/scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/import_utils.py:575: FutureWarning: `is_torch_tpu_available` is deprecated and will be removed in 4.41.0. Please use the `is_torch_xla_available` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/import_utils.py:575: FutureWarning: `is_torch_tpu_available` is deprecated and will be removed in 4.41.0. Please use the `is_torch_xla_available` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/import_utils.py:575: FutureWarning: `is_torch_tpu_available` is deprecated and will be removed in 4.41.0. Please use the `is_torch_xla_available` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/import_utils.py:575: FutureWarning: `is_torch_tpu_available` is deprecated and will be removed in 4.41.0. Please use the `is_torch_xla_available` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/import_utils.py:575: FutureWarning: `is_torch_tpu_available` is deprecated and will be removed in 4.41.0. Please use the `is_torch_xla_available` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/import_utils.py:575: FutureWarning: `is_torch_tpu_available` is deprecated and will be removed in 4.41.0. Please use the `is_torch_xla_available` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/import_utils.py:575: FutureWarning: `is_torch_tpu_available` is deprecated and will be removed in 4.41.0. Please use the `is_torch_xla_available` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/import_utils.py:575: FutureWarning: `is_torch_tpu_available` is deprecated and will be removed in 4.41.0. Please use the `is_torch_xla_available` instead. warnings.warn( 12/25/2024 15:21:27 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, fp16 training: False, bf16 training: True 12/25/2024 15:21:27 - INFO - __main__ - Additional arguments ScriptArguments(model_family='llama', model_name_or_path='/scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/', config_overrides=None, config_overrides_json='', config_name='/scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/', tokenizer_name='/scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/', cache_dir=None, use_fast_tokenizer=False, model_revision='main', use_auth_token=False, tokenized_mds_train=['/work/pi_miyyer_umass_edu/ctpham/BookClaim-dev/data/ft/bookclaim_balanced_pack_complete'], tokenized_mds_validation=[], tokenized_mds_test=[], token_scaled_loss=True) 12/25/2024 15:21:27 - INFO - __main__ - Checkpoint detected, resuming training at /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-800. To avoid this behavior, change the `--output_dir` or add `--overwrite_output_dir` to train from scratch. 12/25/2024 15:21:27 - WARNING - __main__ - Process rank: 7, device: cuda:7, n_gpu: 1distributed training: True, fp16 training: False, bf16 training: True [INFO|tokenization_utils_base.py:2267] 2024-12-25 15:21:27,593 >> loading file tokenizer.json [INFO|tokenization_utils_base.py:2267] 2024-12-25 15:21:27,593 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:2267] 2024-12-25 15:21:27,593 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:2267] 2024-12-25 15:21:27,593 >> loading file tokenizer_config.json 12/25/2024 15:21:27 - WARNING - __main__ - Process rank: 1, device: cuda:1, n_gpu: 1distributed training: True, fp16 training: False, bf16 training: True 12/25/2024 15:21:27 - WARNING - __main__ - Process rank: 2, device: cuda:2, n_gpu: 1distributed training: True, fp16 training: False, bf16 training: True 12/25/2024 15:21:27 - WARNING - __main__ - Process rank: 6, device: cuda:6, n_gpu: 1distributed training: True, fp16 training: False, bf16 training: True 12/25/2024 15:21:27 - WARNING - __main__ - Process rank: 5, device: cuda:5, n_gpu: 1distributed training: True, fp16 training: False, bf16 training: True 12/25/2024 15:21:27 - WARNING - __main__ - Process rank: 3, device: cuda:3, n_gpu: 1distributed training: True, fp16 training: False, bf16 training: True 12/25/2024 15:21:27 - WARNING - __main__ - Process rank: 4, device: cuda:4, n_gpu: 1distributed training: True, fp16 training: False, bf16 training: True [INFO|tokenization_utils_base.py:2513] 2024-12-25 15:21:27,915 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [INFO|configuration_utils.py:731] 2024-12-25 15:21:27,915 >> loading configuration file /scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/config.json [INFO|configuration_utils.py:800] 2024-12-25 15:21:27,916 >> Model config LlamaConfig { "_name_or_path": "/scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": 128009, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 524288, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "rope_theta": 128000000.0, "tie_word_embeddings": false, "torch_dtype": "float32", "transformers_version": "4.44.2", "use_cache": true, "vocab_size": 128256 } 12/25/2024 15:21:27 - INFO - __main__ - Loaded tokenizer - CPU Memory usage: 633.58 MB RSS 12/25/2024 15:21:28 - INFO - __main__ - GPU 0: 4/81920 MB 12/25/2024 15:21:28 - INFO - __main__ - GPU 1: 423/81920 MB 12/25/2024 15:21:28 - INFO - __main__ - GPU 2: 423/81920 MB 12/25/2024 15:21:28 - INFO - __main__ - GPU 3: 423/81920 MB 12/25/2024 15:21:28 - INFO - __main__ - GPU 4: 423/81920 MB 12/25/2024 15:21:28 - INFO - __main__ - GPU 5: 423/81920 MB 12/25/2024 15:21:28 - INFO - __main__ - GPU 6: 423/81920 MB 12/25/2024 15:21:28 - INFO - __main__ - GPU 7: 423/81920 MB [INFO|modeling_utils.py:3675] 2024-12-25 15:21:28,123 >> loading weights file /scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/model.safetensors.index.json [INFO|configuration_utils.py:1038] 2024-12-25 15:21:28,125 >> Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128009, "pad_token_id": 0 } Loading checkpoint shards: 0%| | 0/7 [00:00> All model checkpoint weights were used when initializing LlamaForCausalLM. [INFO|modeling_utils.py:4515] 2024-12-25 15:21:29,050 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at /scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/. If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. [INFO|modeling_utils.py:4003] 2024-12-25 15:21:29,053 >> Generation config file not found, using a generation config created from the model config. 12/25/2024 15:21:29 - INFO - __main__ - Model: LlamaForCausalLM( (model): LlamaModel( (embed_tokens): Embedding(128256, 4096, padding_idx=0) (layers): ModuleList( (0-31): 32 x LlamaDecoderLayer( (self_attn): LlamaAttention( (q_proj): Linear(in_features=4096, out_features=4096, bias=False) (k_proj): Linear(in_features=4096, out_features=1024, bias=False) (v_proj): Linear(in_features=4096, out_features=1024, bias=False) (o_proj): Linear(in_features=4096, out_features=4096, bias=False) (rotary_emb): LlamaRotaryEmbedding() (distributed_attn_func): DistributedAttention() (distributed_varlen_attn_func): DistributedAttention() ) (mlp): LlamaMLP( (gate_proj): Linear(in_features=4096, out_features=14336, bias=False) (up_proj): Linear(in_features=4096, out_features=14336, bias=False) (down_proj): Linear(in_features=14336, out_features=4096, bias=False) (act_fn): SiLU() ) (input_layernorm): LlamaRMSNorm() (post_attention_layernorm): LlamaRMSNorm() ) ) (norm): LlamaRMSNorm() ) (lm_head): Linear(in_features=4096, out_features=128256, bias=False) ) 12/25/2024 15:21:29 - INFO - __main__ - Loaded model - CPU Memory usage: 658.08 MB RSS Loading checkpoint shards: 100%|██████████| 7/7 [00:00<00:00, 8.23it/s] Loading checkpoint shards: 100%|██████████| 7/7 [00:00<00:00, 8.16it/s] Loading checkpoint shards: 100%|██████████| 7/7 [00:00<00:00, 8.53it/s] Loading checkpoint shards: 100%|██████████| 7/7 [00:00<00:00, 8.28it/s] Loading checkpoint shards: 100%|██████████| 7/7 [00:00<00:00, 8.52it/s] Loading checkpoint shards: 100%|██████████| 7/7 [00:00<00:00, 8.14it/s] Loading checkpoint shards: 100%|██████████| 7/7 [00:00<00:00, 8.48it/s] Loading checkpoint shards: 100%|██████████| 7/7 [00:00<00:00, 8.12it/s] 12/25/2024 15:21:29 - INFO - __main__ - GPU 0: 4/81920 MB 12/25/2024 15:21:29 - INFO - __main__ - GPU 1: 423/81920 MB 12/25/2024 15:21:29 - INFO - __main__ - GPU 2: 423/81920 MB 12/25/2024 15:21:29 - INFO - __main__ - GPU 3: 423/81920 MB 12/25/2024 15:21:29 - INFO - __main__ - GPU 4: 423/81920 MB 12/25/2024 15:21:29 - INFO - __main__ - GPU 5: 423/81920 MB 12/25/2024 15:21:29 - INFO - __main__ - GPU 6: 425/81920 MB 12/25/2024 15:21:29 - INFO - __main__ - GPU 7: 423/81920 MB 12/25/2024 15:21:33 - INFO - training.dataset - Loading datasets for training 12/25/2024 15:21:33 - WARNING - streaming.base.dataset - Because `predownload` was not specified, it will default to 8*batch_size if batch_size is not None, otherwise 64. Prior to Streaming v0.7.0, `predownload` defaulted to max(batch_size, 256 * batch_size // num_canonical_nodes). 12/25/2024 15:21:33 - WARNING - streaming.base.dataset - Because `predownload` was not specified, it will default to 8*batch_size if batch_size is not None, otherwise 64. Prior to Streaming v0.7.0, `predownload` defaulted to max(batch_size, 256 * batch_size // num_canonical_nodes). 12/25/2024 15:21:33 - WARNING - streaming.base.dataset - Because `predownload` was not specified, it will default to 8*batch_size if batch_size is not None, otherwise 64. Prior to Streaming v0.7.0, `predownload` defaulted to max(batch_size, 256 * batch_size // num_canonical_nodes). 12/25/2024 15:21:33 - WARNING - streaming.base.dataset - Because `predownload` was not specified, it will default to 8*batch_size if batch_size is not None, otherwise 64. Prior to Streaming v0.7.0, `predownload` defaulted to max(batch_size, 256 * batch_size // num_canonical_nodes). 12/25/2024 15:21:33 - WARNING - streaming.base.dataset - Because `predownload` was not specified, it will default to 8*batch_size if batch_size is not None, otherwise 64. Prior to Streaming v0.7.0, `predownload` defaulted to max(batch_size, 256 * batch_size // num_canonical_nodes). 12/25/2024 15:21:33 - WARNING - streaming.base.dataset - Because `predownload` was not specified, it will default to 8*batch_size if batch_size is not None, otherwise 64. Prior to Streaming v0.7.0, `predownload` defaulted to max(batch_size, 256 * batch_size // num_canonical_nodes). 12/25/2024 15:21:33 - WARNING - streaming.base.dataset - Because `predownload` was not specified, it will default to 8*batch_size if batch_size is not None, otherwise 64. Prior to Streaming v0.7.0, `predownload` defaulted to max(batch_size, 256 * batch_size // num_canonical_nodes). 12/25/2024 15:21:33 - WARNING - streaming.base.dataset - Because `predownload` was not specified, it will default to 8*batch_size if batch_size is not None, otherwise 64. Prior to Streaming v0.7.0, `predownload` defaulted to max(batch_size, 256 * batch_size // num_canonical_nodes). 12/25/2024 15:21:33 - INFO - __main__ - Loaded training dataset [INFO|trainer.py:648] 2024-12-25 15:21:33,924 >> Using auto half precision backend [INFO|trainer.py:262] 2024-12-25 15:21:33,925 >> Using world as sequence parallel group 12/25/2024 15:21:33 - INFO - __main__ - Trainer created 12/25/2024 15:21:33 - INFO - __main__ - Training/evaluation parameters TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.95, adam_epsilon=1e-08, auto_find_batch_size=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, cuda_empty_cache=True, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=1, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=False, ddp_timeout=1800, debug=[], deepspeed=None, disable_tqdm=True, dispatch_batches=None, do_eval=False, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[, ], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=2, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=1e-06, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=info, log_level_replica=warning, log_on_each_node=True, logging_dir=/scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/runs/Dec25_15-21-22_gpu020, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lr_scheduler_kwargs={}, lr_scheduler_type=cosine, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, min_lr_ratio=0.1, mp_parameters=, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=1.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=1, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=200, save_strategy=steps, save_total_limit=None, seed=42, seq_parallel_size=8, skip_memory_metrics=True, split_batches=None, streaming_dataset=True, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.05, warmup_steps=0, weight_decay=0.1, ) 12/25/2024 15:21:33 - INFO - __main__ - *** Train *** [WARNING|trainer.py:795] 2024-12-25 15:21:34,066 >> Use streaming dataloader for train [WARNING|trainer.py:795] 2024-12-25 15:21:34,066 >> Use streaming dataloader for train [WARNING|trainer.py:795] 2024-12-25 15:21:34,068 >> Use streaming dataloader for train [WARNING|trainer.py:795] 2024-12-25 15:21:34,070 >> Use streaming dataloader for train [WARNING|trainer.py:795] 2024-12-25 15:21:34,071 >> Use streaming dataloader for train [WARNING|trainer.py:795] 2024-12-25 15:21:34,074 >> Use streaming dataloader for train [WARNING|trainer.py:795] 2024-12-25 15:21:34,076 >> Use streaming dataloader for train [WARNING|trainer.py:795] 2024-12-25 15:21:34,078 >> Use streaming dataloader for train [INFO|trainer.py:2526] 2024-12-25 15:21:47,974 >> Loading model from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-800. /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/accelerate/utils/fsdp_utils.py:126: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. state_dict = torch.load(input_model_file) /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/accelerate/utils/fsdp_utils.py:126: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. state_dict = torch.load(input_model_file) /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/accelerate/utils/fsdp_utils.py:126: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. state_dict = torch.load(input_model_file) /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/accelerate/utils/fsdp_utils.py:126: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. state_dict = torch.load(input_model_file) /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/accelerate/utils/fsdp_utils.py:126: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. state_dict = torch.load(input_model_file) /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/accelerate/utils/fsdp_utils.py:126: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. state_dict = torch.load(input_model_file) /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/accelerate/utils/fsdp_utils.py:126: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. state_dict = torch.load(input_model_file) /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/accelerate/utils/fsdp_utils.py:126: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. state_dict = torch.load(input_model_file) [WARNING|trainer.py:901] 2024-12-25 15:23:48,230 >> Load optimizer state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-800 [WARNING|trainer.py:901] 2024-12-25 15:23:48,233 >> Load optimizer state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-800 [WARNING|trainer.py:901] 2024-12-25 15:23:48,292 >> Load optimizer state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-800 [WARNING|trainer.py:901] 2024-12-25 15:23:48,300 >> Load optimizer state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-800 [WARNING|trainer.py:901] 2024-12-25 15:23:48,310 >> Load optimizer state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-800 [WARNING|trainer.py:901] 2024-12-25 15:23:48,313 >> Load optimizer state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-800 [WARNING|trainer.py:901] 2024-12-25 15:23:48,313 >> Load optimizer state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-800 [WARNING|trainer.py:901] 2024-12-25 15:23:48,313 >> Load optimizer state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-800 /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/accelerate/utils/fsdp_utils.py:209: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. optim_state = torch.load(input_optimizer_file) [WARNING|trainer.py:921] 2024-12-25 15:27:38,368 >> Load scheduler state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-800 [WARNING|trainer.py:942] 2024-12-25 15:27:38,370 >> Resume streaming dataset state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-800: {'epoch': 0, 'sample_in_epoch': 1600, 'num_canonical_nodes': 1, 'shuffle_seed': 42, 'initial_physical_nodes': 1} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/trainer.py:2833: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. checkpoint_rng_state = torch.load(rng_file) [WARNING|trainer.py:921] 2024-12-25 15:27:38,392 >> Load scheduler state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-800 [WARNING|trainer.py:942] 2024-12-25 15:27:38,393 >> Resume streaming dataset state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-800: {'epoch': 0, 'sample_in_epoch': 1600, 'num_canonical_nodes': 1, 'shuffle_seed': 42, 'initial_physical_nodes': 1} [WARNING|trainer.py:921] 2024-12-25 15:27:38,400 >> Load scheduler state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-800 [WARNING|trainer.py:942] 2024-12-25 15:27:38,401 >> Resume streaming dataset state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-800: {'epoch': 0, 'sample_in_epoch': 1600, 'num_canonical_nodes': 1, 'shuffle_seed': 42, 'initial_physical_nodes': 1} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/trainer.py:2833: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. checkpoint_rng_state = torch.load(rng_file) [WARNING|trainer.py:921] 2024-12-25 15:27:38,413 >> Load scheduler state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-800 /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/trainer.py:2833: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. checkpoint_rng_state = torch.load(rng_file) [WARNING|trainer.py:942] 2024-12-25 15:27:38,414 >> Resume streaming dataset state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-800: {'epoch': 0, 'sample_in_epoch': 1600, 'num_canonical_nodes': 1, 'shuffle_seed': 42, 'initial_physical_nodes': 1} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/trainer.py:2833: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. checkpoint_rng_state = torch.load(rng_file) [WARNING|trainer.py:921] 2024-12-25 15:27:38,456 >> Load scheduler state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-800 [WARNING|trainer.py:942] 2024-12-25 15:27:38,456 >> Resume streaming dataset state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-800: {'epoch': 0, 'sample_in_epoch': 1600, 'num_canonical_nodes': 1, 'shuffle_seed': 42, 'initial_physical_nodes': 1} [WARNING|trainer.py:921] 2024-12-25 15:27:38,464 >> Load scheduler state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-800 [WARNING|trainer.py:942] 2024-12-25 15:27:38,465 >> Resume streaming dataset state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-800: {'epoch': 0, 'sample_in_epoch': 1600, 'num_canonical_nodes': 1, 'shuffle_seed': 42, 'initial_physical_nodes': 1} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/trainer.py:2833: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. checkpoint_rng_state = torch.load(rng_file) [WARNING|trainer.py:921] 2024-12-25 15:27:38,469 >> Load scheduler state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-800 [WARNING|trainer.py:942] 2024-12-25 15:27:38,470 >> Resume streaming dataset state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-800: {'epoch': 0, 'sample_in_epoch': 1600, 'num_canonical_nodes': 1, 'shuffle_seed': 42, 'initial_physical_nodes': 1} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/trainer.py:2833: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. checkpoint_rng_state = torch.load(rng_file) /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/trainer.py:2833: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. checkpoint_rng_state = torch.load(rng_file) 12/25/2024 15:27:38 - WARNING - streaming.base.dataset - Because `shuffle_block_size` was not specified, it will default to max(4_000_000 // num_canonical_nodes, 1 << 18) if num_canonical_nodes is not None, otherwise 262144. Prior to Streaming v0.7.0, `shuffle_block_size` defaulted to 262144. 12/25/2024 15:27:38 - WARNING - streaming.base.dataset - Because `shuffle_block_size` was not specified, it will default to max(4_000_000 // num_canonical_nodes, 1 << 18) if num_canonical_nodes is not None, otherwise 262144. Prior to Streaming v0.7.0, `shuffle_block_size` defaulted to 262144. 12/25/2024 15:27:38 - WARNING - streaming.base.dataset - Because `shuffle_block_size` was not specified, it will default to max(4_000_000 // num_canonical_nodes, 1 << 18) if num_canonical_nodes is not None, otherwise 262144. Prior to Streaming v0.7.0, `shuffle_block_size` defaulted to 262144. 12/25/2024 15:27:38 - WARNING - streaming.base.dataset - Because `shuffle_block_size` was not specified, it will default to max(4_000_000 // num_canonical_nodes, 1 << 18) if num_canonical_nodes is not None, otherwise 262144. Prior to Streaming v0.7.0, `shuffle_block_size` defaulted to 262144. 12/25/2024 15:27:38 - WARNING - streaming.base.dataset - Because `shuffle_block_size` was not specified, it will default to max(4_000_000 // num_canonical_nodes, 1 << 18) if num_canonical_nodes is not None, otherwise 262144. Prior to Streaming v0.7.0, `shuffle_block_size` defaulted to 262144. 12/25/2024 15:27:38 - WARNING - streaming.base.dataset - Because `shuffle_block_size` was not specified, it will default to max(4_000_000 // num_canonical_nodes, 1 << 18) if num_canonical_nodes is not None, otherwise 262144. Prior to Streaming v0.7.0, `shuffle_block_size` defaulted to 262144. 12/25/2024 15:27:38 - WARNING - streaming.base.dataset - Because `shuffle_block_size` was not specified, it will default to max(4_000_000 // num_canonical_nodes, 1 << 18) if num_canonical_nodes is not None, otherwise 262144. Prior to Streaming v0.7.0, `shuffle_block_size` defaulted to 262144. [WARNING|trainer.py:921] 2024-12-25 15:27:42,929 >> Load scheduler state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-800 [WARNING|trainer.py:942] 2024-12-25 15:27:42,930 >> Resume streaming dataset state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-800: {'epoch': 0, 'sample_in_epoch': 1600, 'num_canonical_nodes': 1, 'shuffle_seed': 42, 'initial_physical_nodes': 1} [INFO|trainer.py:2134] 2024-12-25 15:27:42,938 >> ***** Running training ***** [INFO|trainer.py:2135] 2024-12-25 15:27:42,938 >> Num examples = 7,374 [INFO|trainer.py:2136] 2024-12-25 15:27:42,938 >> Num Epochs = 1 [INFO|trainer.py:2137] 2024-12-25 15:27:42,938 >> Instantaneous batch size per device = 1 [INFO|trainer.py:2140] 2024-12-25 15:27:42,938 >> Total train batch size (w. parallel, distributed & accumulation) = 16 [INFO|trainer.py:2141] 2024-12-25 15:27:42,938 >> Gradient Accumulation steps = 2 [INFO|trainer.py:2142] 2024-12-25 15:27:42,938 >> Total optimization steps = 3,687 [INFO|trainer.py:2143] 2024-12-25 15:27:42,940 >> Number of trainable parameters = 1,003,782,656 [INFO|trainer.py:2165] 2024-12-25 15:27:42,942 >> Continuing training from checkpoint, will skip to saved global_step [INFO|trainer.py:2166] 2024-12-25 15:27:42,942 >> Continuing training from epoch 0 [INFO|trainer.py:2167] 2024-12-25 15:27:42,942 >> Continuing training from global step 800 [INFO|integration_utils.py:807] 2024-12-25 15:27:42,944 >> Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true" wandb: Currently logged in as: chtmp223. Use `wandb login --relogin` to force relogin wandb: wandb version 0.19.1 is available! To upgrade, please run: wandb: $ pip install wandb --upgrade wandb: Tracking run with wandb version 0.17.3 wandb: Run data is saved locally in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/wandb/run-20241225_152744-sac86tlv wandb: Run `wandb offline` to turn off syncing. wandb: Syncing run balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_ wandb: ⭐️ View project at https://wandb.ai/chtmp223/prolong wandb: 🚀 View run at https://wandb.ai/chtmp223/prolong/runs/sac86tlv /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/trainer.py:2833: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. checkpoint_rng_state = torch.load(rng_file) 12/25/2024 15:27:52 - WARNING - streaming.base.dataset - Because `shuffle_block_size` was not specified, it will default to max(4_000_000 // num_canonical_nodes, 1 << 18) if num_canonical_nodes is not None, otherwise 262144. Prior to Streaming v0.7.0, `shuffle_block_size` defaulted to 262144. 12/25/2024 15:27:52 - WARNING - streaming.base.dataset - The `replication` arg has been set to 8 and training is resuming from sample 1600. Make sure you are accounting for sample replication when using StreamingDataset's `state_dict` method for deterministic resumption. Otherwise, you will resume training from the wrong sample. /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] [INFO|trainer.py:175] 2024-12-25 15:28:35,986 >> {'loss': 0.3659, 'grad_norm': 14.524723052978516, 'learning_rate': 9.33022124140483e-07, 'epoch': 0.00027122321670735016, 'num_input_tokens_seen': 1679818752, 'completed': '21.72% (801 / 3_687)', 'remaining time': '34:45:09', 'throughput': '3023.54', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:29:06,898 >> {'loss': 0.3889, 'grad_norm': 12.789175987243652, 'learning_rate': 9.32810071644894e-07, 'epoch': 0.0005424464334147003, 'num_input_tokens_seen': 1681915904, 'completed': '21.75% (802 / 3_687)', 'remaining time': '29:45:24', 'throughput': '8480.25', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:29:37,539 >> {'loss': 0.3584, 'grad_norm': 15.237256050109863, 'learning_rate': 9.325977110783263e-07, 'epoch': 0.0008136696501220504, 'num_input_tokens_seen': 1684013056, 'completed': '21.78% (803 / 3_687)', 'remaining time': '28:00:47', 'throughput': '8555.47', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:30:04,732 >> {'loss': 0.5716, 'grad_norm': 17.71392250061035, 'learning_rate': 9.323850426116797e-07, 'epoch': 0.0010848928668294006, 'num_input_tokens_seen': 1686110208, 'completed': '21.81% (804 / 3_687)', 'remaining time': '26:26:48', 'throughput': '9640.04', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:30:36,798 >> {'loss': 0.5411, 'grad_norm': 20.835195541381836, 'learning_rate': 9.321720664161017e-07, 'epoch': 0.0013561160835367507, 'num_input_tokens_seen': 1688207360, 'completed': '21.83% (805 / 3_687)', 'remaining time': '26:17:02', 'throughput': '8175.29', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:31:05,593 >> {'loss': 0.5051, 'grad_norm': 20.98845100402832, 'learning_rate': 9.319587826629872e-07, 'epoch': 0.0016273393002441008, 'num_input_tokens_seen': 1690304512, 'completed': '21.86% (806 / 3_687)', 'remaining time': '25:44:11', 'throughput': '9103.82', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:31:37,803 >> {'loss': 0.6283, 'grad_norm': 17.86595916748047, 'learning_rate': 9.317451915239792e-07, 'epoch': 0.001898562516951451, 'num_input_tokens_seen': 1692401664, 'completed': '21.89% (807 / 3_687)', 'remaining time': '25:44:00', 'throughput': '8138.46', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:32:10,895 >> {'loss': 0.5475, 'grad_norm': 19.295974731445312, 'learning_rate': 9.315312931709674e-07, 'epoch': 0.0021697857336588013, 'num_input_tokens_seen': 1694498816, 'completed': '21.91% (808 / 3_687)', 'remaining time': '25:49:00', 'throughput': '7921.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:32:45,862 >> {'loss': 0.6961, 'grad_norm': 20.173925399780273, 'learning_rate': 9.313170877760892e-07, 'epoch': 0.0024410089503661514, 'num_input_tokens_seen': 1696595968, 'completed': '21.94% (809 / 3_687)', 'remaining time': '26:02:47', 'throughput': '7496.72', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:33:16,398 >> {'loss': 0.3755, 'grad_norm': 13.733884811401367, 'learning_rate': 9.311025755117291e-07, 'epoch': 0.0027122321670735015, 'num_input_tokens_seen': 1698693120, 'completed': '21.97% (810 / 3_687)', 'remaining time': '25:52:26', 'throughput': '8584.93', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:33:47,984 >> {'loss': 0.6359, 'grad_norm': 19.169233322143555, 'learning_rate': 9.308877565505181e-07, 'epoch': 0.0029834553837808516, 'num_input_tokens_seen': 1700790272, 'completed': '22.00% (811 / 3_687)', 'remaining time': '25:48:27', 'throughput': '8299.33', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:34:19,597 >> {'loss': 0.3631, 'grad_norm': 16.250288009643555, 'learning_rate': 9.306726310653346e-07, 'epoch': 0.0032546786004882017, 'num_input_tokens_seen': 1702887424, 'completed': '22.02% (812 / 3_687)', 'remaining time': '25:45:09', 'throughput': '8292.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:34:50,554 >> {'loss': 0.672, 'grad_norm': 34.65564727783203, 'learning_rate': 9.304571992293032e-07, 'epoch': 0.003525901817195552, 'num_input_tokens_seen': 1704984576, 'completed': '22.05% (813 / 3_687)', 'remaining time': '25:39:52', 'throughput': '8468.13', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:35:21,194 >> {'loss': 0.9443, 'grad_norm': 30.45555877685547, 'learning_rate': 9.302414612157954e-07, 'epoch': 0.003797125033902902, 'num_input_tokens_seen': 1707081728, 'completed': '22.08% (814 / 3_687)', 'remaining time': '25:34:10', 'throughput': '8555.66', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:35:50,992 >> {'loss': 0.5574, 'grad_norm': 19.15201759338379, 'learning_rate': 9.300254171984289e-07, 'epoch': 0.0040683482506102524, 'num_input_tokens_seen': 1709178880, 'completed': '22.10% (815 / 3_687)', 'remaining time': '25:26:29', 'throughput': '8797.37', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:36:24,017 >> {'loss': 0.724, 'grad_norm': 29.86651039123535, 'learning_rate': 9.298090673510677e-07, 'epoch': 0.0043395714673176026, 'num_input_tokens_seen': 1711276032, 'completed': '22.13% (816 / 3_687)', 'remaining time': '25:29:20', 'throughput': '7937.74', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:36:56,450 >> {'loss': 0.6076, 'grad_norm': 18.136457443237305, 'learning_rate': 9.295924118478218e-07, 'epoch': 0.004610794684024953, 'num_input_tokens_seen': 1713373184, 'completed': '22.16% (817 / 3_687)', 'remaining time': '25:30:08', 'throughput': '8082.53', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:37:23,348 >> {'loss': 0.7823, 'grad_norm': 19.57548713684082, 'learning_rate': 9.293754508630473e-07, 'epoch': 0.004882017900732303, 'num_input_tokens_seen': 1715470336, 'completed': '22.19% (818 / 3_687)', 'remaining time': '25:16:05', 'throughput': '9745.83', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:37:52,391 >> {'loss': 0.4778, 'grad_norm': 14.906988143920898, 'learning_rate': 9.291581845713466e-07, 'epoch': 0.005153241117439653, 'num_input_tokens_seen': 1717567488, 'completed': '22.21% (819 / 3_687)', 'remaining time': '25:08:51', 'throughput': '9026.07', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:38:24,390 >> {'loss': 0.3655, 'grad_norm': 14.388298034667969, 'learning_rate': 9.289406131475665e-07, 'epoch': 0.005424464334147003, 'num_input_tokens_seen': 1719664640, 'completed': '22.24% (820 / 3_687)', 'remaining time': '25:09:21', 'throughput': '8192.32', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:38:53,145 >> {'loss': 0.6397, 'grad_norm': 19.326364517211914, 'learning_rate': 9.287227367668012e-07, 'epoch': 0.005695687550854353, 'num_input_tokens_seen': 1721761792, 'completed': '22.27% (821 / 3_687)', 'remaining time': '25:02:23', 'throughput': '9116.55', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:39:26,625 >> {'loss': 0.3671, 'grad_norm': 12.356776237487793, 'learning_rate': 9.285045556043885e-07, 'epoch': 0.005966910767561703, 'num_input_tokens_seen': 1723858944, 'completed': '22.29% (822 / 3_687)', 'remaining time': '25:06:16', 'throughput': '7829.66', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:39:53,398 >> {'loss': 0.763, 'grad_norm': 18.93061065673828, 'learning_rate': 9.282860698359128e-07, 'epoch': 0.006238133984269053, 'num_input_tokens_seen': 1725956096, 'completed': '22.32% (823 / 3_687)', 'remaining time': '24:55:50', 'throughput': '9791.56', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:40:20,749 >> {'loss': 0.9767, 'grad_norm': 23.214996337890625, 'learning_rate': 9.280672796372029e-07, 'epoch': 0.006509357200976403, 'num_input_tokens_seen': 1728053248, 'completed': '22.35% (824 / 3_687)', 'remaining time': '24:47:23', 'throughput': '9584.33', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:40:49,260 >> {'loss': 0.6971, 'grad_norm': 21.05919075012207, 'learning_rate': 9.278481851843327e-07, 'epoch': 0.0067805804176837535, 'num_input_tokens_seen': 1730150400, 'completed': '22.38% (825 / 3_687)', 'remaining time': '24:41:48', 'throughput': '9194.39', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:41:20,565 >> {'loss': 0.6836, 'grad_norm': 20.329505920410156, 'learning_rate': 9.276287866536215e-07, 'epoch': 0.007051803634391104, 'num_input_tokens_seen': 1732247552, 'completed': '22.40% (826 / 3_687)', 'remaining time': '24:41:43', 'throughput': '8374.10', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:41:50,661 >> {'loss': 0.3354, 'grad_norm': 13.536745071411133, 'learning_rate': 9.274090842216326e-07, 'epoch': 0.007323026851098454, 'num_input_tokens_seen': 1734344704, 'completed': '22.43% (827 / 3_687)', 'remaining time': '24:39:28', 'throughput': '8710.18', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:42:22,992 >> {'loss': 0.648, 'grad_norm': 19.788000106811523, 'learning_rate': 9.271890780651741e-07, 'epoch': 0.007594250067805804, 'num_input_tokens_seen': 1736441856, 'completed': '22.46% (828 / 3_687)', 'remaining time': '24:41:09', 'throughput': '8108.01', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:42:54,138 >> {'loss': 0.2731, 'grad_norm': 11.306535720825195, 'learning_rate': 9.269687683612987e-07, 'epoch': 0.007865473284513154, 'num_input_tokens_seen': 1738539008, 'completed': '22.48% (829 / 3_687)', 'remaining time': '24:40:44', 'throughput': '8416.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:43:23,485 >> {'loss': 0.4084, 'grad_norm': 20.76605796813965, 'learning_rate': 9.267481552873033e-07, 'epoch': 0.008136696501220505, 'num_input_tokens_seen': 1740636160, 'completed': '22.51% (830 / 3_687)', 'remaining time': '24:37:27', 'throughput': '8932.63', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:43:53,981 >> {'loss': 0.4862, 'grad_norm': 17.467897415161133, 'learning_rate': 9.265272390207289e-07, 'epoch': 0.008407919717927854, 'num_input_tokens_seen': 1742733312, 'completed': '22.54% (831 / 3_687)', 'remaining time': '24:36:07', 'throughput': '8596.04', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:44:22,284 >> {'loss': 0.5836, 'grad_norm': 21.455224990844727, 'learning_rate': 9.263060197393603e-07, 'epoch': 0.008679142934635205, 'num_input_tokens_seen': 1744830464, 'completed': '22.57% (832 / 3_687)', 'remaining time': '24:31:35', 'throughput': '9262.05', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:44:49,520 >> {'loss': 0.9138, 'grad_norm': 22.814332962036133, 'learning_rate': 9.260844976212268e-07, 'epoch': 0.008950366151342554, 'num_input_tokens_seen': 1746927616, 'completed': '22.59% (833 / 3_687)', 'remaining time': '24:25:45', 'throughput': '9624.88', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:45:22,809 >> {'loss': 0.3955, 'grad_norm': 13.567204475402832, 'learning_rate': 9.258626728446004e-07, 'epoch': 0.009221589368049905, 'num_input_tokens_seen': 1749024768, 'completed': '22.62% (834 / 3_687)', 'remaining time': '24:28:41', 'throughput': '7874.65', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:45:50,201 >> {'loss': 0.7177, 'grad_norm': 19.025236129760742, 'learning_rate': 9.256405455879977e-07, 'epoch': 0.009492812584757255, 'num_input_tokens_seen': 1751121920, 'completed': '22.65% (835 / 3_687)', 'remaining time': '24:23:26', 'throughput': '9570.26', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:46:19,510 >> {'loss': 0.6642, 'grad_norm': 22.544050216674805, 'learning_rate': 9.25418116030178e-07, 'epoch': 0.009764035801464606, 'num_input_tokens_seen': 1753219072, 'completed': '22.67% (836 / 3_687)', 'remaining time': '24:20:58', 'throughput': '8944.18', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:46:50,922 >> {'loss': 0.3442, 'grad_norm': 13.098361015319824, 'learning_rate': 9.251953843501443e-07, 'epoch': 0.010035259018171955, 'num_input_tokens_seen': 1755316224, 'completed': '22.70% (837 / 3_687)', 'remaining time': '24:21:18', 'throughput': '8345.31', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:47:21,394 >> {'loss': 0.5423, 'grad_norm': 20.6568660736084, 'learning_rate': 9.249723507271425e-07, 'epoch': 0.010306482234879306, 'num_input_tokens_seen': 1757413376, 'completed': '22.73% (838 / 3_687)', 'remaining time': '24:20:26', 'throughput': '8602.70', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:47:51,979 >> {'loss': 0.5084, 'grad_norm': 17.13654899597168, 'learning_rate': 9.247490153406617e-07, 'epoch': 0.010577705451586655, 'num_input_tokens_seen': 1759510528, 'completed': '22.76% (839 / 3_687)', 'remaining time': '24:19:42', 'throughput': '8570.94', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:48:23,595 >> {'loss': 0.3407, 'grad_norm': 16.935338973999023, 'learning_rate': 9.245253783704334e-07, 'epoch': 0.010848928668294006, 'num_input_tokens_seen': 1761607680, 'completed': '22.78% (840 / 3_687)', 'remaining time': '24:20:13', 'throughput': '8291.62', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:48:53,372 >> {'loss': 0.3577, 'grad_norm': 12.196868896484375, 'learning_rate': 9.243014399964324e-07, 'epoch': 0.011120151885001357, 'num_input_tokens_seen': 1763704832, 'completed': '22.81% (841 / 3_687)', 'remaining time': '24:18:33', 'throughput': '8803.49', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:49:24,349 >> {'loss': 0.3152, 'grad_norm': 10.492193222045898, 'learning_rate': 9.240772003988758e-07, 'epoch': 0.011391375101708706, 'num_input_tokens_seen': 1765801984, 'completed': '22.84% (842 / 3_687)', 'remaining time': '24:18:18', 'throughput': '8462.62', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:49:54,691 >> {'loss': 0.4958, 'grad_norm': 17.036666870117188, 'learning_rate': 9.238526597582229e-07, 'epoch': 0.011662598318416057, 'num_input_tokens_seen': 1767899136, 'completed': '22.86% (843 / 3_687)', 'remaining time': '24:17:20', 'throughput': '8639.70', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:50:23,631 >> {'loss': 0.6916, 'grad_norm': 26.499019622802734, 'learning_rate': 9.236278182551758e-07, 'epoch': 0.011933821535123406, 'num_input_tokens_seen': 1769996288, 'completed': '22.89% (844 / 3_687)', 'remaining time': '24:14:52', 'throughput': '9058.05', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:50:51,039 >> {'loss': 0.5991, 'grad_norm': 18.293750762939453, 'learning_rate': 9.23402676070678e-07, 'epoch': 0.012205044751830757, 'num_input_tokens_seen': 1772093440, 'completed': '22.92% (845 / 3_687)', 'remaining time': '24:10:53', 'throughput': '9564.52', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:51:22,677 >> {'loss': 0.6295, 'grad_norm': 19.971363067626953, 'learning_rate': 9.231772333859154e-07, 'epoch': 0.012476267968538107, 'num_input_tokens_seen': 1774190592, 'completed': '22.95% (846 / 3_687)', 'remaining time': '24:11:25', 'throughput': '8285.82', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:51:52,254 >> {'loss': 0.3458, 'grad_norm': 15.316537857055664, 'learning_rate': 9.22951490382316e-07, 'epoch': 0.012747491185245458, 'num_input_tokens_seen': 1776287744, 'completed': '22.97% (847 / 3_687)', 'remaining time': '24:09:49', 'throughput': '8863.05', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:52:23,039 >> {'loss': 0.3044, 'grad_norm': 13.23933219909668, 'learning_rate': 9.22725447241549e-07, 'epoch': 0.013018714401952807, 'num_input_tokens_seen': 1778384896, 'completed': '23.00% (848 / 3_687)', 'remaining time': '24:09:28', 'throughput': '8515.28', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:52:56,342 >> {'loss': 0.4336, 'grad_norm': 17.797571182250977, 'learning_rate': 9.224991041455252e-07, 'epoch': 0.013289937618660158, 'num_input_tokens_seen': 1780482048, 'completed': '23.03% (849 / 3_687)', 'remaining time': '24:11:32', 'throughput': '7871.39', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:53:26,337 >> {'loss': 0.6113, 'grad_norm': 17.58323860168457, 'learning_rate': 9.222724612763971e-07, 'epoch': 0.013561160835367507, 'num_input_tokens_seen': 1782579200, 'completed': '23.05% (850 / 3_687)', 'remaining time': '24:10:22', 'throughput': '8739.74', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:53:54,885 >> {'loss': 0.6039, 'grad_norm': 17.832801818847656, 'learning_rate': 9.220455188165582e-07, 'epoch': 0.013832384052074858, 'num_input_tokens_seen': 1784676352, 'completed': '23.08% (851 / 3_687)', 'remaining time': '24:07:53', 'throughput': '9182.49', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:54:24,703 >> {'loss': 0.7489, 'grad_norm': 23.844926834106445, 'learning_rate': 9.218182769486433e-07, 'epoch': 0.014103607268782207, 'num_input_tokens_seen': 1786773504, 'completed': '23.11% (852 / 3_687)', 'remaining time': '24:06:38', 'throughput': '8791.40', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:54:52,313 >> {'loss': 0.615, 'grad_norm': 16.524341583251953, 'learning_rate': 9.215907358555276e-07, 'epoch': 0.014374830485489558, 'num_input_tokens_seen': 1788870656, 'completed': '23.14% (853 / 3_687)', 'remaining time': '24:03:26', 'throughput': '9494.59', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:55:24,214 >> {'loss': 0.867, 'grad_norm': 24.72736358642578, 'learning_rate': 9.213628957203277e-07, 'epoch': 0.014646053702196907, 'num_input_tokens_seen': 1790967808, 'completed': '23.16% (854 / 3_687)', 'remaining time': '24:04:06', 'throughput': '8217.57', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:55:54,594 >> {'loss': 0.8105, 'grad_norm': 25.9926700592041, 'learning_rate': 9.21134756726401e-07, 'epoch': 0.014917276918904258, 'num_input_tokens_seen': 1793064960, 'completed': '23.19% (855 / 3_687)', 'remaining time': '24:03:25', 'throughput': '8628.67', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:56:25,075 >> {'loss': 0.5625, 'grad_norm': 20.51833724975586, 'learning_rate': 9.209063190573445e-07, 'epoch': 0.015188500135611608, 'num_input_tokens_seen': 1795162112, 'completed': '23.22% (856 / 3_687)', 'remaining time': '24:02:49', 'throughput': '8600.28', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:56:57,529 >> {'loss': 0.4131, 'grad_norm': 15.484442710876465, 'learning_rate': 9.206775828969967e-07, 'epoch': 0.015459723352318959, 'num_input_tokens_seen': 1797259264, 'completed': '23.24% (857 / 3_687)', 'remaining time': '24:03:52', 'throughput': '8077.55', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:57:29,277 >> {'loss': 0.3329, 'grad_norm': 12.139299392700195, 'learning_rate': 9.204485484294355e-07, 'epoch': 0.015730946569026308, 'num_input_tokens_seen': 1799356416, 'completed': '23.27% (858 / 3_687)', 'remaining time': '24:04:17', 'throughput': '8256.88', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:57:59,433 >> {'loss': 0.5659, 'grad_norm': 15.683771133422852, 'learning_rate': 9.202192158389791e-07, 'epoch': 0.01600216978573366, 'num_input_tokens_seen': 1801453568, 'completed': '23.30% (859 / 3_687)', 'remaining time': '24:03:23', 'throughput': '8693.03', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:58:30,295 >> {'loss': 0.6709, 'grad_norm': 21.66155242919922, 'learning_rate': 9.199895853101856e-07, 'epoch': 0.01627339300244101, 'num_input_tokens_seen': 1803550720, 'completed': '23.33% (860 / 3_687)', 'remaining time': '24:03:04', 'throughput': '8493.97', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:58:58,447 >> {'loss': 0.7169, 'grad_norm': 19.560569763183594, 'learning_rate': 9.197596570278529e-07, 'epoch': 0.01654461621914836, 'num_input_tokens_seen': 1805647872, 'completed': '23.35% (861 / 3_687)', 'remaining time': '24:00:39', 'throughput': '9311.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:59:28,555 >> {'loss': 0.3676, 'grad_norm': 14.26892375946045, 'learning_rate': 9.19529431177019e-07, 'epoch': 0.016815839435855708, 'num_input_tokens_seen': 1807745024, 'completed': '23.38% (862 / 3_687)', 'remaining time': '23:59:46', 'throughput': '8706.71', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 15:59:57,537 >> {'loss': 0.1885, 'grad_norm': 9.662439346313477, 'learning_rate': 9.192989079429603e-07, 'epoch': 0.01708706265256306, 'num_input_tokens_seen': 1809842176, 'completed': '23.41% (863 / 3_687)', 'remaining time': '23:58:04', 'throughput': '9045.20', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:00:28,114 >> {'loss': 0.4003, 'grad_norm': 16.50687026977539, 'learning_rate': 9.190680875111934e-07, 'epoch': 0.01735828586927041, 'num_input_tokens_seen': 1811939328, 'completed': '23.43% (864 / 3_687)', 'remaining time': '23:57:34', 'throughput': '8573.20', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:01:01,622 >> {'loss': 0.4239, 'grad_norm': 16.06983757019043, 'learning_rate': 9.188369700674735e-07, 'epoch': 0.01762950908597776, 'num_input_tokens_seen': 1814036480, 'completed': '23.46% (865 / 3_687)', 'remaining time': '23:59:12', 'throughput': '7823.39', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:01:32,933 >> {'loss': 0.5002, 'grad_norm': 17.57181739807129, 'learning_rate': 9.186055557977957e-07, 'epoch': 0.01790073230268511, 'num_input_tokens_seen': 1816133632, 'completed': '23.49% (866 / 3_687)', 'remaining time': '23:59:12', 'throughput': '8372.29', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:02:01,295 >> {'loss': 0.6752, 'grad_norm': 22.786888122558594, 'learning_rate': 9.18373844888393e-07, 'epoch': 0.01817195551939246, 'num_input_tokens_seen': 1818230784, 'completed': '23.52% (867 / 3_687)', 'remaining time': '23:57:07', 'throughput': '9242.83', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:02:32,492 >> {'loss': 0.477, 'grad_norm': 15.312600135803223, 'learning_rate': 9.181418375257374e-07, 'epoch': 0.01844317873609981, 'num_input_tokens_seen': 1820327936, 'completed': '23.54% (868 / 3_687)', 'remaining time': '23:57:02', 'throughput': '8402.63', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:03:00,117 >> {'loss': 0.6745, 'grad_norm': 21.413515090942383, 'learning_rate': 9.179095338965401e-07, 'epoch': 0.01871440195280716, 'num_input_tokens_seen': 1822425088, 'completed': '23.57% (869 / 3_687)', 'remaining time': '23:54:30', 'throughput': '9489.46', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:03:31,859 >> {'loss': 0.4547, 'grad_norm': 14.375008583068848, 'learning_rate': 9.176769341877497e-07, 'epoch': 0.01898562516951451, 'num_input_tokens_seen': 1824522240, 'completed': '23.60% (870 / 3_687)', 'remaining time': '23:54:48', 'throughput': '8258.69', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:04:00,965 >> {'loss': 0.5549, 'grad_norm': 18.940818786621094, 'learning_rate': 9.17444038586554e-07, 'epoch': 0.01925684838622186, 'num_input_tokens_seen': 1826619392, 'completed': '23.62% (871 / 3_687)', 'remaining time': '23:53:20', 'throughput': '9006.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:04:31,371 >> {'loss': 0.6916, 'grad_norm': 22.97831916809082, 'learning_rate': 9.172108472803782e-07, 'epoch': 0.01952807160292921, 'num_input_tokens_seen': 1828716544, 'completed': '23.65% (872 / 3_687)', 'remaining time': '23:52:44', 'throughput': '8621.39', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:05:03,161 >> {'loss': 0.464, 'grad_norm': 33.489131927490234, 'learning_rate': 9.16977360456886e-07, 'epoch': 0.019799294819636562, 'num_input_tokens_seen': 1830813696, 'completed': '23.68% (873 / 3_687)', 'remaining time': '23:53:02', 'throughput': '8246.14', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:05:35,377 >> {'loss': 0.3587, 'grad_norm': 12.543787002563477, 'learning_rate': 9.167435783039786e-07, 'epoch': 0.02007051803634391, 'num_input_tokens_seen': 1832910848, 'completed': '23.70% (874 / 3_687)', 'remaining time': '23:53:34', 'throughput': '8137.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:06:04,922 >> {'loss': 0.5598, 'grad_norm': 16.099515914916992, 'learning_rate': 9.165095010097949e-07, 'epoch': 0.02034174125305126, 'num_input_tokens_seen': 1835008000, 'completed': '23.73% (875 / 3_687)', 'remaining time': '23:52:25', 'throughput': '8872.72', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:06:35,393 >> {'loss': 0.645, 'grad_norm': 18.598281860351562, 'learning_rate': 9.162751287627116e-07, 'epoch': 0.02061296446975861, 'num_input_tokens_seen': 1837105152, 'completed': '23.76% (876 / 3_687)', 'remaining time': '23:51:51', 'throughput': '8603.05', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:07:05,974 >> {'loss': 0.4911, 'grad_norm': 14.963810920715332, 'learning_rate': 9.160404617513424e-07, 'epoch': 0.020884187686465962, 'num_input_tokens_seen': 1839202304, 'completed': '23.79% (877 / 3_687)', 'remaining time': '23:51:21', 'throughput': '8572.13', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:07:34,179 >> {'loss': 0.8007, 'grad_norm': 21.489377975463867, 'learning_rate': 9.158055001645385e-07, 'epoch': 0.02115541090317331, 'num_input_tokens_seen': 1841299456, 'completed': '23.81% (878 / 3_687)', 'remaining time': '23:49:26', 'throughput': '9293.98', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:08:05,242 >> {'loss': 0.4235, 'grad_norm': 14.658339500427246, 'learning_rate': 9.155702441913881e-07, 'epoch': 0.02142663411988066, 'num_input_tokens_seen': 1843396608, 'completed': '23.84% (879 / 3_687)', 'remaining time': '23:49:14', 'throughput': '8439.10', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:08:35,579 >> {'loss': 0.5487, 'grad_norm': 20.184280395507812, 'learning_rate': 9.15334694021216e-07, 'epoch': 0.021697857336588012, 'num_input_tokens_seen': 1845493760, 'completed': '23.87% (880 / 3_687)', 'remaining time': '23:48:36', 'throughput': '8641.18', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:09:04,734 >> {'loss': 0.558, 'grad_norm': 17.544336318969727, 'learning_rate': 9.150988498435843e-07, 'epoch': 0.021969080553295363, 'num_input_tokens_seen': 1847590912, 'completed': '23.89% (881 / 3_687)', 'remaining time': '23:47:18', 'throughput': '8991.35', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:09:37,464 >> {'loss': 0.3181, 'grad_norm': 12.21959400177002, 'learning_rate': 9.148627118482912e-07, 'epoch': 0.022240303770002714, 'num_input_tokens_seen': 1849688064, 'completed': '23.92% (882 / 3_687)', 'remaining time': '23:48:03', 'throughput': '8009.24', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:10:05,401 >> {'loss': 0.7004, 'grad_norm': 17.65806007385254, 'learning_rate': 9.146262802253717e-07, 'epoch': 0.02251152698671006, 'num_input_tokens_seen': 1851785216, 'completed': '23.95% (883 / 3_687)', 'remaining time': '23:46:04', 'throughput': '9383.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:10:36,561 >> {'loss': 0.3288, 'grad_norm': 11.67097282409668, 'learning_rate': 9.14389555165097e-07, 'epoch': 0.022782750203417412, 'num_input_tokens_seen': 1853882368, 'completed': '23.98% (884 / 3_687)', 'remaining time': '23:45:55', 'throughput': '8412.94', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:11:08,831 >> {'loss': 0.4907, 'grad_norm': 17.335294723510742, 'learning_rate': 9.141525368579742e-07, 'epoch': 0.023053973420124763, 'num_input_tokens_seen': 1855979520, 'completed': '24.00% (885 / 3_687)', 'remaining time': '23:46:22', 'throughput': '8123.32', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:11:40,256 >> {'loss': 0.424, 'grad_norm': 21.265504837036133, 'learning_rate': 9.139152254947469e-07, 'epoch': 0.023325196636832114, 'num_input_tokens_seen': 1858076672, 'completed': '24.03% (886 / 3_687)', 'remaining time': '23:46:20', 'throughput': '8341.93', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:12:13,263 >> {'loss': 0.4297, 'grad_norm': 12.256847381591797, 'learning_rate': 9.136776212663942e-07, 'epoch': 0.023596419853539462, 'num_input_tokens_seen': 1860173824, 'completed': '24.06% (887 / 3_687)', 'remaining time': '23:47:09', 'throughput': '7942.07', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:12:43,718 >> {'loss': 0.6221, 'grad_norm': 20.520448684692383, 'learning_rate': 9.134397243641307e-07, 'epoch': 0.023867643070246813, 'num_input_tokens_seen': 1862270976, 'completed': '24.08% (888 / 3_687)', 'remaining time': '23:46:34', 'throughput': '8607.72', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:13:16,508 >> {'loss': 0.3873, 'grad_norm': 15.912782669067383, 'learning_rate': 9.132015349794069e-07, 'epoch': 0.024138866286954164, 'num_input_tokens_seen': 1864368128, 'completed': '24.11% (889 / 3_687)', 'remaining time': '23:47:13', 'throughput': '7994.47', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:13:48,305 >> {'loss': 0.2584, 'grad_norm': 12.093875885009766, 'learning_rate': 9.129630533039086e-07, 'epoch': 0.024410089503661515, 'num_input_tokens_seen': 1866465280, 'completed': '24.14% (890 / 3_687)', 'remaining time': '23:47:20', 'throughput': '8244.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:14:20,405 >> {'loss': 0.7337, 'grad_norm': 25.329559326171875, 'learning_rate': 9.127242795295569e-07, 'epoch': 0.024681312720368862, 'num_input_tokens_seen': 1868562432, 'completed': '24.17% (891 / 3_687)', 'remaining time': '23:47:34', 'throughput': '8166.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:14:49,995 >> {'loss': 0.5747, 'grad_norm': 23.77729034423828, 'learning_rate': 9.124852138485076e-07, 'epoch': 0.024952535937076213, 'num_input_tokens_seen': 1870659584, 'completed': '24.19% (892 / 3_687)', 'remaining time': '23:46:32', 'throughput': '8859.19', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:15:21,394 >> {'loss': 0.8033, 'grad_norm': 24.904544830322266, 'learning_rate': 9.12245856453152e-07, 'epoch': 0.025223759153783564, 'num_input_tokens_seen': 1872756736, 'completed': '24.22% (893 / 3_687)', 'remaining time': '23:46:25', 'throughput': '8348.90', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:15:53,769 >> {'loss': 0.632, 'grad_norm': 19.269147872924805, 'learning_rate': 9.120062075361155e-07, 'epoch': 0.025494982370490915, 'num_input_tokens_seen': 1874853888, 'completed': '24.25% (894 / 3_687)', 'remaining time': '23:46:46', 'throughput': '8097.06', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:16:23,080 >> {'loss': 0.6049, 'grad_norm': 19.273340225219727, 'learning_rate': 9.117662672902584e-07, 'epoch': 0.025766205587198263, 'num_input_tokens_seen': 1876951040, 'completed': '24.27% (895 / 3_687)', 'remaining time': '23:45:36', 'throughput': '8943.66', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:16:54,234 >> {'loss': 0.4171, 'grad_norm': 16.550512313842773, 'learning_rate': 9.115260359086757e-07, 'epoch': 0.026037428803905614, 'num_input_tokens_seen': 1879048192, 'completed': '24.30% (896 / 3_687)', 'remaining time': '23:45:20', 'throughput': '8414.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:17:22,945 >> {'loss': 0.6028, 'grad_norm': 21.415002822875977, 'learning_rate': 9.112855135846964e-07, 'epoch': 0.026308652020612965, 'num_input_tokens_seen': 1881145344, 'completed': '24.33% (897 / 3_687)', 'remaining time': '23:43:54', 'throughput': '9130.67', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:17:53,164 >> {'loss': 0.4228, 'grad_norm': 13.791115760803223, 'learning_rate': 9.110447005118836e-07, 'epoch': 0.026579875237320316, 'num_input_tokens_seen': 1883242496, 'completed': '24.36% (898 / 3_687)', 'remaining time': '23:43:12', 'throughput': '8674.63', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:18:25,228 >> {'loss': 0.3593, 'grad_norm': 12.052141189575195, 'learning_rate': 9.108035968840348e-07, 'epoch': 0.026851098454027666, 'num_input_tokens_seen': 1885339648, 'completed': '24.38% (899 / 3_687)', 'remaining time': '23:43:22', 'throughput': '8175.78', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:18:56,984 >> {'loss': 0.6185, 'grad_norm': 19.59334373474121, 'learning_rate': 9.105622028951806e-07, 'epoch': 0.027122321670735014, 'num_input_tokens_seen': 1887436800, 'completed': '24.41% (900 / 3_687)', 'remaining time': '23:43:23', 'throughput': '8254.87', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:19:30,683 >> {'loss': 0.3392, 'grad_norm': 13.629782676696777, 'learning_rate': 9.103205187395861e-07, 'epoch': 0.027393544887442365, 'num_input_tokens_seen': 1889533952, 'completed': '24.44% (901 / 3_687)', 'remaining time': '23:44:17', 'throughput': '7778.90', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:20:00,743 >> {'loss': 0.388, 'grad_norm': 14.33342170715332, 'learning_rate': 9.100785446117493e-07, 'epoch': 0.027664768104149716, 'num_input_tokens_seen': 1891631104, 'completed': '24.46% (902 / 3_687)', 'remaining time': '23:43:29', 'throughput': '8720.74', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:20:29,097 >> {'loss': 0.5221, 'grad_norm': 17.475322723388672, 'learning_rate': 9.098362807064017e-07, 'epoch': 0.027935991320857067, 'num_input_tokens_seen': 1893728256, 'completed': '24.49% (903 / 3_687)', 'remaining time': '23:41:56', 'throughput': '9245.59', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:20:58,559 >> {'loss': 0.3685, 'grad_norm': 15.671577453613281, 'learning_rate': 9.095937272185083e-07, 'epoch': 0.028207214537564414, 'num_input_tokens_seen': 1895825408, 'completed': '24.52% (904 / 3_687)', 'remaining time': '23:40:54', 'throughput': '8897.66', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:21:29,067 >> {'loss': 0.4047, 'grad_norm': 16.664928436279297, 'learning_rate': 9.093508843432667e-07, 'epoch': 0.028478437754271765, 'num_input_tokens_seen': 1897922560, 'completed': '24.55% (905 / 3_687)', 'remaining time': '23:40:20', 'throughput': '8592.57', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:21:59,558 >> {'loss': 0.6866, 'grad_norm': 23.7731876373291, 'learning_rate': 9.091077522761078e-07, 'epoch': 0.028749660970979116, 'num_input_tokens_seen': 1900019712, 'completed': '24.57% (906 / 3_687)', 'remaining time': '23:39:45', 'throughput': '8597.44', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:22:29,158 >> {'loss': 0.6913, 'grad_norm': 18.917259216308594, 'learning_rate': 9.088643312126948e-07, 'epoch': 0.029020884187686467, 'num_input_tokens_seen': 1902116864, 'completed': '24.60% (907 / 3_687)', 'remaining time': '23:38:48', 'throughput': '8856.33', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:23:01,117 >> {'loss': 0.4826, 'grad_norm': 16.237077713012695, 'learning_rate': 9.086206213489239e-07, 'epoch': 0.029292107404393815, 'num_input_tokens_seen': 1904214016, 'completed': '24.63% (908 / 3_687)', 'remaining time': '23:38:52', 'throughput': '8202.45', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:23:32,115 >> {'loss': 0.6956, 'grad_norm': 19.56369400024414, 'learning_rate': 9.083766228809234e-07, 'epoch': 0.029563330621101166, 'num_input_tokens_seen': 1906311168, 'completed': '24.65% (909 / 3_687)', 'remaining time': '23:38:30', 'throughput': '8456.83', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:24:02,245 >> {'loss': 0.4732, 'grad_norm': 14.219721794128418, 'learning_rate': 9.081323360050543e-07, 'epoch': 0.029834553837808517, 'num_input_tokens_seen': 1908408320, 'completed': '24.68% (910 / 3_687)', 'remaining time': '23:37:47', 'throughput': '8700.47', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:24:32,457 >> {'loss': 0.4612, 'grad_norm': 17.175357818603516, 'learning_rate': 9.078877609179088e-07, 'epoch': 0.030105777054515868, 'num_input_tokens_seen': 1910505472, 'completed': '24.71% (911 / 3_687)', 'remaining time': '23:37:06', 'throughput': '8676.56', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:25:02,164 >> {'loss': 0.3071, 'grad_norm': 12.589146614074707, 'learning_rate': 9.076428978163121e-07, 'epoch': 0.030377000271223215, 'num_input_tokens_seen': 1912602624, 'completed': '24.74% (912 / 3_687)', 'remaining time': '23:36:12', 'throughput': '8824.35', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:25:29,515 >> {'loss': 1.031, 'grad_norm': 24.949342727661133, 'learning_rate': 9.073977468973206e-07, 'epoch': 0.030648223487930566, 'num_input_tokens_seen': 1914699776, 'completed': '24.76% (913 / 3_687)', 'remaining time': '23:34:21', 'throughput': '9584.61', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:25:58,980 >> {'loss': 0.3996, 'grad_norm': 17.8113956451416, 'learning_rate': 9.071523083582223e-07, 'epoch': 0.030919446704637917, 'num_input_tokens_seen': 1916796928, 'completed': '24.79% (914 / 3_687)', 'remaining time': '23:33:23', 'throughput': '8896.82', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:26:27,928 >> {'loss': 0.6838, 'grad_norm': 25.642656326293945, 'learning_rate': 9.06906582396537e-07, 'epoch': 0.031190669921345268, 'num_input_tokens_seen': 1918894080, 'completed': '24.82% (915 / 3_687)', 'remaining time': '23:32:13', 'throughput': '9055.67', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:27:00,168 >> {'loss': 0.3932, 'grad_norm': 13.68373966217041, 'learning_rate': 9.066605692100155e-07, 'epoch': 0.031461893138052616, 'num_input_tokens_seen': 1920991232, 'completed': '24.84% (916 / 3_687)', 'remaining time': '23:32:23', 'throughput': '8131.09', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:27:31,212 >> {'loss': 0.3473, 'grad_norm': 12.269515037536621, 'learning_rate': 9.064142689966397e-07, 'epoch': 0.03173311635475997, 'num_input_tokens_seen': 1923088384, 'completed': '24.87% (917 / 3_687)', 'remaining time': '23:32:03', 'throughput': '8444.14', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:27:58,239 >> {'loss': 0.8176, 'grad_norm': 19.712133407592773, 'learning_rate': 9.061676819546229e-07, 'epoch': 0.03200433957146732, 'num_input_tokens_seen': 1925185536, 'completed': '24.90% (918 / 3_687)', 'remaining time': '23:30:09', 'throughput': '9699.19', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:28:29,039 >> {'loss': 0.4822, 'grad_norm': 15.215006828308105, 'learning_rate': 9.059208082824087e-07, 'epoch': 0.032275562788174665, 'num_input_tokens_seen': 1927282688, 'completed': '24.93% (919 / 3_687)', 'remaining time': '23:29:44', 'throughput': '8511.21', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:28:59,744 >> {'loss': 0.6293, 'grad_norm': 20.545522689819336, 'learning_rate': 9.05673648178672e-07, 'epoch': 0.03254678600488202, 'num_input_tokens_seen': 1929379840, 'completed': '24.95% (920 / 3_687)', 'remaining time': '23:29:17', 'throughput': '8537.57', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:29:31,343 >> {'loss': 0.3147, 'grad_norm': 13.10300350189209, 'learning_rate': 9.054262018423175e-07, 'epoch': 0.03281800922158937, 'num_input_tokens_seen': 1931476992, 'completed': '24.98% (921 / 3_687)', 'remaining time': '23:29:10', 'throughput': '8295.87', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:29:59,977 >> {'loss': 0.7481, 'grad_norm': 23.32817268371582, 'learning_rate': 9.051784694724808e-07, 'epoch': 0.03308923243829672, 'num_input_tokens_seen': 1933574144, 'completed': '25.01% (922 / 3_687)', 'remaining time': '23:27:56', 'throughput': '9155.19', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:30:31,119 >> {'loss': 0.2809, 'grad_norm': 10.944870948791504, 'learning_rate': 9.049304512685274e-07, 'epoch': 0.03336045565500407, 'num_input_tokens_seen': 1935671296, 'completed': '25.03% (923 / 3_687)', 'remaining time': '23:27:38', 'throughput': '8417.50', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:31:00,180 >> {'loss': 0.297, 'grad_norm': 11.271623611450195, 'learning_rate': 9.046821474300527e-07, 'epoch': 0.033631678871711417, 'num_input_tokens_seen': 1937768448, 'completed': '25.06% (924 / 3_687)', 'remaining time': '23:26:35', 'throughput': '9020.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:31:30,524 >> {'loss': 0.5428, 'grad_norm': 17.19096565246582, 'learning_rate': 9.044335581568827e-07, 'epoch': 0.03390290208841877, 'num_input_tokens_seen': 1939865600, 'completed': '25.09% (925 / 3_687)', 'remaining time': '23:26:00', 'throughput': '8639.02', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:31:59,987 >> {'loss': 0.3738, 'grad_norm': 13.172834396362305, 'learning_rate': 9.041846836490723e-07, 'epoch': 0.03417412530512612, 'num_input_tokens_seen': 1941962752, 'completed': '25.12% (926 / 3_687)', 'remaining time': '23:25:05', 'throughput': '8897.23', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:32:32,069 >> {'loss': 0.39, 'grad_norm': 13.624527931213379, 'learning_rate': 9.03935524106906e-07, 'epoch': 0.034445348521833466, 'num_input_tokens_seen': 1944059904, 'completed': '25.14% (927 / 3_687)', 'remaining time': '23:25:08', 'throughput': '8171.24', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:33:02,317 >> {'loss': 0.2882, 'grad_norm': 11.111466407775879, 'learning_rate': 9.036860797308984e-07, 'epoch': 0.03471657173854082, 'num_input_tokens_seen': 1946157056, 'completed': '25.17% (928 / 3_687)', 'remaining time': '23:24:31', 'throughput': '8666.50', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:33:29,961 >> {'loss': 0.4821, 'grad_norm': 18.21550178527832, 'learning_rate': 9.034363507217925e-07, 'epoch': 0.03498779495524817, 'num_input_tokens_seen': 1948254208, 'completed': '25.20% (929 / 3_687)', 'remaining time': '23:22:59', 'throughput': '9482.66', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:33:58,943 >> {'loss': 0.6339, 'grad_norm': 25.724552154541016, 'learning_rate': 9.031863372805606e-07, 'epoch': 0.03525901817195552, 'num_input_tokens_seen': 1950351360, 'completed': '25.22% (930 / 3_687)', 'remaining time': '23:21:56', 'throughput': '9045.13', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:34:30,358 >> {'loss': 0.2785, 'grad_norm': 22.317489624023438, 'learning_rate': 9.029360396084043e-07, 'epoch': 0.03553024138866287, 'num_input_tokens_seen': 1952448512, 'completed': '25.25% (931 / 3_687)', 'remaining time': '23:21:44', 'throughput': '8344.66', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:35:00,449 >> {'loss': 0.502, 'grad_norm': 15.543082237243652, 'learning_rate': 9.026854579067537e-07, 'epoch': 0.03580146460537022, 'num_input_tokens_seen': 1954545664, 'completed': '25.28% (932 / 3_687)', 'remaining time': '23:21:05', 'throughput': '8711.58', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:35:31,696 >> {'loss': 0.4526, 'grad_norm': 18.427461624145508, 'learning_rate': 9.024345923772671e-07, 'epoch': 0.03607268782207757, 'num_input_tokens_seen': 1956642816, 'completed': '25.31% (933 / 3_687)', 'remaining time': '23:20:50', 'throughput': '8389.35', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:36:00,814 >> {'loss': 0.3389, 'grad_norm': 13.186530113220215, 'learning_rate': 9.021834432218317e-07, 'epoch': 0.03634391103878492, 'num_input_tokens_seen': 1958739968, 'completed': '25.33% (934 / 3_687)', 'remaining time': '23:19:50', 'throughput': '9002.84', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:36:31,779 >> {'loss': 0.3434, 'grad_norm': 13.932114601135254, 'learning_rate': 9.019320106425629e-07, 'epoch': 0.03661513425549227, 'num_input_tokens_seen': 1960837120, 'completed': '25.36% (935 / 3_687)', 'remaining time': '23:19:29', 'throughput': '8465.92', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:37:01,500 >> {'loss': 0.2254, 'grad_norm': 12.41275691986084, 'learning_rate': 9.016802948418038e-07, 'epoch': 0.03688635747219962, 'num_input_tokens_seen': 1962934272, 'completed': '25.39% (936 / 3_687)', 'remaining time': '23:18:42', 'throughput': '8820.00', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:37:29,385 >> {'loss': 0.5891, 'grad_norm': 18.870506286621094, 'learning_rate': 9.014282960221257e-07, 'epoch': 0.03715758068890697, 'num_input_tokens_seen': 1965031424, 'completed': '25.41% (937 / 3_687)', 'remaining time': '23:17:19', 'throughput': '9400.87', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:38:00,095 >> {'loss': 0.4418, 'grad_norm': 15.504498481750488, 'learning_rate': 9.01176014386328e-07, 'epoch': 0.03742880390561432, 'num_input_tokens_seen': 1967128576, 'completed': '25.44% (938 / 3_687)', 'remaining time': '23:16:53', 'throughput': '8536.25', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:38:28,922 >> {'loss': 0.6721, 'grad_norm': 19.771808624267578, 'learning_rate': 9.009234501374371e-07, 'epoch': 0.03770002712232167, 'num_input_tokens_seen': 1969225728, 'completed': '25.47% (939 / 3_687)', 'remaining time': '23:15:50', 'throughput': '9093.59', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:39:01,111 >> {'loss': 0.5628, 'grad_norm': 23.42245101928711, 'learning_rate': 9.006706034787071e-07, 'epoch': 0.03797125033902902, 'num_input_tokens_seen': 1971322880, 'completed': '25.49% (940 / 3_687)', 'remaining time': '23:15:53', 'throughput': '8144.06', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:39:29,356 >> {'loss': 0.6278, 'grad_norm': 18.261873245239258, 'learning_rate': 9.004174746136196e-07, 'epoch': 0.03824247355573637, 'num_input_tokens_seen': 1973420032, 'completed': '25.52% (941 / 3_687)', 'remaining time': '23:14:39', 'throughput': '9281.07', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:40:03,845 >> {'loss': 0.4231, 'grad_norm': 15.880273818969727, 'learning_rate': 9.001640637458829e-07, 'epoch': 0.03851369677244372, 'num_input_tokens_seen': 1975517184, 'completed': '25.55% (942 / 3_687)', 'remaining time': '23:15:26', 'throughput': '7600.61', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:40:32,662 >> {'loss': 0.8923, 'grad_norm': 26.52276611328125, 'learning_rate': 8.999103710794323e-07, 'epoch': 0.038784919989151075, 'num_input_tokens_seen': 1977614336, 'completed': '25.58% (943 / 3_687)', 'remaining time': '23:14:23', 'throughput': '9097.13', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:41:02,778 >> {'loss': 0.3194, 'grad_norm': 12.265298843383789, 'learning_rate': 8.996563968184302e-07, 'epoch': 0.03905614320585842, 'num_input_tokens_seen': 1979711488, 'completed': '25.60% (944 / 3_687)', 'remaining time': '23:13:46', 'throughput': '8704.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:41:33,474 >> {'loss': 0.7034, 'grad_norm': 20.407365798950195, 'learning_rate': 8.994021411672653e-07, 'epoch': 0.03932736642256577, 'num_input_tokens_seen': 1981808640, 'completed': '25.63% (945 / 3_687)', 'remaining time': '23:13:19', 'throughput': '8540.12', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:42:02,716 >> {'loss': 0.5223, 'grad_norm': 22.567445755004883, 'learning_rate': 8.99147604330553e-07, 'epoch': 0.039598589639273124, 'num_input_tokens_seen': 1983905792, 'completed': '25.66% (946 / 3_687)', 'remaining time': '23:12:25', 'throughput': '8964.64', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:42:35,359 >> {'loss': 0.5478, 'grad_norm': 30.489513397216797, 'learning_rate': 8.988927865131347e-07, 'epoch': 0.03986981285598047, 'num_input_tokens_seen': 1986002944, 'completed': '25.68% (947 / 3_687)', 'remaining time': '23:12:35', 'throughput': '8030.53', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:43:07,617 >> {'loss': 0.5236, 'grad_norm': 18.10291862487793, 'learning_rate': 8.986376879200783e-07, 'epoch': 0.04014103607268782, 'num_input_tokens_seen': 1988100096, 'completed': '25.71% (948 / 3_687)', 'remaining time': '23:12:37', 'throughput': '8126.59', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:43:38,583 >> {'loss': 0.2878, 'grad_norm': 13.505256652832031, 'learning_rate': 8.983823087566772e-07, 'epoch': 0.040412259289395173, 'num_input_tokens_seen': 1990197248, 'completed': '25.74% (949 / 3_687)', 'remaining time': '23:12:15', 'throughput': '8465.67', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:44:09,662 >> {'loss': 0.5118, 'grad_norm': 16.355928421020508, 'learning_rate': 8.981266492284511e-07, 'epoch': 0.04068348250610252, 'num_input_tokens_seen': 1992294400, 'completed': '25.77% (950 / 3_687)', 'remaining time': '23:11:55', 'throughput': '8434.68', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:44:41,733 >> {'loss': 0.6593, 'grad_norm': 20.954561233520508, 'learning_rate': 8.978707095411446e-07, 'epoch': 0.040954705722809875, 'num_input_tokens_seen': 1994391552, 'completed': '25.79% (951 / 3_687)', 'remaining time': '23:11:53', 'throughput': '8174.05', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:45:10,174 >> {'loss': 1.0425, 'grad_norm': 28.557451248168945, 'learning_rate': 8.976144899007288e-07, 'epoch': 0.04122592893951722, 'num_input_tokens_seen': 1996488704, 'completed': '25.82% (952 / 3_687)', 'remaining time': '23:10:45', 'throughput': '9216.91', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:45:40,680 >> {'loss': 0.4375, 'grad_norm': 13.614970207214355, 'learning_rate': 8.973579905133991e-07, 'epoch': 0.04149715215622457, 'num_input_tokens_seen': 1998585856, 'completed': '25.85% (953 / 3_687)', 'remaining time': '23:10:14', 'throughput': '8593.08', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:46:11,488 >> {'loss': 0.2845, 'grad_norm': 10.78772258758545, 'learning_rate': 8.971012115855766e-07, 'epoch': 0.041768375372931925, 'num_input_tokens_seen': 2000683008, 'completed': '25.87% (954 / 3_687)', 'remaining time': '23:09:49', 'throughput': '8508.93', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:46:41,249 >> {'loss': 0.4906, 'grad_norm': 17.9395694732666, 'learning_rate': 8.968441533239073e-07, 'epoch': 0.04203959858963927, 'num_input_tokens_seen': 2002780160, 'completed': '25.90% (955 / 3_687)', 'remaining time': '23:09:05', 'throughput': '8808.66', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:47:12,945 >> {'loss': 0.4028, 'grad_norm': 13.34605884552002, 'learning_rate': 8.965868159352616e-07, 'epoch': 0.04231082180634662, 'num_input_tokens_seen': 2004877312, 'completed': '25.93% (956 / 3_687)', 'remaining time': '23:08:55', 'throughput': '8270.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:47:43,577 >> {'loss': 0.3782, 'grad_norm': 16.155813217163086, 'learning_rate': 8.963291996267354e-07, 'epoch': 0.042582045023053974, 'num_input_tokens_seen': 2006974464, 'completed': '25.96% (957 / 3_687)', 'remaining time': '23:08:27', 'throughput': '8557.91', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:48:16,018 >> {'loss': 0.3243, 'grad_norm': 11.215263366699219, 'learning_rate': 8.960713046056478e-07, 'epoch': 0.04285326823976132, 'num_input_tokens_seen': 2009071616, 'completed': '25.98% (958 / 3_687)', 'remaining time': '23:08:30', 'throughput': '8080.63', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:48:44,323 >> {'loss': 0.4452, 'grad_norm': 13.548431396484375, 'learning_rate': 8.958131310795434e-07, 'epoch': 0.043124491456468676, 'num_input_tokens_seen': 2011168768, 'completed': '26.01% (959 / 3_687)', 'remaining time': '23:07:21', 'throughput': '9261.32', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:49:14,761 >> {'loss': 0.4006, 'grad_norm': 13.86419677734375, 'learning_rate': 8.955546792561902e-07, 'epoch': 0.043395714673176024, 'num_input_tokens_seen': 2013265920, 'completed': '26.04% (960 / 3_687)', 'remaining time': '23:06:49', 'throughput': '8612.39', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:49:44,699 >> {'loss': 0.3906, 'grad_norm': 18.16084098815918, 'learning_rate': 8.952959493435806e-07, 'epoch': 0.04366693788988337, 'num_input_tokens_seen': 2015363072, 'completed': '26.06% (961 / 3_687)', 'remaining time': '23:06:09', 'throughput': '8756.12', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:50:13,856 >> {'loss': 0.5608, 'grad_norm': 18.875059127807617, 'learning_rate': 8.950369415499304e-07, 'epoch': 0.043938161106590726, 'num_input_tokens_seen': 2017460224, 'completed': '26.09% (962 / 3_687)', 'remaining time': '23:05:16', 'throughput': '8990.89', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:50:48,212 >> {'loss': 0.7657, 'grad_norm': 22.01349639892578, 'learning_rate': 8.947776560836793e-07, 'epoch': 0.04420938432329807, 'num_input_tokens_seen': 2019557376, 'completed': '26.12% (963 / 3_687)', 'remaining time': '23:05:50', 'throughput': '7630.21', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:51:19,253 >> {'loss': 0.4577, 'grad_norm': 17.236188888549805, 'learning_rate': 8.945180931534902e-07, 'epoch': 0.04448060754000543, 'num_input_tokens_seen': 2021654528, 'completed': '26.15% (964 / 3_687)', 'remaining time': '23:05:28', 'throughput': '8445.05', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:51:51,635 >> {'loss': 0.4122, 'grad_norm': 15.46218204498291, 'learning_rate': 8.942582529682496e-07, 'epoch': 0.044751830756712775, 'num_input_tokens_seen': 2023751680, 'completed': '26.17% (965 / 3_687)', 'remaining time': '23:05:28', 'throughput': '8095.43', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:52:21,810 >> {'loss': 0.3415, 'grad_norm': 14.264276504516602, 'learning_rate': 8.939981357370672e-07, 'epoch': 0.04502305397342012, 'num_input_tokens_seen': 2025848832, 'completed': '26.20% (966 / 3_687)', 'remaining time': '23:04:51', 'throughput': '8687.49', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:52:52,625 >> {'loss': 0.4217, 'grad_norm': 17.676618576049805, 'learning_rate': 8.937377416692752e-07, 'epoch': 0.04529427719012748, 'num_input_tokens_seen': 2027945984, 'completed': '26.23% (967 / 3_687)', 'remaining time': '23:04:25', 'throughput': '8506.96', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:53:22,579 >> {'loss': 0.4929, 'grad_norm': 16.618602752685547, 'learning_rate': 8.934770709744289e-07, 'epoch': 0.045565500406834825, 'num_input_tokens_seen': 2030043136, 'completed': '26.25% (968 / 3_687)', 'remaining time': '23:03:45', 'throughput': '8751.44', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:53:55,019 >> {'loss': 0.3282, 'grad_norm': 13.402697563171387, 'learning_rate': 8.93216123862306e-07, 'epoch': 0.04583672362354217, 'num_input_tokens_seen': 2032140288, 'completed': '26.28% (969 / 3_687)', 'remaining time': '23:03:45', 'throughput': '8081.06', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:54:22,026 >> {'loss': 1.1138, 'grad_norm': 24.799013137817383, 'learning_rate': 8.929549005429071e-07, 'epoch': 0.04610794684024953, 'num_input_tokens_seen': 2034237440, 'completed': '26.31% (970 / 3_687)', 'remaining time': '23:02:18', 'throughput': '9706.24', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:54:52,147 >> {'loss': 0.4161, 'grad_norm': 17.392616271972656, 'learning_rate': 8.926934012264546e-07, 'epoch': 0.046379170056956874, 'num_input_tokens_seen': 2036334592, 'completed': '26.34% (971 / 3_687)', 'remaining time': '23:01:41', 'throughput': '8703.17', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:55:23,872 >> {'loss': 0.567, 'grad_norm': 18.09572410583496, 'learning_rate': 8.924316261233933e-07, 'epoch': 0.04665039327366423, 'num_input_tokens_seen': 2038431744, 'completed': '26.36% (972 / 3_687)', 'remaining time': '23:01:30', 'throughput': '8262.89', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:55:55,006 >> {'loss': 0.3801, 'grad_norm': 14.815757751464844, 'learning_rate': 8.921695754443898e-07, 'epoch': 0.046921616490371576, 'num_input_tokens_seen': 2040528896, 'completed': '26.39% (973 / 3_687)', 'remaining time': '23:01:09', 'throughput': '8419.87', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:56:23,994 >> {'loss': 0.4794, 'grad_norm': 16.38593864440918, 'learning_rate': 8.919072494003325e-07, 'epoch': 0.047192839707078924, 'num_input_tokens_seen': 2042626048, 'completed': '26.42% (974 / 3_687)', 'remaining time': '23:00:14', 'throughput': '9043.19', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:56:57,223 >> {'loss': 0.394, 'grad_norm': 13.091635704040527, 'learning_rate': 8.916446482023313e-07, 'epoch': 0.04746406292378628, 'num_input_tokens_seen': 2044723200, 'completed': '26.44% (975 / 3_687)', 'remaining time': '23:00:25', 'throughput': '7889.03', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:57:26,618 >> {'loss': 0.5268, 'grad_norm': 17.225982666015625, 'learning_rate': 8.913817720617178e-07, 'epoch': 0.047735286140493625, 'num_input_tokens_seen': 2046820352, 'completed': '26.47% (976 / 3_687)', 'remaining time': '22:59:37', 'throughput': '8918.13', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:57:59,009 >> {'loss': 0.3478, 'grad_norm': 12.275423049926758, 'learning_rate': 8.911186211900448e-07, 'epoch': 0.04800650935720097, 'num_input_tokens_seen': 2048917504, 'completed': '26.50% (977 / 3_687)', 'remaining time': '22:59:35', 'throughput': '8093.09', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:58:31,641 >> {'loss': 0.619, 'grad_norm': 19.183958053588867, 'learning_rate': 8.908551957990858e-07, 'epoch': 0.04827773257390833, 'num_input_tokens_seen': 2051014656, 'completed': '26.53% (978 / 3_687)', 'remaining time': '22:59:36', 'throughput': '8033.18', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:59:00,661 >> {'loss': 0.6647, 'grad_norm': 22.74364471435547, 'learning_rate': 8.90591496100836e-07, 'epoch': 0.048548955790615675, 'num_input_tokens_seen': 2053111808, 'completed': '26.55% (979 / 3_687)', 'remaining time': '22:58:42', 'throughput': '9033.24', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:59:29,996 >> {'loss': 0.5442, 'grad_norm': 20.398710250854492, 'learning_rate': 8.903275223075104e-07, 'epoch': 0.04882017900732303, 'num_input_tokens_seen': 2055208960, 'completed': '26.58% (980 / 3_687)', 'remaining time': '22:57:54', 'throughput': '8936.43', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 16:59:58,409 >> {'loss': 0.683, 'grad_norm': 19.808082580566406, 'learning_rate': 8.900632746315454e-07, 'epoch': 0.04909140222403038, 'num_input_tokens_seen': 2057306112, 'completed': '26.61% (981 / 3_687)', 'remaining time': '22:56:51', 'throughput': '9226.10', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:00:27,001 >> {'loss': 1.0303, 'grad_norm': 26.348705291748047, 'learning_rate': 8.897987532855972e-07, 'epoch': 0.049362625440737724, 'num_input_tokens_seen': 2059403264, 'completed': '26.63% (982 / 3_687)', 'remaining time': '22:55:52', 'throughput': '9168.46', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:01:00,917 >> {'loss': 0.5595, 'grad_norm': 21.460338592529297, 'learning_rate': 8.895339584825429e-07, 'epoch': 0.04963384865744508, 'num_input_tokens_seen': 2061500416, 'completed': '26.66% (983 / 3_687)', 'remaining time': '22:56:12', 'throughput': '7729.12', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:01:34,415 >> {'loss': 0.4599, 'grad_norm': 14.994651794433594, 'learning_rate': 8.892688904354787e-07, 'epoch': 0.049905071874152426, 'num_input_tokens_seen': 2063597568, 'completed': '26.69% (984 / 3_687)', 'remaining time': '22:56:25', 'throughput': '7825.80', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:02:03,784 >> {'loss': 0.373, 'grad_norm': 12.65827751159668, 'learning_rate': 8.890035493577219e-07, 'epoch': 0.05017629509085978, 'num_input_tokens_seen': 2065694720, 'completed': '26.72% (985 / 3_687)', 'remaining time': '22:55:37', 'throughput': '8925.71', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:02:33,403 >> {'loss': 0.9004, 'grad_norm': 22.335948944091797, 'learning_rate': 8.887379354628085e-07, 'epoch': 0.05044751830756713, 'num_input_tokens_seen': 2067791872, 'completed': '26.74% (986 / 3_687)', 'remaining time': '22:54:53', 'throughput': '8850.65', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:03:02,335 >> {'loss': 0.3551, 'grad_norm': 13.694454193115234, 'learning_rate': 8.884720489644945e-07, 'epoch': 0.050718741524274476, 'num_input_tokens_seen': 2069889024, 'completed': '26.77% (987 / 3_687)', 'remaining time': '22:53:59', 'throughput': '9060.54', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:03:29,658 >> {'loss': 0.735, 'grad_norm': 20.320920944213867, 'learning_rate': 8.882058900767555e-07, 'epoch': 0.05098996474098183, 'num_input_tokens_seen': 2071986176, 'completed': '26.80% (988 / 3_687)', 'remaining time': '22:52:42', 'throughput': '9594.43', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:04:04,966 >> {'loss': 0.5857, 'grad_norm': 19.12615966796875, 'learning_rate': 8.879394590137857e-07, 'epoch': 0.05126118795768918, 'num_input_tokens_seen': 2074083328, 'completed': '26.82% (989 / 3_687)', 'remaining time': '22:53:20', 'throughput': '7424.51', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:04:37,521 >> {'loss': 0.3749, 'grad_norm': 16.400854110717773, 'learning_rate': 8.876727559899989e-07, 'epoch': 0.051532411174396525, 'num_input_tokens_seen': 2076180480, 'completed': '26.85% (990 / 3_687)', 'remaining time': '22:53:18', 'throughput': '8052.30', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:05:04,342 >> {'loss': 0.9273, 'grad_norm': 24.33226203918457, 'learning_rate': 8.874057812200274e-07, 'epoch': 0.05180363439110388, 'num_input_tokens_seen': 2078277632, 'completed': '26.88% (991 / 3_687)', 'remaining time': '22:51:55', 'throughput': '9773.94', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:05:35,182 >> {'loss': 0.6876, 'grad_norm': 17.637216567993164, 'learning_rate': 8.871385349187225e-07, 'epoch': 0.05207485760781123, 'num_input_tokens_seen': 2080374784, 'completed': '26.91% (992 / 3_687)', 'remaining time': '22:51:29', 'throughput': '8500.08', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:06:03,983 >> {'loss': 0.7319, 'grad_norm': 22.15195083618164, 'learning_rate': 8.868710173011538e-07, 'epoch': 0.05234608082451858, 'num_input_tokens_seen': 2082471936, 'completed': '26.93% (993 / 3_687)', 'remaining time': '22:50:34', 'throughput': '9101.89', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:06:34,219 >> {'loss': 0.4949, 'grad_norm': 16.152517318725586, 'learning_rate': 8.866032285826091e-07, 'epoch': 0.05261730404122593, 'num_input_tokens_seen': 2084569088, 'completed': '26.96% (994 / 3_687)', 'remaining time': '22:50:00', 'throughput': '8670.02', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:07:06,973 >> {'loss': 0.3432, 'grad_norm': 13.20339584350586, 'learning_rate': 8.86335168978595e-07, 'epoch': 0.05288852725793328, 'num_input_tokens_seen': 2086666240, 'completed': '26.99% (995 / 3_687)', 'remaining time': '22:50:00', 'throughput': '8003.29', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:07:39,120 >> {'loss': 0.2759, 'grad_norm': 12.26603889465332, 'learning_rate': 8.860668387048353e-07, 'epoch': 0.05315975047464063, 'num_input_tokens_seen': 2088763392, 'completed': '27.01% (996 / 3_687)', 'remaining time': '22:49:51', 'throughput': '8154.48', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:08:09,471 >> {'loss': 0.3028, 'grad_norm': 12.985957145690918, 'learning_rate': 8.85798237977272e-07, 'epoch': 0.05343097369134798, 'num_input_tokens_seen': 2090860544, 'completed': '27.04% (997 / 3_687)', 'remaining time': '22:49:18', 'throughput': '8637.28', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:08:42,451 >> {'loss': 0.63, 'grad_norm': 17.120885848999023, 'learning_rate': 8.85529367012065e-07, 'epoch': 0.05370219690805533, 'num_input_tokens_seen': 2092957696, 'completed': '27.07% (998 / 3_687)', 'remaining time': '22:49:21', 'throughput': '7948.65', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:09:13,732 >> {'loss': 0.4986, 'grad_norm': 17.12887191772461, 'learning_rate': 8.852602260255911e-07, 'epoch': 0.05397342012476268, 'num_input_tokens_seen': 2095054848, 'completed': '27.10% (999 / 3_687)', 'remaining time': '22:49:00', 'throughput': '8380.18', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:09:45,155 >> {'loss': 0.7428, 'grad_norm': 19.522432327270508, 'learning_rate': 8.849908152344451e-07, 'epoch': 0.05424464334147003, 'num_input_tokens_seen': 2097152000, 'completed': '27.12% (1_000 / 3_687)', 'remaining time': '22:48:41', 'throughput': '8342.42', 'gpu_mem_free': '5581MB'} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( [INFO|trainer.py:3503] 2024-12-25 17:10:09,029 >> Saving model checkpoint to /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-1000 [INFO|configuration_utils.py:472] 2024-12-25 17:10:09,034 >> Configuration saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-1000/config.json [INFO|configuration_utils.py:807] 2024-12-25 17:10:09,035 >> Configuration saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-1000/generation_config.json [INFO|modeling_utils.py:2807] 2024-12-25 17:11:05,268 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 7 checkpoint shards. You can find where each parameters has been saved in the index located at /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-1000/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2684] 2024-12-25 17:11:05,273 >> tokenizer config file saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-1000/tokenizer_config.json [INFO|tokenization_utils_base.py:2693] 2024-12-25 17:11:05,274 >> Special tokens file saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-1000/special_tokens_map.json /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( 12/25/2024 17:14:59 - WARNING - streaming.base.dataset - Because `shuffle_block_size` was not specified, it will default to max(4_000_000 // num_canonical_nodes, 1 << 18) if num_canonical_nodes is not None, otherwise 262144. Prior to Streaming v0.7.0, `shuffle_block_size` defaulted to 262144. [WARNING|trainer.py:868] 2024-12-25 17:14:59,159 >> Save streaming dataset state: {'epoch': 0, 'sample_in_epoch': 2000, 'num_canonical_nodes': 1, 'shuffle_seed': 42, 'initial_physical_nodes': 1} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] [INFO|trainer.py:175] 2024-12-25 17:15:29,462 >> {'loss': 0.3491, 'grad_norm': 19.57962417602539, 'learning_rate': 8.847211348554382e-07, 'epoch': 0.05451586655817738, 'num_input_tokens_seen': 2099249152, 'completed': '27.15% (1_001 / 3_687)', 'remaining time': '23:58:03', 'throughput': '761.37', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:16:01,123 >> {'loss': 0.4238, 'grad_norm': 15.201761245727539, 'learning_rate': 8.844511851055991e-07, 'epoch': 0.05478708977488473, 'num_input_tokens_seen': 2101346304, 'completed': '27.18% (1_002 / 3_687)', 'remaining time': '23:57:25', 'throughput': '8279.79', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:16:32,013 >> {'loss': 0.4377, 'grad_norm': 15.727082252502441, 'learning_rate': 8.841809662021731e-07, 'epoch': 0.05505831299159208, 'num_input_tokens_seen': 2103443456, 'completed': '27.20% (1_003 / 3_687)', 'remaining time': '23:56:37', 'throughput': '8486.28', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:17:02,930 >> {'loss': 0.7603, 'grad_norm': 22.993534088134766, 'learning_rate': 8.839104783626219e-07, 'epoch': 0.05532953620829943, 'num_input_tokens_seen': 2105540608, 'completed': '27.23% (1_004 / 3_687)', 'remaining time': '23:55:49', 'throughput': '8479.07', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:17:32,870 >> {'loss': 0.3576, 'grad_norm': 13.314918518066406, 'learning_rate': 8.836397218046239e-07, 'epoch': 0.05560075942500678, 'num_input_tokens_seen': 2107637760, 'completed': '27.26% (1_005 / 3_687)', 'remaining time': '23:54:48', 'throughput': '8755.50', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:18:01,940 >> {'loss': 0.4315, 'grad_norm': 14.432062149047852, 'learning_rate': 8.83368696746074e-07, 'epoch': 0.055871982641714134, 'num_input_tokens_seen': 2109734912, 'completed': '27.29% (1_006 / 3_687)', 'remaining time': '23:53:37', 'throughput': '9017.71', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:18:30,956 >> {'loss': 0.7692, 'grad_norm': 20.760156631469727, 'learning_rate': 8.830974034050824e-07, 'epoch': 0.05614320585842148, 'num_input_tokens_seen': 2111832064, 'completed': '27.31% (1_007 / 3_687)', 'remaining time': '23:52:25', 'throughput': '9034.47', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:18:59,818 >> {'loss': 0.283, 'grad_norm': 10.294124603271484, 'learning_rate': 8.828258419999759e-07, 'epoch': 0.05641442907512883, 'num_input_tokens_seen': 2113929216, 'completed': '27.34% (1_008 / 3_687)', 'remaining time': '23:51:12', 'throughput': '9082.63', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:19:31,762 >> {'loss': 0.2858, 'grad_norm': 10.813933372497559, 'learning_rate': 8.825540127492965e-07, 'epoch': 0.05668565229183618, 'num_input_tokens_seen': 2116026368, 'completed': '27.37% (1_009 / 3_687)', 'remaining time': '23:50:38', 'throughput': '8206.35', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:20:03,199 >> {'loss': 0.717, 'grad_norm': 23.46965217590332, 'learning_rate': 8.822819158718026e-07, 'epoch': 0.05695687550854353, 'num_input_tokens_seen': 2118123520, 'completed': '27.39% (1_010 / 3_687)', 'remaining time': '23:49:58', 'throughput': '8338.77', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:20:33,501 >> {'loss': 0.5001, 'grad_norm': 19.347179412841797, 'learning_rate': 8.820095515864669e-07, 'epoch': 0.05722809872525088, 'num_input_tokens_seen': 2120220672, 'completed': '27.42% (1_011 / 3_687)', 'remaining time': '23:49:04', 'throughput': '8650.98', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:21:03,570 >> {'loss': 0.5409, 'grad_norm': 18.49291229248047, 'learning_rate': 8.81736920112478e-07, 'epoch': 0.05749932194195823, 'num_input_tokens_seen': 2122317824, 'completed': '27.45% (1_012 / 3_687)', 'remaining time': '23:48:07', 'throughput': '8718.03', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:21:38,394 >> {'loss': 0.4008, 'grad_norm': 14.770519256591797, 'learning_rate': 8.814640216692391e-07, 'epoch': 0.05777054515866558, 'num_input_tokens_seen': 2124414976, 'completed': '27.47% (1_013 / 3_687)', 'remaining time': '23:48:10', 'throughput': '7527.69', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:22:05,927 >> {'loss': 0.8029, 'grad_norm': 22.392454147338867, 'learning_rate': 8.81190856476369e-07, 'epoch': 0.058041768375372935, 'num_input_tokens_seen': 2126512128, 'completed': '27.50% (1_014 / 3_687)', 'remaining time': '23:46:42', 'throughput': '9521.16', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:22:35,257 >> {'loss': 0.5948, 'grad_norm': 19.39711570739746, 'learning_rate': 8.809174247537003e-07, 'epoch': 0.05831299159208028, 'num_input_tokens_seen': 2128609280, 'completed': '27.53% (1_015 / 3_687)', 'remaining time': '23:45:36', 'throughput': '8937.85', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:23:07,442 >> {'loss': 0.2487, 'grad_norm': 11.22283935546875, 'learning_rate': 8.806437267212805e-07, 'epoch': 0.05858421480878763, 'num_input_tokens_seen': 2130706432, 'completed': '27.56% (1_016 / 3_687)', 'remaining time': '23:45:06', 'throughput': '8144.88', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:23:41,141 >> {'loss': 0.8866, 'grad_norm': 23.879262924194336, 'learning_rate': 8.803697625993713e-07, 'epoch': 0.058855438025494984, 'num_input_tokens_seen': 2132803584, 'completed': '27.58% (1_017 / 3_687)', 'remaining time': '23:44:55', 'throughput': '7778.92', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:24:10,954 >> {'loss': 0.6154, 'grad_norm': 17.010164260864258, 'learning_rate': 8.800955326084487e-07, 'epoch': 0.05912666124220233, 'num_input_tokens_seen': 2134900736, 'completed': '27.61% (1_018 / 3_687)', 'remaining time': '23:43:56', 'throughput': '8792.97', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:24:41,449 >> {'loss': 0.2381, 'grad_norm': 12.31965160369873, 'learning_rate': 8.798210369692025e-07, 'epoch': 0.059397884458909686, 'num_input_tokens_seen': 2136997888, 'completed': '27.64% (1_019 / 3_687)', 'remaining time': '23:43:05', 'throughput': '8596.30', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:25:12,178 >> {'loss': 0.4989, 'grad_norm': 14.956830978393555, 'learning_rate': 8.795462759025364e-07, 'epoch': 0.059669107675617034, 'num_input_tokens_seen': 2139095040, 'completed': '27.66% (1_020 / 3_687)', 'remaining time': '23:42:18', 'throughput': '8530.81', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:25:44,706 >> {'loss': 0.3168, 'grad_norm': 11.59531021118164, 'learning_rate': 8.792712496295677e-07, 'epoch': 0.05994033089232438, 'num_input_tokens_seen': 2141192192, 'completed': '27.69% (1_021 / 3_687)', 'remaining time': '23:41:52', 'throughput': '8058.92', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:26:15,354 >> {'loss': 0.4643, 'grad_norm': 15.695453643798828, 'learning_rate': 8.789959583716268e-07, 'epoch': 0.060211554109031735, 'num_input_tokens_seen': 2143289344, 'completed': '27.72% (1_022 / 3_687)', 'remaining time': '23:41:04', 'throughput': '8553.66', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:26:45,292 >> {'loss': 0.2886, 'grad_norm': 12.228116989135742, 'learning_rate': 8.787204023502579e-07, 'epoch': 0.06048277732573908, 'num_input_tokens_seen': 2145386496, 'completed': '27.75% (1_023 / 3_687)', 'remaining time': '23:40:08', 'throughput': '8756.13', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:27:12,539 >> {'loss': 0.4706, 'grad_norm': 15.869124412536621, 'learning_rate': 8.78444581787218e-07, 'epoch': 0.06075400054244643, 'num_input_tokens_seen': 2147483648, 'completed': '27.77% (1_024 / 3_687)', 'remaining time': '23:38:39', 'throughput': '9621.25', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:27:44,870 >> {'loss': 0.4961, 'grad_norm': 17.77741241455078, 'learning_rate': 8.781684969044769e-07, 'epoch': 0.061025223759153785, 'num_input_tokens_seen': 2149580800, 'completed': '27.80% (1_025 / 3_687)', 'remaining time': '23:38:12', 'throughput': '8107.97', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:28:16,783 >> {'loss': 0.5621, 'grad_norm': 18.270593643188477, 'learning_rate': 8.778921479242173e-07, 'epoch': 0.06129644697586113, 'num_input_tokens_seen': 2151677952, 'completed': '27.83% (1_026 / 3_687)', 'remaining time': '23:37:39', 'throughput': '8214.23', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:28:47,320 >> {'loss': 0.2092, 'grad_norm': 10.54736614227295, 'learning_rate': 8.776155350688342e-07, 'epoch': 0.06156767019256849, 'num_input_tokens_seen': 2153775104, 'completed': '27.85% (1_027 / 3_687)', 'remaining time': '23:36:50', 'throughput': '8584.49', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:29:17,452 >> {'loss': 0.2644, 'grad_norm': 14.527905464172363, 'learning_rate': 8.773386585609352e-07, 'epoch': 0.061838893409275834, 'num_input_tokens_seen': 2155872256, 'completed': '27.88% (1_028 / 3_687)', 'remaining time': '23:35:57', 'throughput': '8700.03', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:29:46,966 >> {'loss': 0.4327, 'grad_norm': 14.493075370788574, 'learning_rate': 8.770615186233398e-07, 'epoch': 0.06211011662598318, 'num_input_tokens_seen': 2157969408, 'completed': '27.91% (1_029 / 3_687)', 'remaining time': '23:34:57', 'throughput': '8881.99', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:30:17,553 >> {'loss': 0.3887, 'grad_norm': 15.283905982971191, 'learning_rate': 8.7678411547908e-07, 'epoch': 0.062381339842690536, 'num_input_tokens_seen': 2160066560, 'completed': '27.94% (1_030 / 3_687)', 'remaining time': '23:34:09', 'throughput': '8570.46', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:30:46,609 >> {'loss': 0.8746, 'grad_norm': 22.41847801208496, 'learning_rate': 8.76506449351399e-07, 'epoch': 0.06265256305939788, 'num_input_tokens_seen': 2162163712, 'completed': '27.96% (1_031 / 3_687)', 'remaining time': '23:33:04', 'throughput': '9021.79', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:31:19,114 >> {'loss': 0.3222, 'grad_norm': 12.715897560119629, 'learning_rate': 8.762285204637522e-07, 'epoch': 0.06292378627610523, 'num_input_tokens_seen': 2164260864, 'completed': '27.99% (1_032 / 3_687)', 'remaining time': '23:32:39', 'throughput': '8064.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:31:47,107 >> {'loss': 0.9287, 'grad_norm': 25.04868507385254, 'learning_rate': 8.75950329039806e-07, 'epoch': 0.06319500949281258, 'num_input_tokens_seen': 2166358016, 'completed': '28.02% (1_033 / 3_687)', 'remaining time': '23:31:22', 'throughput': '9364.55', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:32:18,344 >> {'loss': 0.4544, 'grad_norm': 14.874964714050293, 'learning_rate': 8.756718753034381e-07, 'epoch': 0.06346623270951994, 'num_input_tokens_seen': 2168455168, 'completed': '28.04% (1_034 / 3_687)', 'remaining time': '23:30:43', 'throughput': '8392.21', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:32:49,717 >> {'loss': 0.4122, 'grad_norm': 15.657155990600586, 'learning_rate': 8.75393159478738e-07, 'epoch': 0.06373745592622729, 'num_input_tokens_seen': 2170552320, 'completed': '28.07% (1_035 / 3_687)', 'remaining time': '23:30:05', 'throughput': '8355.73', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:33:20,213 >> {'loss': 0.5967, 'grad_norm': 20.458627700805664, 'learning_rate': 8.751141817900052e-07, 'epoch': 0.06400867914293464, 'num_input_tokens_seen': 2172649472, 'completed': '28.10% (1_036 / 3_687)', 'remaining time': '23:29:17', 'throughput': '8596.13', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:33:53,339 >> {'loss': 0.2614, 'grad_norm': 11.38912296295166, 'learning_rate': 8.748349424617504e-07, 'epoch': 0.06427990235964198, 'num_input_tokens_seen': 2174746624, 'completed': '28.13% (1_037 / 3_687)', 'remaining time': '23:28:59', 'throughput': '7913.33', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:34:24,599 >> {'loss': 0.6497, 'grad_norm': 20.700756072998047, 'learning_rate': 8.745554417186946e-07, 'epoch': 0.06455112557634933, 'num_input_tokens_seen': 2176843776, 'completed': '28.15% (1_038 / 3_687)', 'remaining time': '23:28:20', 'throughput': '8386.04', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:34:53,913 >> {'loss': 0.7146, 'grad_norm': 20.341596603393555, 'learning_rate': 8.742756797857698e-07, 'epoch': 0.06482234879305669, 'num_input_tokens_seen': 2178940928, 'completed': '28.18% (1_039 / 3_687)', 'remaining time': '23:27:19', 'throughput': '8942.49', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:35:26,048 >> {'loss': 0.4446, 'grad_norm': 17.363826751708984, 'learning_rate': 8.739956568881174e-07, 'epoch': 0.06509357200976404, 'num_input_tokens_seen': 2181038080, 'completed': '28.21% (1_040 / 3_687)', 'remaining time': '23:26:50', 'throughput': '8157.63', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:35:58,225 >> {'loss': 0.4774, 'grad_norm': 21.04477882385254, 'learning_rate': 8.737153732510894e-07, 'epoch': 0.06536479522647139, 'num_input_tokens_seen': 2183135232, 'completed': '28.23% (1_041 / 3_687)', 'remaining time': '23:26:22', 'throughput': '8147.05', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:36:27,572 >> {'loss': 0.4042, 'grad_norm': 13.529879570007324, 'learning_rate': 8.734348291002472e-07, 'epoch': 0.06563601844317873, 'num_input_tokens_seen': 2185232384, 'completed': '28.26% (1_042 / 3_687)', 'remaining time': '23:25:22', 'throughput': '8932.46', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:36:59,615 >> {'loss': 0.6437, 'grad_norm': 17.152034759521484, 'learning_rate': 8.731540246613621e-07, 'epoch': 0.06590724165988608, 'num_input_tokens_seen': 2187329536, 'completed': '28.29% (1_043 / 3_687)', 'remaining time': '23:24:52', 'throughput': '8181.00', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:37:30,757 >> {'loss': 0.2645, 'grad_norm': 11.518980979919434, 'learning_rate': 8.728729601604149e-07, 'epoch': 0.06617846487659344, 'num_input_tokens_seen': 2189426688, 'completed': '28.32% (1_044 / 3_687)', 'remaining time': '23:24:12', 'throughput': '8417.72', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:38:00,930 >> {'loss': 0.5292, 'grad_norm': 19.25179100036621, 'learning_rate': 8.725916358235956e-07, 'epoch': 0.06644968809330079, 'num_input_tokens_seen': 2191523840, 'completed': '28.34% (1_045 / 3_687)', 'remaining time': '23:23:22', 'throughput': '8688.03', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:38:27,912 >> {'loss': 0.6961, 'grad_norm': 19.196977615356445, 'learning_rate': 8.723100518773034e-07, 'epoch': 0.06672091131000814, 'num_input_tokens_seen': 2193620992, 'completed': '28.37% (1_046 / 3_687)', 'remaining time': '23:21:57', 'throughput': '9715.47', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:38:58,751 >> {'loss': 0.3306, 'grad_norm': 12.074904441833496, 'learning_rate': 8.720282085481463e-07, 'epoch': 0.06699213452671549, 'num_input_tokens_seen': 2195718144, 'completed': '28.40% (1_047 / 3_687)', 'remaining time': '23:21:15', 'throughput': '8500.54', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:39:30,226 >> {'loss': 0.4436, 'grad_norm': 15.995697975158691, 'learning_rate': 8.717461060629408e-07, 'epoch': 0.06726335774342283, 'num_input_tokens_seen': 2197815296, 'completed': '28.42% (1_048 / 3_687)', 'remaining time': '23:20:39', 'throughput': '8328.59', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:40:00,395 >> {'loss': 0.5042, 'grad_norm': 16.870420455932617, 'learning_rate': 8.714637446487127e-07, 'epoch': 0.06753458096013018, 'num_input_tokens_seen': 2199912448, 'completed': '28.45% (1_049 / 3_687)', 'remaining time': '23:19:49', 'throughput': '8689.21', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:40:29,814 >> {'loss': 0.4686, 'grad_norm': 16.621347427368164, 'learning_rate': 8.711811245326955e-07, 'epoch': 0.06780580417683754, 'num_input_tokens_seen': 2202009600, 'completed': '28.48% (1_050 / 3_687)', 'remaining time': '23:18:52', 'throughput': '8910.67', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:41:02,490 >> {'loss': 0.5436, 'grad_norm': 20.809621810913086, 'learning_rate': 8.70898245942331e-07, 'epoch': 0.06807702739354489, 'num_input_tokens_seen': 2204106752, 'completed': '28.51% (1_051 / 3_687)', 'remaining time': '23:18:29', 'throughput': '8022.41', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:41:31,667 >> {'loss': 0.5587, 'grad_norm': 18.218713760375977, 'learning_rate': 8.706151091052693e-07, 'epoch': 0.06834825061025224, 'num_input_tokens_seen': 2206203904, 'completed': '28.53% (1_052 / 3_687)', 'remaining time': '23:17:29', 'throughput': '8984.88', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:42:03,986 >> {'loss': 0.5315, 'grad_norm': 15.873334884643555, 'learning_rate': 8.703317142493681e-07, 'epoch': 0.06861947382695958, 'num_input_tokens_seen': 2208301056, 'completed': '28.56% (1_053 / 3_687)', 'remaining time': '23:17:03', 'throughput': '8111.05', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:42:32,071 >> {'loss': 0.6862, 'grad_norm': 22.45003890991211, 'learning_rate': 8.700480616026928e-07, 'epoch': 0.06889069704366693, 'num_input_tokens_seen': 2210398208, 'completed': '28.59% (1_054 / 3_687)', 'remaining time': '23:15:52', 'throughput': '9333.96', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:43:03,264 >> {'loss': 0.5649, 'grad_norm': 23.83209991455078, 'learning_rate': 8.697641513935164e-07, 'epoch': 0.0691619202603743, 'num_input_tokens_seen': 2212495360, 'completed': '28.61% (1_055 / 3_687)', 'remaining time': '23:15:14', 'throughput': '8403.91', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:43:32,681 >> {'loss': 0.5258, 'grad_norm': 16.99089241027832, 'learning_rate': 8.694799838503186e-07, 'epoch': 0.06943314347708164, 'num_input_tokens_seen': 2214592512, 'completed': '28.64% (1_056 / 3_687)', 'remaining time': '23:14:18', 'throughput': '8911.38', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:44:02,885 >> {'loss': 0.3312, 'grad_norm': 12.274675369262695, 'learning_rate': 8.691955592017872e-07, 'epoch': 0.06970436669378899, 'num_input_tokens_seen': 2216689664, 'completed': '28.67% (1_057 / 3_687)', 'remaining time': '23:13:29', 'throughput': '8679.04', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:44:28,508 >> {'loss': 0.6978, 'grad_norm': 20.667945861816406, 'learning_rate': 8.689108776768159e-07, 'epoch': 0.06997558991049634, 'num_input_tokens_seen': 2218786816, 'completed': '28.70% (1_058 / 3_687)', 'remaining time': '23:11:55', 'throughput': '10230.70', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:44:56,984 >> {'loss': 0.6072, 'grad_norm': 18.657791137695312, 'learning_rate': 8.686259395045056e-07, 'epoch': 0.07024681312720368, 'num_input_tokens_seen': 2220883968, 'completed': '28.72% (1_059 / 3_687)', 'remaining time': '23:10:50', 'throughput': '9205.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:45:29,116 >> {'loss': 0.4635, 'grad_norm': 15.566442489624023, 'learning_rate': 8.68340744914164e-07, 'epoch': 0.07051803634391104, 'num_input_tokens_seen': 2222981120, 'completed': '28.75% (1_060 / 3_687)', 'remaining time': '23:10:22', 'throughput': '8158.48', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:45:59,163 >> {'loss': 0.2827, 'grad_norm': 14.146040916442871, 'learning_rate': 8.680552941353045e-07, 'epoch': 0.07078925956061839, 'num_input_tokens_seen': 2225078272, 'completed': '28.78% (1_061 / 3_687)', 'remaining time': '23:09:33', 'throughput': '8724.28', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:46:29,381 >> {'loss': 0.6515, 'grad_norm': 22.9758358001709, 'learning_rate': 8.677695873976473e-07, 'epoch': 0.07106048277732574, 'num_input_tokens_seen': 2227175424, 'completed': '28.80% (1_062 / 3_687)', 'remaining time': '23:08:46', 'throughput': '8675.26', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:47:00,027 >> {'loss': 0.4001, 'grad_norm': 14.749287605285645, 'learning_rate': 8.674836249311182e-07, 'epoch': 0.07133170599403309, 'num_input_tokens_seen': 2229272576, 'completed': '28.83% (1_063 / 3_687)', 'remaining time': '23:08:03', 'throughput': '8554.00', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:47:27,607 >> {'loss': 0.6569, 'grad_norm': 18.237518310546875, 'learning_rate': 8.671974069658488e-07, 'epoch': 0.07160292921074043, 'num_input_tokens_seen': 2231369728, 'completed': '28.86% (1_064 / 3_687)', 'remaining time': '23:06:50', 'throughput': '9504.80', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:47:56,271 >> {'loss': 0.5926, 'grad_norm': 21.23828125, 'learning_rate': 8.669109337321767e-07, 'epoch': 0.0718741524274478, 'num_input_tokens_seen': 2233466880, 'completed': '28.89% (1_065 / 3_687)', 'remaining time': '23:05:48', 'throughput': '9145.24', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:48:25,236 >> {'loss': 0.4136, 'grad_norm': 13.614550590515137, 'learning_rate': 8.666242054606444e-07, 'epoch': 0.07214537564415514, 'num_input_tokens_seen': 2235564032, 'completed': '28.91% (1_066 / 3_687)', 'remaining time': '23:04:49', 'throughput': '9050.56', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:48:57,758 >> {'loss': 0.5478, 'grad_norm': 19.72222900390625, 'learning_rate': 8.66337222382e-07, 'epoch': 0.07241659886086249, 'num_input_tokens_seen': 2237661184, 'completed': '28.94% (1_067 / 3_687)', 'remaining time': '23:04:25', 'throughput': '8060.56', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:49:27,012 >> {'loss': 0.7417, 'grad_norm': 24.28289222717285, 'learning_rate': 8.660499847271965e-07, 'epoch': 0.07268782207756984, 'num_input_tokens_seen': 2239758336, 'completed': '28.97% (1_068 / 3_687)', 'remaining time': '23:03:30', 'throughput': '8960.92', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:49:58,909 >> {'loss': 0.3927, 'grad_norm': 15.440895080566406, 'learning_rate': 8.657624927273919e-07, 'epoch': 0.07295904529427719, 'num_input_tokens_seen': 2241855488, 'completed': '28.99% (1_069 / 3_687)', 'remaining time': '23:03:00', 'throughput': '8218.48', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:50:28,609 >> {'loss': 0.4269, 'grad_norm': 14.023824691772461, 'learning_rate': 8.654747466139488e-07, 'epoch': 0.07323026851098453, 'num_input_tokens_seen': 2243952640, 'completed': '29.02% (1_070 / 3_687)', 'remaining time': '23:02:09', 'throughput': '8826.23', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:50:57,534 >> {'loss': 0.5167, 'grad_norm': 25.52799415588379, 'learning_rate': 8.651867466184344e-07, 'epoch': 0.0735014917276919, 'num_input_tokens_seen': 2246049792, 'completed': '29.05% (1_071 / 3_687)', 'remaining time': '23:01:11', 'throughput': '9063.02', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:51:28,804 >> {'loss': 0.4347, 'grad_norm': 15.417862892150879, 'learning_rate': 8.6489849297262e-07, 'epoch': 0.07377271494439924, 'num_input_tokens_seen': 2248146944, 'completed': '29.08% (1_072 / 3_687)', 'remaining time': '23:00:35', 'throughput': '8383.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:51:59,433 >> {'loss': 0.4029, 'grad_norm': 17.311777114868164, 'learning_rate': 8.646099859084812e-07, 'epoch': 0.07404393816110659, 'num_input_tokens_seen': 2250244096, 'completed': '29.10% (1_073 / 3_687)', 'remaining time': '22:59:53', 'throughput': '8558.52', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:52:28,669 >> {'loss': 0.4559, 'grad_norm': 15.483137130737305, 'learning_rate': 8.643212256581978e-07, 'epoch': 0.07431516137781394, 'num_input_tokens_seen': 2252341248, 'completed': '29.13% (1_074 / 3_687)', 'remaining time': '22:58:58', 'throughput': '8966.62', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:52:58,955 >> {'loss': 0.4146, 'grad_norm': 18.89014434814453, 'learning_rate': 8.640322124541525e-07, 'epoch': 0.07458638459452128, 'num_input_tokens_seen': 2254438400, 'completed': '29.16% (1_075 / 3_687)', 'remaining time': '22:58:14', 'throughput': '8655.55', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:53:28,941 >> {'loss': 0.7084, 'grad_norm': 23.15615463256836, 'learning_rate': 8.637429465289324e-07, 'epoch': 0.07485760781122865, 'num_input_tokens_seen': 2256535552, 'completed': '29.18% (1_076 / 3_687)', 'remaining time': '22:57:26', 'throughput': '8742.28', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:53:57,328 >> {'loss': 0.9022, 'grad_norm': 26.009380340576172, 'learning_rate': 8.63453428115328e-07, 'epoch': 0.075128831027936, 'num_input_tokens_seen': 2258632704, 'completed': '29.21% (1_077 / 3_687)', 'remaining time': '22:56:24', 'throughput': '9234.74', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:54:24,859 >> {'loss': 0.9285, 'grad_norm': 28.278987884521484, 'learning_rate': 8.631636574463321e-07, 'epoch': 0.07540005424464334, 'num_input_tokens_seen': 2260729856, 'completed': '29.24% (1_078 / 3_687)', 'remaining time': '22:55:14', 'throughput': '9521.85', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:54:54,217 >> {'loss': 0.5127, 'grad_norm': 15.612220764160156, 'learning_rate': 8.628736347551417e-07, 'epoch': 0.07567127746135069, 'num_input_tokens_seen': 2262827008, 'completed': '29.26% (1_079 / 3_687)', 'remaining time': '22:54:21', 'throughput': '8928.97', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:55:27,538 >> {'loss': 0.4181, 'grad_norm': 15.422523498535156, 'learning_rate': 8.625833602751559e-07, 'epoch': 0.07594250067805804, 'num_input_tokens_seen': 2264924160, 'completed': '29.29% (1_080 / 3_687)', 'remaining time': '22:54:05', 'throughput': '7867.35', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:55:56,476 >> {'loss': 0.9629, 'grad_norm': 26.36939239501953, 'learning_rate': 8.622928342399762e-07, 'epoch': 0.0762137238947654, 'num_input_tokens_seen': 2267021312, 'completed': '29.32% (1_081 / 3_687)', 'remaining time': '22:53:08', 'throughput': '9058.66', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:56:26,134 >> {'loss': 0.6093, 'grad_norm': 17.090736389160156, 'learning_rate': 8.620020568834072e-07, 'epoch': 0.07648494711147275, 'num_input_tokens_seen': 2269118464, 'completed': '29.35% (1_082 / 3_687)', 'remaining time': '22:52:19', 'throughput': '8838.98', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:56:54,704 >> {'loss': 0.5369, 'grad_norm': 22.48678207397461, 'learning_rate': 8.617110284394553e-07, 'epoch': 0.07675617032818009, 'num_input_tokens_seen': 2271215616, 'completed': '29.37% (1_083 / 3_687)', 'remaining time': '22:51:19', 'throughput': '9175.50', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:57:23,909 >> {'loss': 0.3614, 'grad_norm': 16.42003059387207, 'learning_rate': 8.614197491423293e-07, 'epoch': 0.07702739354488744, 'num_input_tokens_seen': 2273312768, 'completed': '29.40% (1_084 / 3_687)', 'remaining time': '22:50:26', 'throughput': '8976.12', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:57:52,585 >> {'loss': 0.7799, 'grad_norm': 45.502445220947266, 'learning_rate': 8.611282192264396e-07, 'epoch': 0.07729861676159479, 'num_input_tokens_seen': 2275409920, 'completed': '29.43% (1_085 / 3_687)', 'remaining time': '22:49:27', 'throughput': '9141.60', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:58:22,285 >> {'loss': 0.3233, 'grad_norm': 11.837736129760742, 'learning_rate': 8.608364389263984e-07, 'epoch': 0.07756983997830215, 'num_input_tokens_seen': 2277507072, 'completed': '29.45% (1_086 / 3_687)', 'remaining time': '22:48:39', 'throughput': '8826.39', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:58:54,699 >> {'loss': 0.5218, 'grad_norm': 18.987957000732422, 'learning_rate': 8.605444084770192e-07, 'epoch': 0.0778410631950095, 'num_input_tokens_seen': 2279604224, 'completed': '29.48% (1_087 / 3_687)', 'remaining time': '22:48:15', 'throughput': '8087.20', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:59:25,308 >> {'loss': 0.4539, 'grad_norm': 17.756649017333984, 'learning_rate': 8.602521281133173e-07, 'epoch': 0.07811228641171684, 'num_input_tokens_seen': 2281701376, 'completed': '29.51% (1_088 / 3_687)', 'remaining time': '22:47:35', 'throughput': '8564.31', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 17:59:55,680 >> {'loss': 0.2876, 'grad_norm': 21.006437301635742, 'learning_rate': 8.599595980705085e-07, 'epoch': 0.07838350962842419, 'num_input_tokens_seen': 2283798528, 'completed': '29.54% (1_089 / 3_687)', 'remaining time': '22:46:52', 'throughput': '8631.11', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:00:28,369 >> {'loss': 0.5252, 'grad_norm': 17.649852752685547, 'learning_rate': 8.596668185840102e-07, 'epoch': 0.07865473284513154, 'num_input_tokens_seen': 2285895680, 'completed': '29.56% (1_090 / 3_687)', 'remaining time': '22:46:31', 'throughput': '8019.32', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:00:56,463 >> {'loss': 0.5004, 'grad_norm': 15.316368103027344, 'learning_rate': 8.593737898894398e-07, 'epoch': 0.07892595606183889, 'num_input_tokens_seen': 2287992832, 'completed': '29.59% (1_091 / 3_687)', 'remaining time': '22:45:28', 'throughput': '9331.08', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:01:23,674 >> {'loss': 0.5126, 'grad_norm': 16.443462371826172, 'learning_rate': 8.59080512222616e-07, 'epoch': 0.07919717927854625, 'num_input_tokens_seen': 2290089984, 'completed': '29.62% (1_092 / 3_687)', 'remaining time': '22:44:18', 'throughput': '9633.54', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:01:57,430 >> {'loss': 0.6624, 'grad_norm': 25.65877342224121, 'learning_rate': 8.587869858195574e-07, 'epoch': 0.0794684024952536, 'num_input_tokens_seen': 2292187136, 'completed': '29.64% (1_093 / 3_687)', 'remaining time': '22:44:06', 'throughput': '7765.85', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:02:26,558 >> {'loss': 0.5005, 'grad_norm': 15.421908378601074, 'learning_rate': 8.584932109164826e-07, 'epoch': 0.07973962571196094, 'num_input_tokens_seen': 2294284288, 'completed': '29.67% (1_094 / 3_687)', 'remaining time': '22:43:13', 'throughput': '8999.79', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:03:00,897 >> {'loss': 0.3672, 'grad_norm': 12.648368835449219, 'learning_rate': 8.581991877498109e-07, 'epoch': 0.08001084892866829, 'num_input_tokens_seen': 2296381440, 'completed': '29.70% (1_095 / 3_687)', 'remaining time': '22:43:06', 'throughput': '7634.14', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:03:29,569 >> {'loss': 0.6201, 'grad_norm': 20.8604679107666, 'learning_rate': 8.579049165561607e-07, 'epoch': 0.08028207214537564, 'num_input_tokens_seen': 2298478592, 'completed': '29.73% (1_096 / 3_687)', 'remaining time': '22:42:09', 'throughput': '9142.75', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:03:58,166 >> {'loss': 0.6181, 'grad_norm': 18.767131805419922, 'learning_rate': 8.576103975723502e-07, 'epoch': 0.080553295362083, 'num_input_tokens_seen': 2300575744, 'completed': '29.75% (1_097 / 3_687)', 'remaining time': '22:41:12', 'throughput': '9166.72', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:04:27,054 >> {'loss': 0.5909, 'grad_norm': 18.214879989624023, 'learning_rate': 8.573156310353974e-07, 'epoch': 0.08082451857879035, 'num_input_tokens_seen': 2302672896, 'completed': '29.78% (1_098 / 3_687)', 'remaining time': '22:40:17', 'throughput': '9074.70', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:04:53,968 >> {'loss': 0.5828, 'grad_norm': 15.940201759338379, 'learning_rate': 8.570206171825188e-07, 'epoch': 0.0810957417954977, 'num_input_tokens_seen': 2304770048, 'completed': '29.81% (1_099 / 3_687)', 'remaining time': '22:39:06', 'throughput': '9739.78', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:05:24,136 >> {'loss': 0.4176, 'grad_norm': 13.9312744140625, 'learning_rate': 8.567253562511306e-07, 'epoch': 0.08136696501220504, 'num_input_tokens_seen': 2306867200, 'completed': '29.83% (1_100 / 3_687)', 'remaining time': '22:38:23', 'throughput': '8689.72', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:05:53,322 >> {'loss': 0.4367, 'grad_norm': 16.388742446899414, 'learning_rate': 8.564298484788472e-07, 'epoch': 0.08163818822891239, 'num_input_tokens_seen': 2308964352, 'completed': '29.86% (1_101 / 3_687)', 'remaining time': '22:37:32', 'throughput': '8981.71', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:06:24,120 >> {'loss': 0.2912, 'grad_norm': 11.300383567810059, 'learning_rate': 8.561340941034825e-07, 'epoch': 0.08190941144561975, 'num_input_tokens_seen': 2311061504, 'completed': '29.89% (1_102 / 3_687)', 'remaining time': '22:36:54', 'throughput': '8511.79', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:06:54,937 >> {'loss': 0.6453, 'grad_norm': 19.192779541015625, 'learning_rate': 8.55838093363048e-07, 'epoch': 0.0821806346623271, 'num_input_tokens_seen': 2313158656, 'completed': '29.92% (1_103 / 3_687)', 'remaining time': '22:36:17', 'throughput': '8506.35', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:07:26,525 >> {'loss': 0.6856, 'grad_norm': 20.092891693115234, 'learning_rate': 8.555418464957542e-07, 'epoch': 0.08245185787903445, 'num_input_tokens_seen': 2315255808, 'completed': '29.94% (1_104 / 3_687)', 'remaining time': '22:35:46', 'throughput': '8298.90', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:07:55,385 >> {'loss': 0.4858, 'grad_norm': 16.484115600585938, 'learning_rate': 8.552453537400089e-07, 'epoch': 0.0827230810957418, 'num_input_tokens_seen': 2317352960, 'completed': '29.97% (1_105 / 3_687)', 'remaining time': '22:34:52', 'throughput': '9083.45', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:08:27,003 >> {'loss': 0.7113, 'grad_norm': 27.20563316345215, 'learning_rate': 8.549486153344183e-07, 'epoch': 0.08299430431244914, 'num_input_tokens_seen': 2319450112, 'completed': '30.00% (1_106 / 3_687)', 'remaining time': '22:34:22', 'throughput': '8290.74', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:08:54,435 >> {'loss': 0.6143, 'grad_norm': 17.81060028076172, 'learning_rate': 8.546516315177863e-07, 'epoch': 0.0832655275291565, 'num_input_tokens_seen': 2321547264, 'completed': '30.02% (1_107 / 3_687)', 'remaining time': '22:33:16', 'throughput': '9556.40', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:09:25,059 >> {'loss': 0.5014, 'grad_norm': 17.569068908691406, 'learning_rate': 8.543544025291143e-07, 'epoch': 0.08353675074586385, 'num_input_tokens_seen': 2323644416, 'completed': '30.05% (1_108 / 3_687)', 'remaining time': '22:32:38', 'throughput': '8560.07', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:09:55,155 >> {'loss': 0.4685, 'grad_norm': 15.904462814331055, 'learning_rate': 8.540569286076004e-07, 'epoch': 0.0838079739625712, 'num_input_tokens_seen': 2325741568, 'completed': '30.08% (1_109 / 3_687)', 'remaining time': '22:31:55', 'throughput': '8710.23', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:10:26,219 >> {'loss': 0.6062, 'grad_norm': 19.58460807800293, 'learning_rate': 8.537592099926407e-07, 'epoch': 0.08407919717927854, 'num_input_tokens_seen': 2327838720, 'completed': '30.11% (1_110 / 3_687)', 'remaining time': '22:31:20', 'throughput': '8438.71', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:10:58,092 >> {'loss': 0.3705, 'grad_norm': 13.513802528381348, 'learning_rate': 8.534612469238278e-07, 'epoch': 0.08435042039598589, 'num_input_tokens_seen': 2329935872, 'completed': '30.13% (1_111 / 3_687)', 'remaining time': '22:30:52', 'throughput': '8224.77', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:11:28,820 >> {'loss': 0.677, 'grad_norm': 20.021507263183594, 'learning_rate': 8.531630396409507e-07, 'epoch': 0.08462164361269324, 'num_input_tokens_seen': 2332033024, 'completed': '30.16% (1_112 / 3_687)', 'remaining time': '22:30:14', 'throughput': '8531.19', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:12:00,262 >> {'loss': 0.5015, 'grad_norm': 14.840436935424805, 'learning_rate': 8.528645883839956e-07, 'epoch': 0.0848928668294006, 'num_input_tokens_seen': 2334130176, 'completed': '30.19% (1_113 / 3_687)', 'remaining time': '22:29:43', 'throughput': '8337.19', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:12:31,096 >> {'loss': 0.5022, 'grad_norm': 15.523248672485352, 'learning_rate': 8.525658933931448e-07, 'epoch': 0.08516409004610795, 'num_input_tokens_seen': 2336227328, 'completed': '30.21% (1_114 / 3_687)', 'remaining time': '22:29:06', 'throughput': '8501.75', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:12:58,027 >> {'loss': 0.5963, 'grad_norm': 17.14645004272461, 'learning_rate': 8.522669549087762e-07, 'epoch': 0.0854353132628153, 'num_input_tokens_seen': 2338324480, 'completed': '30.24% (1_115 / 3_687)', 'remaining time': '22:27:58', 'throughput': '9734.04', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:13:28,648 >> {'loss': 0.7184, 'grad_norm': 20.818403244018555, 'learning_rate': 8.519677731714645e-07, 'epoch': 0.08570653647952264, 'num_input_tokens_seen': 2340421632, 'completed': '30.27% (1_116 / 3_687)', 'remaining time': '22:27:20', 'throughput': '8561.04', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:13:58,333 >> {'loss': 0.4394, 'grad_norm': 13.36194896697998, 'learning_rate': 8.516683484219797e-07, 'epoch': 0.08597775969622999, 'num_input_tokens_seen': 2342518784, 'completed': '30.30% (1_117 / 3_687)', 'remaining time': '22:26:34', 'throughput': '8830.88', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:14:29,050 >> {'loss': 0.5567, 'grad_norm': 18.085691452026367, 'learning_rate': 8.513686809012875e-07, 'epoch': 0.08624898291293735, 'num_input_tokens_seen': 2344615936, 'completed': '30.32% (1_118 / 3_687)', 'remaining time': '22:25:57', 'throughput': '8533.92', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:14:58,383 >> {'loss': 0.5698, 'grad_norm': 18.329633712768555, 'learning_rate': 8.510687708505489e-07, 'epoch': 0.0865202061296447, 'num_input_tokens_seen': 2346713088, 'completed': '30.35% (1_119 / 3_687)', 'remaining time': '22:25:08', 'throughput': '8936.88', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:15:28,760 >> {'loss': 0.5401, 'grad_norm': 23.978832244873047, 'learning_rate': 8.507686185111199e-07, 'epoch': 0.08679142934635205, 'num_input_tokens_seen': 2348810240, 'completed': '30.38% (1_120 / 3_687)', 'remaining time': '22:24:28', 'throughput': '8629.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:15:57,861 >> {'loss': 0.3782, 'grad_norm': 19.02138328552246, 'learning_rate': 8.504682241245516e-07, 'epoch': 0.0870626525630594, 'num_input_tokens_seen': 2350907392, 'completed': '30.40% (1_121 / 3_687)', 'remaining time': '22:23:38', 'throughput': '9007.98', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:16:27,778 >> {'loss': 0.2031, 'grad_norm': 10.588142395019531, 'learning_rate': 8.501675879325906e-07, 'epoch': 0.08733387577976674, 'num_input_tokens_seen': 2353004544, 'completed': '30.43% (1_122 / 3_687)', 'remaining time': '22:22:55', 'throughput': '8762.53', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:16:59,488 >> {'loss': 0.4395, 'grad_norm': 18.56726837158203, 'learning_rate': 8.498667101771769e-07, 'epoch': 0.0876050989964741, 'num_input_tokens_seen': 2355101696, 'completed': '30.46% (1_123 / 3_687)', 'remaining time': '22:22:26', 'throughput': '8266.81', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:17:32,899 >> {'loss': 0.6843, 'grad_norm': 21.876890182495117, 'learning_rate': 8.495655911004456e-07, 'epoch': 0.08787632221318145, 'num_input_tokens_seen': 2357198848, 'completed': '30.49% (1_124 / 3_687)', 'remaining time': '22:22:10', 'throughput': '7846.09', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:18:03,622 >> {'loss': 0.508, 'grad_norm': 17.283998489379883, 'learning_rate': 8.492642309447257e-07, 'epoch': 0.0881475454298888, 'num_input_tokens_seen': 2359296000, 'completed': '30.51% (1_125 / 3_687)', 'remaining time': '22:21:33', 'throughput': '8532.36', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:18:33,629 >> {'loss': 0.3963, 'grad_norm': 12.905929565429688, 'learning_rate': 8.489626299525409e-07, 'epoch': 0.08841876864659615, 'num_input_tokens_seen': 2361393152, 'completed': '30.54% (1_126 / 3_687)', 'remaining time': '22:20:51', 'throughput': '8736.28', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:19:03,545 >> {'loss': 0.4069, 'grad_norm': 14.936822891235352, 'learning_rate': 8.486607883666077e-07, 'epoch': 0.0886899918633035, 'num_input_tokens_seen': 2363490304, 'completed': '30.57% (1_127 / 3_687)', 'remaining time': '22:20:08', 'throughput': '8762.50', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:19:34,742 >> {'loss': 0.5104, 'grad_norm': 16.91001319885254, 'learning_rate': 8.483587064298372e-07, 'epoch': 0.08896121508001086, 'num_input_tokens_seen': 2365587456, 'completed': '30.59% (1_128 / 3_687)', 'remaining time': '22:19:35', 'throughput': '8402.94', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:20:07,848 >> {'loss': 0.3908, 'grad_norm': 15.12585735321045, 'learning_rate': 8.480563843853328e-07, 'epoch': 0.0892324382967182, 'num_input_tokens_seen': 2367684608, 'completed': '30.62% (1_129 / 3_687)', 'remaining time': '22:19:17', 'throughput': '7918.40', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:20:38,884 >> {'loss': 0.6482, 'grad_norm': 25.309404373168945, 'learning_rate': 8.477538224763923e-07, 'epoch': 0.08950366151342555, 'num_input_tokens_seen': 2369781760, 'completed': '30.65% (1_130 / 3_687)', 'remaining time': '22:18:42', 'throughput': '8446.36', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:21:09,859 >> {'loss': 0.4492, 'grad_norm': 26.27945899963379, 'learning_rate': 8.474510209465058e-07, 'epoch': 0.0897748847301329, 'num_input_tokens_seen': 2371878912, 'completed': '30.68% (1_131 / 3_687)', 'remaining time': '22:18:07', 'throughput': '8463.04', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:21:40,840 >> {'loss': 0.545, 'grad_norm': 19.251035690307617, 'learning_rate': 8.471479800393565e-07, 'epoch': 0.09004610794684025, 'num_input_tokens_seen': 2373976064, 'completed': '30.70% (1_132 / 3_687)', 'remaining time': '22:17:33', 'throughput': '8461.63', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:22:12,496 >> {'loss': 0.5393, 'grad_norm': 19.276018142700195, 'learning_rate': 8.468446999988202e-07, 'epoch': 0.09031733116354759, 'num_input_tokens_seen': 2376073216, 'completed': '30.73% (1_133 / 3_687)', 'remaining time': '22:17:03', 'throughput': '8281.13', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:22:38,222 >> {'loss': 0.9826, 'grad_norm': 25.423450469970703, 'learning_rate': 8.465411810689653e-07, 'epoch': 0.09058855438025495, 'num_input_tokens_seen': 2378170368, 'completed': '30.76% (1_134 / 3_687)', 'remaining time': '22:15:48', 'throughput': '10189.71', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:23:09,214 >> {'loss': 0.487, 'grad_norm': 17.12239646911621, 'learning_rate': 8.462374234940517e-07, 'epoch': 0.0908597775969623, 'num_input_tokens_seen': 2380267520, 'completed': '30.78% (1_135 / 3_687)', 'remaining time': '22:15:14', 'throughput': '8458.33', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:23:38,886 >> {'loss': 0.3742, 'grad_norm': 14.614494323730469, 'learning_rate': 8.459334275185325e-07, 'epoch': 0.09113100081366965, 'num_input_tokens_seen': 2382364672, 'completed': '30.81% (1_136 / 3_687)', 'remaining time': '22:14:29', 'throughput': '8834.77', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:24:07,126 >> {'loss': 1.0046, 'grad_norm': 27.48741340637207, 'learning_rate': 8.456291933870521e-07, 'epoch': 0.091402224030377, 'num_input_tokens_seen': 2384461824, 'completed': '30.84% (1_137 / 3_687)', 'remaining time': '22:13:34', 'throughput': '9282.57', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:24:38,578 >> {'loss': 0.2549, 'grad_norm': 12.554402351379395, 'learning_rate': 8.453247213444463e-07, 'epoch': 0.09167344724708434, 'num_input_tokens_seen': 2386558976, 'completed': '30.87% (1_138 / 3_687)', 'remaining time': '22:13:03', 'throughput': '8334.67', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:25:09,342 >> {'loss': 0.5053, 'grad_norm': 14.78433895111084, 'learning_rate': 8.450200116357428e-07, 'epoch': 0.0919446704637917, 'num_input_tokens_seen': 2388656128, 'completed': '30.89% (1_139 / 3_687)', 'remaining time': '22:12:27', 'throughput': '8521.32', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:25:41,290 >> {'loss': 0.3622, 'grad_norm': 12.076690673828125, 'learning_rate': 8.4471506450616e-07, 'epoch': 0.09221589368049905, 'num_input_tokens_seen': 2390753280, 'completed': '30.92% (1_140 / 3_687)', 'remaining time': '22:12:00', 'throughput': '8205.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:26:11,284 >> {'loss': 0.472, 'grad_norm': 18.700605392456055, 'learning_rate': 8.444098802011083e-07, 'epoch': 0.0924871168972064, 'num_input_tokens_seen': 2392850432, 'completed': '30.95% (1_141 / 3_687)', 'remaining time': '22:11:19', 'throughput': '8740.00', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:26:42,699 >> {'loss': 0.6367, 'grad_norm': 19.415464401245117, 'learning_rate': 8.441044589661881e-07, 'epoch': 0.09275834011391375, 'num_input_tokens_seen': 2394947584, 'completed': '30.97% (1_142 / 3_687)', 'remaining time': '22:10:47', 'throughput': '8344.39', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:27:13,078 >> {'loss': 0.51, 'grad_norm': 19.17083740234375, 'learning_rate': 8.437988010471907e-07, 'epoch': 0.0930295633306211, 'num_input_tokens_seen': 2397044736, 'completed': '31.00% (1_143 / 3_687)', 'remaining time': '22:10:09', 'throughput': '8629.32', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:27:43,326 >> {'loss': 0.533, 'grad_norm': 16.67664337158203, 'learning_rate': 8.434929066900982e-07, 'epoch': 0.09330078654732846, 'num_input_tokens_seen': 2399141888, 'completed': '31.03% (1_144 / 3_687)', 'remaining time': '22:09:29', 'throughput': '8666.30', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:28:10,973 >> {'loss': 0.8218, 'grad_norm': 20.507274627685547, 'learning_rate': 8.431867761410826e-07, 'epoch': 0.0935720097640358, 'num_input_tokens_seen': 2401239040, 'completed': '31.06% (1_145 / 3_687)', 'remaining time': '22:08:30', 'throughput': '9482.05', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:28:41,241 >> {'loss': 0.4415, 'grad_norm': 14.855219841003418, 'learning_rate': 8.42880409646506e-07, 'epoch': 0.09384323298074315, 'num_input_tokens_seen': 2403336192, 'completed': '31.08% (1_146 / 3_687)', 'remaining time': '22:07:51', 'throughput': '8660.66', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:29:12,361 >> {'loss': 0.485, 'grad_norm': 19.247512817382812, 'learning_rate': 8.42573807452921e-07, 'epoch': 0.0941144561974505, 'num_input_tokens_seen': 2405433344, 'completed': '31.11% (1_147 / 3_687)', 'remaining time': '22:07:18', 'throughput': '8423.92', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:29:41,653 >> {'loss': 0.5588, 'grad_norm': 23.100740432739258, 'learning_rate': 8.422669698070687e-07, 'epoch': 0.09438567941415785, 'num_input_tokens_seen': 2407530496, 'completed': '31.14% (1_148 / 3_687)', 'remaining time': '22:06:31', 'throughput': '8948.93', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:30:12,026 >> {'loss': 0.2426, 'grad_norm': 12.322291374206543, 'learning_rate': 8.419598969558808e-07, 'epoch': 0.09465690263086521, 'num_input_tokens_seen': 2409627648, 'completed': '31.16% (1_149 / 3_687)', 'remaining time': '22:05:53', 'throughput': '8631.02', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:30:44,036 >> {'loss': 0.3201, 'grad_norm': 11.358101844787598, 'learning_rate': 8.416525891464776e-07, 'epoch': 0.09492812584757256, 'num_input_tokens_seen': 2411724800, 'completed': '31.19% (1_150 / 3_687)', 'remaining time': '22:05:26', 'throughput': '8189.38', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:31:16,392 >> {'loss': 0.4272, 'grad_norm': 16.695714950561523, 'learning_rate': 8.413450466261691e-07, 'epoch': 0.0951993490642799, 'num_input_tokens_seen': 2413821952, 'completed': '31.22% (1_151 / 3_687)', 'remaining time': '22:05:02', 'throughput': '8101.81', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:31:46,647 >> {'loss': 0.3545, 'grad_norm': 14.923744201660156, 'learning_rate': 8.410372696424535e-07, 'epoch': 0.09547057228098725, 'num_input_tokens_seen': 2415919104, 'completed': '31.24% (1_152 / 3_687)', 'remaining time': '22:04:23', 'throughput': '8664.43', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:32:17,410 >> {'loss': 0.4019, 'grad_norm': 13.441437721252441, 'learning_rate': 8.40729258443018e-07, 'epoch': 0.0957417954976946, 'num_input_tokens_seen': 2418016256, 'completed': '31.27% (1_153 / 3_687)', 'remaining time': '22:03:48', 'throughput': '8521.57', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:32:46,540 >> {'loss': 0.3352, 'grad_norm': 13.833053588867188, 'learning_rate': 8.404210132757385e-07, 'epoch': 0.09601301871440195, 'num_input_tokens_seen': 2420113408, 'completed': '31.30% (1_154 / 3_687)', 'remaining time': '22:03:00', 'throughput': '8999.10', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:33:16,469 >> {'loss': 0.5448, 'grad_norm': 13.472502708435059, 'learning_rate': 8.401125343886787e-07, 'epoch': 0.09628424193110931, 'num_input_tokens_seen': 2422210560, 'completed': '31.33% (1_155 / 3_687)', 'remaining time': '22:02:19', 'throughput': '8758.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:33:46,833 >> {'loss': 0.3018, 'grad_norm': 10.587477684020996, 'learning_rate': 8.398038220300908e-07, 'epoch': 0.09655546514781665, 'num_input_tokens_seen': 2424307712, 'completed': '31.35% (1_156 / 3_687)', 'remaining time': '22:01:41', 'throughput': '8633.45', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:34:12,978 >> {'loss': 0.7406, 'grad_norm': 19.353105545043945, 'learning_rate': 8.39494876448415e-07, 'epoch': 0.096826688364524, 'num_input_tokens_seen': 2426404864, 'completed': '31.38% (1_157 / 3_687)', 'remaining time': '22:00:33', 'throughput': '10026.56', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:34:42,632 >> {'loss': 0.8482, 'grad_norm': 24.70726203918457, 'learning_rate': 8.391856978922785e-07, 'epoch': 0.09709791158123135, 'num_input_tokens_seen': 2428502016, 'completed': '31.41% (1_158 / 3_687)', 'remaining time': '21:59:50', 'throughput': '8840.15', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:35:13,337 >> {'loss': 0.232, 'grad_norm': 15.665159225463867, 'learning_rate': 8.38876286610497e-07, 'epoch': 0.0973691347979387, 'num_input_tokens_seen': 2430599168, 'completed': '31.43% (1_159 / 3_687)', 'remaining time': '21:59:14', 'throughput': '8537.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:35:43,400 >> {'loss': 0.6336, 'grad_norm': 19.37482452392578, 'learning_rate': 8.385666428520723e-07, 'epoch': 0.09764035801464606, 'num_input_tokens_seen': 2432696320, 'completed': '31.46% (1_160 / 3_687)', 'remaining time': '21:58:34', 'throughput': '8719.64', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:36:15,051 >> {'loss': 0.6754, 'grad_norm': 21.107303619384766, 'learning_rate': 8.382567668661943e-07, 'epoch': 0.0979115812313534, 'num_input_tokens_seen': 2434793472, 'completed': '31.49% (1_161 / 3_687)', 'remaining time': '21:58:05', 'throughput': '8282.56', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:36:45,077 >> {'loss': 0.613, 'grad_norm': 21.173263549804688, 'learning_rate': 8.379466589022393e-07, 'epoch': 0.09818280444806075, 'num_input_tokens_seen': 2436890624, 'completed': '31.52% (1_162 / 3_687)', 'remaining time': '21:57:25', 'throughput': '8730.34', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:37:15,381 >> {'loss': 0.5053, 'grad_norm': 23.236482620239258, 'learning_rate': 8.376363192097703e-07, 'epoch': 0.0984540276647681, 'num_input_tokens_seen': 2438987776, 'completed': '31.54% (1_163 / 3_687)', 'remaining time': '21:56:47', 'throughput': '8650.60', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:37:47,163 >> {'loss': 0.6246, 'grad_norm': 24.89634132385254, 'learning_rate': 8.37325748038537e-07, 'epoch': 0.09872525088147545, 'num_input_tokens_seen': 2441084928, 'completed': '31.57% (1_164 / 3_687)', 'remaining time': '21:56:19', 'throughput': '8248.04', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:38:16,304 >> {'loss': 0.4242, 'grad_norm': 17.755111694335938, 'learning_rate': 8.370149456384754e-07, 'epoch': 0.09899647409818281, 'num_input_tokens_seen': 2443182080, 'completed': '31.60% (1_165 / 3_687)', 'remaining time': '21:55:32', 'throughput': '8995.81', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:38:47,355 >> {'loss': 0.2753, 'grad_norm': 12.603349685668945, 'learning_rate': 8.36703912259707e-07, 'epoch': 0.09926769731489016, 'num_input_tokens_seen': 2445279232, 'completed': '31.62% (1_166 / 3_687)', 'remaining time': '21:54:59', 'throughput': '8442.46', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:39:16,796 >> {'loss': 0.7312, 'grad_norm': 24.28446388244629, 'learning_rate': 8.363926481525402e-07, 'epoch': 0.0995389205315975, 'num_input_tokens_seen': 2447376384, 'completed': '31.65% (1_167 / 3_687)', 'remaining time': '21:54:15', 'throughput': '8904.10', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:39:46,999 >> {'loss': 0.478, 'grad_norm': 16.006052017211914, 'learning_rate': 8.360811535674682e-07, 'epoch': 0.09981014374830485, 'num_input_tokens_seen': 2449473536, 'completed': '31.68% (1_168 / 3_687)', 'remaining time': '21:53:37', 'throughput': '8679.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:40:18,027 >> {'loss': 0.779, 'grad_norm': 25.193614959716797, 'learning_rate': 8.357694287551698e-07, 'epoch': 0.1000813669650122, 'num_input_tokens_seen': 2451570688, 'completed': '31.71% (1_169 / 3_687)', 'remaining time': '21:53:03', 'throughput': '8448.62', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:40:49,556 >> {'loss': 0.496, 'grad_norm': 14.870660781860352, 'learning_rate': 8.354574739665096e-07, 'epoch': 0.10035259018171956, 'num_input_tokens_seen': 2453667840, 'completed': '31.73% (1_170 / 3_687)', 'remaining time': '21:52:34', 'throughput': '8314.19', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:41:23,186 >> {'loss': 0.305, 'grad_norm': 17.222614288330078, 'learning_rate': 8.351452894525368e-07, 'epoch': 0.10062381339842691, 'num_input_tokens_seen': 2455764992, 'completed': '31.76% (1_171 / 3_687)', 'remaining time': '21:52:18', 'throughput': '7794.90', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:41:52,950 >> {'loss': 0.4074, 'grad_norm': 14.791529655456543, 'learning_rate': 8.348328754644855e-07, 'epoch': 0.10089503661513426, 'num_input_tokens_seen': 2457862144, 'completed': '31.79% (1_172 / 3_687)', 'remaining time': '21:51:37', 'throughput': '8807.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:42:22,545 >> {'loss': 0.6192, 'grad_norm': 20.521638870239258, 'learning_rate': 8.34520232253775e-07, 'epoch': 0.1011662598318416, 'num_input_tokens_seen': 2459959296, 'completed': '31.81% (1_173 / 3_687)', 'remaining time': '21:50:54', 'throughput': '8857.86', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:42:53,901 >> {'loss': 0.6189, 'grad_norm': 25.750980377197266, 'learning_rate': 8.342073600720082e-07, 'epoch': 0.10143748304854895, 'num_input_tokens_seen': 2462056448, 'completed': '31.84% (1_174 / 3_687)', 'remaining time': '21:50:23', 'throughput': '8360.24', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:43:30,368 >> {'loss': 0.477, 'grad_norm': 18.543386459350586, 'learning_rate': 8.33894259170973e-07, 'epoch': 0.10170870626525631, 'num_input_tokens_seen': 2464153600, 'completed': '31.87% (1_175 / 3_687)', 'remaining time': '21:50:27', 'throughput': '7188.46', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:44:02,140 >> {'loss': 0.3135, 'grad_norm': 13.476463317871094, 'learning_rate': 8.335809298026409e-07, 'epoch': 0.10197992948196366, 'num_input_tokens_seen': 2466250752, 'completed': '31.90% (1_176 / 3_687)', 'remaining time': '21:49:59', 'throughput': '8250.80', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:44:34,253 >> {'loss': 0.3671, 'grad_norm': 19.402223587036133, 'learning_rate': 8.332673722191677e-07, 'epoch': 0.10225115269867101, 'num_input_tokens_seen': 2468347904, 'completed': '31.92% (1_177 / 3_687)', 'remaining time': '21:49:33', 'throughput': '8163.18', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:45:04,811 >> {'loss': 0.4054, 'grad_norm': 21.820781707763672, 'learning_rate': 8.329535866728922e-07, 'epoch': 0.10252237591537836, 'num_input_tokens_seen': 2470445056, 'completed': '31.95% (1_178 / 3_687)', 'remaining time': '21:48:56', 'throughput': '8578.63', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:45:36,619 >> {'loss': 0.6286, 'grad_norm': 19.219013214111328, 'learning_rate': 8.326395734163375e-07, 'epoch': 0.1027935991320857, 'num_input_tokens_seen': 2472542208, 'completed': '31.98% (1_179 / 3_687)', 'remaining time': '21:48:28', 'throughput': '8241.44', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:46:12,339 >> {'loss': 0.4077, 'grad_norm': 22.508085250854492, 'learning_rate': 8.323253327022094e-07, 'epoch': 0.10306482234879305, 'num_input_tokens_seen': 2474639360, 'completed': '32.00% (1_180 / 3_687)', 'remaining time': '21:48:26', 'throughput': '7338.88', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:46:46,229 >> {'loss': 0.3092, 'grad_norm': 12.82617473602295, 'learning_rate': 8.320108647833967e-07, 'epoch': 0.10333604556550041, 'num_input_tokens_seen': 2476736512, 'completed': '32.03% (1_181 / 3_687)', 'remaining time': '21:48:12', 'throughput': '7735.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:47:16,554 >> {'loss': 0.2255, 'grad_norm': 12.152515411376953, 'learning_rate': 8.316961699129714e-07, 'epoch': 0.10360726878220776, 'num_input_tokens_seen': 2478833664, 'completed': '32.06% (1_182 / 3_687)', 'remaining time': '21:47:34', 'throughput': '8644.46', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:47:48,881 >> {'loss': 0.6904, 'grad_norm': 20.471603393554688, 'learning_rate': 8.313812483441879e-07, 'epoch': 0.1038784919989151, 'num_input_tokens_seen': 2480930816, 'completed': '32.09% (1_183 / 3_687)', 'remaining time': '21:47:09', 'throughput': '8109.04', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:48:17,928 >> {'loss': 0.3722, 'grad_norm': 14.457162857055664, 'learning_rate': 8.310661003304829e-07, 'epoch': 0.10414971521562245, 'num_input_tokens_seen': 2483027968, 'completed': '32.11% (1_184 / 3_687)', 'remaining time': '21:46:23', 'throughput': '9024.91', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:48:47,009 >> {'loss': 0.5091, 'grad_norm': 25.704435348510742, 'learning_rate': 8.30750726125476e-07, 'epoch': 0.1044209384323298, 'num_input_tokens_seen': 2485125120, 'completed': '32.14% (1_185 / 3_687)', 'remaining time': '21:45:37', 'throughput': '9014.35', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:49:15,388 >> {'loss': 0.4869, 'grad_norm': 16.720243453979492, 'learning_rate': 8.304351259829678e-07, 'epoch': 0.10469216164903716, 'num_input_tokens_seen': 2487222272, 'completed': '32.17% (1_186 / 3_687)', 'remaining time': '21:44:47', 'throughput': '9237.29', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:49:44,392 >> {'loss': 0.3771, 'grad_norm': 14.137613296508789, 'learning_rate': 8.301193001569418e-07, 'epoch': 0.10496338486574451, 'num_input_tokens_seen': 2489319424, 'completed': '32.19% (1_187 / 3_687)', 'remaining time': '21:44:01', 'throughput': '9038.20', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:50:15,412 >> {'loss': 0.317, 'grad_norm': 15.297533988952637, 'learning_rate': 8.298032489015623e-07, 'epoch': 0.10523460808245186, 'num_input_tokens_seen': 2491416576, 'completed': '32.22% (1_188 / 3_687)', 'remaining time': '21:43:28', 'throughput': '8450.61', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:50:44,660 >> {'loss': 0.5508, 'grad_norm': 21.42536163330078, 'learning_rate': 8.294869724711752e-07, 'epoch': 0.1055058312991592, 'num_input_tokens_seen': 2493513728, 'completed': '32.25% (1_189 / 3_687)', 'remaining time': '21:42:43', 'throughput': '8962.93', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:51:15,879 >> {'loss': 0.3826, 'grad_norm': 14.231220245361328, 'learning_rate': 8.291704711203082e-07, 'epoch': 0.10577705451586655, 'num_input_tokens_seen': 2495610880, 'completed': '32.28% (1_190 / 3_687)', 'remaining time': '21:42:12', 'throughput': '8396.99', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:51:48,578 >> {'loss': 0.3962, 'grad_norm': 16.597139358520508, 'learning_rate': 8.288537451036691e-07, 'epoch': 0.10604827773257391, 'num_input_tokens_seen': 2497708032, 'completed': '32.30% (1_191 / 3_687)', 'remaining time': '21:41:49', 'throughput': '8016.73', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:52:20,256 >> {'loss': 0.3347, 'grad_norm': 25.333345413208008, 'learning_rate': 8.28536794676147e-07, 'epoch': 0.10631950094928126, 'num_input_tokens_seen': 2499805184, 'completed': '32.33% (1_192 / 3_687)', 'remaining time': '21:41:20', 'throughput': '8275.45', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:52:53,673 >> {'loss': 0.4289, 'grad_norm': 29.860403060913086, 'learning_rate': 8.282196200928119e-07, 'epoch': 0.10659072416598861, 'num_input_tokens_seen': 2501902336, 'completed': '32.36% (1_193 / 3_687)', 'remaining time': '21:41:03', 'throughput': '7844.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:53:23,946 >> {'loss': 0.3256, 'grad_norm': 12.253214836120605, 'learning_rate': 8.279022216089135e-07, 'epoch': 0.10686194738269596, 'num_input_tokens_seen': 2503999488, 'completed': '32.38% (1_194 / 3_687)', 'remaining time': '21:40:25', 'throughput': '8659.37', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:53:55,064 >> {'loss': 0.4446, 'grad_norm': 17.277803421020508, 'learning_rate': 8.275845994798821e-07, 'epoch': 0.1071331705994033, 'num_input_tokens_seen': 2506096640, 'completed': '32.41% (1_195 / 3_687)', 'remaining time': '21:39:52', 'throughput': '8424.40', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:54:24,659 >> {'loss': 0.4053, 'grad_norm': 13.697084426879883, 'learning_rate': 8.272667539613281e-07, 'epoch': 0.10740439381611067, 'num_input_tokens_seen': 2508193792, 'completed': '32.44% (1_196 / 3_687)', 'remaining time': '21:39:10', 'throughput': '8857.56', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:54:56,223 >> {'loss': 0.7259, 'grad_norm': 22.16970443725586, 'learning_rate': 8.26948685309041e-07, 'epoch': 0.10767561703281801, 'num_input_tokens_seen': 2510290944, 'completed': '32.47% (1_197 / 3_687)', 'remaining time': '21:38:41', 'throughput': '8305.20', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:55:28,077 >> {'loss': 0.4267, 'grad_norm': 16.95806884765625, 'learning_rate': 8.266303937789908e-07, 'epoch': 0.10794684024952536, 'num_input_tokens_seen': 2512388096, 'completed': '32.49% (1_198 / 3_687)', 'remaining time': '21:38:13', 'throughput': '8229.52', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:55:57,582 >> {'loss': 0.5967, 'grad_norm': 15.100922584533691, 'learning_rate': 8.263118796273263e-07, 'epoch': 0.10821806346623271, 'num_input_tokens_seen': 2514485248, 'completed': '32.52% (1_199 / 3_687)', 'remaining time': '21:37:30', 'throughput': '8884.61', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 18:56:29,840 >> {'loss': 0.4512, 'grad_norm': 20.644742965698242, 'learning_rate': 8.259931431103754e-07, 'epoch': 0.10848928668294006, 'num_input_tokens_seen': 2516582400, 'completed': '32.55% (1_200 / 3_687)', 'remaining time': '21:37:05', 'throughput': '8126.53', 'gpu_mem_free': '5581MB'} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( [INFO|trainer.py:3503] 2024-12-25 18:56:59,416 >> Saving model checkpoint to /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-1200 [INFO|configuration_utils.py:472] 2024-12-25 18:56:59,419 >> Configuration saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-1200/config.json [INFO|configuration_utils.py:807] 2024-12-25 18:56:59,420 >> Configuration saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-1200/generation_config.json [INFO|modeling_utils.py:2807] 2024-12-25 18:58:06,272 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 7 checkpoint shards. You can find where each parameters has been saved in the index located at /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-1200/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2684] 2024-12-25 18:58:06,276 >> tokenizer config file saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-1200/tokenizer_config.json [INFO|tokenization_utils_base.py:2693] 2024-12-25 18:58:06,277 >> Special tokens file saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-1200/special_tokens_map.json /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( [WARNING|trainer.py:868] 2024-12-25 19:02:10,058 >> Save streaming dataset state: {'epoch': 0, 'sample_in_epoch': 2400, 'num_canonical_nodes': 1, 'shuffle_seed': 42, 'initial_physical_nodes': 1} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] [INFO|trainer.py:175] 2024-12-25 19:02:41,669 >> {'loss': 0.5051, 'grad_norm': 16.619178771972656, 'learning_rate': 8.256741844846452e-07, 'epoch': 0.1087605098996474, 'num_input_tokens_seen': 2518679552, 'completed': '32.57% (1_201 / 3_687)', 'remaining time': '22:11:45', 'throughput': '705.01', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:03:16,491 >> {'loss': 0.3861, 'grad_norm': 13.504317283630371, 'learning_rate': 8.253550040068216e-07, 'epoch': 0.10903173311635476, 'num_input_tokens_seen': 2520776704, 'completed': '32.60% (1_202 / 3_687)', 'remaining time': '22:11:29', 'throughput': '7528.31', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:03:50,270 >> {'loss': 0.417, 'grad_norm': 15.428912162780762, 'learning_rate': 8.250356019337688e-07, 'epoch': 0.10930295633306211, 'num_input_tokens_seen': 2522873856, 'completed': '32.63% (1_203 / 3_687)', 'remaining time': '22:11:07', 'throughput': '7760.51', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:04:18,628 >> {'loss': 0.9491, 'grad_norm': 25.92603874206543, 'learning_rate': 8.247159785225295e-07, 'epoch': 0.10957417954976946, 'num_input_tokens_seen': 2524971008, 'completed': '32.66% (1_204 / 3_687)', 'remaining time': '22:10:12', 'throughput': '9244.11', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:04:46,436 >> {'loss': 0.756, 'grad_norm': 21.164356231689453, 'learning_rate': 8.243961340303245e-07, 'epoch': 0.10984540276647681, 'num_input_tokens_seen': 2527068160, 'completed': '32.68% (1_205 / 3_687)', 'remaining time': '22:09:13', 'throughput': '9426.88', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:05:16,744 >> {'loss': 0.2077, 'grad_norm': 9.049589157104492, 'learning_rate': 8.240760687145521e-07, 'epoch': 0.11011662598318415, 'num_input_tokens_seen': 2529165312, 'completed': '32.71% (1_206 / 3_687)', 'remaining time': '22:08:30', 'throughput': '8649.24', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:05:47,045 >> {'loss': 0.6235, 'grad_norm': 17.34234046936035, 'learning_rate': 8.237557828327891e-07, 'epoch': 0.11038784919989152, 'num_input_tokens_seen': 2531262464, 'completed': '32.74% (1_207 / 3_687)', 'remaining time': '22:07:47', 'throughput': '8651.23', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:06:19,016 >> {'loss': 0.6889, 'grad_norm': 21.27731704711914, 'learning_rate': 8.234352766427894e-07, 'epoch': 0.11065907241659886, 'num_input_tokens_seen': 2533359616, 'completed': '32.76% (1_208 / 3_687)', 'remaining time': '22:07:14', 'throughput': '8199.55', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:06:48,570 >> {'loss': 0.7666, 'grad_norm': 21.866683959960938, 'learning_rate': 8.231145504024838e-07, 'epoch': 0.11093029563330621, 'num_input_tokens_seen': 2535456768, 'completed': '32.79% (1_209 / 3_687)', 'remaining time': '22:06:26', 'throughput': '8870.01', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:07:19,275 >> {'loss': 0.3829, 'grad_norm': 12.252056121826172, 'learning_rate': 8.22793604369981e-07, 'epoch': 0.11120151885001356, 'num_input_tokens_seen': 2537553920, 'completed': '32.82% (1_210 / 3_687)', 'remaining time': '22:05:45', 'throughput': '8537.48', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:07:49,979 >> {'loss': 0.4294, 'grad_norm': 18.94308853149414, 'learning_rate': 8.224724388035659e-07, 'epoch': 0.1114727420667209, 'num_input_tokens_seen': 2539651072, 'completed': '32.85% (1_211 / 3_687)', 'remaining time': '22:05:05', 'throughput': '8537.88', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:08:19,971 >> {'loss': 0.5497, 'grad_norm': 18.54946517944336, 'learning_rate': 8.221510539617003e-07, 'epoch': 0.11174396528342827, 'num_input_tokens_seen': 2541748224, 'completed': '32.87% (1_212 / 3_687)', 'remaining time': '22:04:20', 'throughput': '8740.49', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:08:51,112 >> {'loss': 0.3906, 'grad_norm': 13.319576263427734, 'learning_rate': 8.218294501030226e-07, 'epoch': 0.11201518850013562, 'num_input_tokens_seen': 2543845376, 'completed': '32.90% (1_213 / 3_687)', 'remaining time': '22:03:42', 'throughput': '8417.82', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:09:23,222 >> {'loss': 0.4973, 'grad_norm': 14.632641792297363, 'learning_rate': 8.215076274863476e-07, 'epoch': 0.11228641171684296, 'num_input_tokens_seen': 2545942528, 'completed': '32.93% (1_214 / 3_687)', 'remaining time': '22:03:10', 'throughput': '8163.99', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:09:55,108 >> {'loss': 0.3886, 'grad_norm': 14.780923843383789, 'learning_rate': 8.211855863706654e-07, 'epoch': 0.11255763493355031, 'num_input_tokens_seen': 2548039680, 'completed': '32.95% (1_215 / 3_687)', 'remaining time': '22:02:36', 'throughput': '8221.35', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:10:26,172 >> {'loss': 0.5352, 'grad_norm': 17.888994216918945, 'learning_rate': 8.208633270151426e-07, 'epoch': 0.11282885815025766, 'num_input_tokens_seen': 2550136832, 'completed': '32.98% (1_216 / 3_687)', 'remaining time': '22:01:58', 'throughput': '8438.82', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:10:57,408 >> {'loss': 0.6396, 'grad_norm': 21.13353157043457, 'learning_rate': 8.205408496791216e-07, 'epoch': 0.11310008136696502, 'num_input_tokens_seen': 2552233984, 'completed': '33.01% (1_217 / 3_687)', 'remaining time': '22:01:21', 'throughput': '8392.39', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:11:31,901 >> {'loss': 0.3852, 'grad_norm': 15.377924919128418, 'learning_rate': 8.202181546221193e-07, 'epoch': 0.11337130458367237, 'num_input_tokens_seen': 2554331136, 'completed': '33.03% (1_218 / 3_687)', 'remaining time': '22:01:03', 'throughput': '7599.94', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:12:00,080 >> {'loss': 0.7451, 'grad_norm': 19.80136489868164, 'learning_rate': 8.19895242103829e-07, 'epoch': 0.11364252780037971, 'num_input_tokens_seen': 2556428288, 'completed': '33.06% (1_219 / 3_687)', 'remaining time': '22:00:08', 'throughput': '9302.83', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:12:29,122 >> {'loss': 0.7119, 'grad_norm': 20.684595108032227, 'learning_rate': 8.19572112384118e-07, 'epoch': 0.11391375101708706, 'num_input_tokens_seen': 2558525440, 'completed': '33.09% (1_220 / 3_687)', 'remaining time': '21:59:18', 'throughput': '9026.17', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:13:00,654 >> {'loss': 0.3151, 'grad_norm': 14.285638809204102, 'learning_rate': 8.192487657230288e-07, 'epoch': 0.11418497423379441, 'num_input_tokens_seen': 2560622592, 'completed': '33.12% (1_221 / 3_687)', 'remaining time': '21:58:42', 'throughput': '8313.59', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:13:34,484 >> {'loss': 0.3626, 'grad_norm': 15.149106979370117, 'learning_rate': 8.18925202380779e-07, 'epoch': 0.11445619745050176, 'num_input_tokens_seen': 2562719744, 'completed': '33.14% (1_222 / 3_687)', 'remaining time': '21:58:21', 'throughput': '7748.87', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:14:07,442 >> {'loss': 0.4939, 'grad_norm': 16.11283302307129, 'learning_rate': 8.186014226177594e-07, 'epoch': 0.11472742066720912, 'num_input_tokens_seen': 2564816896, 'completed': '33.17% (1_223 / 3_687)', 'remaining time': '21:57:54', 'throughput': '7953.98', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:14:37,126 >> {'loss': 0.4951, 'grad_norm': 16.55274200439453, 'learning_rate': 8.18277426694536e-07, 'epoch': 0.11499864388391647, 'num_input_tokens_seen': 2566914048, 'completed': '33.20% (1_224 / 3_687)', 'remaining time': '21:57:07', 'throughput': '8831.18', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:15:07,555 >> {'loss': 0.4981, 'grad_norm': 16.552610397338867, 'learning_rate': 8.179532148718483e-07, 'epoch': 0.11526986710062381, 'num_input_tokens_seen': 2569011200, 'completed': '33.22% (1_225 / 3_687)', 'remaining time': '21:56:26', 'throughput': '8614.85', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:15:38,815 >> {'loss': 0.6232, 'grad_norm': 20.186845779418945, 'learning_rate': 8.176287874106097e-07, 'epoch': 0.11554109031733116, 'num_input_tokens_seen': 2571108352, 'completed': '33.25% (1_226 / 3_687)', 'remaining time': '21:55:49', 'throughput': '8385.92', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:16:06,750 >> {'loss': 0.7157, 'grad_norm': 18.383283615112305, 'learning_rate': 8.173041445719069e-07, 'epoch': 0.11581231353403851, 'num_input_tokens_seen': 2573205504, 'completed': '33.28% (1_227 / 3_687)', 'remaining time': '21:54:53', 'throughput': '9384.15', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:16:38,856 >> {'loss': 0.7858, 'grad_norm': 28.215744018554688, 'learning_rate': 8.169792866170003e-07, 'epoch': 0.11608353675074587, 'num_input_tokens_seen': 2575302656, 'completed': '33.31% (1_228 / 3_687)', 'remaining time': '21:54:21', 'throughput': '8164.94', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:17:08,168 >> {'loss': 0.5306, 'grad_norm': 15.474285125732422, 'learning_rate': 8.166542138073232e-07, 'epoch': 0.11635475996745322, 'num_input_tokens_seen': 2577399808, 'completed': '33.33% (1_229 / 3_687)', 'remaining time': '21:53:33', 'throughput': '8943.09', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:17:40,750 >> {'loss': 0.5879, 'grad_norm': 30.002483367919922, 'learning_rate': 8.163289264044817e-07, 'epoch': 0.11662598318416056, 'num_input_tokens_seen': 2579496960, 'completed': '33.36% (1_230 / 3_687)', 'remaining time': '21:53:04', 'throughput': '8045.63', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:18:12,798 >> {'loss': 0.5828, 'grad_norm': 17.683021545410156, 'learning_rate': 8.160034246702548e-07, 'epoch': 0.11689720640086791, 'num_input_tokens_seen': 2581594112, 'completed': '33.39% (1_231 / 3_687)', 'remaining time': '21:52:32', 'throughput': '8179.92', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:18:44,396 >> {'loss': 0.4591, 'grad_norm': 18.124019622802734, 'learning_rate': 8.156777088665939e-07, 'epoch': 0.11716842961757526, 'num_input_tokens_seen': 2583691264, 'completed': '33.41% (1_232 / 3_687)', 'remaining time': '21:51:57', 'throughput': '8296.26', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:19:15,406 >> {'loss': 0.2872, 'grad_norm': 11.012289047241211, 'learning_rate': 8.153517792556226e-07, 'epoch': 0.11743965283428262, 'num_input_tokens_seen': 2585788416, 'completed': '33.44% (1_233 / 3_687)', 'remaining time': '21:51:19', 'throughput': '8453.52', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:19:44,060 >> {'loss': 0.6456, 'grad_norm': 19.892656326293945, 'learning_rate': 8.15025636099637e-07, 'epoch': 0.11771087605098997, 'num_input_tokens_seen': 2587885568, 'completed': '33.47% (1_234 / 3_687)', 'remaining time': '21:50:28', 'throughput': '9148.40', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:20:17,372 >> {'loss': 0.3137, 'grad_norm': 14.017526626586914, 'learning_rate': 8.146992796611042e-07, 'epoch': 0.11798209926769732, 'num_input_tokens_seen': 2589982720, 'completed': '33.50% (1_235 / 3_687)', 'remaining time': '21:50:03', 'throughput': '7869.44', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:20:44,383 >> {'loss': 0.8757, 'grad_norm': 21.26898765563965, 'learning_rate': 8.143727102026638e-07, 'epoch': 0.11825332248440466, 'num_input_tokens_seen': 2592079872, 'completed': '33.52% (1_236 / 3_687)', 'remaining time': '21:49:03', 'throughput': '9704.93', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:21:15,259 >> {'loss': 0.3173, 'grad_norm': 10.634222030639648, 'learning_rate': 8.140459279871264e-07, 'epoch': 0.11852454570111201, 'num_input_tokens_seen': 2594177024, 'completed': '33.55% (1_237 / 3_687)', 'remaining time': '21:48:24', 'throughput': '8490.43', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:21:45,877 >> {'loss': 0.7255, 'grad_norm': 20.876935958862305, 'learning_rate': 8.137189332774738e-07, 'epoch': 0.11879576891781937, 'num_input_tokens_seen': 2596274176, 'completed': '33.58% (1_238 / 3_687)', 'remaining time': '21:47:44', 'throughput': '8561.69', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:22:16,163 >> {'loss': 0.3449, 'grad_norm': 13.55611801147461, 'learning_rate': 8.133917263368589e-07, 'epoch': 0.11906699213452672, 'num_input_tokens_seen': 2598371328, 'completed': '33.60% (1_239 / 3_687)', 'remaining time': '21:47:02', 'throughput': '8655.68', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:22:47,340 >> {'loss': 0.4658, 'grad_norm': 23.81563949584961, 'learning_rate': 8.130643074286056e-07, 'epoch': 0.11933821535123407, 'num_input_tokens_seen': 2600468480, 'completed': '33.63% (1_240 / 3_687)', 'remaining time': '21:46:25', 'throughput': '8408.07', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:23:23,962 >> {'loss': 0.4361, 'grad_norm': 15.136983871459961, 'learning_rate': 8.127366768162077e-07, 'epoch': 0.11960943856794141, 'num_input_tokens_seen': 2602565632, 'completed': '33.66% (1_241 / 3_687)', 'remaining time': '21:46:19', 'throughput': '7158.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:23:56,246 >> {'loss': 0.3938, 'grad_norm': 14.615781784057617, 'learning_rate': 8.124088347633304e-07, 'epoch': 0.11988066178464876, 'num_input_tokens_seen': 2604662784, 'completed': '33.69% (1_242 / 3_687)', 'remaining time': '21:45:48', 'throughput': '8119.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:24:26,910 >> {'loss': 0.377, 'grad_norm': 13.33740234375, 'learning_rate': 8.120807815338083e-07, 'epoch': 0.12015188500135611, 'num_input_tokens_seen': 2606759936, 'completed': '33.71% (1_243 / 3_687)', 'remaining time': '21:45:08', 'throughput': '8548.88', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:25:00,896 >> {'loss': 0.3627, 'grad_norm': 16.450172424316406, 'learning_rate': 8.11752517391646e-07, 'epoch': 0.12042310821806347, 'num_input_tokens_seen': 2608857088, 'completed': '33.74% (1_244 / 3_687)', 'remaining time': '21:44:47', 'throughput': '7713.30', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:25:32,140 >> {'loss': 0.3407, 'grad_norm': 12.913908958435059, 'learning_rate': 8.114240426010183e-07, 'epoch': 0.12069433143477082, 'num_input_tokens_seen': 2610954240, 'completed': '33.77% (1_245 / 3_687)', 'remaining time': '21:44:11', 'throughput': '8390.46', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:26:01,656 >> {'loss': 0.6169, 'grad_norm': 19.959516525268555, 'learning_rate': 8.11095357426269e-07, 'epoch': 0.12096555465147817, 'num_input_tokens_seen': 2613051392, 'completed': '33.79% (1_246 / 3_687)', 'remaining time': '21:43:25', 'throughput': '8881.20', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:26:31,762 >> {'loss': 0.5242, 'grad_norm': 15.97438907623291, 'learning_rate': 8.107664621319113e-07, 'epoch': 0.12123677786818551, 'num_input_tokens_seen': 2615148544, 'completed': '33.82% (1_247 / 3_687)', 'remaining time': '21:42:42', 'throughput': '8707.37', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:27:01,492 >> {'loss': 0.3952, 'grad_norm': 14.837607383728027, 'learning_rate': 8.10437356982628e-07, 'epoch': 0.12150800108489286, 'num_input_tokens_seen': 2617245696, 'completed': '33.85% (1_248 / 3_687)', 'remaining time': '21:41:57', 'throughput': '8817.64', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:27:35,868 >> {'loss': 0.4423, 'grad_norm': 15.677940368652344, 'learning_rate': 8.1010804224327e-07, 'epoch': 0.12177922430160022, 'num_input_tokens_seen': 2619342848, 'completed': '33.88% (1_249 / 3_687)', 'remaining time': '21:41:38', 'throughput': '7625.74', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:28:05,735 >> {'loss': 0.3551, 'grad_norm': 13.296334266662598, 'learning_rate': 8.097785181788574e-07, 'epoch': 0.12205044751830757, 'num_input_tokens_seen': 2621440000, 'completed': '33.90% (1_250 / 3_687)', 'remaining time': '21:40:54', 'throughput': '8777.03', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:28:37,113 >> {'loss': 0.371, 'grad_norm': 17.426509857177734, 'learning_rate': 8.09448785054579e-07, 'epoch': 0.12232167073501492, 'num_input_tokens_seen': 2623537152, 'completed': '33.93% (1_251 / 3_687)', 'remaining time': '21:40:19', 'throughput': '8354.35', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:29:05,840 >> {'loss': 0.9746, 'grad_norm': 24.391117095947266, 'learning_rate': 8.091188431357908e-07, 'epoch': 0.12259289395172226, 'num_input_tokens_seen': 2625634304, 'completed': '33.96% (1_252 / 3_687)', 'remaining time': '21:39:29', 'throughput': '9125.50', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:29:35,732 >> {'loss': 0.9044, 'grad_norm': 22.590383529663086, 'learning_rate': 8.087886926880181e-07, 'epoch': 0.12286411716842961, 'num_input_tokens_seen': 2627731456, 'completed': '33.98% (1_253 / 3_687)', 'remaining time': '21:38:46', 'throughput': '8769.75', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:30:09,227 >> {'loss': 0.4665, 'grad_norm': 15.536231994628906, 'learning_rate': 8.084583339769531e-07, 'epoch': 0.12313534038513697, 'num_input_tokens_seen': 2629828608, 'completed': '34.01% (1_254 / 3_687)', 'remaining time': '21:38:22', 'throughput': '7826.30', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:30:40,307 >> {'loss': 0.3953, 'grad_norm': 16.86744499206543, 'learning_rate': 8.081277672684557e-07, 'epoch': 0.12340656360184432, 'num_input_tokens_seen': 2631925760, 'completed': '34.04% (1_255 / 3_687)', 'remaining time': '21:37:45', 'throughput': '8434.57', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:31:08,724 >> {'loss': 0.5952, 'grad_norm': 18.4736385345459, 'learning_rate': 8.077969928285541e-07, 'epoch': 0.12367778681855167, 'num_input_tokens_seen': 2634022912, 'completed': '34.07% (1_256 / 3_687)', 'remaining time': '21:36:53', 'throughput': '9224.69', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:31:40,027 >> {'loss': 0.4361, 'grad_norm': 15.2562837600708, 'learning_rate': 8.074660109234424e-07, 'epoch': 0.12394901003525902, 'num_input_tokens_seen': 2636120064, 'completed': '34.09% (1_257 / 3_687)', 'remaining time': '21:36:18', 'throughput': '8374.59', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:32:10,036 >> {'loss': 0.5753, 'grad_norm': 20.01945686340332, 'learning_rate': 8.071348218194823e-07, 'epoch': 0.12422023325196636, 'num_input_tokens_seen': 2638217216, 'completed': '34.12% (1_258 / 3_687)', 'remaining time': '21:35:35', 'throughput': '8735.49', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:32:41,080 >> {'loss': 0.7434, 'grad_norm': 20.903608322143555, 'learning_rate': 8.068034257832026e-07, 'epoch': 0.12449145646867373, 'num_input_tokens_seen': 2640314368, 'completed': '34.15% (1_259 / 3_687)', 'remaining time': '21:34:58', 'throughput': '8444.06', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:33:10,607 >> {'loss': 0.5717, 'grad_norm': 18.132099151611328, 'learning_rate': 8.064718230812976e-07, 'epoch': 0.12476267968538107, 'num_input_tokens_seen': 2642411520, 'completed': '34.17% (1_260 / 3_687)', 'remaining time': '21:34:13', 'throughput': '8878.31', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:33:42,033 >> {'loss': 0.6942, 'grad_norm': 19.513261795043945, 'learning_rate': 8.06140013980629e-07, 'epoch': 0.1250339029020884, 'num_input_tokens_seen': 2644508672, 'completed': '34.20% (1_261 / 3_687)', 'remaining time': '21:33:38', 'throughput': '8341.87', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:34:14,349 >> {'loss': 0.3526, 'grad_norm': 15.275543212890625, 'learning_rate': 8.05807998748224e-07, 'epoch': 0.12530512611879577, 'num_input_tokens_seen': 2646605824, 'completed': '34.23% (1_262 / 3_687)', 'remaining time': '21:33:07', 'throughput': '8111.70', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:34:44,683 >> {'loss': 0.3085, 'grad_norm': 15.771416664123535, 'learning_rate': 8.05475777651276e-07, 'epoch': 0.12557634933550313, 'num_input_tokens_seen': 2648702976, 'completed': '34.26% (1_263 / 3_687)', 'remaining time': '21:32:27', 'throughput': '8641.85', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:35:14,140 >> {'loss': 0.6849, 'grad_norm': 20.19382095336914, 'learning_rate': 8.051433509571435e-07, 'epoch': 0.12584757255221046, 'num_input_tokens_seen': 2650800128, 'completed': '34.28% (1_264 / 3_687)', 'remaining time': '21:31:42', 'throughput': '8899.26', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:35:46,718 >> {'loss': 0.5637, 'grad_norm': 19.399974822998047, 'learning_rate': 8.04810718933351e-07, 'epoch': 0.12611879576891782, 'num_input_tokens_seen': 2652897280, 'completed': '34.31% (1_265 / 3_687)', 'remaining time': '21:31:13', 'throughput': '8046.45', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:36:17,134 >> {'loss': 0.3837, 'grad_norm': 14.588866233825684, 'learning_rate': 8.044778818475884e-07, 'epoch': 0.12639001898562516, 'num_input_tokens_seen': 2654994432, 'completed': '34.34% (1_266 / 3_687)', 'remaining time': '21:30:33', 'throughput': '8618.62', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:36:47,465 >> {'loss': 0.6066, 'grad_norm': 18.231657028198242, 'learning_rate': 8.0414483996771e-07, 'epoch': 0.12666124220233252, 'num_input_tokens_seen': 2657091584, 'completed': '34.36% (1_267 / 3_687)', 'remaining time': '21:29:52', 'throughput': '8643.01', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:37:19,628 >> {'loss': 0.8268, 'grad_norm': 21.163400650024414, 'learning_rate': 8.038115935617355e-07, 'epoch': 0.12693246541903988, 'num_input_tokens_seen': 2659188736, 'completed': '34.39% (1_268 / 3_687)', 'remaining time': '21:29:21', 'throughput': '8150.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:37:49,592 >> {'loss': 0.3457, 'grad_norm': 13.187846183776855, 'learning_rate': 8.034781428978484e-07, 'epoch': 0.12720368863574721, 'num_input_tokens_seen': 2661285888, 'completed': '34.42% (1_269 / 3_687)', 'remaining time': '21:28:39', 'throughput': '8748.57', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:38:19,440 >> {'loss': 0.223, 'grad_norm': 11.001519203186035, 'learning_rate': 8.031444882443976e-07, 'epoch': 0.12747491185245458, 'num_input_tokens_seen': 2663383040, 'completed': '34.45% (1_270 / 3_687)', 'remaining time': '21:27:56', 'throughput': '8782.72', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:38:52,272 >> {'loss': 0.5616, 'grad_norm': 16.197965621948242, 'learning_rate': 8.028106298698957e-07, 'epoch': 0.1277461350691619, 'num_input_tokens_seen': 2665480192, 'completed': '34.47% (1_271 / 3_687)', 'remaining time': '21:27:28', 'throughput': '7984.30', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:39:22,906 >> {'loss': 0.314, 'grad_norm': 13.498997688293457, 'learning_rate': 8.024765680430188e-07, 'epoch': 0.12801735828586927, 'num_input_tokens_seen': 2667577344, 'completed': '34.50% (1_272 / 3_687)', 'remaining time': '21:26:49', 'throughput': '8557.41', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:39:54,178 >> {'loss': 0.585, 'grad_norm': 18.05949592590332, 'learning_rate': 8.021423030326075e-07, 'epoch': 0.12828858150257663, 'num_input_tokens_seen': 2669674496, 'completed': '34.53% (1_273 / 3_687)', 'remaining time': '21:26:14', 'throughput': '8382.72', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:40:24,034 >> {'loss': 0.2417, 'grad_norm': 10.726323127746582, 'learning_rate': 8.018078351076653e-07, 'epoch': 0.12855980471928397, 'num_input_tokens_seen': 2671771648, 'completed': '34.55% (1_274 / 3_687)', 'remaining time': '21:25:31', 'throughput': '8780.36', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:40:57,426 >> {'loss': 0.3998, 'grad_norm': 13.169657707214355, 'learning_rate': 8.014731645373595e-07, 'epoch': 0.12883102793599133, 'num_input_tokens_seen': 2673868800, 'completed': '34.58% (1_275 / 3_687)', 'remaining time': '21:25:06', 'throughput': '7850.28', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:41:29,315 >> {'loss': 0.5083, 'grad_norm': 18.775423049926758, 'learning_rate': 8.011382915910203e-07, 'epoch': 0.12910225115269866, 'num_input_tokens_seen': 2675965952, 'completed': '34.61% (1_276 / 3_687)', 'remaining time': '21:24:34', 'throughput': '8220.57', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:42:00,640 >> {'loss': 0.324, 'grad_norm': 14.191473007202148, 'learning_rate': 8.008032165381403e-07, 'epoch': 0.12937347436940602, 'num_input_tokens_seen': 2678063104, 'completed': '34.64% (1_277 / 3_687)', 'remaining time': '21:23:59', 'throughput': '8368.54', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:42:31,640 >> {'loss': 0.2584, 'grad_norm': 19.03130531311035, 'learning_rate': 8.004679396483756e-07, 'epoch': 0.12964469758611338, 'num_input_tokens_seen': 2680160256, 'completed': '34.66% (1_278 / 3_687)', 'remaining time': '21:23:22', 'throughput': '8456.32', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:43:02,003 >> {'loss': 0.8237, 'grad_norm': 22.582111358642578, 'learning_rate': 8.001324611915441e-07, 'epoch': 0.12991592080282072, 'num_input_tokens_seen': 2682257408, 'completed': '34.69% (1_279 / 3_687)', 'remaining time': '21:22:42', 'throughput': '8633.74', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:43:31,739 >> {'loss': 0.4733, 'grad_norm': 16.168638229370117, 'learning_rate': 7.99796781437626e-07, 'epoch': 0.13018714401952808, 'num_input_tokens_seen': 2684354560, 'completed': '34.72% (1_280 / 3_687)', 'remaining time': '21:21:59', 'throughput': '8815.74', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:44:06,936 >> {'loss': 0.2726, 'grad_norm': 11.249455451965332, 'learning_rate': 7.994609006567635e-07, 'epoch': 0.1304583672362354, 'num_input_tokens_seen': 2686451712, 'completed': '34.74% (1_281 / 3_687)', 'remaining time': '21:21:43', 'throughput': '7447.91', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:44:37,393 >> {'loss': 0.4514, 'grad_norm': 24.108003616333008, 'learning_rate': 7.99124819119261e-07, 'epoch': 0.13072959045294277, 'num_input_tokens_seen': 2688548864, 'completed': '34.77% (1_282 / 3_687)', 'remaining time': '21:21:03', 'throughput': '8606.97', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:45:05,834 >> {'loss': 0.6907, 'grad_norm': 22.541616439819336, 'learning_rate': 7.987885370955841e-07, 'epoch': 0.13100081366965013, 'num_input_tokens_seen': 2690646016, 'completed': '34.80% (1_283 / 3_687)', 'remaining time': '21:20:14', 'throughput': '9216.95', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:45:34,268 >> {'loss': 0.6375, 'grad_norm': 17.23971176147461, 'learning_rate': 7.984520548563594e-07, 'epoch': 0.13127203688635747, 'num_input_tokens_seen': 2692743168, 'completed': '34.83% (1_284 / 3_687)', 'remaining time': '21:19:25', 'throughput': '9219.33', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:46:05,282 >> {'loss': 0.4198, 'grad_norm': 14.531922340393066, 'learning_rate': 7.981153726723755e-07, 'epoch': 0.13154326010306483, 'num_input_tokens_seen': 2694840320, 'completed': '34.85% (1_285 / 3_687)', 'remaining time': '21:18:48', 'throughput': '8452.51', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:46:35,059 >> {'loss': 0.3334, 'grad_norm': 12.036006927490234, 'learning_rate': 7.977784908145809e-07, 'epoch': 0.13181448331977216, 'num_input_tokens_seen': 2696937472, 'completed': '34.88% (1_286 / 3_687)', 'remaining time': '21:18:05', 'throughput': '8803.61', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:47:06,921 >> {'loss': 0.2007, 'grad_norm': 10.315123558044434, 'learning_rate': 7.974414095540858e-07, 'epoch': 0.13208570653647952, 'num_input_tokens_seen': 2699034624, 'completed': '34.91% (1_287 / 3_687)', 'remaining time': '21:17:33', 'throughput': '8227.44', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:47:37,720 >> {'loss': 0.3, 'grad_norm': 10.276031494140625, 'learning_rate': 7.971041291621601e-07, 'epoch': 0.13235692975318689, 'num_input_tokens_seen': 2701131776, 'completed': '34.93% (1_288 / 3_687)', 'remaining time': '21:16:56', 'throughput': '8511.57', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:48:13,104 >> {'loss': 0.4751, 'grad_norm': 16.799928665161133, 'learning_rate': 7.967666499102341e-07, 'epoch': 0.13262815296989422, 'num_input_tokens_seen': 2703228928, 'completed': '34.96% (1_289 / 3_687)', 'remaining time': '21:16:40', 'throughput': '7408.50', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:48:44,921 >> {'loss': 0.3534, 'grad_norm': 13.778836250305176, 'learning_rate': 7.964289720698986e-07, 'epoch': 0.13289937618660158, 'num_input_tokens_seen': 2705326080, 'completed': '34.99% (1_290 / 3_687)', 'remaining time': '21:16:08', 'throughput': '8239.21', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:49:18,180 >> {'loss': 0.6054, 'grad_norm': 17.865440368652344, 'learning_rate': 7.960910959129037e-07, 'epoch': 0.13317059940330891, 'num_input_tokens_seen': 2707423232, 'completed': '35.01% (1_291 / 3_687)', 'remaining time': '21:15:42', 'throughput': '7881.72', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:49:47,631 >> {'loss': 0.5136, 'grad_norm': 17.34958267211914, 'learning_rate': 7.957530217111591e-07, 'epoch': 0.13344182262001628, 'num_input_tokens_seen': 2709520384, 'completed': '35.04% (1_292 / 3_687)', 'remaining time': '21:14:58', 'throughput': '8901.26', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:50:17,946 >> {'loss': 0.853, 'grad_norm': 24.52153205871582, 'learning_rate': 7.954147497367343e-07, 'epoch': 0.13371304583672364, 'num_input_tokens_seen': 2711617536, 'completed': '35.07% (1_293 / 3_687)', 'remaining time': '21:14:18', 'throughput': '8647.24', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:50:47,195 >> {'loss': 0.4762, 'grad_norm': 18.36350440979004, 'learning_rate': 7.950762802618576e-07, 'epoch': 0.13398426905343097, 'num_input_tokens_seen': 2713714688, 'completed': '35.10% (1_294 / 3_687)', 'remaining time': '21:13:34', 'throughput': '8962.43', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:51:15,323 >> {'loss': 0.5823, 'grad_norm': 17.231550216674805, 'learning_rate': 7.947376135589164e-07, 'epoch': 0.13425549227013833, 'num_input_tokens_seen': 2715811840, 'completed': '35.12% (1_295 / 3_687)', 'remaining time': '21:12:43', 'throughput': '9319.68', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:51:45,055 >> {'loss': 0.7359, 'grad_norm': 22.24859619140625, 'learning_rate': 7.943987499004569e-07, 'epoch': 0.13452671548684567, 'num_input_tokens_seen': 2717908992, 'completed': '35.15% (1_296 / 3_687)', 'remaining time': '21:12:01', 'throughput': '8817.02', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:52:14,273 >> {'loss': 0.6906, 'grad_norm': 22.393712997436523, 'learning_rate': 7.940596895591835e-07, 'epoch': 0.13479793870355303, 'num_input_tokens_seen': 2720006144, 'completed': '35.18% (1_297 / 3_687)', 'remaining time': '21:11:16', 'throughput': '8972.07', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:52:45,089 >> {'loss': 0.3609, 'grad_norm': 13.456273078918457, 'learning_rate': 7.93720432807959e-07, 'epoch': 0.13506916192026036, 'num_input_tokens_seen': 2722103296, 'completed': '35.20% (1_298 / 3_687)', 'remaining time': '21:10:39', 'throughput': '8506.58', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:53:14,456 >> {'loss': 0.5547, 'grad_norm': 18.195858001708984, 'learning_rate': 7.933809799198045e-07, 'epoch': 0.13534038513696772, 'num_input_tokens_seen': 2724200448, 'completed': '35.23% (1_299 / 3_687)', 'remaining time': '21:09:55', 'throughput': '8926.39', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:53:45,111 >> {'loss': 0.4255, 'grad_norm': 23.925018310546875, 'learning_rate': 7.930413311678987e-07, 'epoch': 0.13561160835367508, 'num_input_tokens_seen': 2726297600, 'completed': '35.26% (1_300 / 3_687)', 'remaining time': '21:09:17', 'throughput': '8551.61', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:54:15,184 >> {'loss': 0.3462, 'grad_norm': 17.64158821105957, 'learning_rate': 7.92701486825578e-07, 'epoch': 0.13588283157038242, 'num_input_tokens_seen': 2728394752, 'completed': '35.29% (1_301 / 3_687)', 'remaining time': '21:08:36', 'throughput': '8716.96', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:54:49,236 >> {'loss': 0.5714, 'grad_norm': 25.17290496826172, 'learning_rate': 7.923614471663361e-07, 'epoch': 0.13615405478708978, 'num_input_tokens_seen': 2730491904, 'completed': '35.31% (1_302 / 3_687)', 'remaining time': '21:08:14', 'throughput': '7698.28', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:55:18,419 >> {'loss': 0.2677, 'grad_norm': 10.592412948608398, 'learning_rate': 7.920212124638241e-07, 'epoch': 0.1364252780037971, 'num_input_tokens_seen': 2732589056, 'completed': '35.34% (1_303 / 3_687)', 'remaining time': '21:07:29', 'throughput': '8982.82', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:55:48,508 >> {'loss': 0.3103, 'grad_norm': 12.378978729248047, 'learning_rate': 7.916807829918499e-07, 'epoch': 0.13669650122050447, 'num_input_tokens_seen': 2734686208, 'completed': '35.37% (1_304 / 3_687)', 'remaining time': '21:06:49', 'throughput': '8712.32', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:56:18,337 >> {'loss': 0.3225, 'grad_norm': 17.801116943359375, 'learning_rate': 7.913401590243781e-07, 'epoch': 0.13696772443721184, 'num_input_tokens_seen': 2736783360, 'completed': '35.39% (1_305 / 3_687)', 'remaining time': '21:06:07', 'throughput': '8788.23', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:56:50,131 >> {'loss': 0.5376, 'grad_norm': 19.611040115356445, 'learning_rate': 7.909993408355302e-07, 'epoch': 0.13723894765391917, 'num_input_tokens_seen': 2738880512, 'completed': '35.42% (1_306 / 3_687)', 'remaining time': '21:05:35', 'throughput': '8244.91', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:57:19,154 >> {'loss': 0.644, 'grad_norm': 20.099191665649414, 'learning_rate': 7.906583286995834e-07, 'epoch': 0.13751017087062653, 'num_input_tokens_seen': 2740977664, 'completed': '35.45% (1_307 / 3_687)', 'remaining time': '21:04:50', 'throughput': '9032.46', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:57:48,426 >> {'loss': 0.2898, 'grad_norm': 13.194962501525879, 'learning_rate': 7.903171228909714e-07, 'epoch': 0.13778139408733386, 'num_input_tokens_seen': 2743074816, 'completed': '35.48% (1_308 / 3_687)', 'remaining time': '21:04:06', 'throughput': '8955.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:58:22,255 >> {'loss': 0.3795, 'grad_norm': 13.878562927246094, 'learning_rate': 7.899757236842836e-07, 'epoch': 0.13805261730404123, 'num_input_tokens_seen': 2745171968, 'completed': '35.50% (1_309 / 3_687)', 'remaining time': '21:03:43', 'throughput': '7748.95', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:58:53,669 >> {'loss': 0.7278, 'grad_norm': 25.963001251220703, 'learning_rate': 7.89634131354265e-07, 'epoch': 0.1383238405207486, 'num_input_tokens_seen': 2747269120, 'completed': '35.53% (1_310 / 3_687)', 'remaining time': '21:03:09', 'throughput': '8345.01', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:59:23,995 >> {'loss': 0.5589, 'grad_norm': 19.090431213378906, 'learning_rate': 7.892923461758165e-07, 'epoch': 0.13859506373745592, 'num_input_tokens_seen': 2749366272, 'completed': '35.56% (1_311 / 3_687)', 'remaining time': '21:02:30', 'throughput': '8644.11', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 19:59:53,855 >> {'loss': 0.4635, 'grad_norm': 15.916430473327637, 'learning_rate': 7.889503684239933e-07, 'epoch': 0.13886628695416328, 'num_input_tokens_seen': 2751463424, 'completed': '35.58% (1_312 / 3_687)', 'remaining time': '21:01:48', 'throughput': '8779.11', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:00:24,348 >> {'loss': 0.3096, 'grad_norm': 12.155187606811523, 'learning_rate': 7.886081983740066e-07, 'epoch': 0.13913751017087062, 'num_input_tokens_seen': 2753560576, 'completed': '35.61% (1_313 / 3_687)', 'remaining time': '21:01:10', 'throughput': '8596.77', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:00:53,207 >> {'loss': 0.5558, 'grad_norm': 19.226377487182617, 'learning_rate': 7.882658363012214e-07, 'epoch': 0.13940873338757798, 'num_input_tokens_seen': 2755657728, 'completed': '35.64% (1_314 / 3_687)', 'remaining time': '21:00:24', 'throughput': '9083.81', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:01:23,482 >> {'loss': 0.4074, 'grad_norm': 16.780195236206055, 'learning_rate': 7.879232824811579e-07, 'epoch': 0.13967995660428534, 'num_input_tokens_seen': 2757754880, 'completed': '35.67% (1_315 / 3_687)', 'remaining time': '20:59:45', 'throughput': '8658.73', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:01:52,315 >> {'loss': 0.8599, 'grad_norm': 24.132198333740234, 'learning_rate': 7.875805371894904e-07, 'epoch': 0.13995117982099267, 'num_input_tokens_seen': 2759852032, 'completed': '35.69% (1_316 / 3_687)', 'remaining time': '20:58:59', 'throughput': '9091.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:02:25,130 >> {'loss': 0.4509, 'grad_norm': 16.48426055908203, 'learning_rate': 7.872376007020469e-07, 'epoch': 0.14022240303770003, 'num_input_tokens_seen': 2761949184, 'completed': '35.72% (1_317 / 3_687)', 'remaining time': '20:58:32', 'throughput': '7988.51', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:02:56,615 >> {'loss': 0.3677, 'grad_norm': 13.625266075134277, 'learning_rate': 7.8689447329481e-07, 'epoch': 0.14049362625440737, 'num_input_tokens_seen': 2764046336, 'completed': '35.75% (1_318 / 3_687)', 'remaining time': '20:57:58', 'throughput': '8326.08', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:03:29,766 >> {'loss': 0.5272, 'grad_norm': 17.589357376098633, 'learning_rate': 7.865511552439156e-07, 'epoch': 0.14076484947111473, 'num_input_tokens_seen': 2766143488, 'completed': '35.77% (1_319 / 3_687)', 'remaining time': '20:57:32', 'throughput': '7907.40', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:03:57,394 >> {'loss': 0.5088, 'grad_norm': 15.197576522827148, 'learning_rate': 7.862076468256529e-07, 'epoch': 0.1410360726878221, 'num_input_tokens_seen': 2768240640, 'completed': '35.80% (1_320 / 3_687)', 'remaining time': '20:56:41', 'throughput': '9488.36', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:04:28,026 >> {'loss': 0.4647, 'grad_norm': 18.043136596679688, 'learning_rate': 7.858639483164643e-07, 'epoch': 0.14130729590452942, 'num_input_tokens_seen': 2770337792, 'completed': '35.83% (1_321 / 3_687)', 'remaining time': '20:56:04', 'throughput': '8557.97', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:04:59,031 >> {'loss': 0.5039, 'grad_norm': 15.092884063720703, 'learning_rate': 7.855200599929456e-07, 'epoch': 0.14157851912123678, 'num_input_tokens_seen': 2772434944, 'completed': '35.86% (1_322 / 3_687)', 'remaining time': '20:55:28', 'throughput': '8455.02', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:05:29,581 >> {'loss': 0.424, 'grad_norm': 15.708367347717285, 'learning_rate': 7.851759821318447e-07, 'epoch': 0.14184974233794412, 'num_input_tokens_seen': 2774532096, 'completed': '35.88% (1_323 / 3_687)', 'remaining time': '20:54:50', 'throughput': '8580.66', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:05:59,982 >> {'loss': 0.5721, 'grad_norm': 22.44023323059082, 'learning_rate': 7.848317150100624e-07, 'epoch': 0.14212096555465148, 'num_input_tokens_seen': 2776629248, 'completed': '35.91% (1_324 / 3_687)', 'remaining time': '20:54:12', 'throughput': '8622.91', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:06:31,735 >> {'loss': 0.4114, 'grad_norm': 13.735273361206055, 'learning_rate': 7.84487258904652e-07, 'epoch': 0.14239218877135884, 'num_input_tokens_seen': 2778726400, 'completed': '35.94% (1_325 / 3_687)', 'remaining time': '20:53:40', 'throughput': '8255.84', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:07:02,168 >> {'loss': 0.7805, 'grad_norm': 24.59619140625, 'learning_rate': 7.841426140928184e-07, 'epoch': 0.14266341198806617, 'num_input_tokens_seen': 2780823552, 'completed': '35.96% (1_326 / 3_687)', 'remaining time': '20:53:01', 'throughput': '8613.59', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:07:34,994 >> {'loss': 0.7438, 'grad_norm': 24.768539428710938, 'learning_rate': 7.837977808519189e-07, 'epoch': 0.14293463520477354, 'num_input_tokens_seen': 2782920704, 'completed': '35.99% (1_327 / 3_687)', 'remaining time': '20:52:34', 'throughput': '7986.00', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:08:06,338 >> {'loss': 0.6571, 'grad_norm': 20.256254196166992, 'learning_rate': 7.834527594594618e-07, 'epoch': 0.14320585842148087, 'num_input_tokens_seen': 2785017856, 'completed': '36.02% (1_328 / 3_687)', 'remaining time': '20:52:00', 'throughput': '8363.32', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:08:39,747 >> {'loss': 0.5158, 'grad_norm': 18.815889358520508, 'learning_rate': 7.831075501931078e-07, 'epoch': 0.14347708163818823, 'num_input_tokens_seen': 2787115008, 'completed': '36.05% (1_329 / 3_687)', 'remaining time': '20:51:35', 'throughput': '7846.51', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:09:06,962 >> {'loss': 0.6597, 'grad_norm': 18.49612045288086, 'learning_rate': 7.827621533306677e-07, 'epoch': 0.1437483048548956, 'num_input_tokens_seen': 2789212160, 'completed': '36.07% (1_330 / 3_687)', 'remaining time': '20:50:42', 'throughput': '9632.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:09:35,565 >> {'loss': 0.479, 'grad_norm': 15.946340560913086, 'learning_rate': 7.824165691501036e-07, 'epoch': 0.14401952807160293, 'num_input_tokens_seen': 2791309312, 'completed': '36.10% (1_331 / 3_687)', 'remaining time': '20:49:56', 'throughput': '9165.10', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:10:06,115 >> {'loss': 0.3821, 'grad_norm': 17.465255737304688, 'learning_rate': 7.820707979295289e-07, 'epoch': 0.1442907512883103, 'num_input_tokens_seen': 2793406464, 'completed': '36.13% (1_332 / 3_687)', 'remaining time': '20:49:19', 'throughput': '8580.86', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:10:37,905 >> {'loss': 0.6755, 'grad_norm': 28.799354553222656, 'learning_rate': 7.81724839947207e-07, 'epoch': 0.14456197450501762, 'num_input_tokens_seen': 2795503616, 'completed': '36.15% (1_333 / 3_687)', 'remaining time': '20:48:47', 'throughput': '8246.13', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:11:08,278 >> {'loss': 0.4634, 'grad_norm': 19.517778396606445, 'learning_rate': 7.813786954815517e-07, 'epoch': 0.14483319772172498, 'num_input_tokens_seen': 2797600768, 'completed': '36.18% (1_334 / 3_687)', 'remaining time': '20:48:09', 'throughput': '8630.63', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:11:38,952 >> {'loss': 0.3967, 'grad_norm': 18.770734786987305, 'learning_rate': 7.810323648111269e-07, 'epoch': 0.14510442093843234, 'num_input_tokens_seen': 2799697920, 'completed': '36.21% (1_335 / 3_687)', 'remaining time': '20:47:32', 'throughput': '8546.38', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:12:13,009 >> {'loss': 0.4468, 'grad_norm': 14.419391632080078, 'learning_rate': 7.806858482146463e-07, 'epoch': 0.14537564415513968, 'num_input_tokens_seen': 2801795072, 'completed': '36.24% (1_336 / 3_687)', 'remaining time': '20:47:10', 'throughput': '7697.14', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:12:44,412 >> {'loss': 0.4719, 'grad_norm': 17.39090919494629, 'learning_rate': 7.803391459709733e-07, 'epoch': 0.14564686737184704, 'num_input_tokens_seen': 2803892224, 'completed': '36.26% (1_337 / 3_687)', 'remaining time': '20:46:36', 'throughput': '8347.79', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:13:14,192 >> {'loss': 0.4681, 'grad_norm': 16.507200241088867, 'learning_rate': 7.799922583591206e-07, 'epoch': 0.14591809058855437, 'num_input_tokens_seen': 2805989376, 'completed': '36.29% (1_338 / 3_687)', 'remaining time': '20:45:55', 'throughput': '8802.64', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:13:44,357 >> {'loss': 0.4491, 'grad_norm': 14.502838134765625, 'learning_rate': 7.796451856582504e-07, 'epoch': 0.14618931380526173, 'num_input_tokens_seen': 2808086528, 'completed': '36.32% (1_339 / 3_687)', 'remaining time': '20:45:16', 'throughput': '8690.20', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:14:14,052 >> {'loss': 0.332, 'grad_norm': 12.314398765563965, 'learning_rate': 7.792979281476734e-07, 'epoch': 0.14646053702196907, 'num_input_tokens_seen': 2810183680, 'completed': '36.34% (1_340 / 3_687)', 'remaining time': '20:44:35', 'throughput': '8828.07', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:14:47,765 >> {'loss': 0.4365, 'grad_norm': 13.12981128692627, 'learning_rate': 7.789504861068492e-07, 'epoch': 0.14673176023867643, 'num_input_tokens_seen': 2812280832, 'completed': '36.37% (1_341 / 3_687)', 'remaining time': '20:44:11', 'throughput': '7775.64', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:15:16,930 >> {'loss': 0.516, 'grad_norm': 19.02878761291504, 'learning_rate': 7.78602859815386e-07, 'epoch': 0.1470029834553838, 'num_input_tokens_seen': 2814377984, 'completed': '36.40% (1_342 / 3_687)', 'remaining time': '20:43:28', 'throughput': '8988.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:15:46,348 >> {'loss': 0.2956, 'grad_norm': 11.80683422088623, 'learning_rate': 7.782550495530402e-07, 'epoch': 0.14727420667209112, 'num_input_tokens_seen': 2816475136, 'completed': '36.43% (1_343 / 3_687)', 'remaining time': '20:42:46', 'throughput': '8911.08', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:16:16,226 >> {'loss': 0.7255, 'grad_norm': 23.65553855895996, 'learning_rate': 7.779070555997162e-07, 'epoch': 0.14754542988879848, 'num_input_tokens_seen': 2818572288, 'completed': '36.45% (1_344 / 3_687)', 'remaining time': '20:42:06', 'throughput': '8774.01', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:16:48,696 >> {'loss': 0.6579, 'grad_norm': 20.80998420715332, 'learning_rate': 7.775588782354666e-07, 'epoch': 0.14781665310550582, 'num_input_tokens_seen': 2820669440, 'completed': '36.48% (1_345 / 3_687)', 'remaining time': '20:41:37', 'throughput': '8073.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:17:24,675 >> {'loss': 0.7579, 'grad_norm': 20.969602584838867, 'learning_rate': 7.77210517740491e-07, 'epoch': 0.14808787632221318, 'num_input_tokens_seen': 2822766592, 'completed': '36.51% (1_346 / 3_687)', 'remaining time': '20:41:23', 'throughput': '7285.96', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:17:56,212 >> {'loss': 0.2488, 'grad_norm': 9.614218711853027, 'learning_rate': 7.768619743951366e-07, 'epoch': 0.14835909953892054, 'num_input_tokens_seen': 2824863744, 'completed': '36.53% (1_347 / 3_687)', 'remaining time': '20:40:50', 'throughput': '8312.41', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:18:26,116 >> {'loss': 0.3422, 'grad_norm': 12.458992004394531, 'learning_rate': 7.765132484798978e-07, 'epoch': 0.14863032275562787, 'num_input_tokens_seen': 2826960896, 'completed': '36.56% (1_348 / 3_687)', 'remaining time': '20:40:10', 'throughput': '8766.08', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:18:56,658 >> {'loss': 0.5978, 'grad_norm': 31.408218383789062, 'learning_rate': 7.761643402754163e-07, 'epoch': 0.14890154597233524, 'num_input_tokens_seen': 2829058048, 'completed': '36.59% (1_349 / 3_687)', 'remaining time': '20:39:33', 'throughput': '8583.26', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:19:27,732 >> {'loss': 0.5435, 'grad_norm': 19.415218353271484, 'learning_rate': 7.7581525006248e-07, 'epoch': 0.14917276918904257, 'num_input_tokens_seen': 2831155200, 'completed': '36.62% (1_350 / 3_687)', 'remaining time': '20:38:58', 'throughput': '8435.94', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:20:01,980 >> {'loss': 0.4826, 'grad_norm': 17.554052352905273, 'learning_rate': 7.754659781220232e-07, 'epoch': 0.14944399240574993, 'num_input_tokens_seen': 2833252352, 'completed': '36.64% (1_351 / 3_687)', 'remaining time': '20:38:36', 'throughput': '7654.33', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:20:32,706 >> {'loss': 0.3339, 'grad_norm': 14.18742561340332, 'learning_rate': 7.751165247351269e-07, 'epoch': 0.1497152156224573, 'num_input_tokens_seen': 2835349504, 'completed': '36.67% (1_352 / 3_687)', 'remaining time': '20:38:00', 'throughput': '8531.58', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:21:03,123 >> {'loss': 0.5135, 'grad_norm': 15.89776611328125, 'learning_rate': 7.747668901830178e-07, 'epoch': 0.14998643883916463, 'num_input_tokens_seen': 2837446656, 'completed': '36.70% (1_353 / 3_687)', 'remaining time': '20:37:22', 'throughput': '8618.36', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:21:33,933 >> {'loss': 0.5667, 'grad_norm': 19.270442962646484, 'learning_rate': 7.744170747470685e-07, 'epoch': 0.150257662055872, 'num_input_tokens_seen': 2839543808, 'completed': '36.72% (1_354 / 3_687)', 'remaining time': '20:36:46', 'throughput': '8508.38', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:21:59,571 >> {'loss': 0.954, 'grad_norm': 23.312227249145508, 'learning_rate': 7.740670787087972e-07, 'epoch': 0.15052888527257932, 'num_input_tokens_seen': 2841640960, 'completed': '36.75% (1_355 / 3_687)', 'remaining time': '20:35:48', 'throughput': '10225.02', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:22:31,015 >> {'loss': 0.3264, 'grad_norm': 13.209382057189941, 'learning_rate': 7.737169023498672e-07, 'epoch': 0.15080010848928668, 'num_input_tokens_seen': 2843738112, 'completed': '36.78% (1_356 / 3_687)', 'remaining time': '20:35:15', 'throughput': '8336.94', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:23:01,263 >> {'loss': 0.3447, 'grad_norm': 15.09355354309082, 'learning_rate': 7.733665459520873e-07, 'epoch': 0.15107133170599404, 'num_input_tokens_seen': 2845835264, 'completed': '36.80% (1_357 / 3_687)', 'remaining time': '20:34:37', 'throughput': '8666.37', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:23:32,135 >> {'loss': 0.3267, 'grad_norm': 12.502537727355957, 'learning_rate': 7.730160097974109e-07, 'epoch': 0.15134255492270138, 'num_input_tokens_seen': 2847932416, 'completed': '36.83% (1_358 / 3_687)', 'remaining time': '20:34:01', 'throughput': '8491.43', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:24:03,818 >> {'loss': 0.4452, 'grad_norm': 17.678531646728516, 'learning_rate': 7.726652941679365e-07, 'epoch': 0.15161377813940874, 'num_input_tokens_seen': 2850029568, 'completed': '36.86% (1_359 / 3_687)', 'remaining time': '20:33:29', 'throughput': '8273.99', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:24:34,855 >> {'loss': 0.443, 'grad_norm': 14.068532943725586, 'learning_rate': 7.723143993459061e-07, 'epoch': 0.15188500135611607, 'num_input_tokens_seen': 2852126720, 'completed': '36.89% (1_360 / 3_687)', 'remaining time': '20:32:54', 'throughput': '8446.01', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:25:04,826 >> {'loss': 0.4935, 'grad_norm': 17.834699630737305, 'learning_rate': 7.719633256137067e-07, 'epoch': 0.15215622457282343, 'num_input_tokens_seen': 2854223872, 'completed': '36.91% (1_361 / 3_687)', 'remaining time': '20:32:15', 'throughput': '8746.62', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:25:35,913 >> {'loss': 0.2926, 'grad_norm': 23.74162483215332, 'learning_rate': 7.716120732538696e-07, 'epoch': 0.1524274477895308, 'num_input_tokens_seen': 2856321024, 'completed': '36.94% (1_362 / 3_687)', 'remaining time': '20:31:40', 'throughput': '8432.58', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:26:06,658 >> {'loss': 0.6429, 'grad_norm': 21.309246063232422, 'learning_rate': 7.712606425490687e-07, 'epoch': 0.15269867100623813, 'num_input_tokens_seen': 2858418176, 'completed': '36.97% (1_363 / 3_687)', 'remaining time': '20:31:04', 'throughput': '8526.46', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:26:35,044 >> {'loss': 0.522, 'grad_norm': 28.85659408569336, 'learning_rate': 7.70909033782122e-07, 'epoch': 0.1529698942229455, 'num_input_tokens_seen': 2860515328, 'completed': '36.99% (1_364 / 3_687)', 'remaining time': '20:30:18', 'throughput': '9235.01', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:27:05,684 >> {'loss': 0.5151, 'grad_norm': 16.417978286743164, 'learning_rate': 7.705572472359913e-07, 'epoch': 0.15324111743965282, 'num_input_tokens_seen': 2862612480, 'completed': '37.02% (1_365 / 3_687)', 'remaining time': '20:29:42', 'throughput': '8555.50', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:27:35,262 >> {'loss': 0.4831, 'grad_norm': 15.138096809387207, 'learning_rate': 7.702052831937811e-07, 'epoch': 0.15351234065636019, 'num_input_tokens_seen': 2864709632, 'completed': '37.05% (1_366 / 3_687)', 'remaining time': '20:29:01', 'throughput': '8862.83', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:28:07,030 >> {'loss': 0.6095, 'grad_norm': 17.662302017211914, 'learning_rate': 7.698531419387382e-07, 'epoch': 0.15378356387306755, 'num_input_tokens_seen': 2866806784, 'completed': '37.08% (1_367 / 3_687)', 'remaining time': '20:28:29', 'throughput': '8251.94', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:28:38,893 >> {'loss': 0.3174, 'grad_norm': 13.90149211883545, 'learning_rate': 7.695008237542526e-07, 'epoch': 0.15405478708977488, 'num_input_tokens_seen': 2868903936, 'completed': '37.10% (1_368 / 3_687)', 'remaining time': '20:27:58', 'throughput': '8227.12', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:29:08,433 >> {'loss': 0.9373, 'grad_norm': 25.922029495239258, 'learning_rate': 7.691483289238569e-07, 'epoch': 0.15432601030648224, 'num_input_tokens_seen': 2871001088, 'completed': '37.13% (1_369 / 3_687)', 'remaining time': '20:27:17', 'throughput': '8874.11', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:29:39,788 >> {'loss': 0.5871, 'grad_norm': 18.99474334716797, 'learning_rate': 7.687956577312251e-07, 'epoch': 0.15459723352318958, 'num_input_tokens_seen': 2873098240, 'completed': '37.16% (1_370 / 3_687)', 'remaining time': '20:26:43', 'throughput': '8360.56', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:30:11,591 >> {'loss': 0.3569, 'grad_norm': 16.25048828125, 'learning_rate': 7.684428104601739e-07, 'epoch': 0.15486845673989694, 'num_input_tokens_seen': 2875195392, 'completed': '37.18% (1_371 / 3_687)', 'remaining time': '20:26:12', 'throughput': '8242.93', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:30:43,685 >> {'loss': 0.3744, 'grad_norm': 12.14373779296875, 'learning_rate': 7.680897873946611e-07, 'epoch': 0.1551396799566043, 'num_input_tokens_seen': 2877292544, 'completed': '37.21% (1_372 / 3_687)', 'remaining time': '20:25:41', 'throughput': '8167.75', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:31:17,336 >> {'loss': 0.4764, 'grad_norm': 15.390382766723633, 'learning_rate': 7.677365888187864e-07, 'epoch': 0.15541090317331163, 'num_input_tokens_seen': 2879389696, 'completed': '37.24% (1_373 / 3_687)', 'remaining time': '20:25:17', 'throughput': '7790.15', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:31:48,396 >> {'loss': 0.4289, 'grad_norm': 16.843217849731445, 'learning_rate': 7.673832150167906e-07, 'epoch': 0.155682126390019, 'num_input_tokens_seen': 2881486848, 'completed': '37.27% (1_374 / 3_687)', 'remaining time': '20:24:43', 'throughput': '8439.99', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:32:18,937 >> {'loss': 0.3706, 'grad_norm': 13.040255546569824, 'learning_rate': 7.670296662730552e-07, 'epoch': 0.15595334960672633, 'num_input_tokens_seen': 2883584000, 'completed': '37.29% (1_375 / 3_687)', 'remaining time': '20:24:06', 'throughput': '8583.21', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:32:47,997 >> {'loss': 0.5491, 'grad_norm': 16.71036148071289, 'learning_rate': 7.66675942872103e-07, 'epoch': 0.1562245728234337, 'num_input_tokens_seen': 2885681152, 'completed': '37.32% (1_376 / 3_687)', 'remaining time': '20:23:23', 'throughput': '9021.07', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:33:21,390 >> {'loss': 0.401, 'grad_norm': 14.318278312683105, 'learning_rate': 7.663220450985973e-07, 'epoch': 0.15649579604014105, 'num_input_tokens_seen': 2887778304, 'completed': '37.35% (1_377 / 3_687)', 'remaining time': '20:22:58', 'throughput': '7850.19', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:33:51,977 >> {'loss': 0.6904, 'grad_norm': 20.72394371032715, 'learning_rate': 7.659679732373413e-07, 'epoch': 0.15676701925684838, 'num_input_tokens_seen': 2889875456, 'completed': '37.37% (1_378 / 3_687)', 'remaining time': '20:22:22', 'throughput': '8570.28', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:34:22,699 >> {'loss': 0.3329, 'grad_norm': 12.829943656921387, 'learning_rate': 7.656137275732786e-07, 'epoch': 0.15703824247355574, 'num_input_tokens_seen': 2891972608, 'completed': '37.40% (1_379 / 3_687)', 'remaining time': '20:21:46', 'throughput': '8532.99', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:34:50,863 >> {'loss': 0.6388, 'grad_norm': 19.38332748413086, 'learning_rate': 7.652593083914927e-07, 'epoch': 0.15730946569026308, 'num_input_tokens_seen': 2894069760, 'completed': '37.43% (1_380 / 3_687)', 'remaining time': '20:21:00', 'throughput': '9307.56', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:35:20,022 >> {'loss': 0.5891, 'grad_norm': 19.630373001098633, 'learning_rate': 7.649047159772064e-07, 'epoch': 0.15758068890697044, 'num_input_tokens_seen': 2896166912, 'completed': '37.46% (1_381 / 3_687)', 'remaining time': '20:20:18', 'throughput': '8990.21', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:35:50,986 >> {'loss': 0.42, 'grad_norm': 15.700737953186035, 'learning_rate': 7.645499506157827e-07, 'epoch': 0.15785191212367777, 'num_input_tokens_seen': 2898264064, 'completed': '37.48% (1_382 / 3_687)', 'remaining time': '20:19:43', 'throughput': '8466.16', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:36:22,796 >> {'loss': 0.5836, 'grad_norm': 17.65544319152832, 'learning_rate': 7.641950125927228e-07, 'epoch': 0.15812313534038513, 'num_input_tokens_seen': 2900361216, 'completed': '37.51% (1_383 / 3_687)', 'remaining time': '20:19:11', 'throughput': '8240.99', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:36:52,381 >> {'loss': 0.3025, 'grad_norm': 12.071609497070312, 'learning_rate': 7.638399021936675e-07, 'epoch': 0.1583943585570925, 'num_input_tokens_seen': 2902458368, 'completed': '37.54% (1_384 / 3_687)', 'remaining time': '20:18:31', 'throughput': '8860.65', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:37:24,077 >> {'loss': 0.4026, 'grad_norm': 16.76353645324707, 'learning_rate': 7.634846197043963e-07, 'epoch': 0.15866558177379983, 'num_input_tokens_seen': 2904555520, 'completed': '37.56% (1_385 / 3_687)', 'remaining time': '20:17:59', 'throughput': '8270.45', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:37:54,314 >> {'loss': 0.642, 'grad_norm': 18.88435173034668, 'learning_rate': 7.631291654108266e-07, 'epoch': 0.1589368049905072, 'num_input_tokens_seen': 2906652672, 'completed': '37.59% (1_386 / 3_687)', 'remaining time': '20:17:21', 'throughput': '8669.80', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:38:22,988 >> {'loss': 0.8431, 'grad_norm': 25.31294822692871, 'learning_rate': 7.627735395990149e-07, 'epoch': 0.15920802820721452, 'num_input_tokens_seen': 2908749824, 'completed': '37.62% (1_387 / 3_687)', 'remaining time': '20:16:37', 'throughput': '9142.09', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:38:49,758 >> {'loss': 0.9865, 'grad_norm': 30.660036087036133, 'learning_rate': 7.624177425551552e-07, 'epoch': 0.1594792514239219, 'num_input_tokens_seen': 2910846976, 'completed': '37.65% (1_388 / 3_687)', 'remaining time': '20:15:46', 'throughput': '9792.47', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:39:20,164 >> {'loss': 0.7311, 'grad_norm': 23.799144744873047, 'learning_rate': 7.620617745655793e-07, 'epoch': 0.15975047464062925, 'num_input_tokens_seen': 2912944128, 'completed': '37.67% (1_389 / 3_687)', 'remaining time': '20:15:09', 'throughput': '8621.59', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:39:56,028 >> {'loss': 0.4711, 'grad_norm': 21.239885330200195, 'learning_rate': 7.617056359167568e-07, 'epoch': 0.16002169785733658, 'num_input_tokens_seen': 2915041280, 'completed': '37.70% (1_390 / 3_687)', 'remaining time': '20:14:54', 'throughput': '7309.29', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:40:24,112 >> {'loss': 0.5114, 'grad_norm': 19.135162353515625, 'learning_rate': 7.613493268952947e-07, 'epoch': 0.16029292107404394, 'num_input_tokens_seen': 2917138432, 'completed': '37.73% (1_391 / 3_687)', 'remaining time': '20:14:08', 'throughput': '9334.21', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:40:55,803 >> {'loss': 0.3154, 'grad_norm': 16.555891036987305, 'learning_rate': 7.609928477879365e-07, 'epoch': 0.16056414429075128, 'num_input_tokens_seen': 2919235584, 'completed': '37.75% (1_392 / 3_687)', 'remaining time': '20:13:36', 'throughput': '8271.93', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:41:26,096 >> {'loss': 0.5111, 'grad_norm': 28.75040626525879, 'learning_rate': 7.606361988815633e-07, 'epoch': 0.16083536750745864, 'num_input_tokens_seen': 2921332736, 'completed': '37.78% (1_393 / 3_687)', 'remaining time': '20:12:59', 'throughput': '8653.80', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:41:58,125 >> {'loss': 0.2485, 'grad_norm': 22.3262939453125, 'learning_rate': 7.602793804631927e-07, 'epoch': 0.161106590724166, 'num_input_tokens_seen': 2923429888, 'completed': '37.81% (1_394 / 3_687)', 'remaining time': '20:12:28', 'throughput': '8184.57', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:42:29,054 >> {'loss': 0.4578, 'grad_norm': 15.991744041442871, 'learning_rate': 7.599223928199781e-07, 'epoch': 0.16137781394087333, 'num_input_tokens_seen': 2925527040, 'completed': '37.84% (1_395 / 3_687)', 'remaining time': '20:11:53', 'throughput': '8475.68', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:42:55,971 >> {'loss': 0.5837, 'grad_norm': 19.883453369140625, 'learning_rate': 7.595652362392103e-07, 'epoch': 0.1616490371575807, 'num_input_tokens_seen': 2927624192, 'completed': '37.86% (1_396 / 3_687)', 'remaining time': '20:11:03', 'throughput': '9738.99', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:43:24,520 >> {'loss': 0.6984, 'grad_norm': 18.910655975341797, 'learning_rate': 7.592079110083146e-07, 'epoch': 0.16192026037428803, 'num_input_tokens_seen': 2929721344, 'completed': '37.89% (1_397 / 3_687)', 'remaining time': '20:10:19', 'throughput': '9182.04', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:43:55,413 >> {'loss': 0.4046, 'grad_norm': 14.154722213745117, 'learning_rate': 7.588504174148532e-07, 'epoch': 0.1621914835909954, 'num_input_tokens_seen': 2931818496, 'completed': '37.92% (1_398 / 3_687)', 'remaining time': '20:09:44', 'throughput': '8485.57', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:44:23,478 >> {'loss': 0.729, 'grad_norm': 24.498065948486328, 'learning_rate': 7.584927557465233e-07, 'epoch': 0.16246270680770275, 'num_input_tokens_seen': 2933915648, 'completed': '37.94% (1_399 / 3_687)', 'remaining time': '20:08:59', 'throughput': '9340.58', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:44:54,237 >> {'loss': 0.3138, 'grad_norm': 18.18712043762207, 'learning_rate': 7.581349262911573e-07, 'epoch': 0.16273393002441008, 'num_input_tokens_seen': 2936012800, 'completed': '37.97% (1_400 / 3_687)', 'remaining time': '20:08:24', 'throughput': '8522.50', 'gpu_mem_free': '5581MB'} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( [INFO|trainer.py:3503] 2024-12-25 20:45:27,790 >> Saving model checkpoint to /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-1400 [INFO|configuration_utils.py:472] 2024-12-25 20:45:27,794 >> Configuration saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-1400/config.json [INFO|configuration_utils.py:807] 2024-12-25 20:45:27,795 >> Configuration saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-1400/generation_config.json [INFO|modeling_utils.py:2807] 2024-12-25 20:46:38,615 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 7 checkpoint shards. You can find where each parameters has been saved in the index located at /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-1400/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2684] 2024-12-25 20:46:38,619 >> tokenizer config file saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-1400/tokenizer_config.json [INFO|tokenization_utils_base.py:2693] 2024-12-25 20:46:38,620 >> Special tokens file saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-1400/special_tokens_map.json /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( [WARNING|trainer.py:868] 2024-12-25 20:50:21,744 >> Save streaming dataset state: {'epoch': 0, 'sample_in_epoch': 2800, 'num_canonical_nodes': 1, 'shuffle_seed': 42, 'initial_physical_nodes': 1} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] [INFO|trainer.py:175] 2024-12-25 20:50:52,140 >> {'loss': 0.4787, 'grad_norm': 16.402795791625977, 'learning_rate': 7.577769293367226e-07, 'epoch': 0.16300515324111745, 'num_input_tokens_seen': 2938109952, 'completed': '38.00% (1_401 / 3_687)', 'remaining time': '20:28:33', 'throughput': '732.44', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:51:25,997 >> {'loss': 0.4823, 'grad_norm': 16.0742130279541, 'learning_rate': 7.574187651713218e-07, 'epoch': 0.16327637645782478, 'num_input_tokens_seen': 2940207104, 'completed': '38.03% (1_402 / 3_687)', 'remaining time': '20:28:06', 'throughput': '7742.78', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:51:56,166 >> {'loss': 0.392, 'grad_norm': 16.962581634521484, 'learning_rate': 7.570604340831916e-07, 'epoch': 0.16354759967453214, 'num_input_tokens_seen': 2942304256, 'completed': '38.05% (1_403 / 3_687)', 'remaining time': '20:27:26', 'throughput': '8689.10', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:52:27,735 >> {'loss': 0.262, 'grad_norm': 12.302724838256836, 'learning_rate': 7.567019363607032e-07, 'epoch': 0.1638188228912395, 'num_input_tokens_seen': 2944401408, 'completed': '38.08% (1_404 / 3_687)', 'remaining time': '20:26:52', 'throughput': '8303.98', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:52:59,842 >> {'loss': 0.3794, 'grad_norm': 13.02363109588623, 'learning_rate': 7.563432722923621e-07, 'epoch': 0.16409004610794684, 'num_input_tokens_seen': 2946498560, 'completed': '38.11% (1_405 / 3_687)', 'remaining time': '20:26:19', 'throughput': '8164.67', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:53:31,447 >> {'loss': 0.2997, 'grad_norm': 13.745916366577148, 'learning_rate': 7.559844421668074e-07, 'epoch': 0.1643612693246542, 'num_input_tokens_seen': 2948595712, 'completed': '38.13% (1_406 / 3_687)', 'remaining time': '20:25:44', 'throughput': '8294.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:53:59,596 >> {'loss': 0.5652, 'grad_norm': 16.339326858520508, 'learning_rate': 7.556254462728122e-07, 'epoch': 0.16463249254136153, 'num_input_tokens_seen': 2950692864, 'completed': '38.16% (1_407 / 3_687)', 'remaining time': '20:24:56', 'throughput': '9312.87', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:54:30,469 >> {'loss': 0.4496, 'grad_norm': 16.774593353271484, 'learning_rate': 7.552662848992822e-07, 'epoch': 0.1649037157580689, 'num_input_tokens_seen': 2952790016, 'completed': '38.19% (1_408 / 3_687)', 'remaining time': '20:24:19', 'throughput': '8490.87', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:55:01,429 >> {'loss': 0.4142, 'grad_norm': 14.789047241210938, 'learning_rate': 7.54906958335257e-07, 'epoch': 0.16517493897477625, 'num_input_tokens_seen': 2954887168, 'completed': '38.22% (1_409 / 3_687)', 'remaining time': '20:23:42', 'throughput': '8467.29', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:55:31,575 >> {'loss': 0.3419, 'grad_norm': 15.732011795043945, 'learning_rate': 7.545474668699091e-07, 'epoch': 0.1654461621914836, 'num_input_tokens_seen': 2956984320, 'completed': '38.24% (1_410 / 3_687)', 'remaining time': '20:23:02', 'throughput': '8695.80', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:56:01,476 >> {'loss': 0.3786, 'grad_norm': 14.704493522644043, 'learning_rate': 7.541878107925435e-07, 'epoch': 0.16571738540819095, 'num_input_tokens_seen': 2959081472, 'completed': '38.27% (1_411 / 3_687)', 'remaining time': '20:22:21', 'throughput': '8767.00', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:56:32,158 >> {'loss': 0.2512, 'grad_norm': 13.177338600158691, 'learning_rate': 7.538279903925977e-07, 'epoch': 0.16598860862489828, 'num_input_tokens_seen': 2961178624, 'completed': '38.30% (1_412 / 3_687)', 'remaining time': '20:21:43', 'throughput': '8544.09', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:56:59,335 >> {'loss': 0.4833, 'grad_norm': 17.341163635253906, 'learning_rate': 7.534680059596414e-07, 'epoch': 0.16625983184160564, 'num_input_tokens_seen': 2963275776, 'completed': '38.32% (1_413 / 3_687)', 'remaining time': '20:20:52', 'throughput': '9645.58', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:57:29,592 >> {'loss': 0.6702, 'grad_norm': 23.333568572998047, 'learning_rate': 7.531078577833765e-07, 'epoch': 0.166531055058313, 'num_input_tokens_seen': 2965372928, 'completed': '38.35% (1_414 / 3_687)', 'remaining time': '20:20:13', 'throughput': '8663.95', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:58:00,737 >> {'loss': 0.3171, 'grad_norm': 12.770888328552246, 'learning_rate': 7.527475461536363e-07, 'epoch': 0.16680227827502034, 'num_input_tokens_seen': 2967470080, 'completed': '38.38% (1_415 / 3_687)', 'remaining time': '20:19:37', 'throughput': '8416.77', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:58:34,854 >> {'loss': 0.3315, 'grad_norm': 16.709857940673828, 'learning_rate': 7.523870713603864e-07, 'epoch': 0.1670735014917277, 'num_input_tokens_seen': 2969567232, 'completed': '38.41% (1_416 / 3_687)', 'remaining time': '20:19:12', 'throughput': '7683.68', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:59:05,088 >> {'loss': 0.3777, 'grad_norm': 14.015904426574707, 'learning_rate': 7.520264336937227e-07, 'epoch': 0.16734472470843503, 'num_input_tokens_seen': 2971664384, 'completed': '38.43% (1_417 / 3_687)', 'remaining time': '20:18:32', 'throughput': '8670.71', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 20:59:36,593 >> {'loss': 0.3262, 'grad_norm': 12.915022850036621, 'learning_rate': 7.516656334438727e-07, 'epoch': 0.1676159479251424, 'num_input_tokens_seen': 2973761536, 'completed': '38.46% (1_418 / 3_687)', 'remaining time': '20:17:57', 'throughput': '8320.57', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:00:07,850 >> {'loss': 0.5589, 'grad_norm': 19.277992248535156, 'learning_rate': 7.513046709011951e-07, 'epoch': 0.16788717114184976, 'num_input_tokens_seen': 2975858688, 'completed': '38.49% (1_419 / 3_687)', 'remaining time': '20:17:22', 'throughput': '8386.83', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:00:38,844 >> {'loss': 0.3394, 'grad_norm': 15.75462532043457, 'learning_rate': 7.509435463561785e-07, 'epoch': 0.1681583943585571, 'num_input_tokens_seen': 2977955840, 'completed': '38.51% (1_420 / 3_687)', 'remaining time': '20:16:45', 'throughput': '8457.78', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:01:09,050 >> {'loss': 0.7979, 'grad_norm': 21.828617095947266, 'learning_rate': 7.505822600994423e-07, 'epoch': 0.16842961757526445, 'num_input_tokens_seen': 2980052992, 'completed': '38.54% (1_421 / 3_687)', 'remaining time': '20:16:05', 'throughput': '8678.75', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:01:40,278 >> {'loss': 0.4936, 'grad_norm': 24.69605255126953, 'learning_rate': 7.502208124217357e-07, 'epoch': 0.16870084079197178, 'num_input_tokens_seen': 2982150144, 'completed': '38.57% (1_422 / 3_687)', 'remaining time': '20:15:30', 'throughput': '8394.29', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:02:08,418 >> {'loss': 0.6869, 'grad_norm': 21.24989891052246, 'learning_rate': 7.498592036139383e-07, 'epoch': 0.16897206400867915, 'num_input_tokens_seen': 2984247296, 'completed': '38.60% (1_423 / 3_687)', 'remaining time': '20:14:43', 'throughput': '9315.95', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:02:38,938 >> {'loss': 0.3728, 'grad_norm': 18.29302215576172, 'learning_rate': 7.494974339670591e-07, 'epoch': 0.16924328722538648, 'num_input_tokens_seen': 2986344448, 'completed': '38.62% (1_424 / 3_687)', 'remaining time': '20:14:05', 'throughput': '8589.29', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:03:08,246 >> {'loss': 0.3315, 'grad_norm': 15.673701286315918, 'learning_rate': 7.491355037722365e-07, 'epoch': 0.16951451044209384, 'num_input_tokens_seen': 2988441600, 'completed': '38.65% (1_425 / 3_687)', 'remaining time': '20:13:22', 'throughput': '8944.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:03:38,551 >> {'loss': 0.5771, 'grad_norm': 20.936105728149414, 'learning_rate': 7.487734133207382e-07, 'epoch': 0.1697857336588012, 'num_input_tokens_seen': 2990538752, 'completed': '38.68% (1_426 / 3_687)', 'remaining time': '20:12:43', 'throughput': '8650.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:04:07,986 >> {'loss': 0.2192, 'grad_norm': 10.056077003479004, 'learning_rate': 7.484111629039607e-07, 'epoch': 0.17005695687550854, 'num_input_tokens_seen': 2992635904, 'completed': '38.70% (1_427 / 3_687)', 'remaining time': '20:12:01', 'throughput': '8905.77', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:04:37,237 >> {'loss': 0.7378, 'grad_norm': 21.529075622558594, 'learning_rate': 7.480487528134292e-07, 'epoch': 0.1703281800922159, 'num_input_tokens_seen': 2994733056, 'completed': '38.73% (1_428 / 3_687)', 'remaining time': '20:11:18', 'throughput': '8962.10', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:05:09,316 >> {'loss': 0.3931, 'grad_norm': 16.297054290771484, 'learning_rate': 7.47686183340798e-07, 'epoch': 0.17059940330892323, 'num_input_tokens_seen': 2996830208, 'completed': '38.76% (1_429 / 3_687)', 'remaining time': '20:10:46', 'throughput': '8171.90', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:05:42,893 >> {'loss': 0.6421, 'grad_norm': 20.00302505493164, 'learning_rate': 7.473234547778489e-07, 'epoch': 0.1708706265256306, 'num_input_tokens_seen': 2998927360, 'completed': '38.78% (1_430 / 3_687)', 'remaining time': '20:10:19', 'throughput': '7807.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:06:14,005 >> {'loss': 0.4505, 'grad_norm': 16.071239471435547, 'learning_rate': 7.46960567416492e-07, 'epoch': 0.17114184974233795, 'num_input_tokens_seen': 3001024512, 'completed': '38.81% (1_431 / 3_687)', 'remaining time': '20:09:43', 'throughput': '8425.69', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:06:43,639 >> {'loss': 0.2977, 'grad_norm': 14.117545127868652, 'learning_rate': 7.465975215487655e-07, 'epoch': 0.1714130729590453, 'num_input_tokens_seen': 3003121664, 'completed': '38.84% (1_432 / 3_687)', 'remaining time': '20:09:01', 'throughput': '8845.99', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:07:13,192 >> {'loss': 0.6603, 'grad_norm': 21.76426124572754, 'learning_rate': 7.462343174668346e-07, 'epoch': 0.17168429617575265, 'num_input_tokens_seen': 3005218816, 'completed': '38.87% (1_433 / 3_687)', 'remaining time': '20:08:20', 'throughput': '8870.41', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:07:42,162 >> {'loss': 0.5771, 'grad_norm': 20.707962036132812, 'learning_rate': 7.458709554629924e-07, 'epoch': 0.17195551939245998, 'num_input_tokens_seen': 3007315968, 'completed': '38.89% (1_434 / 3_687)', 'remaining time': '20:07:36', 'throughput': '9048.89', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:08:11,132 >> {'loss': 0.6713, 'grad_norm': 19.084461212158203, 'learning_rate': 7.455074358296586e-07, 'epoch': 0.17222674260916734, 'num_input_tokens_seen': 3009413120, 'completed': '38.92% (1_435 / 3_687)', 'remaining time': '20:06:53', 'throughput': '9048.75', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:08:44,950 >> {'loss': 0.3933, 'grad_norm': 13.040257453918457, 'learning_rate': 7.451437588593802e-07, 'epoch': 0.1724979658258747, 'num_input_tokens_seen': 3011510272, 'completed': '38.95% (1_436 / 3_687)', 'remaining time': '20:06:27', 'throughput': '7751.78', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:09:15,209 >> {'loss': 0.5142, 'grad_norm': 16.921184539794922, 'learning_rate': 7.447799248448303e-07, 'epoch': 0.17276918904258204, 'num_input_tokens_seen': 3013607424, 'completed': '38.97% (1_437 / 3_687)', 'remaining time': '20:05:48', 'throughput': '8663.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:09:49,155 >> {'loss': 0.5446, 'grad_norm': 16.22494125366211, 'learning_rate': 7.444159340788088e-07, 'epoch': 0.1730404122592894, 'num_input_tokens_seen': 3015704576, 'completed': '39.00% (1_438 / 3_687)', 'remaining time': '20:05:22', 'throughput': '7722.36', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:10:16,513 >> {'loss': 0.3896, 'grad_norm': 13.45175838470459, 'learning_rate': 7.440517868542417e-07, 'epoch': 0.17331163547599673, 'num_input_tokens_seen': 3017801728, 'completed': '39.03% (1_439 / 3_687)', 'remaining time': '20:04:33', 'throughput': '9581.98', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:10:48,579 >> {'loss': 0.3761, 'grad_norm': 14.45242691040039, 'learning_rate': 7.436874834641807e-07, 'epoch': 0.1735828586927041, 'num_input_tokens_seen': 3019898880, 'completed': '39.06% (1_440 / 3_687)', 'remaining time': '20:04:00', 'throughput': '8174.96', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:11:20,167 >> {'loss': 0.5964, 'grad_norm': 21.77591323852539, 'learning_rate': 7.433230242018035e-07, 'epoch': 0.17385408190941146, 'num_input_tokens_seen': 3021996032, 'completed': '39.08% (1_441 / 3_687)', 'remaining time': '20:03:26', 'throughput': '8298.94', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:11:53,033 >> {'loss': 0.3924, 'grad_norm': 16.96351432800293, 'learning_rate': 7.429584093604128e-07, 'epoch': 0.1741253051261188, 'num_input_tokens_seen': 3024093184, 'completed': '39.11% (1_442 / 3_687)', 'remaining time': '20:02:57', 'throughput': '7976.25', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:12:22,648 >> {'loss': 0.6515, 'grad_norm': 19.076356887817383, 'learning_rate': 7.425936392334368e-07, 'epoch': 0.17439652834282615, 'num_input_tokens_seen': 3026190336, 'completed': '39.14% (1_443 / 3_687)', 'remaining time': '20:02:16', 'throughput': '8851.59', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:12:50,334 >> {'loss': 0.6622, 'grad_norm': 18.412277221679688, 'learning_rate': 7.422287141144287e-07, 'epoch': 0.17466775155953349, 'num_input_tokens_seen': 3028287488, 'completed': '39.16% (1_444 / 3_687)', 'remaining time': '20:01:28', 'throughput': '9468.61', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:13:17,575 >> {'loss': 0.7812, 'grad_norm': 19.85443687438965, 'learning_rate': 7.418636342970665e-07, 'epoch': 0.17493897477624085, 'num_input_tokens_seen': 3030384640, 'completed': '39.19% (1_445 / 3_687)', 'remaining time': '20:00:39', 'throughput': '9623.07', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:13:47,353 >> {'loss': 0.3791, 'grad_norm': 18.57454490661621, 'learning_rate': 7.414984000751521e-07, 'epoch': 0.1752101979929482, 'num_input_tokens_seen': 3032481792, 'completed': '39.22% (1_446 / 3_687)', 'remaining time': '19:59:58', 'throughput': '8803.32', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:14:17,011 >> {'loss': 0.3755, 'grad_norm': 13.877761840820312, 'learning_rate': 7.411330117426125e-07, 'epoch': 0.17548142120965554, 'num_input_tokens_seen': 3034578944, 'completed': '39.25% (1_447 / 3_687)', 'remaining time': '19:59:18', 'throughput': '8838.82', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:14:46,993 >> {'loss': 0.2692, 'grad_norm': 11.10472583770752, 'learning_rate': 7.407674695934983e-07, 'epoch': 0.1757526444263629, 'num_input_tokens_seen': 3036676096, 'completed': '39.27% (1_448 / 3_687)', 'remaining time': '19:58:38', 'throughput': '8743.52', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:15:19,343 >> {'loss': 0.4897, 'grad_norm': 16.257774353027344, 'learning_rate': 7.404017739219836e-07, 'epoch': 0.17602386764307024, 'num_input_tokens_seen': 3038773248, 'completed': '39.30% (1_449 / 3_687)', 'remaining time': '19:58:07', 'throughput': '8103.32', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:15:51,049 >> {'loss': 0.2769, 'grad_norm': 13.585163116455078, 'learning_rate': 7.400359250223667e-07, 'epoch': 0.1762950908597776, 'num_input_tokens_seen': 3040870400, 'completed': '39.33% (1_450 / 3_687)', 'remaining time': '19:57:33', 'throughput': '8267.87', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:16:20,439 >> {'loss': 0.5257, 'grad_norm': 16.29515266418457, 'learning_rate': 7.396699231890689e-07, 'epoch': 0.17656631407648496, 'num_input_tokens_seen': 3042967552, 'completed': '39.35% (1_451 / 3_687)', 'remaining time': '19:56:52', 'throughput': '8919.47', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:16:49,023 >> {'loss': 0.5871, 'grad_norm': 19.343111038208008, 'learning_rate': 7.393037687166342e-07, 'epoch': 0.1768375372931923, 'num_input_tokens_seen': 3045064704, 'completed': '39.38% (1_452 / 3_687)', 'remaining time': '19:56:08', 'throughput': '9171.19', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:17:23,844 >> {'loss': 0.5044, 'grad_norm': 17.655189514160156, 'learning_rate': 7.3893746189973e-07, 'epoch': 0.17710876050989965, 'num_input_tokens_seen': 3047161856, 'completed': '39.41% (1_453 / 3_687)', 'remaining time': '19:55:45', 'throughput': '7528.16', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:17:52,835 >> {'loss': 0.4438, 'grad_norm': 15.424549102783203, 'learning_rate': 7.385710030331461e-07, 'epoch': 0.177379983726607, 'num_input_tokens_seen': 3049259008, 'completed': '39.44% (1_454 / 3_687)', 'remaining time': '19:55:02', 'throughput': '9042.53', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:18:22,224 >> {'loss': 0.3884, 'grad_norm': 15.889403343200684, 'learning_rate': 7.382043924117945e-07, 'epoch': 0.17765120694331435, 'num_input_tokens_seen': 3051356160, 'completed': '39.46% (1_455 / 3_687)', 'remaining time': '19:54:21', 'throughput': '8919.53', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:18:52,081 >> {'loss': 0.2612, 'grad_norm': 11.11455249786377, 'learning_rate': 7.378376303307099e-07, 'epoch': 0.1779224301600217, 'num_input_tokens_seen': 3053453312, 'completed': '39.49% (1_456 / 3_687)', 'remaining time': '19:53:41', 'throughput': '8780.20', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:19:24,449 >> {'loss': 0.45, 'grad_norm': 18.22006607055664, 'learning_rate': 7.374707170850479e-07, 'epoch': 0.17819365337672904, 'num_input_tokens_seen': 3055550464, 'completed': '39.52% (1_457 / 3_687)', 'remaining time': '19:53:10', 'throughput': '8098.83', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:19:54,924 >> {'loss': 0.4225, 'grad_norm': 14.16891098022461, 'learning_rate': 7.371036529700866e-07, 'epoch': 0.1784648765934364, 'num_input_tokens_seen': 3057647616, 'completed': '39.54% (1_458 / 3_687)', 'remaining time': '19:52:32', 'throughput': '8601.96', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:20:23,997 >> {'loss': 0.3465, 'grad_norm': 13.474692344665527, 'learning_rate': 7.367364382812253e-07, 'epoch': 0.17873609981014374, 'num_input_tokens_seen': 3059744768, 'completed': '39.57% (1_459 / 3_687)', 'remaining time': '19:51:50', 'throughput': '9016.64', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:20:54,636 >> {'loss': 0.6558, 'grad_norm': 19.395767211914062, 'learning_rate': 7.363690733139842e-07, 'epoch': 0.1790073230268511, 'num_input_tokens_seen': 3061841920, 'completed': '39.60% (1_460 / 3_687)', 'remaining time': '19:51:13', 'throughput': '8555.99', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:21:25,288 >> {'loss': 0.7213, 'grad_norm': 22.699716567993164, 'learning_rate': 7.360015583640049e-07, 'epoch': 0.17927854624355846, 'num_input_tokens_seen': 3063939072, 'completed': '39.63% (1_461 / 3_687)', 'remaining time': '19:50:36', 'throughput': '8552.31', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:21:54,334 >> {'loss': 0.2935, 'grad_norm': 10.556961059570312, 'learning_rate': 7.356338937270492e-07, 'epoch': 0.1795497694602658, 'num_input_tokens_seen': 3066036224, 'completed': '39.65% (1_462 / 3_687)', 'remaining time': '19:49:53', 'throughput': '9024.96', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:22:23,705 >> {'loss': 0.3626, 'grad_norm': 21.715940475463867, 'learning_rate': 7.352660796989999e-07, 'epoch': 0.17982099267697316, 'num_input_tokens_seen': 3068133376, 'completed': '39.68% (1_463 / 3_687)', 'remaining time': '19:49:12', 'throughput': '8925.38', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:22:52,700 >> {'loss': 0.4329, 'grad_norm': 14.98128604888916, 'learning_rate': 7.348981165758595e-07, 'epoch': 0.1800922158936805, 'num_input_tokens_seen': 3070230528, 'completed': '39.71% (1_464 / 3_687)', 'remaining time': '19:48:30', 'throughput': '9041.10', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:23:23,041 >> {'loss': 0.4274, 'grad_norm': 13.6019868850708, 'learning_rate': 7.345300046537507e-07, 'epoch': 0.18036343911038785, 'num_input_tokens_seen': 3072327680, 'completed': '39.73% (1_465 / 3_687)', 'remaining time': '19:47:52', 'throughput': '8639.79', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:23:53,961 >> {'loss': 0.6069, 'grad_norm': 19.526018142700195, 'learning_rate': 7.341617442289159e-07, 'epoch': 0.18063466232709519, 'num_input_tokens_seen': 3074424832, 'completed': '39.76% (1_466 / 3_687)', 'remaining time': '19:47:16', 'throughput': '8478.00', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:24:26,387 >> {'loss': 0.484, 'grad_norm': 18.811630249023438, 'learning_rate': 7.337933355977175e-07, 'epoch': 0.18090588554380255, 'num_input_tokens_seen': 3076521984, 'completed': '39.79% (1_467 / 3_687)', 'remaining time': '19:46:45', 'throughput': '8084.57', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:24:57,639 >> {'loss': 0.3532, 'grad_norm': 11.840438842773438, 'learning_rate': 7.334247790566364e-07, 'epoch': 0.1811771087605099, 'num_input_tokens_seen': 3078619136, 'completed': '39.82% (1_468 / 3_687)', 'remaining time': '19:46:10', 'throughput': '8387.96', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:25:28,550 >> {'loss': 0.348, 'grad_norm': 13.839353561401367, 'learning_rate': 7.330560749022728e-07, 'epoch': 0.18144833197721724, 'num_input_tokens_seen': 3080716288, 'completed': '39.84% (1_469 / 3_687)', 'remaining time': '19:45:34', 'throughput': '8480.66', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:26:01,782 >> {'loss': 0.5346, 'grad_norm': 19.16689682006836, 'learning_rate': 7.326872234313459e-07, 'epoch': 0.1817195551939246, 'num_input_tokens_seen': 3082813440, 'completed': '39.87% (1_470 / 3_687)', 'remaining time': '19:45:06', 'throughput': '7888.39', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:26:33,950 >> {'loss': 0.3933, 'grad_norm': 14.058456420898438, 'learning_rate': 7.323182249406936e-07, 'epoch': 0.18199077841063194, 'num_input_tokens_seen': 3084910592, 'completed': '39.90% (1_471 / 3_687)', 'remaining time': '19:44:34', 'throughput': '8149.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:27:03,736 >> {'loss': 0.6038, 'grad_norm': 20.379125595092773, 'learning_rate': 7.319490797272714e-07, 'epoch': 0.1822620016273393, 'num_input_tokens_seen': 3087007744, 'completed': '39.92% (1_472 / 3_687)', 'remaining time': '19:43:55', 'throughput': '8800.73', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:27:37,011 >> {'loss': 0.459, 'grad_norm': 15.348321914672852, 'learning_rate': 7.315797880881535e-07, 'epoch': 0.18253322484404666, 'num_input_tokens_seen': 3089104896, 'completed': '39.95% (1_473 / 3_687)', 'remaining time': '19:43:27', 'throughput': '7878.06', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:28:07,424 >> {'loss': 0.4144, 'grad_norm': 12.275803565979004, 'learning_rate': 7.312103503205318e-07, 'epoch': 0.182804448060754, 'num_input_tokens_seen': 3091202048, 'completed': '39.98% (1_474 / 3_687)', 'remaining time': '19:42:49', 'throughput': '8619.43', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:28:39,041 >> {'loss': 0.1598, 'grad_norm': 9.52855110168457, 'learning_rate': 7.308407667217158e-07, 'epoch': 0.18307567127746135, 'num_input_tokens_seen': 3093299200, 'completed': '40.01% (1_475 / 3_687)', 'remaining time': '19:42:16', 'throughput': '8291.28', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:29:10,474 >> {'loss': 0.5072, 'grad_norm': 17.207202911376953, 'learning_rate': 7.304710375891323e-07, 'epoch': 0.1833468944941687, 'num_input_tokens_seen': 3095396352, 'completed': '40.03% (1_476 / 3_687)', 'remaining time': '19:41:41', 'throughput': '8339.97', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:29:42,454 >> {'loss': 0.7901, 'grad_norm': 21.52804946899414, 'learning_rate': 7.30101163220325e-07, 'epoch': 0.18361811771087605, 'num_input_tokens_seen': 3097493504, 'completed': '40.06% (1_477 / 3_687)', 'remaining time': '19:41:09', 'throughput': '8197.28', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:30:15,809 >> {'loss': 0.6151, 'grad_norm': 19.89445686340332, 'learning_rate': 7.29731143912955e-07, 'epoch': 0.1838893409275834, 'num_input_tokens_seen': 3099590656, 'completed': '40.09% (1_478 / 3_687)', 'remaining time': '19:40:41', 'throughput': '7858.96', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:30:48,046 >> {'loss': 0.4936, 'grad_norm': 15.77376651763916, 'learning_rate': 7.293609799647996e-07, 'epoch': 0.18416056414429074, 'num_input_tokens_seen': 3101687808, 'completed': '40.11% (1_479 / 3_687)', 'remaining time': '19:40:10', 'throughput': '8131.69', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:31:20,268 >> {'loss': 0.5218, 'grad_norm': 18.152956008911133, 'learning_rate': 7.289906716737528e-07, 'epoch': 0.1844317873609981, 'num_input_tokens_seen': 3103784960, 'completed': '40.14% (1_480 / 3_687)', 'remaining time': '19:39:38', 'throughput': '8135.65', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:31:53,199 >> {'loss': 0.3309, 'grad_norm': 13.07524585723877, 'learning_rate': 7.286202193378244e-07, 'epoch': 0.18470301057770544, 'num_input_tokens_seen': 3105882112, 'completed': '40.17% (1_481 / 3_687)', 'remaining time': '19:39:09', 'throughput': '7960.57', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:32:24,762 >> {'loss': 0.4835, 'grad_norm': 15.021199226379395, 'learning_rate': 7.282496232551406e-07, 'epoch': 0.1849742337944128, 'num_input_tokens_seen': 3107979264, 'completed': '40.20% (1_482 / 3_687)', 'remaining time': '19:38:35', 'throughput': '8305.21', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:32:55,497 >> {'loss': 0.5135, 'grad_norm': 18.78805923461914, 'learning_rate': 7.278788837239429e-07, 'epoch': 0.18524545701112016, 'num_input_tokens_seen': 3110076416, 'completed': '40.22% (1_483 / 3_687)', 'remaining time': '19:37:59', 'throughput': '8529.09', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:33:28,011 >> {'loss': 0.3278, 'grad_norm': 12.501900672912598, 'learning_rate': 7.275080010425883e-07, 'epoch': 0.1855166802278275, 'num_input_tokens_seen': 3112173568, 'completed': '40.25% (1_484 / 3_687)', 'remaining time': '19:37:28', 'throughput': '8062.58', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:33:56,138 >> {'loss': 0.5373, 'grad_norm': 15.580584526062012, 'learning_rate': 7.271369755095494e-07, 'epoch': 0.18578790344453486, 'num_input_tokens_seen': 3114270720, 'completed': '40.28% (1_485 / 3_687)', 'remaining time': '19:36:43', 'throughput': '9320.03', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:34:22,676 >> {'loss': 0.7682, 'grad_norm': 20.438100814819336, 'learning_rate': 7.267658074234128e-07, 'epoch': 0.1860591266612422, 'num_input_tokens_seen': 3116367872, 'completed': '40.30% (1_486 / 3_687)', 'remaining time': '19:35:54', 'throughput': '9877.96', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:34:53,672 >> {'loss': 0.5544, 'grad_norm': 19.1994686126709, 'learning_rate': 7.26394497082881e-07, 'epoch': 0.18633034987794955, 'num_input_tokens_seen': 3118465024, 'completed': '40.33% (1_487 / 3_687)', 'remaining time': '19:35:18', 'throughput': '8457.32', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:35:24,826 >> {'loss': 0.461, 'grad_norm': 14.906885147094727, 'learning_rate': 7.260230447867703e-07, 'epoch': 0.18660157309465691, 'num_input_tokens_seen': 3120562176, 'completed': '40.36% (1_488 / 3_687)', 'remaining time': '19:34:43', 'throughput': '8414.57', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:35:56,993 >> {'loss': 0.5265, 'grad_norm': 16.73325538635254, 'learning_rate': 7.256514508340114e-07, 'epoch': 0.18687279631136425, 'num_input_tokens_seen': 3122659328, 'completed': '40.39% (1_489 / 3_687)', 'remaining time': '19:34:11', 'throughput': '8149.48', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:36:25,301 >> {'loss': 0.6415, 'grad_norm': 22.4089412689209, 'learning_rate': 7.252797155236488e-07, 'epoch': 0.1871440195280716, 'num_input_tokens_seen': 3124756480, 'completed': '40.41% (1_490 / 3_687)', 'remaining time': '19:33:28', 'throughput': '9260.28', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:36:55,795 >> {'loss': 0.5622, 'grad_norm': 17.32809066772461, 'learning_rate': 7.249078391548409e-07, 'epoch': 0.18741524274477894, 'num_input_tokens_seen': 3126853632, 'completed': '40.44% (1_491 / 3_687)', 'remaining time': '19:32:51', 'throughput': '8596.66', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:37:28,067 >> {'loss': 0.2816, 'grad_norm': 10.146015167236328, 'learning_rate': 7.245358220268599e-07, 'epoch': 0.1876864659614863, 'num_input_tokens_seen': 3128950784, 'completed': '40.47% (1_492 / 3_687)', 'remaining time': '19:32:19', 'throughput': '8122.96', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:38:00,913 >> {'loss': 0.617, 'grad_norm': 24.50543975830078, 'learning_rate': 7.24163664439091e-07, 'epoch': 0.18795768917819367, 'num_input_tokens_seen': 3131047936, 'completed': '40.49% (1_493 / 3_687)', 'remaining time': '19:31:50', 'throughput': '7980.93', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:38:31,865 >> {'loss': 0.3164, 'grad_norm': 13.438515663146973, 'learning_rate': 7.237913666910322e-07, 'epoch': 0.188228912394901, 'num_input_tokens_seen': 3133145088, 'completed': '40.52% (1_494 / 3_687)', 'remaining time': '19:31:14', 'throughput': '8469.38', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:39:02,499 >> {'loss': 0.5909, 'grad_norm': 17.03264045715332, 'learning_rate': 7.234189290822947e-07, 'epoch': 0.18850013561160836, 'num_input_tokens_seen': 3135242240, 'completed': '40.55% (1_495 / 3_687)', 'remaining time': '19:30:38', 'throughput': '8557.38', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:39:31,580 >> {'loss': 0.4531, 'grad_norm': 14.248628616333008, 'learning_rate': 7.230463519126024e-07, 'epoch': 0.1887713588283157, 'num_input_tokens_seen': 3137339392, 'completed': '40.57% (1_496 / 3_687)', 'remaining time': '19:29:56', 'throughput': '9014.17', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:40:02,625 >> {'loss': 0.433, 'grad_norm': 16.5543212890625, 'learning_rate': 7.226736354817908e-07, 'epoch': 0.18904258204502306, 'num_input_tokens_seen': 3139436544, 'completed': '40.60% (1_497 / 3_687)', 'remaining time': '19:29:21', 'throughput': '8444.01', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:40:27,129 >> {'loss': 1.4103, 'grad_norm': 35.90945053100586, 'learning_rate': 7.223007800898082e-07, 'epoch': 0.18931380526173042, 'num_input_tokens_seen': 3141533696, 'completed': '40.63% (1_498 / 3_687)', 'remaining time': '19:28:25', 'throughput': '10698.28', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:40:55,959 >> {'loss': 0.6804, 'grad_norm': 19.54884910583496, 'learning_rate': 7.219277860367143e-07, 'epoch': 0.18958502847843775, 'num_input_tokens_seen': 3143630848, 'completed': '40.66% (1_499 / 3_687)', 'remaining time': '19:27:43', 'throughput': '9092.66', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:41:26,008 >> {'loss': 0.7872, 'grad_norm': 24.173274993896484, 'learning_rate': 7.215546536226805e-07, 'epoch': 0.1898562516951451, 'num_input_tokens_seen': 3145728000, 'completed': '40.68% (1_500 / 3_687)', 'remaining time': '19:27:05', 'throughput': '8723.85', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:41:53,619 >> {'loss': 0.4702, 'grad_norm': 15.859006881713867, 'learning_rate': 7.211813831479896e-07, 'epoch': 0.19012747491185245, 'num_input_tokens_seen': 3147825152, 'completed': '40.71% (1_501 / 3_687)', 'remaining time': '19:26:20', 'throughput': '9494.32', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:42:25,635 >> {'loss': 0.4567, 'grad_norm': 14.973697662353516, 'learning_rate': 7.208079749130356e-07, 'epoch': 0.1903986981285598, 'num_input_tokens_seen': 3149922304, 'completed': '40.74% (1_502 / 3_687)', 'remaining time': '19:25:48', 'throughput': '8187.85', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:42:52,441 >> {'loss': 0.5384, 'grad_norm': 17.608720779418945, 'learning_rate': 7.204344292183228e-07, 'epoch': 0.19066992134526717, 'num_input_tokens_seen': 3152019456, 'completed': '40.76% (1_503 / 3_687)', 'remaining time': '19:24:59', 'throughput': '9779.26', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:43:21,512 >> {'loss': 0.3972, 'grad_norm': 17.97117042541504, 'learning_rate': 7.200607463644673e-07, 'epoch': 0.1909411445619745, 'num_input_tokens_seen': 3154116608, 'completed': '40.79% (1_504 / 3_687)', 'remaining time': '19:24:18', 'throughput': '9017.54', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:43:53,415 >> {'loss': 0.235, 'grad_norm': 17.989160537719727, 'learning_rate': 7.196869266521941e-07, 'epoch': 0.19121236777868186, 'num_input_tokens_seen': 3156213760, 'completed': '40.82% (1_505 / 3_687)', 'remaining time': '19:23:46', 'throughput': '8216.67', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:44:27,477 >> {'loss': 0.5892, 'grad_norm': 14.863112449645996, 'learning_rate': 7.193129703823395e-07, 'epoch': 0.1914835909953892, 'num_input_tokens_seen': 3158310912, 'completed': '40.85% (1_506 / 3_687)', 'remaining time': '19:23:20', 'throughput': '7696.21', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:44:56,793 >> {'loss': 0.4773, 'grad_norm': 16.317642211914062, 'learning_rate': 7.189388778558491e-07, 'epoch': 0.19175481421209656, 'num_input_tokens_seen': 3160408064, 'completed': '40.87% (1_507 / 3_687)', 'remaining time': '19:22:40', 'throughput': '8942.12', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:45:26,619 >> {'loss': 0.3363, 'grad_norm': 12.433012962341309, 'learning_rate': 7.185646493737785e-07, 'epoch': 0.1920260374288039, 'num_input_tokens_seen': 3162505216, 'completed': '40.90% (1_508 / 3_687)', 'remaining time': '19:22:01', 'throughput': '8789.01', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:45:58,310 >> {'loss': 0.3374, 'grad_norm': 12.628132820129395, 'learning_rate': 7.181902852372924e-07, 'epoch': 0.19229726064551125, 'num_input_tokens_seen': 3164602368, 'completed': '40.93% (1_509 / 3_687)', 'remaining time': '19:21:28', 'throughput': '8271.86', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:46:28,587 >> {'loss': 0.5477, 'grad_norm': 18.057632446289062, 'learning_rate': 7.17815785747665e-07, 'epoch': 0.19256848386221861, 'num_input_tokens_seen': 3166699520, 'completed': '40.95% (1_510 / 3_687)', 'remaining time': '19:20:51', 'throughput': '8658.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:46:57,561 >> {'loss': 0.4914, 'grad_norm': 17.424755096435547, 'learning_rate': 7.174411512062789e-07, 'epoch': 0.19283970707892595, 'num_input_tokens_seen': 3168796672, 'completed': '40.98% (1_511 / 3_687)', 'remaining time': '19:20:10', 'throughput': '9047.53', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:47:26,636 >> {'loss': 0.5325, 'grad_norm': 15.669259071350098, 'learning_rate': 7.170663819146259e-07, 'epoch': 0.1931109302956333, 'num_input_tokens_seen': 3170893824, 'completed': '41.01% (1_512 / 3_687)', 'remaining time': '19:19:29', 'throughput': '9016.16', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:47:55,258 >> {'loss': 0.6616, 'grad_norm': 19.925724029541016, 'learning_rate': 7.166914781743062e-07, 'epoch': 0.19338215351234064, 'num_input_tokens_seen': 3172990976, 'completed': '41.04% (1_513 / 3_687)', 'remaining time': '19:18:47', 'throughput': '9158.89', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:48:26,781 >> {'loss': 0.3527, 'grad_norm': 12.23149585723877, 'learning_rate': 7.16316440287028e-07, 'epoch': 0.193653376729048, 'num_input_tokens_seen': 3175088128, 'completed': '41.06% (1_514 / 3_687)', 'remaining time': '19:18:13', 'throughput': '8315.87', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:48:58,365 >> {'loss': 0.5477, 'grad_norm': 20.181726455688477, 'learning_rate': 7.159412685546073e-07, 'epoch': 0.19392459994575537, 'num_input_tokens_seen': 3177185280, 'completed': '41.09% (1_515 / 3_687)', 'remaining time': '19:17:40', 'throughput': '8300.03', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:49:28,774 >> {'loss': 0.7233, 'grad_norm': 19.928884506225586, 'learning_rate': 7.155659632789683e-07, 'epoch': 0.1941958231624627, 'num_input_tokens_seen': 3179282432, 'completed': '41.12% (1_516 / 3_687)', 'remaining time': '19:17:03', 'throughput': '8620.44', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:50:01,697 >> {'loss': 0.6027, 'grad_norm': 20.663057327270508, 'learning_rate': 7.151905247621422e-07, 'epoch': 0.19446704637917006, 'num_input_tokens_seen': 3181379584, 'completed': '41.14% (1_517 / 3_687)', 'remaining time': '19:16:34', 'throughput': '7962.39', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:50:29,017 >> {'loss': 0.4609, 'grad_norm': 15.95964527130127, 'learning_rate': 7.148149533062678e-07, 'epoch': 0.1947382695958774, 'num_input_tokens_seen': 3183476736, 'completed': '41.17% (1_518 / 3_687)', 'remaining time': '19:15:48', 'throughput': '9595.41', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:50:58,815 >> {'loss': 0.4759, 'grad_norm': 18.054109573364258, 'learning_rate': 7.144392492135908e-07, 'epoch': 0.19500949281258476, 'num_input_tokens_seen': 3185573888, 'completed': '41.20% (1_519 / 3_687)', 'remaining time': '19:15:10', 'throughput': '8797.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:51:27,532 >> {'loss': 0.7295, 'grad_norm': 36.16549301147461, 'learning_rate': 7.140634127864632e-07, 'epoch': 0.19528071602929212, 'num_input_tokens_seen': 3187671040, 'completed': '41.23% (1_520 / 3_687)', 'remaining time': '19:14:28', 'throughput': '9128.54', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:51:57,950 >> {'loss': 0.4694, 'grad_norm': 16.543094635009766, 'learning_rate': 7.136874443273442e-07, 'epoch': 0.19555193924599945, 'num_input_tokens_seen': 3189768192, 'completed': '41.25% (1_521 / 3_687)', 'remaining time': '19:13:51', 'throughput': '8617.99', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:52:25,510 >> {'loss': 0.6975, 'grad_norm': 18.826963424682617, 'learning_rate': 7.133113441387988e-07, 'epoch': 0.1958231624627068, 'num_input_tokens_seen': 3191865344, 'completed': '41.28% (1_522 / 3_687)', 'remaining time': '19:13:06', 'throughput': '9511.91', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:52:57,351 >> {'loss': 0.2668, 'grad_norm': 11.468395233154297, 'learning_rate': 7.129351125234979e-07, 'epoch': 0.19609438567941415, 'num_input_tokens_seen': 3193962496, 'completed': '41.31% (1_523 / 3_687)', 'remaining time': '19:12:34', 'throughput': '8232.74', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:53:25,071 >> {'loss': 0.6557, 'grad_norm': 18.200729370117188, 'learning_rate': 7.125587497842189e-07, 'epoch': 0.1963656088961215, 'num_input_tokens_seen': 3196059648, 'completed': '41.33% (1_524 / 3_687)', 'remaining time': '19:11:49', 'throughput': '9456.96', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:53:56,504 >> {'loss': 0.2728, 'grad_norm': 11.098851203918457, 'learning_rate': 7.121822562238436e-07, 'epoch': 0.19663683211282887, 'num_input_tokens_seen': 3198156800, 'completed': '41.36% (1_525 / 3_687)', 'remaining time': '19:11:16', 'throughput': '8339.77', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:54:26,806 >> {'loss': 0.5025, 'grad_norm': 15.489395141601562, 'learning_rate': 7.118056321453601e-07, 'epoch': 0.1969080553295362, 'num_input_tokens_seen': 3200253952, 'completed': '41.39% (1_526 / 3_687)', 'remaining time': '19:10:39', 'throughput': '8651.07', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:54:57,282 >> {'loss': 0.4449, 'grad_norm': 14.967645645141602, 'learning_rate': 7.11428877851861e-07, 'epoch': 0.19717927854624356, 'num_input_tokens_seen': 3202351104, 'completed': '41.42% (1_527 / 3_687)', 'remaining time': '19:10:03', 'throughput': '8601.61', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:55:27,496 >> {'loss': 0.4, 'grad_norm': 15.567303657531738, 'learning_rate': 7.110519936465438e-07, 'epoch': 0.1974505017629509, 'num_input_tokens_seen': 3204448256, 'completed': '41.44% (1_528 / 3_687)', 'remaining time': '19:09:25', 'throughput': '8676.25', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:55:59,252 >> {'loss': 0.3363, 'grad_norm': 16.95789337158203, 'learning_rate': 7.106749798327106e-07, 'epoch': 0.19772172497965826, 'num_input_tokens_seen': 3206545408, 'completed': '41.47% (1_529 / 3_687)', 'remaining time': '19:08:53', 'throughput': '8255.09', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:56:30,653 >> {'loss': 0.4492, 'grad_norm': 16.923965454101562, 'learning_rate': 7.102978367137679e-07, 'epoch': 0.19799294819636562, 'num_input_tokens_seen': 3208642560, 'completed': '41.50% (1_530 / 3_687)', 'remaining time': '19:08:19', 'throughput': '8348.03', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:56:59,754 >> {'loss': 0.4185, 'grad_norm': 16.046384811401367, 'learning_rate': 7.099205645932258e-07, 'epoch': 0.19826417141307295, 'num_input_tokens_seen': 3210739712, 'completed': '41.52% (1_531 / 3_687)', 'remaining time': '19:07:39', 'throughput': '9008.24', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:57:28,494 >> {'loss': 0.6639, 'grad_norm': 20.16338348388672, 'learning_rate': 7.095431637746988e-07, 'epoch': 0.19853539462978032, 'num_input_tokens_seen': 3212836864, 'completed': '41.55% (1_532 / 3_687)', 'remaining time': '19:06:58', 'throughput': '9121.17', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:58:00,326 >> {'loss': 0.3112, 'grad_norm': 12.801714897155762, 'learning_rate': 7.091656345619047e-07, 'epoch': 0.19880661784648765, 'num_input_tokens_seen': 3214934016, 'completed': '41.58% (1_533 / 3_687)', 'remaining time': '19:06:26', 'throughput': '8235.20', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:58:31,568 >> {'loss': 0.4997, 'grad_norm': 14.88927936553955, 'learning_rate': 7.087879772586647e-07, 'epoch': 0.199077841063195, 'num_input_tokens_seen': 3217031168, 'completed': '41.61% (1_534 / 3_687)', 'remaining time': '19:05:52', 'throughput': '8390.82', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:59:04,076 >> {'loss': 0.5001, 'grad_norm': 20.41654396057129, 'learning_rate': 7.084101921689029e-07, 'epoch': 0.19934906427990237, 'num_input_tokens_seen': 3219128320, 'completed': '41.63% (1_535 / 3_687)', 'remaining time': '19:05:21', 'throughput': '8064.03', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 21:59:34,319 >> {'loss': 0.5536, 'grad_norm': 18.13739013671875, 'learning_rate': 7.080322795966462e-07, 'epoch': 0.1996202874966097, 'num_input_tokens_seen': 3221225472, 'completed': '41.66% (1_536 / 3_687)', 'remaining time': '19:04:44', 'throughput': '8667.84', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:00:05,399 >> {'loss': 0.4459, 'grad_norm': 15.031099319458008, 'learning_rate': 7.076542398460247e-07, 'epoch': 0.19989151071331707, 'num_input_tokens_seen': 3223322624, 'completed': '41.69% (1_537 / 3_687)', 'remaining time': '19:04:10', 'throughput': '8434.48', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:00:32,677 >> {'loss': 0.4325, 'grad_norm': 16.9238224029541, 'learning_rate': 7.0727607322127e-07, 'epoch': 0.2001627339300244, 'num_input_tokens_seen': 3225419776, 'completed': '41.71% (1_538 / 3_687)', 'remaining time': '19:03:25', 'throughput': '9610.16', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:01:03,316 >> {'loss': 0.5089, 'grad_norm': 17.309598922729492, 'learning_rate': 7.068977800267164e-07, 'epoch': 0.20043395714673176, 'num_input_tokens_seen': 3227516928, 'completed': '41.74% (1_539 / 3_687)', 'remaining time': '19:02:49', 'throughput': '8555.80', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:01:34,443 >> {'loss': 0.3359, 'grad_norm': 14.402442932128906, 'learning_rate': 7.065193605667999e-07, 'epoch': 0.20070518036343912, 'num_input_tokens_seen': 3229614080, 'completed': '41.77% (1_540 / 3_687)', 'remaining time': '19:02:15', 'throughput': '8421.88', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:02:07,606 >> {'loss': 0.3825, 'grad_norm': 12.946207046508789, 'learning_rate': 7.06140815146058e-07, 'epoch': 0.20097640358014646, 'num_input_tokens_seen': 3231711232, 'completed': '41.80% (1_541 / 3_687)', 'remaining time': '19:01:46', 'throughput': '7904.67', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:02:36,541 >> {'loss': 0.6832, 'grad_norm': 21.32681655883789, 'learning_rate': 7.057621440691296e-07, 'epoch': 0.20124762679685382, 'num_input_tokens_seen': 3233808384, 'completed': '41.82% (1_542 / 3_687)', 'remaining time': '19:01:06', 'throughput': '9059.65', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:03:08,328 >> {'loss': 0.5939, 'grad_norm': 18.230838775634766, 'learning_rate': 7.053833476407549e-07, 'epoch': 0.20151885001356115, 'num_input_tokens_seen': 3235905536, 'completed': '41.85% (1_543 / 3_687)', 'remaining time': '19:00:33', 'throughput': '8246.88', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:03:38,896 >> {'loss': 0.3152, 'grad_norm': 14.465167045593262, 'learning_rate': 7.050044261657748e-07, 'epoch': 0.2017900732302685, 'num_input_tokens_seen': 3238002688, 'completed': '41.88% (1_544 / 3_687)', 'remaining time': '18:59:58', 'throughput': '8575.78', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:04:08,887 >> {'loss': 0.2369, 'grad_norm': 11.428820610046387, 'learning_rate': 7.04625379949131e-07, 'epoch': 0.20206129644697587, 'num_input_tokens_seen': 3240099840, 'completed': '41.90% (1_545 / 3_687)', 'remaining time': '18:59:20', 'throughput': '8740.81', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:04:38,843 >> {'loss': 0.4465, 'grad_norm': 19.767881393432617, 'learning_rate': 7.042462092958651e-07, 'epoch': 0.2023325196636832, 'num_input_tokens_seen': 3242196992, 'completed': '41.93% (1_546 / 3_687)', 'remaining time': '18:58:43', 'throughput': '8751.18', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:05:08,612 >> {'loss': 0.4017, 'grad_norm': 13.485447883605957, 'learning_rate': 7.038669145111195e-07, 'epoch': 0.20260374288039057, 'num_input_tokens_seen': 3244294144, 'completed': '41.96% (1_547 / 3_687)', 'remaining time': '18:58:05', 'throughput': '8805.77', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:05:35,918 >> {'loss': 0.4824, 'grad_norm': 15.75723648071289, 'learning_rate': 7.034874959001363e-07, 'epoch': 0.2028749660970979, 'num_input_tokens_seen': 3246391296, 'completed': '41.99% (1_548 / 3_687)', 'remaining time': '18:57:20', 'throughput': '9600.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:06:07,100 >> {'loss': 0.3523, 'grad_norm': 13.907236099243164, 'learning_rate': 7.031079537682569e-07, 'epoch': 0.20314618931380526, 'num_input_tokens_seen': 3248488448, 'completed': '42.01% (1_549 / 3_687)', 'remaining time': '18:56:46', 'throughput': '8407.04', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:06:36,035 >> {'loss': 0.6136, 'grad_norm': 24.97413444519043, 'learning_rate': 7.027282884209227e-07, 'epoch': 0.20341741253051263, 'num_input_tokens_seen': 3250585600, 'completed': '42.04% (1_550 / 3_687)', 'remaining time': '18:56:05', 'throughput': '9059.65', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:07:08,197 >> {'loss': 0.3508, 'grad_norm': 15.245758056640625, 'learning_rate': 7.023485001636737e-07, 'epoch': 0.20368863574721996, 'num_input_tokens_seen': 3252682752, 'completed': '42.07% (1_551 / 3_687)', 'remaining time': '18:55:34', 'throughput': '8150.70', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:07:39,910 >> {'loss': 0.4695, 'grad_norm': 16.83147430419922, 'learning_rate': 7.019685893021488e-07, 'epoch': 0.20395985896392732, 'num_input_tokens_seen': 3254779904, 'completed': '42.09% (1_552 / 3_687)', 'remaining time': '18:55:02', 'throughput': '8266.03', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:08:09,734 >> {'loss': 0.2958, 'grad_norm': 17.15081787109375, 'learning_rate': 7.015885561420863e-07, 'epoch': 0.20423108218063465, 'num_input_tokens_seen': 3256877056, 'completed': '42.12% (1_553 / 3_687)', 'remaining time': '18:54:24', 'throughput': '8789.85', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:08:41,267 >> {'loss': 0.2439, 'grad_norm': 11.432106018066406, 'learning_rate': 7.012084009893221e-07, 'epoch': 0.20450230539734202, 'num_input_tokens_seen': 3258974208, 'completed': '42.15% (1_554 / 3_687)', 'remaining time': '18:53:51', 'throughput': '8313.40', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:09:10,762 >> {'loss': 0.604, 'grad_norm': 18.06707000732422, 'learning_rate': 7.008281241497908e-07, 'epoch': 0.20477352861404935, 'num_input_tokens_seen': 3261071360, 'completed': '42.18% (1_555 / 3_687)', 'remaining time': '18:53:12', 'throughput': '8887.98', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:09:41,068 >> {'loss': 0.4381, 'grad_norm': 17.187938690185547, 'learning_rate': 7.004477259295244e-07, 'epoch': 0.2050447518307567, 'num_input_tokens_seen': 3263168512, 'completed': '42.20% (1_556 / 3_687)', 'remaining time': '18:52:36', 'throughput': '8649.67', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:10:10,778 >> {'loss': 0.1795, 'grad_norm': 8.261337280273438, 'learning_rate': 7.000672066346532e-07, 'epoch': 0.20531597504746407, 'num_input_tokens_seen': 3265265664, 'completed': '42.23% (1_557 / 3_687)', 'remaining time': '18:51:58', 'throughput': '8823.30', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:10:41,821 >> {'loss': 0.319, 'grad_norm': 14.19255256652832, 'learning_rate': 6.996865665714046e-07, 'epoch': 0.2055871982641714, 'num_input_tokens_seen': 3267362816, 'completed': '42.26% (1_558 / 3_687)', 'remaining time': '18:51:24', 'throughput': '8445.35', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:11:12,730 >> {'loss': 0.5702, 'grad_norm': 21.178258895874023, 'learning_rate': 6.993058060461035e-07, 'epoch': 0.20585842148087877, 'num_input_tokens_seen': 3269459968, 'completed': '42.28% (1_559 / 3_687)', 'remaining time': '18:50:49', 'throughput': '8481.93', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:11:43,260 >> {'loss': 0.5158, 'grad_norm': 17.968137741088867, 'learning_rate': 6.989249253651708e-07, 'epoch': 0.2061296446975861, 'num_input_tokens_seen': 3271557120, 'completed': '42.31% (1_560 / 3_687)', 'remaining time': '18:50:13', 'throughput': '8584.74', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:12:17,525 >> {'loss': 0.2995, 'grad_norm': 12.363228797912598, 'learning_rate': 6.985439248351254e-07, 'epoch': 0.20640086791429346, 'num_input_tokens_seen': 3273654272, 'completed': '42.34% (1_561 / 3_687)', 'remaining time': '18:49:48', 'throughput': '7650.63', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:12:48,748 >> {'loss': 0.9089, 'grad_norm': 28.861257553100586, 'learning_rate': 6.981628047625818e-07, 'epoch': 0.20667209113100082, 'num_input_tokens_seen': 3275751424, 'completed': '42.37% (1_562 / 3_687)', 'remaining time': '18:49:14', 'throughput': '8395.88', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:13:19,864 >> {'loss': 0.5814, 'grad_norm': 19.901887893676758, 'learning_rate': 6.977815654542508e-07, 'epoch': 0.20694331434770816, 'num_input_tokens_seen': 3277848576, 'completed': '42.39% (1_563 / 3_687)', 'remaining time': '18:48:40', 'throughput': '8424.58', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:13:48,857 >> {'loss': 0.2016, 'grad_norm': 15.665919303894043, 'learning_rate': 6.974002072169395e-07, 'epoch': 0.20721453756441552, 'num_input_tokens_seen': 3279945728, 'completed': '42.42% (1_564 / 3_687)', 'remaining time': '18:48:00', 'throughput': '9041.81', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:14:22,556 >> {'loss': 0.3062, 'grad_norm': 19.99432945251465, 'learning_rate': 6.970187303575505e-07, 'epoch': 0.20748576078112285, 'num_input_tokens_seen': 3282042880, 'completed': '42.45% (1_565 / 3_687)', 'remaining time': '18:47:34', 'throughput': '7779.13', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:14:50,244 >> {'loss': 0.8428, 'grad_norm': 22.537017822265625, 'learning_rate': 6.966371351830818e-07, 'epoch': 0.2077569839978302, 'num_input_tokens_seen': 3284140032, 'completed': '42.47% (1_566 / 3_687)', 'remaining time': '18:46:50', 'throughput': '9467.61', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:15:22,905 >> {'loss': 0.3116, 'grad_norm': 13.440853118896484, 'learning_rate': 6.962554220006265e-07, 'epoch': 0.20802820721453757, 'num_input_tokens_seen': 3286237184, 'completed': '42.50% (1_567 / 3_687)', 'remaining time': '18:46:20', 'throughput': '8026.16', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:15:53,636 >> {'loss': 0.4781, 'grad_norm': 17.965656280517578, 'learning_rate': 6.958735911173729e-07, 'epoch': 0.2082994304312449, 'num_input_tokens_seen': 3288334336, 'completed': '42.53% (1_568 / 3_687)', 'remaining time': '18:45:45', 'throughput': '8530.34', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:16:23,573 >> {'loss': 0.6427, 'grad_norm': 19.18067741394043, 'learning_rate': 6.954916428406045e-07, 'epoch': 0.20857065364795227, 'num_input_tokens_seen': 3290431488, 'completed': '42.55% (1_569 / 3_687)', 'remaining time': '18:45:08', 'throughput': '8756.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:16:54,263 >> {'loss': 0.5277, 'grad_norm': 19.246889114379883, 'learning_rate': 6.95109577477698e-07, 'epoch': 0.2088418768646596, 'num_input_tokens_seen': 3292528640, 'completed': '42.58% (1_570 / 3_687)', 'remaining time': '18:44:33', 'throughput': '8541.75', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:17:22,873 >> {'loss': 0.5097, 'grad_norm': 16.653470993041992, 'learning_rate': 6.947273953361255e-07, 'epoch': 0.20911310008136696, 'num_input_tokens_seen': 3294625792, 'completed': '42.61% (1_571 / 3_687)', 'remaining time': '18:43:52', 'throughput': '9162.75', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:17:55,698 >> {'loss': 0.3398, 'grad_norm': 16.175172805786133, 'learning_rate': 6.943450967234524e-07, 'epoch': 0.20938432329807433, 'num_input_tokens_seen': 3296722944, 'completed': '42.64% (1_572 / 3_687)', 'remaining time': '18:43:23', 'throughput': '7986.01', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:18:24,755 >> {'loss': 0.3364, 'grad_norm': 15.780593872070312, 'learning_rate': 6.939626819473384e-07, 'epoch': 0.20965554651478166, 'num_input_tokens_seen': 3298820096, 'completed': '42.66% (1_573 / 3_687)', 'remaining time': '18:42:43', 'throughput': '9021.94', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:18:51,694 >> {'loss': 0.7428, 'grad_norm': 22.667606353759766, 'learning_rate': 6.93580151315536e-07, 'epoch': 0.20992676973148902, 'num_input_tokens_seen': 3300917248, 'completed': '42.69% (1_574 / 3_687)', 'remaining time': '18:41:58', 'throughput': '9730.92', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:19:20,552 >> {'loss': 0.3328, 'grad_norm': 15.267520904541016, 'learning_rate': 6.931975051358914e-07, 'epoch': 0.21019799294819635, 'num_input_tokens_seen': 3303014400, 'completed': '42.72% (1_575 / 3_687)', 'remaining time': '18:41:18', 'throughput': '9083.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:19:48,862 >> {'loss': 0.5853, 'grad_norm': 19.079408645629883, 'learning_rate': 6.928147437163439e-07, 'epoch': 0.21046921616490372, 'num_input_tokens_seen': 3305111552, 'completed': '42.74% (1_576 / 3_687)', 'remaining time': '18:40:37', 'throughput': '9259.75', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:20:16,599 >> {'loss': 0.3437, 'grad_norm': 13.351947784423828, 'learning_rate': 6.92431867364925e-07, 'epoch': 0.21074043938161108, 'num_input_tokens_seen': 3307208704, 'completed': '42.77% (1_577 / 3_687)', 'remaining time': '18:39:54', 'throughput': '9451.13', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:20:44,418 >> {'loss': 0.8318, 'grad_norm': 24.631118774414062, 'learning_rate': 6.920488763897593e-07, 'epoch': 0.2110116625983184, 'num_input_tokens_seen': 3309305856, 'completed': '42.80% (1_578 / 3_687)', 'remaining time': '18:39:11', 'throughput': '9423.30', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:21:16,556 >> {'loss': 0.5461, 'grad_norm': 17.947328567504883, 'learning_rate': 6.916657710990632e-07, 'epoch': 0.21128288581502577, 'num_input_tokens_seen': 3311403008, 'completed': '42.83% (1_579 / 3_687)', 'remaining time': '18:38:40', 'throughput': '8156.81', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:21:45,659 >> {'loss': 0.6746, 'grad_norm': 21.530193328857422, 'learning_rate': 6.912825518011452e-07, 'epoch': 0.2115541090317331, 'num_input_tokens_seen': 3313500160, 'completed': '42.85% (1_580 / 3_687)', 'remaining time': '18:38:01', 'throughput': '9007.45', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:22:15,560 >> {'loss': 0.3837, 'grad_norm': 16.4180850982666, 'learning_rate': 6.90899218804406e-07, 'epoch': 0.21182533224844047, 'num_input_tokens_seen': 3315597312, 'completed': '42.88% (1_581 / 3_687)', 'remaining time': '18:37:23', 'throughput': '8767.06', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:22:48,229 >> {'loss': 0.3731, 'grad_norm': 18.914493560791016, 'learning_rate': 6.905157724173369e-07, 'epoch': 0.21209655546514783, 'num_input_tokens_seen': 3317694464, 'completed': '42.91% (1_582 / 3_687)', 'remaining time': '18:36:54', 'throughput': '8024.23', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:23:18,118 >> {'loss': 0.4683, 'grad_norm': 15.548588752746582, 'learning_rate': 6.901322129485212e-07, 'epoch': 0.21236777868185516, 'num_input_tokens_seen': 3319791616, 'completed': '42.93% (1_583 / 3_687)', 'remaining time': '18:36:17', 'throughput': '8770.66', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:23:48,827 >> {'loss': 0.3252, 'grad_norm': 14.097211837768555, 'learning_rate': 6.897485407066329e-07, 'epoch': 0.21263900189856252, 'num_input_tokens_seen': 3321888768, 'completed': '42.96% (1_584 / 3_687)', 'remaining time': '18:35:42', 'throughput': '8536.37', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:24:21,721 >> {'loss': 0.5362, 'grad_norm': 15.844295501708984, 'learning_rate': 6.893647560004369e-07, 'epoch': 0.21291022511526986, 'num_input_tokens_seen': 3323985920, 'completed': '42.99% (1_585 / 3_687)', 'remaining time': '18:35:13', 'throughput': '7969.33', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:24:54,665 >> {'loss': 0.4111, 'grad_norm': 18.061901092529297, 'learning_rate': 6.889808591387885e-07, 'epoch': 0.21318144833197722, 'num_input_tokens_seen': 3326083072, 'completed': '43.02% (1_586 / 3_687)', 'remaining time': '18:34:44', 'throughput': '7957.20', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:25:25,384 >> {'loss': 0.408, 'grad_norm': 15.680930137634277, 'learning_rate': 6.885968504306334e-07, 'epoch': 0.21345267154868458, 'num_input_tokens_seen': 3328180224, 'completed': '43.04% (1_587 / 3_687)', 'remaining time': '18:34:09', 'throughput': '8533.81', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:25:55,408 >> {'loss': 0.684, 'grad_norm': 22.98466682434082, 'learning_rate': 6.882127301850069e-07, 'epoch': 0.21372389476539191, 'num_input_tokens_seen': 3330277376, 'completed': '43.07% (1_588 / 3_687)', 'remaining time': '18:33:33', 'throughput': '8731.01', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:26:25,616 >> {'loss': 0.576, 'grad_norm': 20.19873046875, 'learning_rate': 6.878284987110345e-07, 'epoch': 0.21399511798209928, 'num_input_tokens_seen': 3332374528, 'completed': '43.10% (1_589 / 3_687)', 'remaining time': '18:32:56', 'throughput': '8678.06', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:26:55,606 >> {'loss': 0.3008, 'grad_norm': 12.961610794067383, 'learning_rate': 6.874441563179313e-07, 'epoch': 0.2142663411988066, 'num_input_tokens_seen': 3334471680, 'completed': '43.12% (1_590 / 3_687)', 'remaining time': '18:32:20', 'throughput': '8741.01', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:27:25,466 >> {'loss': 0.6572, 'grad_norm': 22.00704574584961, 'learning_rate': 6.870597033150012e-07, 'epoch': 0.21453756441551397, 'num_input_tokens_seen': 3336568832, 'completed': '43.15% (1_591 / 3_687)', 'remaining time': '18:31:43', 'throughput': '8779.01', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:27:59,957 >> {'loss': 0.4674, 'grad_norm': 18.104747772216797, 'learning_rate': 6.866751400116374e-07, 'epoch': 0.21480878763222133, 'num_input_tokens_seen': 3338665984, 'completed': '43.18% (1_592 / 3_687)', 'remaining time': '18:31:18', 'throughput': '7600.47', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:28:29,972 >> {'loss': 0.7152, 'grad_norm': 24.767946243286133, 'learning_rate': 6.862904667173216e-07, 'epoch': 0.21508001084892867, 'num_input_tokens_seen': 3340763136, 'completed': '43.21% (1_593 / 3_687)', 'remaining time': '18:30:41', 'throughput': '8733.58', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:29:01,482 >> {'loss': 0.8405, 'grad_norm': 23.480615615844727, 'learning_rate': 6.859056837416245e-07, 'epoch': 0.21535123406563603, 'num_input_tokens_seen': 3342860288, 'completed': '43.23% (1_594 / 3_687)', 'remaining time': '18:30:09', 'throughput': '8319.48', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:29:30,663 >> {'loss': 0.3923, 'grad_norm': 13.278249740600586, 'learning_rate': 6.855207913942048e-07, 'epoch': 0.21562245728234336, 'num_input_tokens_seen': 3344957440, 'completed': '43.26% (1_595 / 3_687)', 'remaining time': '18:29:30', 'throughput': '8983.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:30:01,592 >> {'loss': 0.3845, 'grad_norm': 14.385771751403809, 'learning_rate': 6.85135789984809e-07, 'epoch': 0.21589368049905072, 'num_input_tokens_seen': 3347054592, 'completed': '43.29% (1_596 / 3_687)', 'remaining time': '18:28:56', 'throughput': '8475.79', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:30:30,753 >> {'loss': 0.5702, 'grad_norm': 15.939520835876465, 'learning_rate': 6.847506798232719e-07, 'epoch': 0.21616490371575806, 'num_input_tokens_seen': 3349151744, 'completed': '43.31% (1_597 / 3_687)', 'remaining time': '18:28:17', 'throughput': '8989.46', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:31:02,708 >> {'loss': 0.4505, 'grad_norm': 15.751465797424316, 'learning_rate': 6.843654612195152e-07, 'epoch': 0.21643612693246542, 'num_input_tokens_seen': 3351248896, 'completed': '43.34% (1_598 / 3_687)', 'remaining time': '18:27:45', 'throughput': '8203.63', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:31:32,590 >> {'loss': 0.5949, 'grad_norm': 16.361385345458984, 'learning_rate': 6.839801344835484e-07, 'epoch': 0.21670735014917278, 'num_input_tokens_seen': 3353346048, 'completed': '43.37% (1_599 / 3_687)', 'remaining time': '18:27:09', 'throughput': '8772.67', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:32:04,576 >> {'loss': 0.6065, 'grad_norm': 15.845523834228516, 'learning_rate': 6.835946999254677e-07, 'epoch': 0.2169785733658801, 'num_input_tokens_seen': 3355443200, 'completed': '43.40% (1_600 / 3_687)', 'remaining time': '18:26:37', 'throughput': '8195.59', 'gpu_mem_free': '5581MB'} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( [INFO|trainer.py:3503] 2024-12-25 22:32:28,633 >> Saving model checkpoint to /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-1600 [INFO|configuration_utils.py:472] 2024-12-25 22:32:28,636 >> Configuration saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-1600/config.json [INFO|configuration_utils.py:807] 2024-12-25 22:32:28,637 >> Configuration saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-1600/generation_config.json [INFO|modeling_utils.py:2807] 2024-12-25 22:33:30,637 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 7 checkpoint shards. You can find where each parameters has been saved in the index located at /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-1600/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2684] 2024-12-25 22:33:30,641 >> tokenizer config file saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-1600/tokenizer_config.json [INFO|tokenization_utils_base.py:2693] 2024-12-25 22:33:30,642 >> Special tokens file saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-1600/special_tokens_map.json /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( [WARNING|trainer.py:868] 2024-12-25 22:37:29,295 >> Save streaming dataset state: {'epoch': 0, 'sample_in_epoch': 3200, 'num_canonical_nodes': 1, 'shuffle_seed': 42, 'initial_physical_nodes': 1} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] [INFO|trainer.py:175] 2024-12-25 22:37:59,455 >> {'loss': 0.4201, 'grad_norm': 13.730551719665527, 'learning_rate': 6.83209157855456e-07, 'epoch': 0.21724979658258747, 'num_input_tokens_seen': 3357540352, 'completed': '43.42% (1_601 / 3_687)', 'remaining time': '18:40:07', 'throughput': '738.69', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:38:31,391 >> {'loss': 0.4174, 'grad_norm': 17.727937698364258, 'learning_rate': 6.828235085837831e-07, 'epoch': 0.2175210197992948, 'num_input_tokens_seen': 3359637504, 'completed': '43.45% (1_602 / 3_687)', 'remaining time': '18:39:34', 'throughput': '8208.25', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:39:01,721 >> {'loss': 0.5056, 'grad_norm': 16.495952606201172, 'learning_rate': 6.824377524208047e-07, 'epoch': 0.21779224301600217, 'num_input_tokens_seen': 3361734656, 'completed': '43.48% (1_603 / 3_687)', 'remaining time': '18:38:57', 'throughput': '8643.24', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:39:32,963 >> {'loss': 0.4776, 'grad_norm': 16.65219497680664, 'learning_rate': 6.820518896769629e-07, 'epoch': 0.21806346623270953, 'num_input_tokens_seen': 3363831808, 'completed': '43.50% (1_604 / 3_687)', 'remaining time': '18:38:22', 'throughput': '8390.70', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:40:05,132 >> {'loss': 0.2099, 'grad_norm': 9.43620777130127, 'learning_rate': 6.816659206627853e-07, 'epoch': 0.21833468944941686, 'num_input_tokens_seen': 3365928960, 'completed': '43.53% (1_605 / 3_687)', 'remaining time': '18:37:50', 'throughput': '8148.87', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:40:37,364 >> {'loss': 0.4603, 'grad_norm': 16.026655197143555, 'learning_rate': 6.812798456888849e-07, 'epoch': 0.21860591266612422, 'num_input_tokens_seen': 3368026112, 'completed': '43.56% (1_606 / 3_687)', 'remaining time': '18:37:17', 'throughput': '8133.13', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:41:06,039 >> {'loss': 0.658, 'grad_norm': 18.902629852294922, 'learning_rate': 6.808936650659605e-07, 'epoch': 0.21887713588283156, 'num_input_tokens_seen': 3370123264, 'completed': '43.59% (1_607 / 3_687)', 'remaining time': '18:36:36', 'throughput': '9141.96', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:41:35,181 >> {'loss': 0.7047, 'grad_norm': 18.937278747558594, 'learning_rate': 6.805073791047951e-07, 'epoch': 0.21914835909953892, 'num_input_tokens_seen': 3372220416, 'completed': '43.61% (1_608 / 3_687)', 'remaining time': '18:35:56', 'throughput': '8995.28', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:42:05,128 >> {'loss': 0.303, 'grad_norm': 19.952560424804688, 'learning_rate': 6.80120988116257e-07, 'epoch': 0.21941958231624628, 'num_input_tokens_seen': 3374317568, 'completed': '43.64% (1_609 / 3_687)', 'remaining time': '18:35:18', 'throughput': '8753.63', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:42:37,905 >> {'loss': 0.3612, 'grad_norm': 16.27387046813965, 'learning_rate': 6.797344924112995e-07, 'epoch': 0.21969080553295361, 'num_input_tokens_seen': 3376414720, 'completed': '43.67% (1_610 / 3_687)', 'remaining time': '18:34:47', 'throughput': '7997.89', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:43:06,692 >> {'loss': 0.5804, 'grad_norm': 18.694629669189453, 'learning_rate': 6.793478923009592e-07, 'epoch': 0.21996202874966098, 'num_input_tokens_seen': 3378511872, 'completed': '43.69% (1_611 / 3_687)', 'remaining time': '18:34:06', 'throughput': '9106.20', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:43:37,436 >> {'loss': 0.2376, 'grad_norm': 11.021830558776855, 'learning_rate': 6.789611880963569e-07, 'epoch': 0.2202332519663683, 'num_input_tokens_seen': 3380609024, 'completed': '43.72% (1_612 / 3_687)', 'remaining time': '18:33:30', 'throughput': '8526.67', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:44:06,944 >> {'loss': 0.4313, 'grad_norm': 15.804499626159668, 'learning_rate': 6.78574380108698e-07, 'epoch': 0.22050447518307567, 'num_input_tokens_seen': 3382706176, 'completed': '43.75% (1_613 / 3_687)', 'remaining time': '18:32:51', 'throughput': '8883.99', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:44:35,342 >> {'loss': 0.4538, 'grad_norm': 15.294432640075684, 'learning_rate': 6.781874686492706e-07, 'epoch': 0.22077569839978303, 'num_input_tokens_seen': 3384803328, 'completed': '43.78% (1_614 / 3_687)', 'remaining time': '18:32:09', 'throughput': '9231.01', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:45:05,925 >> {'loss': 0.3465, 'grad_norm': 14.991973876953125, 'learning_rate': 6.778004540294464e-07, 'epoch': 0.22104692161649037, 'num_input_tokens_seen': 3386900480, 'completed': '43.80% (1_615 / 3_687)', 'remaining time': '18:31:33', 'throughput': '8571.49', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:45:32,432 >> {'loss': 0.7659, 'grad_norm': 22.537893295288086, 'learning_rate': 6.774133365606801e-07, 'epoch': 0.22131814483319773, 'num_input_tokens_seen': 3388997632, 'completed': '43.83% (1_616 / 3_687)', 'remaining time': '18:30:47', 'throughput': '9889.73', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:46:00,749 >> {'loss': 0.797, 'grad_norm': 21.337339401245117, 'learning_rate': 6.770261165545085e-07, 'epoch': 0.22158936804990506, 'num_input_tokens_seen': 3391094784, 'completed': '43.86% (1_617 / 3_687)', 'remaining time': '18:30:05', 'throughput': '9257.57', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:46:30,052 >> {'loss': 0.7643, 'grad_norm': 20.40045166015625, 'learning_rate': 6.766387943225524e-07, 'epoch': 0.22186059126661242, 'num_input_tokens_seen': 3393191936, 'completed': '43.88% (1_618 / 3_687)', 'remaining time': '18:29:25', 'throughput': '8945.75', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:46:57,876 >> {'loss': 0.409, 'grad_norm': 14.96789836883545, 'learning_rate': 6.762513701765135e-07, 'epoch': 0.22213181448331978, 'num_input_tokens_seen': 3395289088, 'completed': '43.91% (1_619 / 3_687)', 'remaining time': '18:28:42', 'throughput': '9421.67', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:47:30,523 >> {'loss': 0.4159, 'grad_norm': 14.952929496765137, 'learning_rate': 6.75863844428176e-07, 'epoch': 0.22240303770002712, 'num_input_tokens_seen': 3397386240, 'completed': '43.94% (1_620 / 3_687)', 'remaining time': '18:28:11', 'throughput': '8029.61', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:48:02,618 >> {'loss': 0.495, 'grad_norm': 16.563785552978516, 'learning_rate': 6.754762173894061e-07, 'epoch': 0.22267426091673448, 'num_input_tokens_seen': 3399483392, 'completed': '43.97% (1_621 / 3_687)', 'remaining time': '18:27:39', 'throughput': '8167.83', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:48:35,222 >> {'loss': 0.2923, 'grad_norm': 13.857934951782227, 'learning_rate': 6.750884893721511e-07, 'epoch': 0.2229454841334418, 'num_input_tokens_seen': 3401580544, 'completed': '43.99% (1_622 / 3_687)', 'remaining time': '18:27:08', 'throughput': '8040.21', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:49:07,590 >> {'loss': 0.4906, 'grad_norm': 16.87193489074707, 'learning_rate': 6.747006606884398e-07, 'epoch': 0.22321670735014917, 'num_input_tokens_seen': 3403677696, 'completed': '44.02% (1_623 / 3_687)', 'remaining time': '18:26:36', 'throughput': '8098.72', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:49:39,218 >> {'loss': 0.5691, 'grad_norm': 28.209985733032227, 'learning_rate': 6.74312731650382e-07, 'epoch': 0.22348793056685654, 'num_input_tokens_seen': 3405774848, 'completed': '44.05% (1_624 / 3_687)', 'remaining time': '18:26:02', 'throughput': '8288.34', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:50:11,747 >> {'loss': 0.4746, 'grad_norm': 18.54442596435547, 'learning_rate': 6.739247025701683e-07, 'epoch': 0.22375915378356387, 'num_input_tokens_seen': 3407872000, 'completed': '44.07% (1_625 / 3_687)', 'remaining time': '18:25:31', 'throughput': '8058.97', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:50:41,873 >> {'loss': 0.8189, 'grad_norm': 21.610734939575195, 'learning_rate': 6.735365737600695e-07, 'epoch': 0.22403037700027123, 'num_input_tokens_seen': 3409969152, 'completed': '44.10% (1_626 / 3_687)', 'remaining time': '18:24:54', 'throughput': '8702.57', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:51:13,741 >> {'loss': 0.3669, 'grad_norm': 13.132299423217773, 'learning_rate': 6.731483455324374e-07, 'epoch': 0.22430160021697856, 'num_input_tokens_seen': 3412066304, 'completed': '44.13% (1_627 / 3_687)', 'remaining time': '18:24:21', 'throughput': '8224.83', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:51:43,485 >> {'loss': 0.2336, 'grad_norm': 10.474742889404297, 'learning_rate': 6.727600181997026e-07, 'epoch': 0.22457282343368593, 'num_input_tokens_seen': 3414163456, 'completed': '44.16% (1_628 / 3_687)', 'remaining time': '18:23:43', 'throughput': '8813.38', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:52:11,498 >> {'loss': 0.5633, 'grad_norm': 20.9769287109375, 'learning_rate': 6.723715920743767e-07, 'epoch': 0.2248440466503933, 'num_input_tokens_seen': 3416260608, 'completed': '44.18% (1_629 / 3_687)', 'remaining time': '18:23:00', 'throughput': '9358.26', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:52:42,692 >> {'loss': 0.3815, 'grad_norm': 14.54977798461914, 'learning_rate': 6.7198306746905e-07, 'epoch': 0.22511526986710062, 'num_input_tokens_seen': 3418357760, 'completed': '44.21% (1_630 / 3_687)', 'remaining time': '18:22:26', 'throughput': '8403.44', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:53:11,593 >> {'loss': 0.5412, 'grad_norm': 19.554121017456055, 'learning_rate': 6.715944446963924e-07, 'epoch': 0.22538649308380798, 'num_input_tokens_seen': 3420454912, 'completed': '44.24% (1_631 / 3_687)', 'remaining time': '18:21:46', 'throughput': '9071.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:53:40,698 >> {'loss': 0.9327, 'grad_norm': 23.622020721435547, 'learning_rate': 6.712057240691527e-07, 'epoch': 0.22565771630051532, 'num_input_tokens_seen': 3422552064, 'completed': '44.26% (1_632 / 3_687)', 'remaining time': '18:21:06', 'throughput': '9005.52', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:54:10,728 >> {'loss': 0.3727, 'grad_norm': 13.846428871154785, 'learning_rate': 6.708169059001586e-07, 'epoch': 0.22592893951722268, 'num_input_tokens_seen': 3424649216, 'completed': '44.29% (1_633 / 3_687)', 'remaining time': '18:20:29', 'throughput': '8729.55', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:54:41,433 >> {'loss': 0.5854, 'grad_norm': 20.11954689025879, 'learning_rate': 6.704279905023159e-07, 'epoch': 0.22620016273393004, 'num_input_tokens_seen': 3426746368, 'completed': '44.32% (1_634 / 3_687)', 'remaining time': '18:19:53', 'throughput': '8537.28', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:55:12,407 >> {'loss': 0.6741, 'grad_norm': 28.113088607788086, 'learning_rate': 6.700389781886091e-07, 'epoch': 0.22647138595063737, 'num_input_tokens_seen': 3428843520, 'completed': '44.34% (1_635 / 3_687)', 'remaining time': '18:19:18', 'throughput': '8463.38', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:55:41,699 >> {'loss': 0.3548, 'grad_norm': 12.346856117248535, 'learning_rate': 6.696498692721006e-07, 'epoch': 0.22674260916734473, 'num_input_tokens_seen': 3430940672, 'completed': '44.37% (1_636 / 3_687)', 'remaining time': '18:18:39', 'throughput': '8949.57', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:56:13,973 >> {'loss': 0.5486, 'grad_norm': 16.389089584350586, 'learning_rate': 6.692606640659302e-07, 'epoch': 0.22701383238405207, 'num_input_tokens_seen': 3433037824, 'completed': '44.40% (1_637 / 3_687)', 'remaining time': '18:18:07', 'throughput': '8122.35', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:56:43,623 >> {'loss': 0.7321, 'grad_norm': 24.49221420288086, 'learning_rate': 6.688713628833157e-07, 'epoch': 0.22728505560075943, 'num_input_tokens_seen': 3435134976, 'completed': '44.43% (1_638 / 3_687)', 'remaining time': '18:17:29', 'throughput': '8841.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:57:14,200 >> {'loss': 0.4644, 'grad_norm': 16.724529266357422, 'learning_rate': 6.684819660375516e-07, 'epoch': 0.22755627881746676, 'num_input_tokens_seen': 3437232128, 'completed': '44.45% (1_639 / 3_687)', 'remaining time': '18:16:53', 'throughput': '8573.16', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:57:46,634 >> {'loss': 0.6855, 'grad_norm': 20.410425186157227, 'learning_rate': 6.6809247384201e-07, 'epoch': 0.22782750203417412, 'num_input_tokens_seen': 3439329280, 'completed': '44.48% (1_640 / 3_687)', 'remaining time': '18:16:21', 'throughput': '8082.50', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:58:19,112 >> {'loss': 0.4161, 'grad_norm': 14.939435005187988, 'learning_rate': 6.67702886610139e-07, 'epoch': 0.22809872525088148, 'num_input_tokens_seen': 3441426432, 'completed': '44.51% (1_641 / 3_687)', 'remaining time': '18:15:50', 'throughput': '8071.33', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:58:47,776 >> {'loss': 0.6421, 'grad_norm': 16.925172805786133, 'learning_rate': 6.673132046554639e-07, 'epoch': 0.22836994846758882, 'num_input_tokens_seen': 3443523584, 'completed': '44.53% (1_642 / 3_687)', 'remaining time': '18:15:09', 'throughput': '9145.49', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:59:17,837 >> {'loss': 0.2435, 'grad_norm': 11.812236785888672, 'learning_rate': 6.669234282915857e-07, 'epoch': 0.22864117168429618, 'num_input_tokens_seen': 3445620736, 'completed': '44.56% (1_643 / 3_687)', 'remaining time': '18:14:32', 'throughput': '8720.30', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 22:59:49,986 >> {'loss': 0.4875, 'grad_norm': 16.797849655151367, 'learning_rate': 6.665335578321819e-07, 'epoch': 0.2289123949010035, 'num_input_tokens_seen': 3447717888, 'completed': '44.59% (1_644 / 3_687)', 'remaining time': '18:14:00', 'throughput': '8154.17', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:00:19,280 >> {'loss': 0.6485, 'grad_norm': 20.539812088012695, 'learning_rate': 6.661435935910048e-07, 'epoch': 0.22918361811771087, 'num_input_tokens_seen': 3449815040, 'completed': '44.62% (1_645 / 3_687)', 'remaining time': '18:13:21', 'throughput': '8948.79', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:00:49,159 >> {'loss': 0.3879, 'grad_norm': 12.565629959106445, 'learning_rate': 6.657535358818833e-07, 'epoch': 0.22945484133441824, 'num_input_tokens_seen': 3451912192, 'completed': '44.64% (1_646 / 3_687)', 'remaining time': '18:12:44', 'throughput': '8773.38', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:01:19,434 >> {'loss': 0.6757, 'grad_norm': 21.272024154663086, 'learning_rate': 6.653633850187211e-07, 'epoch': 0.22972606455112557, 'num_input_tokens_seen': 3454009344, 'completed': '44.67% (1_647 / 3_687)', 'remaining time': '18:12:07', 'throughput': '8658.82', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:01:50,137 >> {'loss': 0.4601, 'grad_norm': 17.750259399414062, 'learning_rate': 6.649731413154964e-07, 'epoch': 0.22999728776783293, 'num_input_tokens_seen': 3456106496, 'completed': '44.70% (1_648 / 3_687)', 'remaining time': '18:11:32', 'throughput': '8537.94', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:02:18,932 >> {'loss': 0.5663, 'grad_norm': 17.899972915649414, 'learning_rate': 6.645828050862626e-07, 'epoch': 0.23026851098454026, 'num_input_tokens_seen': 3458203648, 'completed': '44.72% (1_649 / 3_687)', 'remaining time': '18:10:51', 'throughput': '9104.04', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:02:52,955 >> {'loss': 0.1971, 'grad_norm': 10.090818405151367, 'learning_rate': 6.641923766451475e-07, 'epoch': 0.23053973420124763, 'num_input_tokens_seen': 3460300800, 'completed': '44.75% (1_650 / 3_687)', 'remaining time': '18:10:24', 'throughput': '7704.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:03:24,660 >> {'loss': 0.4693, 'grad_norm': 14.428431510925293, 'learning_rate': 6.63801856306353e-07, 'epoch': 0.230810957417955, 'num_input_tokens_seen': 3462397952, 'completed': '44.78% (1_651 / 3_687)', 'remaining time': '18:09:51', 'throughput': '8268.31', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:03:53,814 >> {'loss': 0.5592, 'grad_norm': 18.0393123626709, 'learning_rate': 6.634112443841551e-07, 'epoch': 0.23108218063466232, 'num_input_tokens_seen': 3464495104, 'completed': '44.81% (1_652 / 3_687)', 'remaining time': '18:09:12', 'throughput': '8991.68', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:04:23,777 >> {'loss': 0.5662, 'grad_norm': 15.629161834716797, 'learning_rate': 6.630205411929032e-07, 'epoch': 0.23135340385136968, 'num_input_tokens_seen': 3466592256, 'completed': '44.83% (1_653 / 3_687)', 'remaining time': '18:08:34', 'throughput': '8749.04', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:04:55,329 >> {'loss': 0.3379, 'grad_norm': 15.491093635559082, 'learning_rate': 6.626297470470205e-07, 'epoch': 0.23162462706807702, 'num_input_tokens_seen': 3468689408, 'completed': '44.86% (1_654 / 3_687)', 'remaining time': '18:08:01', 'throughput': '8308.15', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:05:30,382 >> {'loss': 0.2769, 'grad_norm': 13.871119499206543, 'learning_rate': 6.622388622610034e-07, 'epoch': 0.23189585028478438, 'num_input_tokens_seen': 3470786560, 'completed': '44.89% (1_655 / 3_687)', 'remaining time': '18:07:36', 'throughput': '7478.58', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:06:01,520 >> {'loss': 0.3845, 'grad_norm': 12.842281341552734, 'learning_rate': 6.618478871494209e-07, 'epoch': 0.23216707350149174, 'num_input_tokens_seen': 3472883712, 'completed': '44.91% (1_656 / 3_687)', 'remaining time': '18:07:01', 'throughput': '8418.62', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:06:35,294 >> {'loss': 0.3783, 'grad_norm': 14.781960487365723, 'learning_rate': 6.61456822026915e-07, 'epoch': 0.23243829671819907, 'num_input_tokens_seen': 3474980864, 'completed': '44.94% (1_657 / 3_687)', 'remaining time': '18:06:33', 'throughput': '7761.78', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:07:05,529 >> {'loss': 0.5104, 'grad_norm': 18.07168960571289, 'learning_rate': 6.610656672081999e-07, 'epoch': 0.23270951993490643, 'num_input_tokens_seen': 3477078016, 'completed': '44.97% (1_658 / 3_687)', 'remaining time': '18:05:57', 'throughput': '8670.25', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:07:35,666 >> {'loss': 0.5309, 'grad_norm': 19.128738403320312, 'learning_rate': 6.606744230080622e-07, 'epoch': 0.23298074315161377, 'num_input_tokens_seen': 3479175168, 'completed': '45.00% (1_659 / 3_687)', 'remaining time': '18:05:20', 'throughput': '8698.54', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:08:03,181 >> {'loss': 0.7695, 'grad_norm': 21.630807876586914, 'learning_rate': 6.602830897413603e-07, 'epoch': 0.23325196636832113, 'num_input_tokens_seen': 3481272320, 'completed': '45.02% (1_660 / 3_687)', 'remaining time': '18:04:37', 'throughput': '9527.31', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:08:29,913 >> {'loss': 0.8973, 'grad_norm': 22.350727081298828, 'learning_rate': 6.598916677230243e-07, 'epoch': 0.2335231895850285, 'num_input_tokens_seen': 3483369472, 'completed': '45.05% (1_661 / 3_687)', 'remaining time': '18:03:52', 'throughput': '9806.17', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:09:00,301 >> {'loss': 0.4983, 'grad_norm': 16.569183349609375, 'learning_rate': 6.59500157268056e-07, 'epoch': 0.23379441280173582, 'num_input_tokens_seen': 3485466624, 'completed': '45.08% (1_662 / 3_687)', 'remaining time': '18:03:16', 'throughput': '8626.69', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:09:30,375 >> {'loss': 0.4791, 'grad_norm': 17.87363052368164, 'learning_rate': 6.591085586915279e-07, 'epoch': 0.23406563601844319, 'num_input_tokens_seen': 3487563776, 'completed': '45.10% (1_663 / 3_687)', 'remaining time': '18:02:39', 'throughput': '8716.67', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:10:01,386 >> {'loss': 0.7941, 'grad_norm': 22.503154754638672, 'learning_rate': 6.587168723085836e-07, 'epoch': 0.23433685923515052, 'num_input_tokens_seen': 3489660928, 'completed': '45.13% (1_664 / 3_687)', 'remaining time': '18:02:05', 'throughput': '8453.20', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:10:30,768 >> {'loss': 0.4266, 'grad_norm': 16.588794708251953, 'learning_rate': 6.583250984344374e-07, 'epoch': 0.23460808245185788, 'num_input_tokens_seen': 3491758080, 'completed': '45.16% (1_665 / 3_687)', 'remaining time': '18:01:26', 'throughput': '8921.94', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:11:00,000 >> {'loss': 0.3887, 'grad_norm': 20.612022399902344, 'learning_rate': 6.57933237384374e-07, 'epoch': 0.23487930566856524, 'num_input_tokens_seen': 3493855232, 'completed': '45.19% (1_666 / 3_687)', 'remaining time': '18:00:47', 'throughput': '8967.66', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:11:31,626 >> {'loss': 0.655, 'grad_norm': 19.953716278076172, 'learning_rate': 6.575412894737484e-07, 'epoch': 0.23515052888527258, 'num_input_tokens_seen': 3495952384, 'completed': '45.21% (1_667 / 3_687)', 'remaining time': '18:00:14', 'throughput': '8288.91', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:11:58,337 >> {'loss': 0.8792, 'grad_norm': 21.730518341064453, 'learning_rate': 6.571492550179853e-07, 'epoch': 0.23542175210197994, 'num_input_tokens_seen': 3498049536, 'completed': '45.24% (1_668 / 3_687)', 'remaining time': '17:59:30', 'throughput': '9813.88', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:12:27,522 >> {'loss': 0.6458, 'grad_norm': 20.50918197631836, 'learning_rate': 6.567571343325791e-07, 'epoch': 0.23569297531868727, 'num_input_tokens_seen': 3500146688, 'completed': '45.27% (1_669 / 3_687)', 'remaining time': '17:58:51', 'throughput': '8982.31', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:12:54,946 >> {'loss': 0.5007, 'grad_norm': 18.293838500976562, 'learning_rate': 6.563649277330935e-07, 'epoch': 0.23596419853539463, 'num_input_tokens_seen': 3502243840, 'completed': '45.29% (1_670 / 3_687)', 'remaining time': '17:58:08', 'throughput': '9558.70', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:13:26,913 >> {'loss': 0.4134, 'grad_norm': 13.126777648925781, 'learning_rate': 6.559726355351617e-07, 'epoch': 0.236235421752102, 'num_input_tokens_seen': 3504340992, 'completed': '45.32% (1_671 / 3_687)', 'remaining time': '17:57:36', 'throughput': '8200.44', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:13:55,937 >> {'loss': 0.345, 'grad_norm': 12.992348670959473, 'learning_rate': 6.555802580544853e-07, 'epoch': 0.23650664496880933, 'num_input_tokens_seen': 3506438144, 'completed': '45.35% (1_672 / 3_687)', 'remaining time': '17:56:57', 'throughput': '9032.16', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:14:28,735 >> {'loss': 0.413, 'grad_norm': 15.970709800720215, 'learning_rate': 6.551877956068349e-07, 'epoch': 0.2367778681855167, 'num_input_tokens_seen': 3508535296, 'completed': '45.38% (1_673 / 3_687)', 'remaining time': '17:56:26', 'throughput': '7992.67', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:14:57,936 >> {'loss': 0.5876, 'grad_norm': 19.215749740600586, 'learning_rate': 6.547952485080491e-07, 'epoch': 0.23704909140222402, 'num_input_tokens_seen': 3510632448, 'completed': '45.40% (1_674 / 3_687)', 'remaining time': '17:55:47', 'throughput': '8977.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:15:32,950 >> {'loss': 0.2881, 'grad_norm': 10.450196266174316, 'learning_rate': 6.544026170740352e-07, 'epoch': 0.23732031461893138, 'num_input_tokens_seen': 3512729600, 'completed': '45.43% (1_675 / 3_687)', 'remaining time': '17:55:22', 'throughput': '7486.79', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:16:05,376 >> {'loss': 0.3993, 'grad_norm': 13.23245906829834, 'learning_rate': 6.540099016207679e-07, 'epoch': 0.23759153783563874, 'num_input_tokens_seen': 3514826752, 'completed': '45.46% (1_676 / 3_687)', 'remaining time': '17:54:51', 'throughput': '8084.37', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:16:35,987 >> {'loss': 0.41, 'grad_norm': 13.348737716674805, 'learning_rate': 6.536171024642896e-07, 'epoch': 0.23786276105234608, 'num_input_tokens_seen': 3516923904, 'completed': '45.48% (1_677 / 3_687)', 'remaining time': '17:54:16', 'throughput': '8563.68', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:17:06,924 >> {'loss': 0.3554, 'grad_norm': 15.210914611816406, 'learning_rate': 6.532242199207103e-07, 'epoch': 0.23813398426905344, 'num_input_tokens_seen': 3519021056, 'completed': '45.51% (1_678 / 3_687)', 'remaining time': '17:53:41', 'throughput': '8473.51', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:17:35,964 >> {'loss': 0.2304, 'grad_norm': 14.020462989807129, 'learning_rate': 6.528312543062066e-07, 'epoch': 0.23840520748576077, 'num_input_tokens_seen': 3521118208, 'completed': '45.54% (1_679 / 3_687)', 'remaining time': '17:53:02', 'throughput': '9027.13', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:18:04,327 >> {'loss': 0.5624, 'grad_norm': 18.36133575439453, 'learning_rate': 6.524382059370226e-07, 'epoch': 0.23867643070246813, 'num_input_tokens_seen': 3523215360, 'completed': '45.57% (1_680 / 3_687)', 'remaining time': '17:52:21', 'throughput': '9242.32', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:18:33,330 >> {'loss': 0.5282, 'grad_norm': 16.818580627441406, 'learning_rate': 6.520450751294685e-07, 'epoch': 0.23894765391917547, 'num_input_tokens_seen': 3525312512, 'completed': '45.59% (1_681 / 3_687)', 'remaining time': '17:51:42', 'throughput': '9038.65', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:19:02,093 >> {'loss': 0.6669, 'grad_norm': 25.353300094604492, 'learning_rate': 6.516518621999209e-07, 'epoch': 0.23921887713588283, 'num_input_tokens_seen': 3527409664, 'completed': '45.62% (1_682 / 3_687)', 'remaining time': '17:51:03', 'throughput': '9113.95', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:19:33,164 >> {'loss': 0.6045, 'grad_norm': 18.05834197998047, 'learning_rate': 6.512585674648227e-07, 'epoch': 0.2394901003525902, 'num_input_tokens_seen': 3529506816, 'completed': '45.65% (1_683 / 3_687)', 'remaining time': '17:50:29', 'throughput': '8437.39', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:20:03,968 >> {'loss': 0.3037, 'grad_norm': 11.763015747070312, 'learning_rate': 6.50865191240683e-07, 'epoch': 0.23976132356929752, 'num_input_tokens_seen': 3531603968, 'completed': '45.67% (1_684 / 3_687)', 'remaining time': '17:49:54', 'throughput': '8509.91', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:20:33,641 >> {'loss': 0.5238, 'grad_norm': 19.146381378173828, 'learning_rate': 6.504717338440751e-07, 'epoch': 0.24003254678600489, 'num_input_tokens_seen': 3533701120, 'completed': '45.70% (1_685 / 3_687)', 'remaining time': '17:49:16', 'throughput': '8834.04', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:21:03,869 >> {'loss': 0.3522, 'grad_norm': 13.004151344299316, 'learning_rate': 6.500781955916393e-07, 'epoch': 0.24030377000271222, 'num_input_tokens_seen': 3535798272, 'completed': '45.73% (1_686 / 3_687)', 'remaining time': '17:48:40', 'throughput': '8672.12', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:21:32,974 >> {'loss': 0.5728, 'grad_norm': 17.20205307006836, 'learning_rate': 6.4968457680008e-07, 'epoch': 0.24057499321941958, 'num_input_tokens_seen': 3537895424, 'completed': '45.76% (1_687 / 3_687)', 'remaining time': '17:48:01', 'throughput': '9007.01', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:22:05,330 >> {'loss': 0.3135, 'grad_norm': 12.467523574829102, 'learning_rate': 6.492908777861664e-07, 'epoch': 0.24084621643612694, 'num_input_tokens_seen': 3539992576, 'completed': '45.78% (1_688 / 3_687)', 'remaining time': '17:47:30', 'throughput': '8101.82', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:22:34,829 >> {'loss': 0.6591, 'grad_norm': 21.274076461791992, 'learning_rate': 6.488970988667327e-07, 'epoch': 0.24111743965283428, 'num_input_tokens_seen': 3542089728, 'completed': '45.81% (1_689 / 3_687)', 'remaining time': '17:46:52', 'throughput': '8886.53', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:23:01,492 >> {'loss': 0.927, 'grad_norm': 24.0526180267334, 'learning_rate': 6.485032403586772e-07, 'epoch': 0.24138866286954164, 'num_input_tokens_seen': 3544186880, 'completed': '45.84% (1_690 / 3_687)', 'remaining time': '17:46:08', 'throughput': '9831.86', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:23:30,515 >> {'loss': 0.3657, 'grad_norm': 16.187849044799805, 'learning_rate': 6.481093025789621e-07, 'epoch': 0.24165988608624897, 'num_input_tokens_seen': 3546284032, 'completed': '45.86% (1_691 / 3_687)', 'remaining time': '17:45:29', 'throughput': '9032.05', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:24:00,466 >> {'loss': 0.5053, 'grad_norm': 25.65354347229004, 'learning_rate': 6.477152858446136e-07, 'epoch': 0.24193110930295633, 'num_input_tokens_seen': 3548381184, 'completed': '45.89% (1_692 / 3_687)', 'remaining time': '17:44:53', 'throughput': '8752.54', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:24:30,496 >> {'loss': 0.5336, 'grad_norm': 16.25813865661621, 'learning_rate': 6.473211904727216e-07, 'epoch': 0.2422023325196637, 'num_input_tokens_seen': 3550478336, 'completed': '45.92% (1_693 / 3_687)', 'remaining time': '17:44:16', 'throughput': '8729.51', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:25:00,872 >> {'loss': 0.5435, 'grad_norm': 19.955923080444336, 'learning_rate': 6.469270167804386e-07, 'epoch': 0.24247355573637103, 'num_input_tokens_seen': 3552575488, 'completed': '45.95% (1_694 / 3_687)', 'remaining time': '17:43:41', 'throughput': '8629.71', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:25:33,829 >> {'loss': 0.3573, 'grad_norm': 12.152280807495117, 'learning_rate': 6.46532765084981e-07, 'epoch': 0.2427447789530784, 'num_input_tokens_seen': 3554672640, 'completed': '45.97% (1_695 / 3_687)', 'remaining time': '17:43:11', 'throughput': '7954.14', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:26:06,242 >> {'loss': 0.2561, 'grad_norm': 13.028290748596191, 'learning_rate': 6.461384357036274e-07, 'epoch': 0.24301600216978572, 'num_input_tokens_seen': 3556769792, 'completed': '46.00% (1_696 / 3_687)', 'remaining time': '17:42:40', 'throughput': '8087.70', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:26:37,666 >> {'loss': 0.4922, 'grad_norm': 20.247438430786133, 'learning_rate': 6.457440289537191e-07, 'epoch': 0.24328722538649308, 'num_input_tokens_seen': 3558866944, 'completed': '46.03% (1_697 / 3_687)', 'remaining time': '17:42:06', 'throughput': '8342.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:27:07,987 >> {'loss': 0.301, 'grad_norm': 12.755880355834961, 'learning_rate': 6.453495451526595e-07, 'epoch': 0.24355844860320044, 'num_input_tokens_seen': 3560964096, 'completed': '46.05% (1_698 / 3_687)', 'remaining time': '17:41:30', 'throughput': '8645.43', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:27:40,148 >> {'loss': 0.3143, 'grad_norm': 13.987512588500977, 'learning_rate': 6.449549846179144e-07, 'epoch': 0.24382967181990778, 'num_input_tokens_seen': 3563061248, 'completed': '46.08% (1_699 / 3_687)', 'remaining time': '17:40:59', 'throughput': '8151.07', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:28:12,977 >> {'loss': 0.5411, 'grad_norm': 26.919103622436523, 'learning_rate': 6.445603476670109e-07, 'epoch': 0.24410089503661514, 'num_input_tokens_seen': 3565158400, 'completed': '46.11% (1_700 / 3_687)', 'remaining time': '17:40:28', 'throughput': '7985.14', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:28:40,416 >> {'loss': 0.7823, 'grad_norm': 21.93929672241211, 'learning_rate': 6.44165634617538e-07, 'epoch': 0.24437211825332247, 'num_input_tokens_seen': 3567255552, 'completed': '46.14% (1_701 / 3_687)', 'remaining time': '17:39:46', 'throughput': '9553.60', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:29:10,596 >> {'loss': 0.4474, 'grad_norm': 16.736602783203125, 'learning_rate': 6.437708457871455e-07, 'epoch': 0.24464334147002983, 'num_input_tokens_seen': 3569352704, 'completed': '46.16% (1_702 / 3_687)', 'remaining time': '17:39:10', 'throughput': '8686.12', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:29:45,133 >> {'loss': 0.5609, 'grad_norm': 17.7639217376709, 'learning_rate': 6.433759814935447e-07, 'epoch': 0.2449145646867372, 'num_input_tokens_seen': 3571449856, 'completed': '46.19% (1_703 / 3_687)', 'remaining time': '17:38:44', 'throughput': '7590.23', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:30:18,842 >> {'loss': 0.3755, 'grad_norm': 12.938990592956543, 'learning_rate': 6.429810420545072e-07, 'epoch': 0.24518578790344453, 'num_input_tokens_seen': 3573547008, 'completed': '46.22% (1_704 / 3_687)', 'remaining time': '17:38:15', 'throughput': '7776.77', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:30:49,195 >> {'loss': 0.6132, 'grad_norm': 19.107515335083008, 'learning_rate': 6.425860277878651e-07, 'epoch': 0.2454570111201519, 'num_input_tokens_seen': 3575644160, 'completed': '46.24% (1_705 / 3_687)', 'remaining time': '17:37:40', 'throughput': '8636.36', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:31:22,096 >> {'loss': 0.4449, 'grad_norm': 16.069114685058594, 'learning_rate': 6.42190939011511e-07, 'epoch': 0.24572823433685922, 'num_input_tokens_seen': 3577741312, 'completed': '46.27% (1_706 / 3_687)', 'remaining time': '17:37:10', 'throughput': '7967.70', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:31:55,142 >> {'loss': 0.4561, 'grad_norm': 18.453479766845703, 'learning_rate': 6.417957760433974e-07, 'epoch': 0.2459994575535666, 'num_input_tokens_seen': 3579838464, 'completed': '46.30% (1_707 / 3_687)', 'remaining time': '17:36:40', 'throughput': '7932.71', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:32:27,163 >> {'loss': 0.4495, 'grad_norm': 22.410266876220703, 'learning_rate': 6.414005392015364e-07, 'epoch': 0.24627068077027395, 'num_input_tokens_seen': 3581935616, 'completed': '46.32% (1_708 / 3_687)', 'remaining time': '17:36:08', 'throughput': '8186.51', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:33:00,223 >> {'loss': 0.647, 'grad_norm': 18.5291690826416, 'learning_rate': 6.410052288039994e-07, 'epoch': 0.24654190398698128, 'num_input_tokens_seen': 3584032768, 'completed': '46.35% (1_709 / 3_687)', 'remaining time': '17:35:38', 'throughput': '7929.38', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:33:30,498 >> {'loss': 0.3701, 'grad_norm': 15.33918571472168, 'learning_rate': 6.406098451689171e-07, 'epoch': 0.24681312720368864, 'num_input_tokens_seen': 3586129920, 'completed': '46.38% (1_710 / 3_687)', 'remaining time': '17:35:02', 'throughput': '8658.81', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:34:00,436 >> {'loss': 0.4231, 'grad_norm': 16.662532806396484, 'learning_rate': 6.402143886144797e-07, 'epoch': 0.24708435042039598, 'num_input_tokens_seen': 3588227072, 'completed': '46.41% (1_711 / 3_687)', 'remaining time': '17:34:26', 'throughput': '8756.41', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:34:30,781 >> {'loss': 0.431, 'grad_norm': 16.196252822875977, 'learning_rate': 6.39818859458935e-07, 'epoch': 0.24735557363710334, 'num_input_tokens_seen': 3590324224, 'completed': '46.43% (1_712 / 3_687)', 'remaining time': '17:33:50', 'throughput': '8638.79', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:34:58,869 >> {'loss': 0.9444, 'grad_norm': 24.346651077270508, 'learning_rate': 6.394232580205903e-07, 'epoch': 0.2476267968538107, 'num_input_tokens_seen': 3592421376, 'completed': '46.46% (1_713 / 3_687)', 'remaining time': '17:33:10', 'throughput': '9332.88', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:35:28,688 >> {'loss': 0.6498, 'grad_norm': 18.19024658203125, 'learning_rate': 6.390275846178102e-07, 'epoch': 0.24789802007051803, 'num_input_tokens_seen': 3594518528, 'completed': '46.49% (1_714 / 3_687)', 'remaining time': '17:32:33', 'throughput': '8791.09', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:35:58,894 >> {'loss': 0.279, 'grad_norm': 11.127647399902344, 'learning_rate': 6.386318395690178e-07, 'epoch': 0.2481692432872254, 'num_input_tokens_seen': 3596615680, 'completed': '46.51% (1_715 / 3_687)', 'remaining time': '17:31:57', 'throughput': '8678.65', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:36:29,936 >> {'loss': 0.5013, 'grad_norm': 21.56477165222168, 'learning_rate': 6.382360231926935e-07, 'epoch': 0.24844046650393273, 'num_input_tokens_seen': 3598712832, 'completed': '46.54% (1_716 / 3_687)', 'remaining time': '17:31:23', 'throughput': '8444.65', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:36:59,771 >> {'loss': 0.469, 'grad_norm': 15.533906936645508, 'learning_rate': 6.378401358073754e-07, 'epoch': 0.2487116897206401, 'num_input_tokens_seen': 3600809984, 'completed': '46.57% (1_717 / 3_687)', 'remaining time': '17:30:46', 'throughput': '8786.59', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:37:29,273 >> {'loss': 0.5628, 'grad_norm': 16.186729431152344, 'learning_rate': 6.374441777316587e-07, 'epoch': 0.24898291293734745, 'num_input_tokens_seen': 3602907136, 'completed': '46.60% (1_718 / 3_687)', 'remaining time': '17:30:09', 'throughput': '8885.68', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:37:58,559 >> {'loss': 0.4068, 'grad_norm': 14.087782859802246, 'learning_rate': 6.370481492841952e-07, 'epoch': 0.24925413615405478, 'num_input_tokens_seen': 3605004288, 'completed': '46.62% (1_719 / 3_687)', 'remaining time': '17:29:31', 'throughput': '8950.98', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:38:29,557 >> {'loss': 0.4257, 'grad_norm': 14.505183219909668, 'learning_rate': 6.366520507836934e-07, 'epoch': 0.24952535937076215, 'num_input_tokens_seen': 3607101440, 'completed': '46.65% (1_720 / 3_687)', 'remaining time': '17:28:57', 'throughput': '8457.01', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:38:59,510 >> {'loss': 0.3571, 'grad_norm': 14.800230979919434, 'learning_rate': 6.362558825489187e-07, 'epoch': 0.24979658258746948, 'num_input_tokens_seen': 3609198592, 'completed': '46.68% (1_721 / 3_687)', 'remaining time': '17:28:21', 'throughput': '8751.73', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:39:32,108 >> {'loss': 0.5663, 'grad_norm': 18.357351303100586, 'learning_rate': 6.35859644898692e-07, 'epoch': 0.2500678058041768, 'num_input_tokens_seen': 3611295744, 'completed': '46.70% (1_722 / 3_687)', 'remaining time': '17:27:50', 'throughput': '8041.62', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:40:00,610 >> {'loss': 0.6229, 'grad_norm': 18.787809371948242, 'learning_rate': 6.354633381518901e-07, 'epoch': 0.2503390290208842, 'num_input_tokens_seen': 3613392896, 'completed': '46.73% (1_723 / 3_687)', 'remaining time': '17:27:10', 'throughput': '9197.63', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:40:32,337 >> {'loss': 0.4966, 'grad_norm': 22.12680435180664, 'learning_rate': 6.350669626274461e-07, 'epoch': 0.25061025223759154, 'num_input_tokens_seen': 3615490048, 'completed': '46.76% (1_724 / 3_687)', 'remaining time': '17:26:38', 'throughput': '8262.44', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:41:03,315 >> {'loss': 0.4202, 'grad_norm': 14.827641487121582, 'learning_rate': 6.346705186443474e-07, 'epoch': 0.2508814754542989, 'num_input_tokens_seen': 3617587200, 'completed': '46.79% (1_725 / 3_687)', 'remaining time': '17:26:04', 'throughput': '8462.33', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:41:32,671 >> {'loss': 0.5388, 'grad_norm': 16.624746322631836, 'learning_rate': 6.342740065216371e-07, 'epoch': 0.25115269867100626, 'num_input_tokens_seen': 3619684352, 'completed': '46.81% (1_726 / 3_687)', 'remaining time': '17:25:26', 'throughput': '8929.72', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:42:01,681 >> {'loss': 0.3481, 'grad_norm': 13.560758590698242, 'learning_rate': 6.338774265784134e-07, 'epoch': 0.25142392188771356, 'num_input_tokens_seen': 3621781504, 'completed': '46.84% (1_727 / 3_687)', 'remaining time': '17:24:48', 'throughput': '9036.44', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:42:33,695 >> {'loss': 0.4449, 'grad_norm': 17.08588409423828, 'learning_rate': 6.334807791338286e-07, 'epoch': 0.2516951451044209, 'num_input_tokens_seen': 3623878656, 'completed': '46.87% (1_728 / 3_687)', 'remaining time': '17:24:16', 'throughput': '8188.32', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:43:04,242 >> {'loss': 0.6385, 'grad_norm': 25.524866104125977, 'learning_rate': 6.330840645070894e-07, 'epoch': 0.2519663683211283, 'num_input_tokens_seen': 3625975808, 'completed': '46.89% (1_729 / 3_687)', 'remaining time': '17:23:41', 'throughput': '8581.73', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:43:33,864 >> {'loss': 0.5051, 'grad_norm': 16.190298080444336, 'learning_rate': 6.326872830174566e-07, 'epoch': 0.25223759153783565, 'num_input_tokens_seen': 3628072960, 'completed': '46.92% (1_730 / 3_687)', 'remaining time': '17:23:04', 'throughput': '8849.49', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:44:04,127 >> {'loss': 0.4167, 'grad_norm': 17.590593338012695, 'learning_rate': 6.32290434984245e-07, 'epoch': 0.252508814754543, 'num_input_tokens_seen': 3630170112, 'completed': '46.95% (1_731 / 3_687)', 'remaining time': '17:22:28', 'throughput': '8662.23', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:44:36,499 >> {'loss': 0.5106, 'grad_norm': 23.287769317626953, 'learning_rate': 6.318935207268227e-07, 'epoch': 0.2527800379712503, 'num_input_tokens_seen': 3632267264, 'completed': '46.98% (1_732 / 3_687)', 'remaining time': '17:21:57', 'throughput': '8097.79', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:45:09,353 >> {'loss': 0.5227, 'grad_norm': 17.762920379638672, 'learning_rate': 6.314965405646113e-07, 'epoch': 0.2530512611879577, 'num_input_tokens_seen': 3634364416, 'completed': '47.00% (1_733 / 3_687)', 'remaining time': '17:21:27', 'throughput': '7979.23', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:45:39,052 >> {'loss': 0.7039, 'grad_norm': 23.90679931640625, 'learning_rate': 6.31099494817085e-07, 'epoch': 0.25332248440466504, 'num_input_tokens_seen': 3636461568, 'completed': '47.03% (1_734 / 3_687)', 'remaining time': '17:20:50', 'throughput': '8826.52', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:46:11,454 >> {'loss': 0.4308, 'grad_norm': 14.712848663330078, 'learning_rate': 6.307023838037714e-07, 'epoch': 0.2535937076213724, 'num_input_tokens_seen': 3638558720, 'completed': '47.06% (1_735 / 3_687)', 'remaining time': '17:20:19', 'throughput': '8090.30', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:46:44,369 >> {'loss': 0.6353, 'grad_norm': 20.47460174560547, 'learning_rate': 6.303052078442503e-07, 'epoch': 0.25386493083807976, 'num_input_tokens_seen': 3640655872, 'completed': '47.08% (1_736 / 3_687)', 'remaining time': '17:19:49', 'throughput': '7964.34', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:47:11,646 >> {'loss': 0.4013, 'grad_norm': 15.011195182800293, 'learning_rate': 6.299079672581537e-07, 'epoch': 0.25413615405478707, 'num_input_tokens_seen': 3642753024, 'completed': '47.11% (1_737 / 3_687)', 'remaining time': '17:19:07', 'throughput': '9610.47', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:47:41,924 >> {'loss': 0.5446, 'grad_norm': 17.400686264038086, 'learning_rate': 6.295106623651661e-07, 'epoch': 0.25440737727149443, 'num_input_tokens_seen': 3644850176, 'completed': '47.14% (1_738 / 3_687)', 'remaining time': '17:18:32', 'throughput': '8657.91', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:48:13,647 >> {'loss': 0.2983, 'grad_norm': 11.729777336120605, 'learning_rate': 6.291132934850225e-07, 'epoch': 0.2546786004882018, 'num_input_tokens_seen': 3646947328, 'completed': '47.17% (1_739 / 3_687)', 'remaining time': '17:18:00', 'throughput': '8263.59', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:48:45,326 >> {'loss': 0.8498, 'grad_norm': 21.60021209716797, 'learning_rate': 6.28715860937511e-07, 'epoch': 0.25494982370490915, 'num_input_tokens_seen': 3649044480, 'completed': '47.19% (1_740 / 3_687)', 'remaining time': '17:17:27', 'throughput': '8275.10', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:49:15,258 >> {'loss': 0.5259, 'grad_norm': 18.79129981994629, 'learning_rate': 6.283183650424701e-07, 'epoch': 0.2552210469216165, 'num_input_tokens_seen': 3651141632, 'completed': '47.22% (1_741 / 3_687)', 'remaining time': '17:16:51', 'throughput': '8757.73', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:49:45,104 >> {'loss': 0.6427, 'grad_norm': 19.276283264160156, 'learning_rate': 6.279208061197892e-07, 'epoch': 0.2554922701383238, 'num_input_tokens_seen': 3653238784, 'completed': '47.25% (1_742 / 3_687)', 'remaining time': '17:16:14', 'throughput': '8783.47', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:50:17,165 >> {'loss': 0.3971, 'grad_norm': 13.992378234863281, 'learning_rate': 6.275231844894086e-07, 'epoch': 0.2557634933550312, 'num_input_tokens_seen': 3655335936, 'completed': '47.27% (1_743 / 3_687)', 'remaining time': '17:15:43', 'throughput': '8176.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:50:47,591 >> {'loss': 0.4849, 'grad_norm': 16.478370666503906, 'learning_rate': 6.271255004713192e-07, 'epoch': 0.25603471657173854, 'num_input_tokens_seen': 3657433088, 'completed': '47.30% (1_744 / 3_687)', 'remaining time': '17:15:07', 'throughput': '8615.67', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:51:18,660 >> {'loss': 0.3075, 'grad_norm': 11.54108715057373, 'learning_rate': 6.267277543855618e-07, 'epoch': 0.2563059397884459, 'num_input_tokens_seen': 3659530240, 'completed': '47.33% (1_745 / 3_687)', 'remaining time': '17:14:34', 'throughput': '8437.54', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:51:47,469 >> {'loss': 0.5958, 'grad_norm': 19.243192672729492, 'learning_rate': 6.263299465522274e-07, 'epoch': 0.25657716300515326, 'num_input_tokens_seen': 3661627392, 'completed': '47.36% (1_746 / 3_687)', 'remaining time': '17:13:55', 'throughput': '9099.28', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:52:19,208 >> {'loss': 0.4542, 'grad_norm': 14.843926429748535, 'learning_rate': 6.259320772914566e-07, 'epoch': 0.25684838622186057, 'num_input_tokens_seen': 3663724544, 'completed': '47.38% (1_747 / 3_687)', 'remaining time': '17:13:23', 'throughput': '8259.50', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:52:49,908 >> {'loss': 0.8589, 'grad_norm': 22.970197677612305, 'learning_rate': 6.255341469234393e-07, 'epoch': 0.25711960943856793, 'num_input_tokens_seen': 3665821696, 'completed': '47.41% (1_748 / 3_687)', 'remaining time': '17:12:48', 'throughput': '8538.81', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:53:18,859 >> {'loss': 0.5667, 'grad_norm': 16.67009162902832, 'learning_rate': 6.25136155768415e-07, 'epoch': 0.2573908326552753, 'num_input_tokens_seen': 3667918848, 'completed': '47.44% (1_749 / 3_687)', 'remaining time': '17:12:10', 'throughput': '9054.59', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:53:53,336 >> {'loss': 0.4136, 'grad_norm': 13.778572082519531, 'learning_rate': 6.247381041466716e-07, 'epoch': 0.25766205587198265, 'num_input_tokens_seen': 3670016000, 'completed': '47.46% (1_750 / 3_687)', 'remaining time': '17:11:43', 'throughput': '7603.48', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:54:25,667 >> {'loss': 0.3029, 'grad_norm': 11.786734580993652, 'learning_rate': 6.243399923785459e-07, 'epoch': 0.25793327908869, 'num_input_tokens_seen': 3672113152, 'completed': '47.49% (1_751 / 3_687)', 'remaining time': '17:11:12', 'throughput': '8108.26', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:54:55,262 >> {'loss': 0.614, 'grad_norm': 18.904006958007812, 'learning_rate': 6.239418207844232e-07, 'epoch': 0.2582045023053973, 'num_input_tokens_seen': 3674210304, 'completed': '47.52% (1_752 / 3_687)', 'remaining time': '17:10:35', 'throughput': '8857.64', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:55:27,206 >> {'loss': 0.4589, 'grad_norm': 16.964792251586914, 'learning_rate': 6.23543589684737e-07, 'epoch': 0.2584757255221047, 'num_input_tokens_seen': 3676307456, 'completed': '47.55% (1_753 / 3_687)', 'remaining time': '17:10:03', 'throughput': '8206.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:55:58,670 >> {'loss': 0.4943, 'grad_norm': 15.05766487121582, 'learning_rate': 6.231452993999683e-07, 'epoch': 0.25874694873881204, 'num_input_tokens_seen': 3678404608, 'completed': '47.57% (1_754 / 3_687)', 'remaining time': '17:09:30', 'throughput': '8331.77', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:56:27,761 >> {'loss': 0.558, 'grad_norm': 18.096086502075195, 'learning_rate': 6.22746950250646e-07, 'epoch': 0.2590181719555194, 'num_input_tokens_seen': 3680501760, 'completed': '47.60% (1_755 / 3_687)', 'remaining time': '17:08:53', 'throughput': '9011.19', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:56:57,402 >> {'loss': 0.4175, 'grad_norm': 15.403691291809082, 'learning_rate': 6.223485425573463e-07, 'epoch': 0.25928939517222677, 'num_input_tokens_seen': 3682598912, 'completed': '47.63% (1_756 / 3_687)', 'remaining time': '17:08:16', 'throughput': '8843.74', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:57:29,116 >> {'loss': 0.3693, 'grad_norm': 14.380963325500488, 'learning_rate': 6.219500766406926e-07, 'epoch': 0.2595606183889341, 'num_input_tokens_seen': 3684696064, 'completed': '47.65% (1_757 / 3_687)', 'remaining time': '17:07:44', 'throughput': '8266.06', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:58:04,841 >> {'loss': 0.5975, 'grad_norm': 20.143505096435547, 'learning_rate': 6.215515528213553e-07, 'epoch': 0.25983184160564143, 'num_input_tokens_seen': 3686793216, 'completed': '47.68% (1_758 / 3_687)', 'remaining time': '17:07:19', 'throughput': '7337.83', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:58:34,241 >> {'loss': 0.3873, 'grad_norm': 18.09722900390625, 'learning_rate': 6.211529714200509e-07, 'epoch': 0.2601030648223488, 'num_input_tokens_seen': 3688890368, 'completed': '47.71% (1_759 / 3_687)', 'remaining time': '17:06:42', 'throughput': '8916.29', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:59:05,361 >> {'loss': 0.511, 'grad_norm': 18.81859588623047, 'learning_rate': 6.207543327575426e-07, 'epoch': 0.26037428803905616, 'num_input_tokens_seen': 3690987520, 'completed': '47.74% (1_760 / 3_687)', 'remaining time': '17:06:09', 'throughput': '8423.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-25 23:59:37,070 >> {'loss': 0.5123, 'grad_norm': 18.815465927124023, 'learning_rate': 6.203556371546399e-07, 'epoch': 0.2606455112557635, 'num_input_tokens_seen': 3693084672, 'completed': '47.76% (1_761 / 3_687)', 'remaining time': '17:05:36', 'throughput': '8267.25', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:00:06,286 >> {'loss': 0.4696, 'grad_norm': 15.527567863464355, 'learning_rate': 6.199568849321975e-07, 'epoch': 0.2609167344724708, 'num_input_tokens_seen': 3695181824, 'completed': '47.79% (1_762 / 3_687)', 'remaining time': '17:04:59', 'throughput': '8972.32', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:00:38,029 >> {'loss': 0.4165, 'grad_norm': 15.812019348144531, 'learning_rate': 6.195580764111163e-07, 'epoch': 0.2611879576891782, 'num_input_tokens_seen': 3697278976, 'completed': '47.82% (1_763 / 3_687)', 'remaining time': '17:04:26', 'throughput': '8258.37', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:01:09,462 >> {'loss': 0.2559, 'grad_norm': 11.180591583251953, 'learning_rate': 6.191592119123419e-07, 'epoch': 0.26145918090588555, 'num_input_tokens_seen': 3699376128, 'completed': '47.84% (1_764 / 3_687)', 'remaining time': '17:03:53', 'throughput': '8339.90', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:01:38,798 >> {'loss': 0.1839, 'grad_norm': 8.583144187927246, 'learning_rate': 6.187602917568655e-07, 'epoch': 0.2617304041225929, 'num_input_tokens_seen': 3701473280, 'completed': '47.87% (1_765 / 3_687)', 'remaining time': '17:03:16', 'throughput': '8935.94', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:02:11,682 >> {'loss': 0.2641, 'grad_norm': 12.009425163269043, 'learning_rate': 6.183613162657229e-07, 'epoch': 0.26200162733930027, 'num_input_tokens_seen': 3703570432, 'completed': '47.90% (1_766 / 3_687)', 'remaining time': '17:02:46', 'throughput': '7971.94', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:02:43,684 >> {'loss': 0.4814, 'grad_norm': 14.513581275939941, 'learning_rate': 6.179622857599942e-07, 'epoch': 0.2622728505560076, 'num_input_tokens_seen': 3705667584, 'completed': '47.93% (1_767 / 3_687)', 'remaining time': '17:02:14', 'throughput': '8191.20', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:03:16,289 >> {'loss': 0.6047, 'grad_norm': 23.496122360229492, 'learning_rate': 6.175632005608043e-07, 'epoch': 0.26254407377271494, 'num_input_tokens_seen': 3707764736, 'completed': '47.95% (1_768 / 3_687)', 'remaining time': '17:01:44', 'throughput': '8039.94', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:03:48,135 >> {'loss': 0.4534, 'grad_norm': 19.470165252685547, 'learning_rate': 6.171640609893213e-07, 'epoch': 0.2628152969894223, 'num_input_tokens_seen': 3709861888, 'completed': '47.98% (1_769 / 3_687)', 'remaining time': '17:01:12', 'throughput': '8231.70', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:04:17,395 >> {'loss': 0.403, 'grad_norm': 15.71687126159668, 'learning_rate': 6.167648673667581e-07, 'epoch': 0.26308652020612966, 'num_input_tokens_seen': 3711959040, 'completed': '48.01% (1_770 / 3_687)', 'remaining time': '17:00:34', 'throughput': '8959.06', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:04:50,064 >> {'loss': 0.6622, 'grad_norm': 23.09616470336914, 'learning_rate': 6.1636562001437e-07, 'epoch': 0.263357743422837, 'num_input_tokens_seen': 3714056192, 'completed': '48.03% (1_771 / 3_687)', 'remaining time': '17:00:04', 'throughput': '8024.23', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:05:20,743 >> {'loss': 0.4457, 'grad_norm': 15.076805114746094, 'learning_rate': 6.159663192534561e-07, 'epoch': 0.2636289666395443, 'num_input_tokens_seen': 3716153344, 'completed': '48.06% (1_772 / 3_687)', 'remaining time': '16:59:29', 'throughput': '8544.81', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:05:49,741 >> {'loss': 0.3627, 'grad_norm': 13.646811485290527, 'learning_rate': 6.155669654053592e-07, 'epoch': 0.2639001898562517, 'num_input_tokens_seen': 3718250496, 'completed': '48.09% (1_773 / 3_687)', 'remaining time': '16:58:52', 'throughput': '9040.17', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:06:19,625 >> {'loss': 0.6803, 'grad_norm': 20.863006591796875, 'learning_rate': 6.151675587914631e-07, 'epoch': 0.26417141307295905, 'num_input_tokens_seen': 3720347648, 'completed': '48.11% (1_774 / 3_687)', 'remaining time': '16:58:16', 'throughput': '8772.00', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:06:46,039 >> {'loss': 0.7848, 'grad_norm': 22.40181541442871, 'learning_rate': 6.147680997331958e-07, 'epoch': 0.2644426362896664, 'num_input_tokens_seen': 3722444800, 'completed': '48.14% (1_775 / 3_687)', 'remaining time': '16:57:33', 'throughput': '9924.52', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:07:19,456 >> {'loss': 0.5724, 'grad_norm': 20.66728973388672, 'learning_rate': 6.143685885520263e-07, 'epoch': 0.26471385950637377, 'num_input_tokens_seen': 3724541952, 'completed': '48.17% (1_776 / 3_687)', 'remaining time': '16:57:04', 'throughput': '7844.44', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:07:51,954 >> {'loss': 0.3131, 'grad_norm': 16.2211971282959, 'learning_rate': 6.13969025569466e-07, 'epoch': 0.2649850827230811, 'num_input_tokens_seen': 3726639104, 'completed': '48.20% (1_777 / 3_687)', 'remaining time': '16:56:33', 'throughput': '8066.61', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:08:22,035 >> {'loss': 0.3258, 'grad_norm': 14.444147109985352, 'learning_rate': 6.135694111070684e-07, 'epoch': 0.26525630593978844, 'num_input_tokens_seen': 3728736256, 'completed': '48.22% (1_778 / 3_687)', 'remaining time': '16:55:57', 'throughput': '8714.65', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:08:52,012 >> {'loss': 0.4398, 'grad_norm': 15.806771278381348, 'learning_rate': 6.131697454864277e-07, 'epoch': 0.2655275291564958, 'num_input_tokens_seen': 3730833408, 'completed': '48.25% (1_779 / 3_687)', 'remaining time': '16:55:22', 'throughput': '8744.82', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:09:22,046 >> {'loss': 0.6636, 'grad_norm': 19.829689025878906, 'learning_rate': 6.127700290291794e-07, 'epoch': 0.26579875237320316, 'num_input_tokens_seen': 3732930560, 'completed': '48.28% (1_780 / 3_687)', 'remaining time': '16:54:46', 'throughput': '8728.06', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:09:52,042 >> {'loss': 0.4753, 'grad_norm': 16.643007278442383, 'learning_rate': 6.123702620570005e-07, 'epoch': 0.2660699755899105, 'num_input_tokens_seen': 3735027712, 'completed': '48.30% (1_781 / 3_687)', 'remaining time': '16:54:10', 'throughput': '8739.46', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:10:24,794 >> {'loss': 0.4912, 'grad_norm': 21.29400062561035, 'learning_rate': 6.11970444891608e-07, 'epoch': 0.26634119880661783, 'num_input_tokens_seen': 3737124864, 'completed': '48.33% (1_782 / 3_687)', 'remaining time': '16:53:40', 'throughput': '8003.84', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:10:54,068 >> {'loss': 0.2683, 'grad_norm': 14.493866920471191, 'learning_rate': 6.115705778547597e-07, 'epoch': 0.2666124220233252, 'num_input_tokens_seen': 3739222016, 'completed': '48.36% (1_783 / 3_687)', 'remaining time': '16:53:03', 'throughput': '8954.75', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:11:23,248 >> {'loss': 0.6509, 'grad_norm': 19.871868133544922, 'learning_rate': 6.111706612682532e-07, 'epoch': 0.26688364524003255, 'num_input_tokens_seen': 3741319168, 'completed': '48.39% (1_784 / 3_687)', 'remaining time': '16:52:26', 'throughput': '8983.80', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:11:53,468 >> {'loss': 0.3895, 'grad_norm': 15.187943458557129, 'learning_rate': 6.107706954539261e-07, 'epoch': 0.2671548684567399, 'num_input_tokens_seen': 3743416320, 'completed': '48.41% (1_785 / 3_687)', 'remaining time': '16:51:51', 'throughput': '8674.43', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:12:24,128 >> {'loss': 0.54, 'grad_norm': 15.838203430175781, 'learning_rate': 6.103706807336559e-07, 'epoch': 0.2674260916734473, 'num_input_tokens_seen': 3745513472, 'completed': '48.44% (1_786 / 3_687)', 'remaining time': '16:51:16', 'throughput': '8550.06', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:12:56,043 >> {'loss': 0.4153, 'grad_norm': 12.953702926635742, 'learning_rate': 6.099706174293592e-07, 'epoch': 0.2676973148901546, 'num_input_tokens_seen': 3747610624, 'completed': '48.47% (1_787 / 3_687)', 'remaining time': '16:50:44', 'throughput': '8213.78', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:13:26,081 >> {'loss': 0.2817, 'grad_norm': 14.383577346801758, 'learning_rate': 6.095705058629914e-07, 'epoch': 0.26796853810686194, 'num_input_tokens_seen': 3749707776, 'completed': '48.49% (1_788 / 3_687)', 'remaining time': '16:50:09', 'throughput': '8727.18', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:13:54,960 >> {'loss': 0.3399, 'grad_norm': 14.063627243041992, 'learning_rate': 6.091703463565475e-07, 'epoch': 0.2682397613235693, 'num_input_tokens_seen': 3751804928, 'completed': '48.52% (1_789 / 3_687)', 'remaining time': '16:49:31', 'throughput': '9077.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:14:26,084 >> {'loss': 0.5057, 'grad_norm': 16.237741470336914, 'learning_rate': 6.087701392320606e-07, 'epoch': 0.26851098454027666, 'num_input_tokens_seen': 3753902080, 'completed': '48.55% (1_790 / 3_687)', 'remaining time': '16:48:58', 'throughput': '8422.72', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:14:56,498 >> {'loss': 0.3234, 'grad_norm': 12.57223129272461, 'learning_rate': 6.083698848116018e-07, 'epoch': 0.26878220775698397, 'num_input_tokens_seen': 3755999232, 'completed': '48.58% (1_791 / 3_687)', 'remaining time': '16:48:23', 'throughput': '8619.05', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:15:20,739 >> {'loss': 0.9309, 'grad_norm': 26.410602569580078, 'learning_rate': 6.079695834172808e-07, 'epoch': 0.26905343097369133, 'num_input_tokens_seen': 3758096384, 'completed': '48.60% (1_792 / 3_687)', 'remaining time': '16:47:36', 'throughput': '10814.24', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:15:54,764 >> {'loss': 0.4636, 'grad_norm': 17.888185501098633, 'learning_rate': 6.075692353712451e-07, 'epoch': 0.2693246541903987, 'num_input_tokens_seen': 3760193536, 'completed': '48.63% (1_793 / 3_687)', 'remaining time': '16:47:08', 'throughput': '7704.57', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:16:23,571 >> {'loss': 0.5735, 'grad_norm': 20.749839782714844, 'learning_rate': 6.071688409956793e-07, 'epoch': 0.26959587740710605, 'num_input_tokens_seen': 3762290688, 'completed': '48.66% (1_794 / 3_687)', 'remaining time': '16:46:31', 'throughput': '9099.97', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:16:51,933 >> {'loss': 0.5885, 'grad_norm': 23.78219223022461, 'learning_rate': 6.06768400612806e-07, 'epoch': 0.2698671006238134, 'num_input_tokens_seen': 3764387840, 'completed': '48.68% (1_795 / 3_687)', 'remaining time': '16:45:52', 'throughput': '9242.55', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:17:20,253 >> {'loss': 0.7039, 'grad_norm': 25.068920135498047, 'learning_rate': 6.063679145448838e-07, 'epoch': 0.2701383238405207, 'num_input_tokens_seen': 3766484992, 'completed': '48.71% (1_796 / 3_687)', 'remaining time': '16:45:13', 'throughput': '9256.71', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:17:53,128 >> {'loss': 0.3652, 'grad_norm': 15.582983016967773, 'learning_rate': 6.05967383114209e-07, 'epoch': 0.2704095470572281, 'num_input_tokens_seen': 3768582144, 'completed': '48.74% (1_797 / 3_687)', 'remaining time': '16:44:43', 'throughput': '7973.93', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:18:24,129 >> {'loss': 0.6668, 'grad_norm': 27.859983444213867, 'learning_rate': 6.055668066431142e-07, 'epoch': 0.27068077027393544, 'num_input_tokens_seen': 3770679296, 'completed': '48.77% (1_798 / 3_687)', 'remaining time': '16:44:10', 'throughput': '8455.99', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:18:54,144 >> {'loss': 0.3411, 'grad_norm': 12.221877098083496, 'learning_rate': 6.051661854539677e-07, 'epoch': 0.2709519934906428, 'num_input_tokens_seen': 3772776448, 'completed': '48.79% (1_799 / 3_687)', 'remaining time': '16:43:34', 'throughput': '8733.80', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:19:24,205 >> {'loss': 0.794, 'grad_norm': 20.4038143157959, 'learning_rate': 6.047655198691742e-07, 'epoch': 0.27122321670735017, 'num_input_tokens_seen': 3774873600, 'completed': '48.82% (1_800 / 3_687)', 'remaining time': '16:42:59', 'throughput': '8720.45', 'gpu_mem_free': '5581MB'} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( [INFO|trainer.py:3503] 2024-12-26 00:19:48,102 >> Saving model checkpoint to /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-1800 [INFO|configuration_utils.py:472] 2024-12-26 00:19:48,105 >> Configuration saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-1800/config.json [INFO|configuration_utils.py:807] 2024-12-26 00:19:48,106 >> Configuration saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-1800/generation_config.json [INFO|modeling_utils.py:2807] 2024-12-26 00:20:43,418 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 7 checkpoint shards. You can find where each parameters has been saved in the index located at /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-1800/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2684] 2024-12-26 00:20:43,422 >> tokenizer config file saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-1800/tokenizer_config.json [INFO|tokenization_utils_base.py:2693] 2024-12-26 00:20:43,423 >> Special tokens file saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-1800/special_tokens_map.json /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( [WARNING|trainer.py:868] 2024-12-26 00:24:38,073 >> Save streaming dataset state: {'epoch': 0, 'sample_in_epoch': 3600, 'num_canonical_nodes': 1, 'shuffle_seed': 42, 'initial_physical_nodes': 1} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] [INFO|trainer.py:175] 2024-12-26 00:25:09,242 >> {'loss': 0.2082, 'grad_norm': 13.717342376708984, 'learning_rate': 6.043648102111745e-07, 'epoch': 0.2714944399240575, 'num_input_tokens_seen': 3776970752, 'completed': '48.85% (1_801 / 3_687)', 'remaining time': '16:52:17', 'throughput': '759.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:25:39,665 >> {'loss': 0.6036, 'grad_norm': 19.058935165405273, 'learning_rate': 6.039640568024443e-07, 'epoch': 0.27176566314076483, 'num_input_tokens_seen': 3779067904, 'completed': '48.87% (1_802 / 3_687)', 'remaining time': '16:51:41', 'throughput': '8616.59', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:26:08,060 >> {'loss': 0.5712, 'grad_norm': 15.670707702636719, 'learning_rate': 6.035632599654946e-07, 'epoch': 0.2720368863574722, 'num_input_tokens_seen': 3781165056, 'completed': '48.90% (1_803 / 3_687)', 'remaining time': '16:51:02', 'throughput': '9231.82', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:26:38,625 >> {'loss': 0.3603, 'grad_norm': 13.817047119140625, 'learning_rate': 6.031624200228715e-07, 'epoch': 0.27230810957417956, 'num_input_tokens_seen': 3783262208, 'completed': '48.93% (1_804 / 3_687)', 'remaining time': '16:50:27', 'throughput': '8576.61', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:27:10,846 >> {'loss': 0.4361, 'grad_norm': 15.026437759399414, 'learning_rate': 6.027615372971558e-07, 'epoch': 0.2725793327908869, 'num_input_tokens_seen': 3785359360, 'completed': '48.96% (1_805 / 3_687)', 'remaining time': '16:49:55', 'throughput': '8136.03', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:27:40,910 >> {'loss': 0.4687, 'grad_norm': 16.934778213500977, 'learning_rate': 6.023606121109626e-07, 'epoch': 0.2728505560075942, 'num_input_tokens_seen': 3787456512, 'completed': '48.98% (1_806 / 3_687)', 'remaining time': '16:49:18', 'throughput': '8719.85', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:28:12,974 >> {'loss': 0.423, 'grad_norm': 18.090913772583008, 'learning_rate': 6.019596447869413e-07, 'epoch': 0.2731217792243016, 'num_input_tokens_seen': 3789553664, 'completed': '49.01% (1_807 / 3_687)', 'remaining time': '16:48:46', 'throughput': '8175.74', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:28:43,160 >> {'loss': 0.3783, 'grad_norm': 15.30374813079834, 'learning_rate': 6.015586356477749e-07, 'epoch': 0.27339300244100895, 'num_input_tokens_seen': 3791650816, 'completed': '49.04% (1_808 / 3_687)', 'remaining time': '16:48:10', 'throughput': '8683.60', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:29:14,167 >> {'loss': 0.576, 'grad_norm': 21.7882080078125, 'learning_rate': 6.011575850161805e-07, 'epoch': 0.2736642256577163, 'num_input_tokens_seen': 3793747968, 'completed': '49.06% (1_809 / 3_687)', 'remaining time': '16:47:36', 'throughput': '8454.34', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:29:41,556 >> {'loss': 0.5201, 'grad_norm': 17.875171661376953, 'learning_rate': 6.007564932149086e-07, 'epoch': 0.27393544887442367, 'num_input_tokens_seen': 3795845120, 'completed': '49.09% (1_810 / 3_687)', 'remaining time': '16:46:55', 'throughput': '9571.38', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:30:14,062 >> {'loss': 0.3927, 'grad_norm': 15.981590270996094, 'learning_rate': 6.003553605667423e-07, 'epoch': 0.274206672091131, 'num_input_tokens_seen': 3797942272, 'completed': '49.12% (1_811 / 3_687)', 'remaining time': '16:46:23', 'throughput': '8064.32', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:30:43,113 >> {'loss': 0.4337, 'grad_norm': 14.422945022583008, 'learning_rate': 5.999541873944979e-07, 'epoch': 0.27447789530783834, 'num_input_tokens_seen': 3800039424, 'completed': '49.15% (1_812 / 3_687)', 'remaining time': '16:45:45', 'throughput': '9023.85', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:31:15,424 >> {'loss': 0.4126, 'grad_norm': 17.687833786010742, 'learning_rate': 5.995529740210244e-07, 'epoch': 0.2747491185245457, 'num_input_tokens_seen': 3802136576, 'completed': '49.17% (1_813 / 3_687)', 'remaining time': '16:45:13', 'throughput': '8113.10', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:31:46,617 >> {'loss': 0.2607, 'grad_norm': 11.91661262512207, 'learning_rate': 5.99151720769203e-07, 'epoch': 0.27502034174125306, 'num_input_tokens_seen': 3804233728, 'completed': '49.20% (1_814 / 3_687)', 'remaining time': '16:44:39', 'throughput': '8403.94', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:32:14,528 >> {'loss': 0.4029, 'grad_norm': 13.01582145690918, 'learning_rate': 5.987504279619473e-07, 'epoch': 0.2752915649579604, 'num_input_tokens_seen': 3806330880, 'completed': '49.23% (1_815 / 3_687)', 'remaining time': '16:43:59', 'throughput': '9391.88', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:32:46,271 >> {'loss': 0.3179, 'grad_norm': 14.431076049804688, 'learning_rate': 5.98349095922202e-07, 'epoch': 0.2755627881746677, 'num_input_tokens_seen': 3808428032, 'completed': '49.25% (1_816 / 3_687)', 'remaining time': '16:43:26', 'throughput': '8258.35', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:33:15,598 >> {'loss': 0.9278, 'grad_norm': 23.85053825378418, 'learning_rate': 5.979477249729442e-07, 'epoch': 0.2758340113913751, 'num_input_tokens_seen': 3810525184, 'completed': '49.28% (1_817 / 3_687)', 'remaining time': '16:42:49', 'throughput': '8938.74', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:33:46,164 >> {'loss': 0.5163, 'grad_norm': 16.68154525756836, 'learning_rate': 5.975463154371822e-07, 'epoch': 0.27610523460808245, 'num_input_tokens_seen': 3812622336, 'completed': '49.31% (1_818 / 3_687)', 'remaining time': '16:42:13', 'throughput': '8576.24', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:34:15,642 >> {'loss': 0.4715, 'grad_norm': 20.487937927246094, 'learning_rate': 5.971448676379544e-07, 'epoch': 0.2763764578247898, 'num_input_tokens_seen': 3814719488, 'completed': '49.34% (1_819 / 3_687)', 'remaining time': '16:41:36', 'throughput': '8892.88', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:34:47,881 >> {'loss': 0.3283, 'grad_norm': 11.562193870544434, 'learning_rate': 5.967433818983311e-07, 'epoch': 0.2766476810414972, 'num_input_tokens_seen': 3816816640, 'completed': '49.36% (1_820 / 3_687)', 'remaining time': '16:41:04', 'throughput': '8131.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:35:17,497 >> {'loss': 0.3784, 'grad_norm': 14.949207305908203, 'learning_rate': 5.963418585414129e-07, 'epoch': 0.2769189042582045, 'num_input_tokens_seen': 3818913792, 'completed': '49.39% (1_821 / 3_687)', 'remaining time': '16:40:27', 'throughput': '8851.69', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:35:48,494 >> {'loss': 0.2296, 'grad_norm': 13.64500617980957, 'learning_rate': 5.959402978903306e-07, 'epoch': 0.27719012747491184, 'num_input_tokens_seen': 3821010944, 'completed': '49.42% (1_822 / 3_687)', 'remaining time': '16:39:53', 'throughput': '8456.91', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:36:20,347 >> {'loss': 0.7141, 'grad_norm': 20.391284942626953, 'learning_rate': 5.955387002682445e-07, 'epoch': 0.2774613506916192, 'num_input_tokens_seen': 3823108096, 'completed': '49.44% (1_823 / 3_687)', 'remaining time': '16:39:20', 'throughput': '8229.82', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:36:51,617 >> {'loss': 0.3659, 'grad_norm': 12.009023666381836, 'learning_rate': 5.951370659983452e-07, 'epoch': 0.27773257390832656, 'num_input_tokens_seen': 3825205248, 'completed': '49.47% (1_824 / 3_687)', 'remaining time': '16:38:47', 'throughput': '8383.26', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:37:21,537 >> {'loss': 0.273, 'grad_norm': 12.802525520324707, 'learning_rate': 5.94735395403853e-07, 'epoch': 0.2780037971250339, 'num_input_tokens_seen': 3827302400, 'completed': '49.50% (1_825 / 3_687)', 'remaining time': '16:38:10', 'throughput': '8761.51', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:37:50,520 >> {'loss': 0.7285, 'grad_norm': 22.112329483032227, 'learning_rate': 5.943336888080167e-07, 'epoch': 0.27827502034174123, 'num_input_tokens_seen': 3829399552, 'completed': '49.53% (1_826 / 3_687)', 'remaining time': '16:37:32', 'throughput': '9044.71', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:38:20,791 >> {'loss': 0.5338, 'grad_norm': 17.80236053466797, 'learning_rate': 5.939319465341148e-07, 'epoch': 0.2785462435584486, 'num_input_tokens_seen': 3831496704, 'completed': '49.55% (1_827 / 3_687)', 'remaining time': '16:36:57', 'throughput': '8660.04', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:38:51,950 >> {'loss': 0.444, 'grad_norm': 17.097667694091797, 'learning_rate': 5.93530168905454e-07, 'epoch': 0.27881746677515595, 'num_input_tokens_seen': 3833593856, 'completed': '49.58% (1_828 / 3_687)', 'remaining time': '16:36:23', 'throughput': '8413.11', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:39:23,073 >> {'loss': 0.4893, 'grad_norm': 16.976318359375, 'learning_rate': 5.931283562453696e-07, 'epoch': 0.2790886899918633, 'num_input_tokens_seen': 3835691008, 'completed': '49.61% (1_829 / 3_687)', 'remaining time': '16:35:49', 'throughput': '8422.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:39:50,033 >> {'loss': 0.4142, 'grad_norm': 14.774353981018066, 'learning_rate': 5.927265088772255e-07, 'epoch': 0.2793599132085707, 'num_input_tokens_seen': 3837788160, 'completed': '49.63% (1_830 / 3_687)', 'remaining time': '16:35:07', 'throughput': '9723.48', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:40:20,310 >> {'loss': 0.5785, 'grad_norm': 17.598939895629883, 'learning_rate': 5.923246271244127e-07, 'epoch': 0.279631136425278, 'num_input_tokens_seen': 3839885312, 'completed': '49.66% (1_831 / 3_687)', 'remaining time': '16:34:32', 'throughput': '8658.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:40:53,408 >> {'loss': 0.4257, 'grad_norm': 18.117116928100586, 'learning_rate': 5.919227113103508e-07, 'epoch': 0.27990235964198534, 'num_input_tokens_seen': 3841982464, 'completed': '49.69% (1_832 / 3_687)', 'remaining time': '16:34:01', 'throughput': '7920.29', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:41:22,719 >> {'loss': 0.279, 'grad_norm': 12.194579124450684, 'learning_rate': 5.915207617584858e-07, 'epoch': 0.2801735828586927, 'num_input_tokens_seen': 3844079616, 'completed': '49.72% (1_833 / 3_687)', 'remaining time': '16:33:24', 'throughput': '8943.53', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:41:53,504 >> {'loss': 0.4616, 'grad_norm': 15.989289283752441, 'learning_rate': 5.911187787922918e-07, 'epoch': 0.28044480607540007, 'num_input_tokens_seen': 3846176768, 'completed': '49.74% (1_834 / 3_687)', 'remaining time': '16:32:49', 'throughput': '8515.08', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:42:27,366 >> {'loss': 0.3184, 'grad_norm': 14.382404327392578, 'learning_rate': 5.907167627352697e-07, 'epoch': 0.2807160292921074, 'num_input_tokens_seen': 3848273920, 'completed': '49.77% (1_835 / 3_687)', 'remaining time': '16:32:20', 'throughput': '7741.57', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:42:58,160 >> {'loss': 0.4872, 'grad_norm': 15.597938537597656, 'learning_rate': 5.90314713910946e-07, 'epoch': 0.28098725250881473, 'num_input_tokens_seen': 3850371072, 'completed': '49.80% (1_836 / 3_687)', 'remaining time': '16:31:46', 'throughput': '8512.92', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:43:27,040 >> {'loss': 0.4539, 'grad_norm': 14.058372497558594, 'learning_rate': 5.899126326428746e-07, 'epoch': 0.2812584757255221, 'num_input_tokens_seen': 3852468224, 'completed': '49.82% (1_837 / 3_687)', 'remaining time': '16:31:08', 'throughput': '9077.10', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:43:56,305 >> {'loss': 0.742, 'grad_norm': 23.597742080688477, 'learning_rate': 5.895105192546353e-07, 'epoch': 0.28152969894222946, 'num_input_tokens_seen': 3854565376, 'completed': '49.85% (1_838 / 3_687)', 'remaining time': '16:30:31', 'throughput': '8957.59', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:44:26,254 >> {'loss': 0.243, 'grad_norm': 10.927760124206543, 'learning_rate': 5.891083740698337e-07, 'epoch': 0.2818009221589368, 'num_input_tokens_seen': 3856662528, 'completed': '49.88% (1_839 / 3_687)', 'remaining time': '16:29:55', 'throughput': '8752.93', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:44:59,172 >> {'loss': 0.4831, 'grad_norm': 19.77141571044922, 'learning_rate': 5.887061974121007e-07, 'epoch': 0.2820721453756442, 'num_input_tokens_seen': 3858759680, 'completed': '49.91% (1_840 / 3_687)', 'remaining time': '16:29:24', 'throughput': '7963.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:45:29,194 >> {'loss': 0.6657, 'grad_norm': 18.980165481567383, 'learning_rate': 5.883039896050928e-07, 'epoch': 0.2823433685923515, 'num_input_tokens_seen': 3860856832, 'completed': '49.93% (1_841 / 3_687)', 'remaining time': '16:28:48', 'throughput': '8731.72', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:46:00,539 >> {'loss': 0.4045, 'grad_norm': 14.606121063232422, 'learning_rate': 5.879017509724919e-07, 'epoch': 0.28261459180905885, 'num_input_tokens_seen': 3862953984, 'completed': '49.96% (1_842 / 3_687)', 'remaining time': '16:28:14', 'throughput': '8363.28', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:46:32,693 >> {'loss': 0.5391, 'grad_norm': 18.364437103271484, 'learning_rate': 5.874994818380039e-07, 'epoch': 0.2828858150257662, 'num_input_tokens_seen': 3865051136, 'completed': '49.99% (1_843 / 3_687)', 'remaining time': '16:27:42', 'throughput': '8152.85', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:47:05,464 >> {'loss': 0.4491, 'grad_norm': 16.989797592163086, 'learning_rate': 5.8709718252536e-07, 'epoch': 0.28315703824247357, 'num_input_tokens_seen': 3867148288, 'completed': '50.01% (1_844 / 3_687)', 'remaining time': '16:27:11', 'throughput': '7999.25', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:47:34,705 >> {'loss': 0.338, 'grad_norm': 14.282411575317383, 'learning_rate': 5.866948533583152e-07, 'epoch': 0.28342826145918093, 'num_input_tokens_seen': 3869245440, 'completed': '50.04% (1_845 / 3_687)', 'remaining time': '16:26:34', 'throughput': '8964.86', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:48:04,495 >> {'loss': 0.2293, 'grad_norm': 11.778130531311035, 'learning_rate': 5.862924946606487e-07, 'epoch': 0.28369948467588824, 'num_input_tokens_seen': 3871342592, 'completed': '50.07% (1_846 / 3_687)', 'remaining time': '16:25:58', 'throughput': '8799.74', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:48:37,247 >> {'loss': 0.4903, 'grad_norm': 17.8021297454834, 'learning_rate': 5.858901067561637e-07, 'epoch': 0.2839707078925956, 'num_input_tokens_seen': 3873439744, 'completed': '50.09% (1_847 / 3_687)', 'remaining time': '16:25:27', 'throughput': '8003.91', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:49:08,297 >> {'loss': 0.8826, 'grad_norm': 21.23961067199707, 'learning_rate': 5.854876899686864e-07, 'epoch': 0.28424193110930296, 'num_input_tokens_seen': 3875536896, 'completed': '50.12% (1_848 / 3_687)', 'remaining time': '16:24:53', 'throughput': '8442.69', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:49:36,784 >> {'loss': 0.6914, 'grad_norm': 21.922151565551758, 'learning_rate': 5.850852446220666e-07, 'epoch': 0.2845131543260103, 'num_input_tokens_seen': 3877634048, 'completed': '50.15% (1_849 / 3_687)', 'remaining time': '16:24:14', 'throughput': '9202.19', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:50:05,884 >> {'loss': 0.3711, 'grad_norm': 15.679913520812988, 'learning_rate': 5.84682771040177e-07, 'epoch': 0.2847843775427177, 'num_input_tokens_seen': 3879731200, 'completed': '50.18% (1_850 / 3_687)', 'remaining time': '16:23:37', 'throughput': '9008.31', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:50:37,762 >> {'loss': 0.3313, 'grad_norm': 12.335871696472168, 'learning_rate': 5.842802695469131e-07, 'epoch': 0.285055600759425, 'num_input_tokens_seen': 3881828352, 'completed': '50.20% (1_851 / 3_687)', 'remaining time': '16:23:04', 'throughput': '8223.52', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:51:07,502 >> {'loss': 0.3604, 'grad_norm': 13.751896858215332, 'learning_rate': 5.838777404661927e-07, 'epoch': 0.28532682397613235, 'num_input_tokens_seen': 3883925504, 'completed': '50.23% (1_852 / 3_687)', 'remaining time': '16:22:28', 'throughput': '8814.37', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:51:35,129 >> {'loss': 0.6989, 'grad_norm': 17.7875919342041, 'learning_rate': 5.83475184121956e-07, 'epoch': 0.2855980471928397, 'num_input_tokens_seen': 3886022656, 'completed': '50.26% (1_853 / 3_687)', 'remaining time': '16:21:48', 'throughput': '9488.68', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:52:06,094 >> {'loss': 0.6116, 'grad_norm': 21.652841567993164, 'learning_rate': 5.830726008381648e-07, 'epoch': 0.28586927040954707, 'num_input_tokens_seen': 3888119808, 'completed': '50.28% (1_854 / 3_687)', 'remaining time': '16:21:14', 'throughput': '8465.75', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:52:36,090 >> {'loss': 0.4374, 'grad_norm': 17.218547821044922, 'learning_rate': 5.826699909388031e-07, 'epoch': 0.28614049362625443, 'num_input_tokens_seen': 3890216960, 'completed': '50.31% (1_855 / 3_687)', 'remaining time': '16:20:38', 'throughput': '8739.43', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:53:07,944 >> {'loss': 0.5781, 'grad_norm': 16.56511116027832, 'learning_rate': 5.822673547478757e-07, 'epoch': 0.28641171684296174, 'num_input_tokens_seen': 3892314112, 'completed': '50.34% (1_856 / 3_687)', 'remaining time': '16:20:05', 'throughput': '8229.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:53:39,898 >> {'loss': 0.6474, 'grad_norm': 18.990068435668945, 'learning_rate': 5.818646925894092e-07, 'epoch': 0.2866829400596691, 'num_input_tokens_seen': 3894411264, 'completed': '50.37% (1_857 / 3_687)', 'remaining time': '16:19:33', 'throughput': '8203.90', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:54:11,311 >> {'loss': 0.2562, 'grad_norm': 9.938685417175293, 'learning_rate': 5.814620047874505e-07, 'epoch': 0.28695416327637646, 'num_input_tokens_seen': 3896508416, 'completed': '50.39% (1_858 / 3_687)', 'remaining time': '16:19:00', 'throughput': '8345.13', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:54:41,235 >> {'loss': 0.4856, 'grad_norm': 16.420949935913086, 'learning_rate': 5.810592916660677e-07, 'epoch': 0.2872253864930838, 'num_input_tokens_seen': 3898605568, 'completed': '50.42% (1_859 / 3_687)', 'remaining time': '16:18:24', 'throughput': '8760.16', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:55:12,301 >> {'loss': 0.3751, 'grad_norm': 17.336559295654297, 'learning_rate': 5.806565535493489e-07, 'epoch': 0.2874966097097912, 'num_input_tokens_seen': 3900702720, 'completed': '50.45% (1_860 / 3_687)', 'remaining time': '16:17:50', 'throughput': '8439.32', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:55:43,127 >> {'loss': 0.3521, 'grad_norm': 12.965970039367676, 'learning_rate': 5.802537907614023e-07, 'epoch': 0.2877678329264985, 'num_input_tokens_seen': 3902799872, 'completed': '50.47% (1_861 / 3_687)', 'remaining time': '16:17:15', 'throughput': '8503.01', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:56:13,576 >> {'loss': 0.7629, 'grad_norm': 20.108755111694336, 'learning_rate': 5.798510036263561e-07, 'epoch': 0.28803905614320585, 'num_input_tokens_seen': 3904897024, 'completed': '50.50% (1_862 / 3_687)', 'remaining time': '16:16:40', 'throughput': '8609.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:56:42,649 >> {'loss': 0.5382, 'grad_norm': 15.406062126159668, 'learning_rate': 5.794481924683581e-07, 'epoch': 0.2883102793599132, 'num_input_tokens_seen': 3906994176, 'completed': '50.53% (1_863 / 3_687)', 'remaining time': '16:16:03', 'throughput': '9016.88', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:57:15,971 >> {'loss': 0.4859, 'grad_norm': 21.75050163269043, 'learning_rate': 5.790453576115756e-07, 'epoch': 0.2885815025766206, 'num_input_tokens_seen': 3909091328, 'completed': '50.56% (1_864 / 3_687)', 'remaining time': '16:15:33', 'throughput': '7867.05', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:57:46,769 >> {'loss': 0.2867, 'grad_norm': 11.159966468811035, 'learning_rate': 5.786424993801942e-07, 'epoch': 0.28885272579332794, 'num_input_tokens_seen': 3911188480, 'completed': '50.58% (1_865 / 3_687)', 'remaining time': '16:14:59', 'throughput': '8511.71', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:58:19,411 >> {'loss': 0.3638, 'grad_norm': 16.282310485839844, 'learning_rate': 5.782396180984194e-07, 'epoch': 0.28912394901003524, 'num_input_tokens_seen': 3913285632, 'completed': '50.61% (1_866 / 3_687)', 'remaining time': '16:14:28', 'throughput': '8030.84', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:58:50,984 >> {'loss': 0.4206, 'grad_norm': 15.254396438598633, 'learning_rate': 5.778367140904746e-07, 'epoch': 0.2893951722267426, 'num_input_tokens_seen': 3915382784, 'completed': '50.64% (1_867 / 3_687)', 'remaining time': '16:13:55', 'throughput': '8302.83', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:59:23,080 >> {'loss': 0.5837, 'grad_norm': 18.541379928588867, 'learning_rate': 5.774337876806016e-07, 'epoch': 0.28966639544344996, 'num_input_tokens_seen': 3917479936, 'completed': '50.66% (1_868 / 3_687)', 'remaining time': '16:13:22', 'throughput': '8167.46', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 00:59:53,540 >> {'loss': 0.5548, 'grad_norm': 20.998781204223633, 'learning_rate': 5.770308391930601e-07, 'epoch': 0.2899376186601573, 'num_input_tokens_seen': 3919577088, 'completed': '50.69% (1_869 / 3_687)', 'remaining time': '16:12:48', 'throughput': '8606.09', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:00:24,165 >> {'loss': 0.428, 'grad_norm': 14.681344985961914, 'learning_rate': 5.766278689521278e-07, 'epoch': 0.2902088418768647, 'num_input_tokens_seen': 3921674240, 'completed': '50.72% (1_870 / 3_687)', 'remaining time': '16:12:13', 'throughput': '8559.95', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:00:53,018 >> {'loss': 0.3078, 'grad_norm': 11.808730125427246, 'learning_rate': 5.762248772820999e-07, 'epoch': 0.290480065093572, 'num_input_tokens_seen': 3923771392, 'completed': '50.75% (1_871 / 3_687)', 'remaining time': '16:11:35', 'throughput': '9085.28', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:01:24,030 >> {'loss': 0.3758, 'grad_norm': 12.644157409667969, 'learning_rate': 5.758218645072887e-07, 'epoch': 0.29075128831027935, 'num_input_tokens_seen': 3925868544, 'completed': '50.77% (1_872 / 3_687)', 'remaining time': '16:11:01', 'throughput': '8453.17', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:01:52,895 >> {'loss': 0.5625, 'grad_norm': 18.32072639465332, 'learning_rate': 5.754188309520235e-07, 'epoch': 0.2910225115269867, 'num_input_tokens_seen': 3927965696, 'completed': '50.80% (1_873 / 3_687)', 'remaining time': '16:10:24', 'throughput': '9081.67', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:02:21,630 >> {'loss': 0.6177, 'grad_norm': 20.193140029907227, 'learning_rate': 5.750157769406504e-07, 'epoch': 0.2912937347436941, 'num_input_tokens_seen': 3930062848, 'completed': '50.83% (1_874 / 3_687)', 'remaining time': '16:09:46', 'throughput': '9122.64', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:02:50,497 >> {'loss': 0.7787, 'grad_norm': 20.246875762939453, 'learning_rate': 5.74612702797532e-07, 'epoch': 0.2915649579604014, 'num_input_tokens_seen': 3932160000, 'completed': '50.85% (1_875 / 3_687)', 'remaining time': '16:09:08', 'throughput': '9081.18', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:03:17,710 >> {'loss': 0.7121, 'grad_norm': 44.87726974487305, 'learning_rate': 5.74209608847047e-07, 'epoch': 0.29183618117710874, 'num_input_tokens_seen': 3934257152, 'completed': '50.88% (1_876 / 3_687)', 'remaining time': '16:08:28', 'throughput': '9632.91', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:03:47,425 >> {'loss': 0.6667, 'grad_norm': 22.28316879272461, 'learning_rate': 5.7380649541359e-07, 'epoch': 0.2921074043938161, 'num_input_tokens_seen': 3936354304, 'completed': '50.91% (1_877 / 3_687)', 'remaining time': '16:07:52', 'throughput': '8821.97', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:04:17,483 >> {'loss': 0.3718, 'grad_norm': 12.953632354736328, 'learning_rate': 5.734033628215714e-07, 'epoch': 0.29237862761052347, 'num_input_tokens_seen': 3938451456, 'completed': '50.94% (1_878 / 3_687)', 'remaining time': '16:07:17', 'throughput': '8721.50', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:04:48,857 >> {'loss': 0.6267, 'grad_norm': 18.659269332885742, 'learning_rate': 5.730002113954169e-07, 'epoch': 0.29264985082723083, 'num_input_tokens_seen': 3940548608, 'completed': '50.96% (1_879 / 3_687)', 'remaining time': '16:06:43', 'throughput': '8355.36', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:05:19,595 >> {'loss': 0.7124, 'grad_norm': 24.658376693725586, 'learning_rate': 5.725970414595675e-07, 'epoch': 0.29292107404393813, 'num_input_tokens_seen': 3942645760, 'completed': '50.99% (1_880 / 3_687)', 'remaining time': '16:06:09', 'throughput': '8528.26', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:05:48,903 >> {'loss': 0.5809, 'grad_norm': 15.565204620361328, 'learning_rate': 5.721938533384791e-07, 'epoch': 0.2931922972606455, 'num_input_tokens_seen': 3944742912, 'completed': '51.02% (1_881 / 3_687)', 'remaining time': '16:05:32', 'throughput': '8944.48', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:06:19,440 >> {'loss': 0.4206, 'grad_norm': 16.507081985473633, 'learning_rate': 5.717906473566219e-07, 'epoch': 0.29346352047735286, 'num_input_tokens_seen': 3946840064, 'completed': '51.04% (1_882 / 3_687)', 'remaining time': '16:04:58', 'throughput': '8584.44', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:06:49,039 >> {'loss': 0.6355, 'grad_norm': 19.66336441040039, 'learning_rate': 5.71387423838481e-07, 'epoch': 0.2937347436940602, 'num_input_tokens_seen': 3948937216, 'completed': '51.07% (1_883 / 3_687)', 'remaining time': '16:04:21', 'throughput': '8856.64', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:07:19,866 >> {'loss': 0.3339, 'grad_norm': 13.438267707824707, 'learning_rate': 5.709841831085553e-07, 'epoch': 0.2940059669107676, 'num_input_tokens_seen': 3951034368, 'completed': '51.10% (1_884 / 3_687)', 'remaining time': '16:03:47', 'throughput': '8503.59', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:07:51,507 >> {'loss': 0.3658, 'grad_norm': 12.036945343017578, 'learning_rate': 5.705809254913576e-07, 'epoch': 0.2942771901274749, 'num_input_tokens_seen': 3953131520, 'completed': '51.13% (1_885 / 3_687)', 'remaining time': '16:03:14', 'throughput': '8284.88', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:08:19,303 >> {'loss': 0.6545, 'grad_norm': 19.046716690063477, 'learning_rate': 5.70177651311414e-07, 'epoch': 0.29454841334418225, 'num_input_tokens_seen': 3955228672, 'completed': '51.15% (1_886 / 3_687)', 'remaining time': '16:02:35', 'throughput': '9431.01', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:08:51,718 >> {'loss': 0.3465, 'grad_norm': 16.657411575317383, 'learning_rate': 5.697743608932646e-07, 'epoch': 0.2948196365608896, 'num_input_tokens_seen': 3957325824, 'completed': '51.18% (1_887 / 3_687)', 'remaining time': '16:02:04', 'throughput': '8087.16', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:09:21,507 >> {'loss': 1.0207, 'grad_norm': 28.270238876342773, 'learning_rate': 5.693710545614621e-07, 'epoch': 0.29509085977759697, 'num_input_tokens_seen': 3959422976, 'completed': '51.21% (1_888 / 3_687)', 'remaining time': '16:01:28', 'throughput': '8800.25', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:09:50,136 >> {'loss': 0.6864, 'grad_norm': 18.105026245117188, 'learning_rate': 5.689677326405719e-07, 'epoch': 0.29536208299430433, 'num_input_tokens_seen': 3961520128, 'completed': '51.23% (1_889 / 3_687)', 'remaining time': '16:00:50', 'throughput': '9156.51', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:10:20,526 >> {'loss': 0.8733, 'grad_norm': 22.831960678100586, 'learning_rate': 5.685643954551722e-07, 'epoch': 0.29563330621101164, 'num_input_tokens_seen': 3963617280, 'completed': '51.26% (1_890 / 3_687)', 'remaining time': '16:00:15', 'throughput': '8626.07', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:10:51,972 >> {'loss': 0.5093, 'grad_norm': 18.551103591918945, 'learning_rate': 5.681610433298535e-07, 'epoch': 0.295904529427719, 'num_input_tokens_seen': 3965714432, 'completed': '51.29% (1_891 / 3_687)', 'remaining time': '15:59:42', 'throughput': '8336.15', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:11:24,618 >> {'loss': 0.4713, 'grad_norm': 15.365251541137695, 'learning_rate': 5.677576765892182e-07, 'epoch': 0.29617575264442636, 'num_input_tokens_seen': 3967811584, 'completed': '51.32% (1_892 / 3_687)', 'remaining time': '15:59:11', 'throughput': '8030.02', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:11:54,766 >> {'loss': 0.4017, 'grad_norm': 16.42351722717285, 'learning_rate': 5.673542955578806e-07, 'epoch': 0.2964469758611337, 'num_input_tokens_seen': 3969908736, 'completed': '51.34% (1_893 / 3_687)', 'remaining time': '15:58:36', 'throughput': '8695.15', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:12:25,203 >> {'loss': 0.3471, 'grad_norm': 15.9588623046875, 'learning_rate': 5.669509005604663e-07, 'epoch': 0.2967181990778411, 'num_input_tokens_seen': 3972005888, 'completed': '51.37% (1_894 / 3_687)', 'remaining time': '15:58:01', 'throughput': '8612.55', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:12:55,073 >> {'loss': 0.7455, 'grad_norm': 21.854177474975586, 'learning_rate': 5.665474919216122e-07, 'epoch': 0.2969894222945484, 'num_input_tokens_seen': 3974103040, 'completed': '51.40% (1_895 / 3_687)', 'remaining time': '15:57:26', 'throughput': '8776.36', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:13:25,290 >> {'loss': 0.4565, 'grad_norm': 15.875153541564941, 'learning_rate': 5.661440699659663e-07, 'epoch': 0.29726064551125575, 'num_input_tokens_seen': 3976200192, 'completed': '51.42% (1_896 / 3_687)', 'remaining time': '15:56:51', 'throughput': '8675.28', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:13:54,176 >> {'loss': 0.3644, 'grad_norm': 13.967791557312012, 'learning_rate': 5.657406350181872e-07, 'epoch': 0.2975318687279631, 'num_input_tokens_seen': 3978297344, 'completed': '51.45% (1_897 / 3_687)', 'remaining time': '15:56:13', 'throughput': '9075.24', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:14:24,610 >> {'loss': 0.7488, 'grad_norm': 22.632991790771484, 'learning_rate': 5.65337187402944e-07, 'epoch': 0.2978030919446705, 'num_input_tokens_seen': 3980394496, 'completed': '51.48% (1_898 / 3_687)', 'remaining time': '15:55:39', 'throughput': '8613.54', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:14:57,226 >> {'loss': 0.5608, 'grad_norm': 35.31967544555664, 'learning_rate': 5.64933727444916e-07, 'epoch': 0.29807431516137783, 'num_input_tokens_seen': 3982491648, 'completed': '51.51% (1_899 / 3_687)', 'remaining time': '15:55:08', 'throughput': '8037.24', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:15:27,162 >> {'loss': 0.3702, 'grad_norm': 13.531403541564941, 'learning_rate': 5.645302554687925e-07, 'epoch': 0.29834553837808514, 'num_input_tokens_seen': 3984588800, 'completed': '51.53% (1_900 / 3_687)', 'remaining time': '15:54:32', 'throughput': '8756.63', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:15:57,748 >> {'loss': 0.4089, 'grad_norm': 15.457035064697266, 'learning_rate': 5.641267717992723e-07, 'epoch': 0.2986167615947925, 'num_input_tokens_seen': 3986685952, 'completed': '51.56% (1_901 / 3_687)', 'remaining time': '15:53:58', 'throughput': '8570.82', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:16:29,760 >> {'loss': 0.4136, 'grad_norm': 15.16627311706543, 'learning_rate': 5.637232767610637e-07, 'epoch': 0.29888798481149986, 'num_input_tokens_seen': 3988783104, 'completed': '51.59% (1_902 / 3_687)', 'remaining time': '15:53:26', 'throughput': '8188.88', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:17:02,471 >> {'loss': 0.4861, 'grad_norm': 15.52268123626709, 'learning_rate': 5.63319770678884e-07, 'epoch': 0.2991592080282072, 'num_input_tokens_seen': 3990880256, 'completed': '51.61% (1_903 / 3_687)', 'remaining time': '15:52:55', 'throughput': '8014.06', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:17:34,357 >> {'loss': 0.3985, 'grad_norm': 14.946885108947754, 'learning_rate': 5.6291625387746e-07, 'epoch': 0.2994304312449146, 'num_input_tokens_seen': 3992977408, 'completed': '51.64% (1_904 / 3_687)', 'remaining time': '15:52:22', 'throughput': '8221.17', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:18:03,808 >> {'loss': 0.3807, 'grad_norm': 18.582761764526367, 'learning_rate': 5.625127266815263e-07, 'epoch': 0.2997016544616219, 'num_input_tokens_seen': 3995074560, 'completed': '51.67% (1_905 / 3_687)', 'remaining time': '15:51:46', 'throughput': '8901.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:18:35,236 >> {'loss': 0.3393, 'grad_norm': 17.98885726928711, 'learning_rate': 5.621091894158261e-07, 'epoch': 0.29997287767832925, 'num_input_tokens_seen': 3997171712, 'completed': '51.70% (1_906 / 3_687)', 'remaining time': '15:51:13', 'throughput': '8340.90', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:19:02,755 >> {'loss': 0.5337, 'grad_norm': 17.400163650512695, 'learning_rate': 5.617056424051113e-07, 'epoch': 0.3002441008950366, 'num_input_tokens_seen': 3999268864, 'completed': '51.72% (1_907 / 3_687)', 'remaining time': '15:50:34', 'throughput': '9526.21', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:19:34,580 >> {'loss': 0.4942, 'grad_norm': 20.89246368408203, 'learning_rate': 5.613020859741408e-07, 'epoch': 0.300515324111744, 'num_input_tokens_seen': 4001366016, 'completed': '51.75% (1_908 / 3_687)', 'remaining time': '15:50:01', 'throughput': '8236.96', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:20:08,771 >> {'loss': 0.6841, 'grad_norm': 22.6678524017334, 'learning_rate': 5.608985204476817e-07, 'epoch': 0.30078654732845134, 'num_input_tokens_seen': 4003463168, 'completed': '51.78% (1_909 / 3_687)', 'remaining time': '15:49:33', 'throughput': '7667.13', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:20:41,786 >> {'loss': 0.5315, 'grad_norm': 29.596494674682617, 'learning_rate': 5.60494946150508e-07, 'epoch': 0.30105777054515864, 'num_input_tokens_seen': 4005560320, 'completed': '51.80% (1_910 / 3_687)', 'remaining time': '15:49:02', 'throughput': '7940.15', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:21:13,288 >> {'loss': 0.4301, 'grad_norm': 16.15056037902832, 'learning_rate': 5.600913634074009e-07, 'epoch': 0.301328993761866, 'num_input_tokens_seen': 4007657472, 'completed': '51.83% (1_911 / 3_687)', 'remaining time': '15:48:29', 'throughput': '8321.40', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:21:43,918 >> {'loss': 0.4657, 'grad_norm': 16.99407196044922, 'learning_rate': 5.596877725431487e-07, 'epoch': 0.30160021697857337, 'num_input_tokens_seen': 4009754624, 'completed': '51.86% (1_912 / 3_687)', 'remaining time': '15:47:55', 'throughput': '8558.38', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:22:14,060 >> {'loss': 0.313, 'grad_norm': 13.268138885498047, 'learning_rate': 5.592841738825457e-07, 'epoch': 0.3018714401952807, 'num_input_tokens_seen': 4011851776, 'completed': '51.89% (1_913 / 3_687)', 'remaining time': '15:47:20', 'throughput': '8696.91', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:22:43,982 >> {'loss': 0.3853, 'grad_norm': 16.610029220581055, 'learning_rate': 5.588805677503928e-07, 'epoch': 0.3021426634119881, 'num_input_tokens_seen': 4013948928, 'completed': '51.91% (1_914 / 3_687)', 'remaining time': '15:46:44', 'throughput': '8760.95', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:23:19,793 >> {'loss': 0.4144, 'grad_norm': 14.709389686584473, 'learning_rate': 5.58476954471497e-07, 'epoch': 0.3024138866286954, 'num_input_tokens_seen': 4016046080, 'completed': '51.94% (1_915 / 3_687)', 'remaining time': '15:46:18', 'throughput': '7320.14', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:23:51,379 >> {'loss': 0.6721, 'grad_norm': 18.49999237060547, 'learning_rate': 5.580733343706708e-07, 'epoch': 0.30268510984540276, 'num_input_tokens_seen': 4018143232, 'completed': '51.97% (1_916 / 3_687)', 'remaining time': '15:45:46', 'throughput': '8299.56', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:24:22,240 >> {'loss': 0.4901, 'grad_norm': 17.968942642211914, 'learning_rate': 5.576697077727323e-07, 'epoch': 0.3029563330621101, 'num_input_tokens_seen': 4020240384, 'completed': '51.99% (1_917 / 3_687)', 'remaining time': '15:45:12', 'throughput': '8494.17', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:24:52,410 >> {'loss': 0.4983, 'grad_norm': 18.4853515625, 'learning_rate': 5.572660750025049e-07, 'epoch': 0.3032275562788175, 'num_input_tokens_seen': 4022337536, 'completed': '52.02% (1_918 / 3_687)', 'remaining time': '15:44:37', 'throughput': '8688.89', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:25:25,310 >> {'loss': 0.3819, 'grad_norm': 15.520868301391602, 'learning_rate': 5.568624363848166e-07, 'epoch': 0.30349877949552484, 'num_input_tokens_seen': 4024434688, 'completed': '52.05% (1_919 / 3_687)', 'remaining time': '15:44:06', 'throughput': '7967.96', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:25:56,294 >> {'loss': 0.2858, 'grad_norm': 10.888456344604492, 'learning_rate': 5.564587922445008e-07, 'epoch': 0.30377000271223215, 'num_input_tokens_seen': 4026531840, 'completed': '52.07% (1_920 / 3_687)', 'remaining time': '15:43:32', 'throughput': '8460.63', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:26:30,073 >> {'loss': 0.3409, 'grad_norm': 13.872169494628906, 'learning_rate': 5.560551429063949e-07, 'epoch': 0.3040412259289395, 'num_input_tokens_seen': 4028628992, 'completed': '52.10% (1_921 / 3_687)', 'remaining time': '15:43:03', 'throughput': '7760.66', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:26:58,298 >> {'loss': 0.7254, 'grad_norm': 22.901479721069336, 'learning_rate': 5.556514886953403e-07, 'epoch': 0.30431244914564687, 'num_input_tokens_seen': 4030726144, 'completed': '52.13% (1_922 / 3_687)', 'remaining time': '15:42:25', 'throughput': '9287.51', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:27:30,150 >> {'loss': 0.4352, 'grad_norm': 15.373479843139648, 'learning_rate': 5.552478299361826e-07, 'epoch': 0.30458367236235423, 'num_input_tokens_seen': 4032823296, 'completed': '52.16% (1_923 / 3_687)', 'remaining time': '15:41:53', 'throughput': '8230.05', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:28:00,922 >> {'loss': 0.6943, 'grad_norm': 17.427900314331055, 'learning_rate': 5.548441669537712e-07, 'epoch': 0.3048548955790616, 'num_input_tokens_seen': 4034920448, 'completed': '52.18% (1_924 / 3_687)', 'remaining time': '15:41:19', 'throughput': '8518.99', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:28:30,047 >> {'loss': 0.5922, 'grad_norm': 20.171907424926758, 'learning_rate': 5.544405000729584e-07, 'epoch': 0.3051261187957689, 'num_input_tokens_seen': 4037017600, 'completed': '52.21% (1_925 / 3_687)', 'remaining time': '15:40:42', 'throughput': '9000.74', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:28:59,315 >> {'loss': 0.4632, 'grad_norm': 15.804996490478516, 'learning_rate': 5.540368296186002e-07, 'epoch': 0.30539734201247626, 'num_input_tokens_seen': 4039114752, 'completed': '52.24% (1_926 / 3_687)', 'remaining time': '15:40:06', 'throughput': '8956.51', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:29:29,123 >> {'loss': 0.3587, 'grad_norm': 13.54218864440918, 'learning_rate': 5.53633155915555e-07, 'epoch': 0.3056685652291836, 'num_input_tokens_seen': 4041211904, 'completed': '52.26% (1_927 / 3_687)', 'remaining time': '15:39:30', 'throughput': '8794.61', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:30:00,195 >> {'loss': 0.8946, 'grad_norm': 24.575578689575195, 'learning_rate': 5.532294792886843e-07, 'epoch': 0.305939788445891, 'num_input_tokens_seen': 4043309056, 'completed': '52.29% (1_928 / 3_687)', 'remaining time': '15:38:57', 'throughput': '8436.64', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:30:31,640 >> {'loss': 0.4468, 'grad_norm': 18.244264602661133, 'learning_rate': 5.528258000628518e-07, 'epoch': 0.30621101166259834, 'num_input_tokens_seen': 4045406208, 'completed': '52.32% (1_929 / 3_687)', 'remaining time': '15:38:24', 'throughput': '8336.50', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:30:59,061 >> {'loss': 0.5916, 'grad_norm': 18.432344436645508, 'learning_rate': 5.524221185629224e-07, 'epoch': 0.30648223487930565, 'num_input_tokens_seen': 4047503360, 'completed': '52.35% (1_930 / 3_687)', 'remaining time': '15:37:45', 'throughput': '9559.85', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:31:29,188 >> {'loss': 0.428, 'grad_norm': 14.628120422363281, 'learning_rate': 5.520184351137646e-07, 'epoch': 0.306753458096013, 'num_input_tokens_seen': 4049600512, 'completed': '52.37% (1_931 / 3_687)', 'remaining time': '15:37:10', 'throughput': '8701.52', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:32:01,545 >> {'loss': 0.3338, 'grad_norm': 15.325323104858398, 'learning_rate': 5.51614750040247e-07, 'epoch': 0.30702468131272037, 'num_input_tokens_seen': 4051697664, 'completed': '52.40% (1_932 / 3_687)', 'remaining time': '15:36:38', 'throughput': '8101.45', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:32:31,887 >> {'loss': 0.5334, 'grad_norm': 18.96306037902832, 'learning_rate': 5.512110636672396e-07, 'epoch': 0.30729590452942773, 'num_input_tokens_seen': 4053794816, 'completed': '52.43% (1_933 / 3_687)', 'remaining time': '15:36:03', 'throughput': '8639.56', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:33:02,432 >> {'loss': 0.405, 'grad_norm': 16.209426879882812, 'learning_rate': 5.508073763196146e-07, 'epoch': 0.3075671277461351, 'num_input_tokens_seen': 4055891968, 'completed': '52.45% (1_934 / 3_687)', 'remaining time': '15:35:29', 'throughput': '8582.26', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:33:32,336 >> {'loss': 0.4682, 'grad_norm': 17.50925064086914, 'learning_rate': 5.504036883222438e-07, 'epoch': 0.3078383509628424, 'num_input_tokens_seen': 4057989120, 'completed': '52.48% (1_935 / 3_687)', 'remaining time': '15:34:54', 'throughput': '8766.26', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:34:07,142 >> {'loss': 0.6966, 'grad_norm': 21.680133819580078, 'learning_rate': 5.5e-07, 'epoch': 0.30810957417954976, 'num_input_tokens_seen': 4060086272, 'completed': '52.51% (1_936 / 3_687)', 'remaining time': '15:34:26', 'throughput': '7531.56', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:34:36,619 >> {'loss': 0.3213, 'grad_norm': 16.945390701293945, 'learning_rate': 5.495963116777562e-07, 'epoch': 0.3083807973962571, 'num_input_tokens_seen': 4062183424, 'completed': '52.54% (1_937 / 3_687)', 'remaining time': '15:33:50', 'throughput': '8893.10', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:35:07,271 >> {'loss': 0.6936, 'grad_norm': 19.299571990966797, 'learning_rate': 5.491926236803854e-07, 'epoch': 0.3086520206129645, 'num_input_tokens_seen': 4064280576, 'completed': '52.56% (1_938 / 3_687)', 'remaining time': '15:33:16', 'throughput': '8552.37', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:35:41,302 >> {'loss': 0.5167, 'grad_norm': 16.683467864990234, 'learning_rate': 5.487889363327603e-07, 'epoch': 0.30892324382967185, 'num_input_tokens_seen': 4066377728, 'completed': '52.59% (1_939 / 3_687)', 'remaining time': '15:32:47', 'throughput': '7703.28', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:36:14,584 >> {'loss': 0.308, 'grad_norm': 11.593538284301758, 'learning_rate': 5.483852499597532e-07, 'epoch': 0.30919446704637915, 'num_input_tokens_seen': 4068474880, 'completed': '52.62% (1_940 / 3_687)', 'remaining time': '15:32:17', 'throughput': '7876.25', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:36:45,496 >> {'loss': 0.6182, 'grad_norm': 21.42290687561035, 'learning_rate': 5.479815648862355e-07, 'epoch': 0.3094656902630865, 'num_input_tokens_seen': 4070572032, 'completed': '52.64% (1_941 / 3_687)', 'remaining time': '15:31:43', 'throughput': '8480.48', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:37:17,417 >> {'loss': 0.2313, 'grad_norm': 10.256348609924316, 'learning_rate': 5.475778814370776e-07, 'epoch': 0.3097369134797939, 'num_input_tokens_seen': 4072669184, 'completed': '52.67% (1_942 / 3_687)', 'remaining time': '15:31:11', 'throughput': '8212.25', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:37:48,371 >> {'loss': 0.3522, 'grad_norm': 12.884416580200195, 'learning_rate': 5.471741999371483e-07, 'epoch': 0.31000813669650124, 'num_input_tokens_seen': 4074766336, 'completed': '52.70% (1_943 / 3_687)', 'remaining time': '15:30:38', 'throughput': '8468.74', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:38:17,584 >> {'loss': 0.3817, 'grad_norm': 15.000761032104492, 'learning_rate': 5.467705207113156e-07, 'epoch': 0.3102793599132086, 'num_input_tokens_seen': 4076863488, 'completed': '52.73% (1_944 / 3_687)', 'remaining time': '15:30:01', 'throughput': '8973.49', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:38:47,899 >> {'loss': 0.5017, 'grad_norm': 18.647953033447266, 'learning_rate': 5.46366844084445e-07, 'epoch': 0.3105505831299159, 'num_input_tokens_seen': 4078960640, 'completed': '52.75% (1_945 / 3_687)', 'remaining time': '15:29:27', 'throughput': '8647.24', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:39:15,118 >> {'loss': 0.4558, 'grad_norm': 16.23006820678711, 'learning_rate': 5.459631703813998e-07, 'epoch': 0.31082180634662326, 'num_input_tokens_seen': 4081057792, 'completed': '52.78% (1_946 / 3_687)', 'remaining time': '15:28:47', 'throughput': '9630.91', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:39:45,536 >> {'loss': 0.3032, 'grad_norm': 14.229667663574219, 'learning_rate': 5.455594999270416e-07, 'epoch': 0.3110930295633306, 'num_input_tokens_seen': 4083154944, 'completed': '52.81% (1_947 / 3_687)', 'remaining time': '15:28:13', 'throughput': '8618.11', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:40:17,046 >> {'loss': 0.5615, 'grad_norm': 22.243261337280273, 'learning_rate': 5.451558330462289e-07, 'epoch': 0.311364252780038, 'num_input_tokens_seen': 4085252096, 'completed': '52.83% (1_948 / 3_687)', 'remaining time': '15:27:40', 'throughput': '8319.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:40:46,459 >> {'loss': 0.4128, 'grad_norm': 16.879133224487305, 'learning_rate': 5.447521700638174e-07, 'epoch': 0.31163547599674535, 'num_input_tokens_seen': 4087349248, 'completed': '52.86% (1_949 / 3_687)', 'remaining time': '15:27:04', 'throughput': '8912.63', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:41:17,013 >> {'loss': 0.4147, 'grad_norm': 17.294158935546875, 'learning_rate': 5.443485113046597e-07, 'epoch': 0.31190669921345265, 'num_input_tokens_seen': 4089446400, 'completed': '52.89% (1_950 / 3_687)', 'remaining time': '15:26:30', 'throughput': '8579.75', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:41:46,907 >> {'loss': 0.3165, 'grad_norm': 12.599308967590332, 'learning_rate': 5.439448570936052e-07, 'epoch': 0.31217792243016, 'num_input_tokens_seen': 4091543552, 'completed': '52.92% (1_951 / 3_687)', 'remaining time': '15:25:55', 'throughput': '8769.13', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:42:16,783 >> {'loss': 0.4091, 'grad_norm': 16.82218360900879, 'learning_rate': 5.435412077554991e-07, 'epoch': 0.3124491456468674, 'num_input_tokens_seen': 4093640704, 'completed': '52.94% (1_952 / 3_687)', 'remaining time': '15:25:20', 'throughput': '8774.31', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:42:45,757 >> {'loss': 0.4083, 'grad_norm': 15.809752464294434, 'learning_rate': 5.431375636151833e-07, 'epoch': 0.31272036886357474, 'num_input_tokens_seen': 4095737856, 'completed': '52.97% (1_953 / 3_687)', 'remaining time': '15:24:43', 'throughput': '9047.65', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:43:15,094 >> {'loss': 0.8117, 'grad_norm': 24.349023818969727, 'learning_rate': 5.427339249974952e-07, 'epoch': 0.3129915920802821, 'num_input_tokens_seen': 4097835008, 'completed': '53.00% (1_954 / 3_687)', 'remaining time': '15:24:07', 'throughput': '8935.57', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:43:45,733 >> {'loss': 0.4125, 'grad_norm': 15.348997116088867, 'learning_rate': 5.423302922272677e-07, 'epoch': 0.3132628152969894, 'num_input_tokens_seen': 4099932160, 'completed': '53.02% (1_955 / 3_687)', 'remaining time': '15:23:33', 'throughput': '8555.93', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:44:17,696 >> {'loss': 0.6195, 'grad_norm': 21.631223678588867, 'learning_rate': 5.419266656293293e-07, 'epoch': 0.31353403851369677, 'num_input_tokens_seen': 4102029312, 'completed': '53.05% (1_956 / 3_687)', 'remaining time': '15:23:01', 'throughput': '8201.25', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:44:46,115 >> {'loss': 0.4847, 'grad_norm': 17.50019645690918, 'learning_rate': 5.415230455285031e-07, 'epoch': 0.31380526173040413, 'num_input_tokens_seen': 4104126464, 'completed': '53.08% (1_957 / 3_687)', 'remaining time': '15:22:24', 'throughput': '9224.39', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:45:16,497 >> {'loss': 0.3183, 'grad_norm': 12.668447494506836, 'learning_rate': 5.411194322496073e-07, 'epoch': 0.3140764849471115, 'num_input_tokens_seen': 4106223616, 'completed': '53.11% (1_958 / 3_687)', 'remaining time': '15:21:49', 'throughput': '8628.37', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:45:47,586 >> {'loss': 0.2835, 'grad_norm': 14.355048179626465, 'learning_rate': 5.407158261174543e-07, 'epoch': 0.3143477081638188, 'num_input_tokens_seen': 4108320768, 'completed': '53.13% (1_959 / 3_687)', 'remaining time': '15:21:16', 'throughput': '8432.00', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:46:21,705 >> {'loss': 0.2686, 'grad_norm': 13.801246643066406, 'learning_rate': 5.403122274568514e-07, 'epoch': 0.31461893138052616, 'num_input_tokens_seen': 4110417920, 'completed': '53.16% (1_960 / 3_687)', 'remaining time': '15:20:47', 'throughput': '7683.12', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:46:51,022 >> {'loss': 0.8419, 'grad_norm': 23.078720092773438, 'learning_rate': 5.399086365925991e-07, 'epoch': 0.3148901545972335, 'num_input_tokens_seen': 4112515072, 'completed': '53.19% (1_961 / 3_687)', 'remaining time': '15:20:11', 'throughput': '8941.68', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:47:23,468 >> {'loss': 0.3, 'grad_norm': 13.70639419555664, 'learning_rate': 5.395050538494922e-07, 'epoch': 0.3151613778139409, 'num_input_tokens_seen': 4114612224, 'completed': '53.21% (1_962 / 3_687)', 'remaining time': '15:19:40', 'throughput': '8079.48', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:47:52,401 >> {'loss': 0.2514, 'grad_norm': 11.593742370605469, 'learning_rate': 5.391014795523184e-07, 'epoch': 0.31543260103064824, 'num_input_tokens_seen': 4116709376, 'completed': '53.24% (1_963 / 3_687)', 'remaining time': '15:19:03', 'throughput': '9060.25', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:48:26,541 >> {'loss': 0.5374, 'grad_norm': 17.700437545776367, 'learning_rate': 5.386979140258592e-07, 'epoch': 0.31570382424735555, 'num_input_tokens_seen': 4118806528, 'completed': '53.27% (1_964 / 3_687)', 'remaining time': '15:18:35', 'throughput': '7678.57', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:48:58,673 >> {'loss': 0.3823, 'grad_norm': 13.762934684753418, 'learning_rate': 5.382943575948888e-07, 'epoch': 0.3159750474640629, 'num_input_tokens_seen': 4120903680, 'completed': '53.30% (1_965 / 3_687)', 'remaining time': '15:18:03', 'throughput': '8158.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:49:30,071 >> {'loss': 0.5349, 'grad_norm': 21.619094848632812, 'learning_rate': 5.378908105841738e-07, 'epoch': 0.31624627068077027, 'num_input_tokens_seen': 4123000832, 'completed': '53.32% (1_966 / 3_687)', 'remaining time': '15:17:30', 'throughput': '8349.31', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:50:02,621 >> {'loss': 0.4939, 'grad_norm': 16.5258731842041, 'learning_rate': 5.374872733184737e-07, 'epoch': 0.31651749389747763, 'num_input_tokens_seen': 4125097984, 'completed': '53.35% (1_967 / 3_687)', 'remaining time': '15:16:59', 'throughput': '8053.60', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:50:30,809 >> {'loss': 0.6686, 'grad_norm': 19.478639602661133, 'learning_rate': 5.370837461225402e-07, 'epoch': 0.316788717114185, 'num_input_tokens_seen': 4127195136, 'completed': '53.38% (1_968 / 3_687)', 'remaining time': '15:16:21', 'throughput': '9299.74', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:51:01,494 >> {'loss': 0.3994, 'grad_norm': 15.19251537322998, 'learning_rate': 5.36680229321116e-07, 'epoch': 0.3170599403308923, 'num_input_tokens_seen': 4129292288, 'completed': '53.40% (1_969 / 3_687)', 'remaining time': '15:15:47', 'throughput': '8543.14', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:51:30,546 >> {'loss': 0.5067, 'grad_norm': 18.16364288330078, 'learning_rate': 5.362767232389365e-07, 'epoch': 0.31733116354759966, 'num_input_tokens_seen': 4131389440, 'completed': '53.43% (1_970 / 3_687)', 'remaining time': '15:15:11', 'throughput': '9023.02', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:52:00,985 >> {'loss': 0.3757, 'grad_norm': 13.165485382080078, 'learning_rate': 5.358732282007278e-07, 'epoch': 0.317602386764307, 'num_input_tokens_seen': 4133486592, 'completed': '53.46% (1_971 / 3_687)', 'remaining time': '15:14:37', 'throughput': '8612.23', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:52:33,475 >> {'loss': 0.5601, 'grad_norm': 17.412708282470703, 'learning_rate': 5.354697445312074e-07, 'epoch': 0.3178736099810144, 'num_input_tokens_seen': 4135583744, 'completed': '53.49% (1_972 / 3_687)', 'remaining time': '15:14:06', 'throughput': '8068.39', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:53:03,442 >> {'loss': 0.6123, 'grad_norm': 16.720489501953125, 'learning_rate': 5.35066272555084e-07, 'epoch': 0.31814483319772174, 'num_input_tokens_seen': 4137680896, 'completed': '53.51% (1_973 / 3_687)', 'remaining time': '15:13:31', 'throughput': '8747.80', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:53:35,586 >> {'loss': 0.401, 'grad_norm': 16.069841384887695, 'learning_rate': 5.346628125970562e-07, 'epoch': 0.31841605641442905, 'num_input_tokens_seen': 4139778048, 'completed': '53.54% (1_974 / 3_687)', 'remaining time': '15:12:59', 'throughput': '8155.28', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:54:06,934 >> {'loss': 0.6885, 'grad_norm': 25.234981536865234, 'learning_rate': 5.342593649818129e-07, 'epoch': 0.3186872796311364, 'num_input_tokens_seen': 4141875200, 'completed': '53.57% (1_975 / 3_687)', 'remaining time': '15:12:26', 'throughput': '8362.40', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:54:38,555 >> {'loss': 0.4314, 'grad_norm': 14.604377746582031, 'learning_rate': 5.338559300340338e-07, 'epoch': 0.3189585028478438, 'num_input_tokens_seen': 4143972352, 'completed': '53.59% (1_976 / 3_687)', 'remaining time': '15:11:54', 'throughput': '8290.30', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:55:10,125 >> {'loss': 0.4862, 'grad_norm': 17.683347702026367, 'learning_rate': 5.334525080783879e-07, 'epoch': 0.31922972606455113, 'num_input_tokens_seen': 4146069504, 'completed': '53.62% (1_977 / 3_687)', 'remaining time': '15:11:21', 'throughput': '8303.52', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:55:40,428 >> {'loss': 0.2884, 'grad_norm': 15.543233871459961, 'learning_rate': 5.330490994395338e-07, 'epoch': 0.3195009492812585, 'num_input_tokens_seen': 4148166656, 'completed': '53.65% (1_978 / 3_687)', 'remaining time': '15:10:47', 'throughput': '8650.62', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:56:11,200 >> {'loss': 0.5094, 'grad_norm': 16.377304077148438, 'learning_rate': 5.326457044421195e-07, 'epoch': 0.3197721724979658, 'num_input_tokens_seen': 4150263808, 'completed': '53.68% (1_979 / 3_687)', 'remaining time': '15:10:13', 'throughput': '8519.08', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:56:42,318 >> {'loss': 0.4386, 'grad_norm': 14.058331489562988, 'learning_rate': 5.322423234107818e-07, 'epoch': 0.32004339571467316, 'num_input_tokens_seen': 4152360960, 'completed': '53.70% (1_980 / 3_687)', 'remaining time': '15:09:40', 'throughput': '8424.23', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:57:15,401 >> {'loss': 0.5509, 'grad_norm': 20.853776931762695, 'learning_rate': 5.318389566701465e-07, 'epoch': 0.3203146189313805, 'num_input_tokens_seen': 4154458112, 'completed': '53.73% (1_981 / 3_687)', 'remaining time': '15:09:09', 'throughput': '7923.80', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:57:44,957 >> {'loss': 0.403, 'grad_norm': 17.144161224365234, 'learning_rate': 5.314356045448278e-07, 'epoch': 0.3205858421480879, 'num_input_tokens_seen': 4156555264, 'completed': '53.76% (1_982 / 3_687)', 'remaining time': '15:08:34', 'throughput': '8869.31', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:58:14,983 >> {'loss': 0.4065, 'grad_norm': 17.98135757446289, 'learning_rate': 5.310322673594282e-07, 'epoch': 0.32085706536479525, 'num_input_tokens_seen': 4158652416, 'completed': '53.78% (1_983 / 3_687)', 'remaining time': '15:07:59', 'throughput': '8730.64', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:58:46,692 >> {'loss': 0.2229, 'grad_norm': 9.543654441833496, 'learning_rate': 5.306289454385379e-07, 'epoch': 0.32112828858150255, 'num_input_tokens_seen': 4160749568, 'completed': '53.81% (1_984 / 3_687)', 'remaining time': '15:07:27', 'throughput': '8267.03', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:59:17,385 >> {'loss': 0.4597, 'grad_norm': 15.082742691040039, 'learning_rate': 5.302256391067354e-07, 'epoch': 0.3213995117982099, 'num_input_tokens_seen': 4162846720, 'completed': '53.84% (1_985 / 3_687)', 'remaining time': '15:06:53', 'throughput': '8540.84', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 01:59:49,529 >> {'loss': 0.4349, 'grad_norm': 14.424219131469727, 'learning_rate': 5.29822348688586e-07, 'epoch': 0.3216707350149173, 'num_input_tokens_seen': 4164943872, 'completed': '53.86% (1_986 / 3_687)', 'remaining time': '15:06:21', 'throughput': '8155.49', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:00:22,894 >> {'loss': 0.4877, 'grad_norm': 17.661540985107422, 'learning_rate': 5.294190745086426e-07, 'epoch': 0.32194195823162464, 'num_input_tokens_seen': 4167041024, 'completed': '53.89% (1_987 / 3_687)', 'remaining time': '15:05:51', 'throughput': '7856.69', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:00:52,013 >> {'loss': 0.6855, 'grad_norm': 21.022674560546875, 'learning_rate': 5.290158168914447e-07, 'epoch': 0.322213181448332, 'num_input_tokens_seen': 4169138176, 'completed': '53.92% (1_988 / 3_687)', 'remaining time': '15:05:15', 'throughput': '9002.50', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:01:21,146 >> {'loss': 0.616, 'grad_norm': 19.004026412963867, 'learning_rate': 5.28612576161519e-07, 'epoch': 0.3224844046650393, 'num_input_tokens_seen': 4171235328, 'completed': '53.95% (1_989 / 3_687)', 'remaining time': '15:04:39', 'throughput': '8998.12', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:01:49,817 >> {'loss': 0.5411, 'grad_norm': 16.85509490966797, 'learning_rate': 5.282093526433781e-07, 'epoch': 0.32275562788174667, 'num_input_tokens_seen': 4173332480, 'completed': '53.97% (1_990 / 3_687)', 'remaining time': '15:04:02', 'throughput': '9143.47', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:02:19,876 >> {'loss': 0.2902, 'grad_norm': 14.019049644470215, 'learning_rate': 5.27806146661521e-07, 'epoch': 0.323026851098454, 'num_input_tokens_seen': 4175429632, 'completed': '54.00% (1_991 / 3_687)', 'remaining time': '15:03:28', 'throughput': '8720.96', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:02:51,169 >> {'loss': 0.3034, 'grad_norm': 16.84299087524414, 'learning_rate': 5.274029585404326e-07, 'epoch': 0.3232980743151614, 'num_input_tokens_seen': 4177526784, 'completed': '54.03% (1_992 / 3_687)', 'remaining time': '15:02:55', 'throughput': '8377.10', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:03:25,753 >> {'loss': 0.3721, 'grad_norm': 13.026562690734863, 'learning_rate': 5.269997886045833e-07, 'epoch': 0.32356929753186875, 'num_input_tokens_seen': 4179623936, 'completed': '54.05% (1_993 / 3_687)', 'remaining time': '15:02:27', 'throughput': '7579.81', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:03:55,120 >> {'loss': 0.5713, 'grad_norm': 16.872188568115234, 'learning_rate': 5.265966371784287e-07, 'epoch': 0.32384052074857606, 'num_input_tokens_seen': 4181721088, 'completed': '54.08% (1_994 / 3_687)', 'remaining time': '15:01:51', 'throughput': '8926.51', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:04:25,783 >> {'loss': 0.2909, 'grad_norm': 16.673782348632812, 'learning_rate': 5.261935045864101e-07, 'epoch': 0.3241117439652834, 'num_input_tokens_seen': 4183818240, 'completed': '54.11% (1_995 / 3_687)', 'remaining time': '15:01:17', 'throughput': '8549.05', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:04:57,508 >> {'loss': 0.4015, 'grad_norm': 14.819953918457031, 'learning_rate': 5.257903911529532e-07, 'epoch': 0.3243829671819908, 'num_input_tokens_seen': 4185915392, 'completed': '54.14% (1_996 / 3_687)', 'remaining time': '15:00:45', 'throughput': '8263.08', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:05:27,109 >> {'loss': 0.2481, 'grad_norm': 13.379781723022461, 'learning_rate': 5.253872972024681e-07, 'epoch': 0.32465419039869814, 'num_input_tokens_seen': 4188012544, 'completed': '54.16% (1_997 / 3_687)', 'remaining time': '15:00:10', 'throughput': '8856.08', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:05:55,192 >> {'loss': 0.376, 'grad_norm': 14.248574256896973, 'learning_rate': 5.249842230593497e-07, 'epoch': 0.3249254136154055, 'num_input_tokens_seen': 4190109696, 'completed': '54.19% (1_998 / 3_687)', 'remaining time': '14:59:32', 'throughput': '9334.35', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:06:26,095 >> {'loss': 0.3937, 'grad_norm': 22.307329177856445, 'learning_rate': 5.245811690479766e-07, 'epoch': 0.3251966368321128, 'num_input_tokens_seen': 4192206848, 'completed': '54.22% (1_999 / 3_687)', 'remaining time': '14:58:59', 'throughput': '8483.01', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:06:56,437 >> {'loss': 0.3849, 'grad_norm': 14.40904426574707, 'learning_rate': 5.241781354927113e-07, 'epoch': 0.32546786004882017, 'num_input_tokens_seen': 4194304000, 'completed': '54.24% (2_000 / 3_687)', 'remaining time': '14:58:24', 'throughput': '8639.61', 'gpu_mem_free': '5581MB'} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( [INFO|trainer.py:3503] 2024-12-26 02:07:20,350 >> Saving model checkpoint to /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2000 [INFO|configuration_utils.py:472] 2024-12-26 02:07:20,353 >> Configuration saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2000/config.json [INFO|configuration_utils.py:807] 2024-12-26 02:07:20,354 >> Configuration saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2000/generation_config.json [INFO|modeling_utils.py:2807] 2024-12-26 02:08:16,686 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 7 checkpoint shards. You can find where each parameters has been saved in the index located at /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2000/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2684] 2024-12-26 02:08:16,690 >> tokenizer config file saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2000/tokenizer_config.json [INFO|tokenization_utils_base.py:2693] 2024-12-26 02:08:16,690 >> Special tokens file saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2000/special_tokens_map.json /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( [WARNING|trainer.py:868] 2024-12-26 02:12:10,689 >> Save streaming dataset state: {'epoch': 0, 'sample_in_epoch': 4000, 'num_canonical_nodes': 1, 'shuffle_seed': 42, 'initial_physical_nodes': 1} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] [INFO|trainer.py:175] 2024-12-26 02:12:42,740 >> {'loss': 0.3802, 'grad_norm': 18.996400833129883, 'learning_rate': 5.237751227179001e-07, 'epoch': 0.32573908326552753, 'num_input_tokens_seen': 4196401152, 'completed': '54.27% (2_001 / 3_687)', 'remaining time': '15:05:14', 'throughput': '756.98', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:13:13,335 >> {'loss': 0.5296, 'grad_norm': 17.524658203125, 'learning_rate': 5.233721310478722e-07, 'epoch': 0.3260103064822349, 'num_input_tokens_seen': 4198498304, 'completed': '54.30% (2_002 / 3_687)', 'remaining time': '15:04:39', 'throughput': '8568.32', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:13:43,119 >> {'loss': 0.2159, 'grad_norm': 11.911806106567383, 'learning_rate': 5.2296916080694e-07, 'epoch': 0.32628152969894225, 'num_input_tokens_seen': 4200595456, 'completed': '54.33% (2_003 / 3_687)', 'remaining time': '15:04:04', 'throughput': '8801.44', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:14:16,989 >> {'loss': 0.4125, 'grad_norm': 17.387310028076172, 'learning_rate': 5.225662123193985e-07, 'epoch': 0.32655275291564956, 'num_input_tokens_seen': 4202692608, 'completed': '54.35% (2_004 / 3_687)', 'remaining time': '15:03:34', 'throughput': '7739.64', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:14:47,142 >> {'loss': 0.3276, 'grad_norm': 17.1206111907959, 'learning_rate': 5.221632859095254e-07, 'epoch': 0.3268239761323569, 'num_input_tokens_seen': 4204789760, 'completed': '54.38% (2_005 / 3_687)', 'remaining time': '15:02:59', 'throughput': '8694.03', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:15:14,679 >> {'loss': 0.4545, 'grad_norm': 17.467138290405273, 'learning_rate': 5.217603819015805e-07, 'epoch': 0.3270951993490643, 'num_input_tokens_seen': 4206886912, 'completed': '54.41% (2_006 / 3_687)', 'remaining time': '15:02:20', 'throughput': '9519.54', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:15:46,881 >> {'loss': 0.7608, 'grad_norm': 22.18386459350586, 'learning_rate': 5.213575006198057e-07, 'epoch': 0.32736642256577164, 'num_input_tokens_seen': 4208984064, 'completed': '54.43% (2_007 / 3_687)', 'remaining time': '15:01:48', 'throughput': '8140.61', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:16:17,200 >> {'loss': 0.512, 'grad_norm': 18.72940444946289, 'learning_rate': 5.209546423884246e-07, 'epoch': 0.327637645782479, 'num_input_tokens_seen': 4211081216, 'completed': '54.46% (2_008 / 3_687)', 'remaining time': '15:01:13', 'throughput': '8646.07', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:16:48,696 >> {'loss': 0.3453, 'grad_norm': 15.777796745300293, 'learning_rate': 5.20551807531642e-07, 'epoch': 0.3279088689991863, 'num_input_tokens_seen': 4213178368, 'completed': '54.49% (2_009 / 3_687)', 'remaining time': '15:00:40', 'throughput': '8323.06', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:17:18,401 >> {'loss': 0.6098, 'grad_norm': 20.98493194580078, 'learning_rate': 5.20148996373644e-07, 'epoch': 0.32818009221589367, 'num_input_tokens_seen': 4215275520, 'completed': '54.52% (2_010 / 3_687)', 'remaining time': '15:00:04', 'throughput': '8824.94', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:17:48,462 >> {'loss': 0.3278, 'grad_norm': 13.770686149597168, 'learning_rate': 5.197462092385978e-07, 'epoch': 0.32845131543260103, 'num_input_tokens_seen': 4217372672, 'completed': '54.54% (2_011 / 3_687)', 'remaining time': '14:59:29', 'throughput': '8720.58', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:18:19,096 >> {'loss': 0.1932, 'grad_norm': 10.093859672546387, 'learning_rate': 5.193434464506511e-07, 'epoch': 0.3287225386493084, 'num_input_tokens_seen': 4219469824, 'completed': '54.57% (2_012 / 3_687)', 'remaining time': '14:58:55', 'throughput': '8557.32', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:18:48,920 >> {'loss': 0.3751, 'grad_norm': 16.35398292541504, 'learning_rate': 5.189407083339324e-07, 'epoch': 0.32899376186601575, 'num_input_tokens_seen': 4221566976, 'completed': '54.60% (2_013 / 3_687)', 'remaining time': '14:58:19', 'throughput': '8789.57', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:19:19,019 >> {'loss': 0.2063, 'grad_norm': 10.368598937988281, 'learning_rate': 5.185379952125494e-07, 'epoch': 0.32926498508272306, 'num_input_tokens_seen': 4223664128, 'completed': '54.62% (2_014 / 3_687)', 'remaining time': '14:57:44', 'throughput': '8709.29', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:19:49,808 >> {'loss': 0.3463, 'grad_norm': 15.229095458984375, 'learning_rate': 5.18135307410591e-07, 'epoch': 0.3295362082994304, 'num_input_tokens_seen': 4225761280, 'completed': '54.65% (2_015 / 3_687)', 'remaining time': '14:57:10', 'throughput': '8514.47', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:20:20,900 >> {'loss': 0.3193, 'grad_norm': 21.608442306518555, 'learning_rate': 5.177326452521242e-07, 'epoch': 0.3298074315161378, 'num_input_tokens_seen': 4227858432, 'completed': '54.68% (2_016 / 3_687)', 'remaining time': '14:56:36', 'throughput': '8431.05', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:20:51,508 >> {'loss': 0.3732, 'grad_norm': 16.298704147338867, 'learning_rate': 5.17330009061197e-07, 'epoch': 0.33007865473284514, 'num_input_tokens_seen': 4229955584, 'completed': '54.71% (2_017 / 3_687)', 'remaining time': '14:56:02', 'throughput': '8564.58', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:21:18,576 >> {'loss': 0.3864, 'grad_norm': 17.584896087646484, 'learning_rate': 5.169273991618351e-07, 'epoch': 0.3303498779495525, 'num_input_tokens_seen': 4232052736, 'completed': '54.73% (2_018 / 3_687)', 'remaining time': '14:55:23', 'throughput': '9684.54', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:21:52,464 >> {'loss': 0.3894, 'grad_norm': 17.949689865112305, 'learning_rate': 5.165248158780441e-07, 'epoch': 0.3306211011662598, 'num_input_tokens_seen': 4234149888, 'completed': '54.76% (2_019 / 3_687)', 'remaining time': '14:54:53', 'throughput': '7735.78', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:22:19,602 >> {'loss': 0.7639, 'grad_norm': 22.562408447265625, 'learning_rate': 5.161222595338073e-07, 'epoch': 0.3308923243829672, 'num_input_tokens_seen': 4236247040, 'completed': '54.79% (2_020 / 3_687)', 'remaining time': '14:54:14', 'throughput': '9659.51', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:22:52,612 >> {'loss': 0.3913, 'grad_norm': 15.829678535461426, 'learning_rate': 5.157197304530869e-07, 'epoch': 0.33116354759967453, 'num_input_tokens_seen': 4238344192, 'completed': '54.81% (2_021 / 3_687)', 'remaining time': '14:53:43', 'throughput': '7941.47', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:23:23,094 >> {'loss': 0.4058, 'grad_norm': 19.865497589111328, 'learning_rate': 5.15317228959823e-07, 'epoch': 0.3314347708163819, 'num_input_tokens_seen': 4240441344, 'completed': '54.84% (2_022 / 3_687)', 'remaining time': '14:53:08', 'throughput': '8599.91', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:23:53,062 >> {'loss': 0.61, 'grad_norm': 20.204696655273438, 'learning_rate': 5.149147553779335e-07, 'epoch': 0.33170599403308926, 'num_input_tokens_seen': 4242538496, 'completed': '54.87% (2_023 / 3_687)', 'remaining time': '14:52:33', 'throughput': '8747.51', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:24:27,007 >> {'loss': 0.2718, 'grad_norm': 11.350019454956055, 'learning_rate': 5.145123100313137e-07, 'epoch': 0.33197721724979656, 'num_input_tokens_seen': 4244635648, 'completed': '54.90% (2_024 / 3_687)', 'remaining time': '14:52:03', 'throughput': '7722.59', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:24:57,757 >> {'loss': 0.4187, 'grad_norm': 20.93108558654785, 'learning_rate': 5.141098932438365e-07, 'epoch': 0.3322484404665039, 'num_input_tokens_seen': 4246732800, 'completed': '54.92% (2_025 / 3_687)', 'remaining time': '14:51:29', 'throughput': '8524.80', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:25:33,948 >> {'loss': 0.3879, 'grad_norm': 17.941852569580078, 'learning_rate': 5.137075053393512e-07, 'epoch': 0.3325196636832113, 'num_input_tokens_seen': 4248829952, 'completed': '54.95% (2_026 / 3_687)', 'remaining time': '14:51:02', 'throughput': '7243.52', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:26:03,935 >> {'loss': 0.6891, 'grad_norm': 23.98504638671875, 'learning_rate': 5.133051466416849e-07, 'epoch': 0.33279088689991865, 'num_input_tokens_seen': 4250927104, 'completed': '54.98% (2_027 / 3_687)', 'remaining time': '14:50:27', 'throughput': '8741.87', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:26:34,294 >> {'loss': 0.3502, 'grad_norm': 17.1292667388916, 'learning_rate': 5.129028174746399e-07, 'epoch': 0.333062110116626, 'num_input_tokens_seen': 4253024256, 'completed': '55.00% (2_028 / 3_687)', 'remaining time': '14:49:52', 'throughput': '8634.74', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:27:03,481 >> {'loss': 0.939, 'grad_norm': 23.580167770385742, 'learning_rate': 5.12500518161996e-07, 'epoch': 0.3333333333333333, 'num_input_tokens_seen': 4255121408, 'completed': '55.03% (2_029 / 3_687)', 'remaining time': '14:49:16', 'throughput': '8981.50', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:27:36,513 >> {'loss': 0.3974, 'grad_norm': 16.317161560058594, 'learning_rate': 5.120982490275081e-07, 'epoch': 0.3336045565500407, 'num_input_tokens_seen': 4257218560, 'completed': '55.06% (2_030 / 3_687)', 'remaining time': '14:48:45', 'throughput': '7936.15', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:28:05,360 >> {'loss': 0.7093, 'grad_norm': 19.563129425048828, 'learning_rate': 5.116960103949071e-07, 'epoch': 0.33387577976674804, 'num_input_tokens_seen': 4259315712, 'completed': '55.09% (2_031 / 3_687)', 'remaining time': '14:48:08', 'throughput': '9087.33', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:28:37,821 >> {'loss': 0.3047, 'grad_norm': 14.194764137268066, 'learning_rate': 5.112938025878994e-07, 'epoch': 0.3341470029834554, 'num_input_tokens_seen': 4261412864, 'completed': '55.11% (2_032 / 3_687)', 'remaining time': '14:47:37', 'throughput': '8075.67', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:29:07,036 >> {'loss': 0.2439, 'grad_norm': 12.254755973815918, 'learning_rate': 5.108916259301663e-07, 'epoch': 0.33441822620016276, 'num_input_tokens_seen': 4263510016, 'completed': '55.14% (2_033 / 3_687)', 'remaining time': '14:47:00', 'throughput': '8972.83', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:29:40,702 >> {'loss': 0.3246, 'grad_norm': 13.140419006347656, 'learning_rate': 5.104894807453647e-07, 'epoch': 0.33468944941687007, 'num_input_tokens_seen': 4265607168, 'completed': '55.17% (2_034 / 3_687)', 'remaining time': '14:46:30', 'throughput': '7786.67', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:30:10,034 >> {'loss': 0.5276, 'grad_norm': 18.896413803100586, 'learning_rate': 5.100873673571253e-07, 'epoch': 0.3349606726335774, 'num_input_tokens_seen': 4267704320, 'completed': '55.19% (2_035 / 3_687)', 'remaining time': '14:45:54', 'throughput': '8937.24', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:30:43,945 >> {'loss': 0.335, 'grad_norm': 13.808533668518066, 'learning_rate': 5.096852860890541e-07, 'epoch': 0.3352318958502848, 'num_input_tokens_seen': 4269801472, 'completed': '55.22% (2_036 / 3_687)', 'remaining time': '14:45:24', 'throughput': '7730.30', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:31:13,083 >> {'loss': 0.6184, 'grad_norm': 21.51602554321289, 'learning_rate': 5.092832372647304e-07, 'epoch': 0.33550311906699215, 'num_input_tokens_seen': 4271898624, 'completed': '55.25% (2_037 / 3_687)', 'remaining time': '14:44:48', 'throughput': '8996.57', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:31:45,133 >> {'loss': 0.519, 'grad_norm': 18.561599731445312, 'learning_rate': 5.08881221207708e-07, 'epoch': 0.3357743422836995, 'num_input_tokens_seen': 4273995776, 'completed': '55.28% (2_038 / 3_687)', 'remaining time': '14:44:16', 'throughput': '8179.35', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:32:16,343 >> {'loss': 0.2667, 'grad_norm': 15.325385093688965, 'learning_rate': 5.084792382415141e-07, 'epoch': 0.3360455655004068, 'num_input_tokens_seen': 4276092928, 'completed': '55.30% (2_039 / 3_687)', 'remaining time': '14:43:42', 'throughput': '8399.28', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:32:45,382 >> {'loss': 0.429, 'grad_norm': 15.208858489990234, 'learning_rate': 5.080772886896493e-07, 'epoch': 0.3363167887171142, 'num_input_tokens_seen': 4278190080, 'completed': '55.33% (2_040 / 3_687)', 'remaining time': '14:43:06', 'throughput': '9027.18', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:33:15,269 >> {'loss': 0.2648, 'grad_norm': 13.985490798950195, 'learning_rate': 5.076753728755871e-07, 'epoch': 0.33658801193382154, 'num_input_tokens_seen': 4280287232, 'completed': '55.36% (2_041 / 3_687)', 'remaining time': '14:42:31', 'throughput': '8771.32', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:33:45,371 >> {'loss': 0.3776, 'grad_norm': 19.45819664001465, 'learning_rate': 5.072734911227746e-07, 'epoch': 0.3368592351505289, 'num_input_tokens_seen': 4282384384, 'completed': '55.38% (2_042 / 3_687)', 'remaining time': '14:41:56', 'throughput': '8708.56', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:34:15,996 >> {'loss': 0.1879, 'grad_norm': 11.178092956542969, 'learning_rate': 5.068716437546305e-07, 'epoch': 0.33713045836723626, 'num_input_tokens_seen': 4284481536, 'completed': '55.41% (2_043 / 3_687)', 'remaining time': '14:41:22', 'throughput': '8559.84', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:34:49,613 >> {'loss': 0.5829, 'grad_norm': 19.116785049438477, 'learning_rate': 5.06469831094546e-07, 'epoch': 0.33740168158394357, 'num_input_tokens_seen': 4286578688, 'completed': '55.44% (2_044 / 3_687)', 'remaining time': '14:40:52', 'throughput': '7797.87', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:35:18,656 >> {'loss': 0.4346, 'grad_norm': 17.29070281982422, 'learning_rate': 5.060680534658852e-07, 'epoch': 0.33767290480065093, 'num_input_tokens_seen': 4288675840, 'completed': '55.47% (2_045 / 3_687)', 'remaining time': '14:40:15', 'throughput': '9025.96', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:35:48,760 >> {'loss': 0.3758, 'grad_norm': 11.40611457824707, 'learning_rate': 5.056663111919833e-07, 'epoch': 0.3379441280173583, 'num_input_tokens_seen': 4290772992, 'completed': '55.49% (2_046 / 3_687)', 'remaining time': '14:39:40', 'throughput': '8707.91', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:36:17,726 >> {'loss': 0.2305, 'grad_norm': 10.537538528442383, 'learning_rate': 5.052646045961471e-07, 'epoch': 0.33821535123406565, 'num_input_tokens_seen': 4292870144, 'completed': '55.52% (2_047 / 3_687)', 'remaining time': '14:39:04', 'throughput': '9050.24', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:36:46,914 >> {'loss': 0.3792, 'grad_norm': 15.972824096679688, 'learning_rate': 5.048629340016548e-07, 'epoch': 0.33848657445077296, 'num_input_tokens_seen': 4294967296, 'completed': '55.55% (2_048 / 3_687)', 'remaining time': '14:38:28', 'throughput': '8981.31', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:37:17,144 >> {'loss': 0.3346, 'grad_norm': 13.639211654663086, 'learning_rate': 5.044612997317557e-07, 'epoch': 0.3387577976674803, 'num_input_tokens_seen': 4297064448, 'completed': '55.57% (2_049 / 3_687)', 'remaining time': '14:37:53', 'throughput': '8671.52', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:37:48,662 >> {'loss': 0.6396, 'grad_norm': 17.853118896484375, 'learning_rate': 5.040597021096695e-07, 'epoch': 0.3390290208841877, 'num_input_tokens_seen': 4299161600, 'completed': '55.60% (2_050 / 3_687)', 'remaining time': '14:37:20', 'throughput': '8317.17', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:38:22,120 >> {'loss': 0.4423, 'grad_norm': 18.226207733154297, 'learning_rate': 5.03658141458587e-07, 'epoch': 0.33930024410089504, 'num_input_tokens_seen': 4301258752, 'completed': '55.63% (2_051 / 3_687)', 'remaining time': '14:36:50', 'throughput': '7835.11', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:38:54,167 >> {'loss': 0.3318, 'grad_norm': 14.695561408996582, 'learning_rate': 5.032566181016688e-07, 'epoch': 0.3395714673176024, 'num_input_tokens_seen': 4303355904, 'completed': '55.66% (2_052 / 3_687)', 'remaining time': '14:36:17', 'throughput': '8180.00', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:39:21,555 >> {'loss': 0.6093, 'grad_norm': 18.532405853271484, 'learning_rate': 5.028551323620458e-07, 'epoch': 0.3398426905343097, 'num_input_tokens_seen': 4305453056, 'completed': '55.68% (2_053 / 3_687)', 'remaining time': '14:35:39', 'throughput': '9571.58', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:39:51,695 >> {'loss': 0.4432, 'grad_norm': 15.558454513549805, 'learning_rate': 5.024536845628181e-07, 'epoch': 0.34011391375101707, 'num_input_tokens_seen': 4307550208, 'completed': '55.71% (2_054 / 3_687)', 'remaining time': '14:35:04', 'throughput': '8697.56', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:40:22,837 >> {'loss': 0.6549, 'grad_norm': 22.573936462402344, 'learning_rate': 5.020522750270558e-07, 'epoch': 0.34038513696772443, 'num_input_tokens_seen': 4309647360, 'completed': '55.74% (2_055 / 3_687)', 'remaining time': '14:34:31', 'throughput': '8417.65', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:40:52,720 >> {'loss': 0.27, 'grad_norm': 13.960949897766113, 'learning_rate': 5.01650904077798e-07, 'epoch': 0.3406563601844318, 'num_input_tokens_seen': 4311744512, 'completed': '55.76% (2_056 / 3_687)', 'remaining time': '14:33:56', 'throughput': '8772.35', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:41:22,211 >> {'loss': 0.6896, 'grad_norm': 19.82161521911621, 'learning_rate': 5.012495720380527e-07, 'epoch': 0.34092758340113916, 'num_input_tokens_seen': 4313841664, 'completed': '55.79% (2_057 / 3_687)', 'remaining time': '14:33:20', 'throughput': '8888.86', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:41:54,000 >> {'loss': 0.3401, 'grad_norm': 13.285265922546387, 'learning_rate': 5.008482792307968e-07, 'epoch': 0.34119880661784646, 'num_input_tokens_seen': 4315938816, 'completed': '55.82% (2_058 / 3_687)', 'remaining time': '14:32:48', 'throughput': '8246.62', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:42:26,964 >> {'loss': 0.3066, 'grad_norm': 12.160155296325684, 'learning_rate': 5.004470259789757e-07, 'epoch': 0.3414700298345538, 'num_input_tokens_seen': 4318035968, 'completed': '55.84% (2_059 / 3_687)', 'remaining time': '14:32:16', 'throughput': '7952.41', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:43:00,311 >> {'loss': 0.324, 'grad_norm': 12.414044380187988, 'learning_rate': 5.000458126055021e-07, 'epoch': 0.3417412530512612, 'num_input_tokens_seen': 4320133120, 'completed': '55.87% (2_060 / 3_687)', 'remaining time': '14:31:46', 'throughput': '7860.81', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:43:31,722 >> {'loss': 0.5181, 'grad_norm': 18.17831802368164, 'learning_rate': 4.996446394332578e-07, 'epoch': 0.34201247626796855, 'num_input_tokens_seen': 4322230272, 'completed': '55.90% (2_061 / 3_687)', 'remaining time': '14:31:13', 'throughput': '8345.69', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:44:04,322 >> {'loss': 0.3709, 'grad_norm': 14.670184135437012, 'learning_rate': 4.992435067850914e-07, 'epoch': 0.3422836994846759, 'num_input_tokens_seen': 4324327424, 'completed': '55.93% (2_062 / 3_687)', 'remaining time': '14:30:41', 'throughput': '8041.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:44:34,106 >> {'loss': 0.5221, 'grad_norm': 17.332820892333984, 'learning_rate': 4.988424149838192e-07, 'epoch': 0.3425549227013832, 'num_input_tokens_seen': 4326424576, 'completed': '55.95% (2_063 / 3_687)', 'remaining time': '14:30:06', 'throughput': '8801.63', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:45:04,121 >> {'loss': 0.7608, 'grad_norm': 22.735279083251953, 'learning_rate': 4.984413643522251e-07, 'epoch': 0.3428261459180906, 'num_input_tokens_seen': 4328521728, 'completed': '55.98% (2_064 / 3_687)', 'remaining time': '14:29:31', 'throughput': '8733.60', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:45:32,316 >> {'loss': 0.3522, 'grad_norm': 14.321521759033203, 'learning_rate': 4.980403552130589e-07, 'epoch': 0.34309736913479794, 'num_input_tokens_seen': 4330618880, 'completed': '56.01% (2_065 / 3_687)', 'remaining time': '14:28:54', 'throughput': '9297.54', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:46:01,482 >> {'loss': 0.4359, 'grad_norm': 16.680944442749023, 'learning_rate': 4.976393878890375e-07, 'epoch': 0.3433685923515053, 'num_input_tokens_seen': 4332716032, 'completed': '56.03% (2_066 / 3_687)', 'remaining time': '14:28:18', 'throughput': '8988.10', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:46:31,498 >> {'loss': 0.4501, 'grad_norm': 15.195837020874023, 'learning_rate': 4.972384627028442e-07, 'epoch': 0.34363981556821266, 'num_input_tokens_seen': 4334813184, 'completed': '56.06% (2_067 / 3_687)', 'remaining time': '14:27:43', 'throughput': '8733.56', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:47:04,062 >> {'loss': 0.3573, 'grad_norm': 13.431507110595703, 'learning_rate': 4.968375799771285e-07, 'epoch': 0.34391103878491996, 'num_input_tokens_seen': 4336910336, 'completed': '56.09% (2_068 / 3_687)', 'remaining time': '14:27:11', 'throughput': '8049.97', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:47:33,016 >> {'loss': 0.3762, 'grad_norm': 12.854487419128418, 'learning_rate': 4.964367400345053e-07, 'epoch': 0.3441822620016273, 'num_input_tokens_seen': 4339007488, 'completed': '56.12% (2_069 / 3_687)', 'remaining time': '14:26:35', 'throughput': '9053.97', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:48:02,436 >> {'loss': 0.7081, 'grad_norm': 17.647762298583984, 'learning_rate': 4.960359431975555e-07, 'epoch': 0.3444534852183347, 'num_input_tokens_seen': 4341104640, 'completed': '56.14% (2_070 / 3_687)', 'remaining time': '14:26:00', 'throughput': '8910.30', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:48:33,412 >> {'loss': 0.7699, 'grad_norm': 21.555879592895508, 'learning_rate': 4.956351897888256e-07, 'epoch': 0.34472470843504205, 'num_input_tokens_seen': 4343201792, 'completed': '56.17% (2_071 / 3_687)', 'remaining time': '14:25:26', 'throughput': '8462.71', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:49:01,236 >> {'loss': 0.5833, 'grad_norm': 17.82149887084961, 'learning_rate': 4.952344801308258e-07, 'epoch': 0.3449959316517494, 'num_input_tokens_seen': 4345298944, 'completed': '56.20% (2_072 / 3_687)', 'remaining time': '14:24:48', 'throughput': '9421.71', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:49:29,202 >> {'loss': 0.3707, 'grad_norm': 13.02746295928955, 'learning_rate': 4.948338145460324e-07, 'epoch': 0.3452671548684567, 'num_input_tokens_seen': 4347396096, 'completed': '56.22% (2_073 / 3_687)', 'remaining time': '14:24:11', 'throughput': '9373.66', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:49:59,422 >> {'loss': 0.6831, 'grad_norm': 18.583114624023438, 'learning_rate': 4.944331933568858e-07, 'epoch': 0.3455383780851641, 'num_input_tokens_seen': 4349493248, 'completed': '56.25% (2_074 / 3_687)', 'remaining time': '14:23:37', 'throughput': '8674.41', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:50:25,829 >> {'loss': 0.618, 'grad_norm': 18.889436721801758, 'learning_rate': 4.94032616885791e-07, 'epoch': 0.34580960130187144, 'num_input_tokens_seen': 4351590400, 'completed': '56.28% (2_075 / 3_687)', 'remaining time': '14:22:57', 'throughput': '9927.23', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:50:59,635 >> {'loss': 0.4065, 'grad_norm': 14.465885162353516, 'learning_rate': 4.936320854551163e-07, 'epoch': 0.3460808245185788, 'num_input_tokens_seen': 4353687552, 'completed': '56.31% (2_076 / 3_687)', 'remaining time': '14:22:27', 'throughput': '7754.26', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:51:30,948 >> {'loss': 0.2518, 'grad_norm': 11.185941696166992, 'learning_rate': 4.932315993871942e-07, 'epoch': 0.34635204773528616, 'num_input_tokens_seen': 4355784704, 'completed': '56.33% (2_077 / 3_687)', 'remaining time': '14:21:54', 'throughput': '8371.70', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:52:02,946 >> {'loss': 0.3067, 'grad_norm': 14.272321701049805, 'learning_rate': 4.928311590043206e-07, 'epoch': 0.34662327095199347, 'num_input_tokens_seen': 4357881856, 'completed': '56.36% (2_078 / 3_687)', 'remaining time': '14:21:22', 'throughput': '8192.55', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:52:33,017 >> {'loss': 0.6866, 'grad_norm': 23.103809356689453, 'learning_rate': 4.924307646287549e-07, 'epoch': 0.34689449416870083, 'num_input_tokens_seen': 4359979008, 'completed': '56.39% (2_079 / 3_687)', 'remaining time': '14:20:47', 'throughput': '8717.66', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:53:00,405 >> {'loss': 0.7426, 'grad_norm': 24.843984603881836, 'learning_rate': 4.920304165827192e-07, 'epoch': 0.3471657173854082, 'num_input_tokens_seen': 4362076160, 'completed': '56.41% (2_080 / 3_687)', 'remaining time': '14:20:09', 'throughput': '9571.46', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:53:30,433 >> {'loss': 0.4618, 'grad_norm': 15.894787788391113, 'learning_rate': 4.916301151883983e-07, 'epoch': 0.34743694060211555, 'num_input_tokens_seen': 4364173312, 'completed': '56.44% (2_081 / 3_687)', 'remaining time': '14:19:34', 'throughput': '8729.84', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:54:00,534 >> {'loss': 0.6288, 'grad_norm': 19.025161743164062, 'learning_rate': 4.912298607679396e-07, 'epoch': 0.3477081638188229, 'num_input_tokens_seen': 4366270464, 'completed': '56.47% (2_082 / 3_687)', 'remaining time': '14:19:00', 'throughput': '8708.98', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:54:29,404 >> {'loss': 0.4496, 'grad_norm': 17.685871124267578, 'learning_rate': 4.908296536434525e-07, 'epoch': 0.3479793870355302, 'num_input_tokens_seen': 4368367616, 'completed': '56.50% (2_083 / 3_687)', 'remaining time': '14:18:23', 'throughput': '9080.17', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:54:59,702 >> {'loss': 0.7546, 'grad_norm': 26.235916137695312, 'learning_rate': 4.904294941370086e-07, 'epoch': 0.3482506102522376, 'num_input_tokens_seen': 4370464768, 'completed': '56.52% (2_084 / 3_687)', 'remaining time': '14:17:49', 'throughput': '8651.95', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:55:29,186 >> {'loss': 0.2447, 'grad_norm': 12.8169584274292, 'learning_rate': 4.900293825706408e-07, 'epoch': 0.34852183346894494, 'num_input_tokens_seen': 4372561920, 'completed': '56.55% (2_085 / 3_687)', 'remaining time': '14:17:14', 'throughput': '8891.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:55:59,472 >> {'loss': 0.4664, 'grad_norm': 16.72005271911621, 'learning_rate': 4.89629319266344e-07, 'epoch': 0.3487930566856523, 'num_input_tokens_seen': 4374659072, 'completed': '56.58% (2_086 / 3_687)', 'remaining time': '14:16:39', 'throughput': '8655.49', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:56:32,021 >> {'loss': 0.3813, 'grad_norm': 14.927910804748535, 'learning_rate': 4.892293045460738e-07, 'epoch': 0.34906427990235966, 'num_input_tokens_seen': 4376756224, 'completed': '56.60% (2_087 / 3_687)', 'remaining time': '14:16:08', 'throughput': '8053.98', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:57:03,647 >> {'loss': 0.3661, 'grad_norm': 14.392501831054688, 'learning_rate': 4.88829338731747e-07, 'epoch': 0.34933550311906697, 'num_input_tokens_seen': 4378853376, 'completed': '56.63% (2_088 / 3_687)', 'remaining time': '14:15:35', 'throughput': '8288.86', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:57:33,723 >> {'loss': 0.3709, 'grad_norm': 15.27480697631836, 'learning_rate': 4.884294221452405e-07, 'epoch': 0.34960672633577433, 'num_input_tokens_seen': 4380950528, 'completed': '56.66% (2_089 / 3_687)', 'remaining time': '14:15:00', 'throughput': '8715.98', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:58:02,715 >> {'loss': 0.58, 'grad_norm': 16.14817237854004, 'learning_rate': 4.88029555108392e-07, 'epoch': 0.3498779495524817, 'num_input_tokens_seen': 4383047680, 'completed': '56.69% (2_090 / 3_687)', 'remaining time': '14:14:25', 'throughput': '9042.06', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:58:32,984 >> {'loss': 0.2754, 'grad_norm': 11.918871879577637, 'learning_rate': 4.876297379429995e-07, 'epoch': 0.35014917276918905, 'num_input_tokens_seen': 4385144832, 'completed': '56.71% (2_091 / 3_687)', 'remaining time': '14:13:50', 'throughput': '8660.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:59:02,391 >> {'loss': 0.453, 'grad_norm': 24.65549659729004, 'learning_rate': 4.872299709708206e-07, 'epoch': 0.3504203959858964, 'num_input_tokens_seen': 4387241984, 'completed': '56.74% (2_092 / 3_687)', 'remaining time': '14:13:15', 'throughput': '8914.28', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 02:59:36,672 >> {'loss': 0.3193, 'grad_norm': 13.130111694335938, 'learning_rate': 4.868302545135725e-07, 'epoch': 0.3506916192026037, 'num_input_tokens_seen': 4389339136, 'completed': '56.77% (2_093 / 3_687)', 'remaining time': '14:12:45', 'throughput': '7646.93', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:00:02,128 >> {'loss': 0.9149, 'grad_norm': 22.114221572875977, 'learning_rate': 4.864305888929318e-07, 'epoch': 0.3509628424193111, 'num_input_tokens_seen': 4391436288, 'completed': '56.79% (2_094 / 3_687)', 'remaining time': '14:12:05', 'throughput': '10297.88', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:00:31,615 >> {'loss': 0.7968, 'grad_norm': 22.851913452148438, 'learning_rate': 4.860309744305339e-07, 'epoch': 0.35123406563601844, 'num_input_tokens_seen': 4393533440, 'completed': '56.82% (2_095 / 3_687)', 'remaining time': '14:11:30', 'throughput': '8890.13', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:01:00,965 >> {'loss': 0.7142, 'grad_norm': 19.785030364990234, 'learning_rate': 4.856314114479738e-07, 'epoch': 0.3515052888527258, 'num_input_tokens_seen': 4395630592, 'completed': '56.85% (2_096 / 3_687)', 'remaining time': '14:10:54', 'throughput': '8931.84', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:01:32,028 >> {'loss': 0.3404, 'grad_norm': 13.180719375610352, 'learning_rate': 4.852319002668044e-07, 'epoch': 0.35177651206943317, 'num_input_tokens_seen': 4397727744, 'completed': '56.88% (2_097 / 3_687)', 'remaining time': '14:10:21', 'throughput': '8438.97', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:02:01,065 >> {'loss': 0.5252, 'grad_norm': 18.110231399536133, 'learning_rate': 4.848324412085367e-07, 'epoch': 0.3520477352861405, 'num_input_tokens_seen': 4399824896, 'completed': '56.90% (2_098 / 3_687)', 'remaining time': '14:09:45', 'throughput': '9028.02', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:02:30,035 >> {'loss': 0.4975, 'grad_norm': 18.176883697509766, 'learning_rate': 4.84433034594641e-07, 'epoch': 0.35231895850284783, 'num_input_tokens_seen': 4401922048, 'completed': '56.93% (2_099 / 3_687)', 'remaining time': '14:09:09', 'throughput': '9048.77', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:03:00,315 >> {'loss': 0.4609, 'grad_norm': 16.192501068115234, 'learning_rate': 4.840336807465439e-07, 'epoch': 0.3525901817195552, 'num_input_tokens_seen': 4404019200, 'completed': '56.96% (2_100 / 3_687)', 'remaining time': '14:08:35', 'throughput': '8657.41', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:03:29,913 >> {'loss': 0.4662, 'grad_norm': 15.730474472045898, 'learning_rate': 4.836343799856302e-07, 'epoch': 0.35286140493626256, 'num_input_tokens_seen': 4406116352, 'completed': '56.98% (2_101 / 3_687)', 'remaining time': '14:08:00', 'throughput': '8856.80', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:04:00,522 >> {'loss': 0.2086, 'grad_norm': 9.630202293395996, 'learning_rate': 4.83235132633242e-07, 'epoch': 0.3531326281529699, 'num_input_tokens_seen': 4408213504, 'completed': '57.01% (2_102 / 3_687)', 'remaining time': '14:07:26', 'throughput': '8564.56', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:04:31,386 >> {'loss': 0.2325, 'grad_norm': 10.681892395019531, 'learning_rate': 4.828359390106786e-07, 'epoch': 0.3534038513696772, 'num_input_tokens_seen': 4410310656, 'completed': '57.04% (2_103 / 3_687)', 'remaining time': '14:06:52', 'throughput': '8493.25', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:05:00,066 >> {'loss': 0.4176, 'grad_norm': 15.098684310913086, 'learning_rate': 4.824367994391958e-07, 'epoch': 0.3536750745863846, 'num_input_tokens_seen': 4412407808, 'completed': '57.07% (2_104 / 3_687)', 'remaining time': '14:06:16', 'throughput': '9140.09', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:05:32,104 >> {'loss': 0.2685, 'grad_norm': 11.902873039245605, 'learning_rate': 4.820377142400058e-07, 'epoch': 0.35394629780309195, 'num_input_tokens_seen': 4414504960, 'completed': '57.09% (2_105 / 3_687)', 'remaining time': '14:05:44', 'throughput': '8182.51', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:06:02,557 >> {'loss': 0.5198, 'grad_norm': 19.398834228515625, 'learning_rate': 4.816386837342771e-07, 'epoch': 0.3542175210197993, 'num_input_tokens_seen': 4416602112, 'completed': '57.12% (2_106 / 3_687)', 'remaining time': '14:05:10', 'throughput': '8608.04', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:06:34,621 >> {'loss': 0.5469, 'grad_norm': 21.086822509765625, 'learning_rate': 4.812397082431345e-07, 'epoch': 0.35448874423650667, 'num_input_tokens_seen': 4418699264, 'completed': '57.15% (2_107 / 3_687)', 'remaining time': '14:04:38', 'throughput': '8175.67', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:07:05,458 >> {'loss': 0.6044, 'grad_norm': 21.071117401123047, 'learning_rate': 4.808407880876581e-07, 'epoch': 0.354759967453214, 'num_input_tokens_seen': 4420796416, 'completed': '57.17% (2_108 / 3_687)', 'remaining time': '14:04:04', 'throughput': '8500.88', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:07:36,788 >> {'loss': 0.5796, 'grad_norm': 17.908489227294922, 'learning_rate': 4.804419235888838e-07, 'epoch': 0.35503119066992134, 'num_input_tokens_seen': 4422893568, 'completed': '57.20% (2_109 / 3_687)', 'remaining time': '14:03:31', 'throughput': '8367.30', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:08:09,238 >> {'loss': 0.4045, 'grad_norm': 15.03857135772705, 'learning_rate': 4.800431150678026e-07, 'epoch': 0.3553024138866287, 'num_input_tokens_seen': 4424990720, 'completed': '57.23% (2_110 / 3_687)', 'remaining time': '14:03:00', 'throughput': '8078.28', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:08:38,195 >> {'loss': 0.6847, 'grad_norm': 22.05478858947754, 'learning_rate': 4.796443628453603e-07, 'epoch': 0.35557363710333606, 'num_input_tokens_seen': 4427087872, 'completed': '57.26% (2_111 / 3_687)', 'remaining time': '14:02:24', 'throughput': '9052.98', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:09:05,421 >> {'loss': 0.5185, 'grad_norm': 18.630102157592773, 'learning_rate': 4.792456672424574e-07, 'epoch': 0.3558448603200434, 'num_input_tokens_seen': 4429185024, 'completed': '57.28% (2_112 / 3_687)', 'remaining time': '14:01:46', 'throughput': '9628.31', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:09:38,348 >> {'loss': 0.2514, 'grad_norm': 10.600038528442383, 'learning_rate': 4.788470285799492e-07, 'epoch': 0.3561160835367507, 'num_input_tokens_seen': 4431282176, 'completed': '57.31% (2_113 / 3_687)', 'remaining time': '14:01:15', 'throughput': '7961.50', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:10:08,581 >> {'loss': 0.3868, 'grad_norm': 15.612627029418945, 'learning_rate': 4.784484471786447e-07, 'epoch': 0.3563873067534581, 'num_input_tokens_seen': 4433379328, 'completed': '57.34% (2_114 / 3_687)', 'remaining time': '14:00:41', 'throughput': '8670.72', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:10:38,651 >> {'loss': 0.5039, 'grad_norm': 17.270404815673828, 'learning_rate': 4.780499233593073e-07, 'epoch': 0.35665852997016545, 'num_input_tokens_seen': 4435476480, 'completed': '57.36% (2_115 / 3_687)', 'remaining time': '14:00:06', 'throughput': '8717.86', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:11:11,776 >> {'loss': 0.4082, 'grad_norm': 17.62727928161621, 'learning_rate': 4.776514574426538e-07, 'epoch': 0.3569297531868728, 'num_input_tokens_seen': 4437573632, 'completed': '57.39% (2_116 / 3_687)', 'remaining time': '13:59:36', 'throughput': '7913.73', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:11:43,054 >> {'loss': 0.3864, 'grad_norm': 15.866716384887695, 'learning_rate': 4.772530497493541e-07, 'epoch': 0.3572009764035802, 'num_input_tokens_seen': 4439670784, 'completed': '57.42% (2_117 / 3_687)', 'remaining time': '13:59:03', 'throughput': '8381.00', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:12:13,103 >> {'loss': 0.505, 'grad_norm': 17.658111572265625, 'learning_rate': 4.768547006000317e-07, 'epoch': 0.3574721996202875, 'num_input_tokens_seen': 4441767936, 'completed': '57.45% (2_118 / 3_687)', 'remaining time': '13:58:28', 'throughput': '8723.84', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:12:43,474 >> {'loss': 0.5577, 'grad_norm': 17.13514518737793, 'learning_rate': 4.76456410315263e-07, 'epoch': 0.35774342283699484, 'num_input_tokens_seen': 4443865088, 'completed': '57.47% (2_119 / 3_687)', 'remaining time': '13:57:54', 'throughput': '8631.43', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:13:14,202 >> {'loss': 0.3055, 'grad_norm': 12.680811882019043, 'learning_rate': 4.7605817921557666e-07, 'epoch': 0.3580146460537022, 'num_input_tokens_seen': 4445962240, 'completed': '57.50% (2_120 / 3_687)', 'remaining time': '13:57:20', 'throughput': '8531.15', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:13:44,833 >> {'loss': 0.3165, 'grad_norm': 10.903252601623535, 'learning_rate': 4.756600076214541e-07, 'epoch': 0.35828586927040956, 'num_input_tokens_seen': 4448059392, 'completed': '57.53% (2_121 / 3_687)', 'remaining time': '13:56:47', 'throughput': '8558.13', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:14:12,025 >> {'loss': 0.489, 'grad_norm': 20.912986755371094, 'learning_rate': 4.7526189585332855e-07, 'epoch': 0.3585570924871169, 'num_input_tokens_seen': 4450156544, 'completed': '57.55% (2_122 / 3_687)', 'remaining time': '13:56:09', 'throughput': '9640.57', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:14:43,057 >> {'loss': 0.7097, 'grad_norm': 19.84836196899414, 'learning_rate': 4.748638442315851e-07, 'epoch': 0.35882831570382423, 'num_input_tokens_seen': 4452253696, 'completed': '57.58% (2_123 / 3_687)', 'remaining time': '13:55:35', 'throughput': '8447.63', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:15:14,427 >> {'loss': 0.4103, 'grad_norm': 14.485566139221191, 'learning_rate': 4.7446585307656074e-07, 'epoch': 0.3590995389205316, 'num_input_tokens_seen': 4454350848, 'completed': '57.61% (2_124 / 3_687)', 'remaining time': '13:55:03', 'throughput': '8356.47', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:15:47,712 >> {'loss': 0.6003, 'grad_norm': 17.28524398803711, 'learning_rate': 4.740679227085436e-07, 'epoch': 0.35937076213723895, 'num_input_tokens_seen': 4456448000, 'completed': '57.63% (2_125 / 3_687)', 'remaining time': '13:54:32', 'throughput': '7875.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:16:20,128 >> {'loss': 0.4924, 'grad_norm': 17.543121337890625, 'learning_rate': 4.7367005344777255e-07, 'epoch': 0.3596419853539463, 'num_input_tokens_seen': 4458545152, 'completed': '57.66% (2_126 / 3_687)', 'remaining time': '13:54:00', 'throughput': '8086.66', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:16:49,776 >> {'loss': 0.4224, 'grad_norm': 15.53244400024414, 'learning_rate': 4.732722456144381e-07, 'epoch': 0.3599132085706537, 'num_input_tokens_seen': 4460642304, 'completed': '57.69% (2_127 / 3_687)', 'remaining time': '13:53:25', 'throughput': '8841.98', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:17:22,269 >> {'loss': 0.4343, 'grad_norm': 15.329564094543457, 'learning_rate': 4.7287449952868084e-07, 'epoch': 0.360184431787361, 'num_input_tokens_seen': 4462739456, 'completed': '57.72% (2_128 / 3_687)', 'remaining time': '13:52:54', 'throughput': '8067.84', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:17:52,480 >> {'loss': 0.2533, 'grad_norm': 11.179370880126953, 'learning_rate': 4.724768155105914e-07, 'epoch': 0.36045565500406834, 'num_input_tokens_seen': 4464836608, 'completed': '57.74% (2_129 / 3_687)', 'remaining time': '13:52:20', 'throughput': '8677.12', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:18:22,042 >> {'loss': 0.6616, 'grad_norm': 19.77837371826172, 'learning_rate': 4.7207919388021076e-07, 'epoch': 0.3607268782207757, 'num_input_tokens_seen': 4466933760, 'completed': '57.77% (2_130 / 3_687)', 'remaining time': '13:51:45', 'throughput': '8867.48', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:18:53,908 >> {'loss': 0.4892, 'grad_norm': 18.568798065185547, 'learning_rate': 4.7168163495752977e-07, 'epoch': 0.36099810143748307, 'num_input_tokens_seen': 4469030912, 'completed': '57.80% (2_131 / 3_687)', 'remaining time': '13:51:12', 'throughput': '8226.47', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:19:25,652 >> {'loss': 0.5263, 'grad_norm': 21.5096492767334, 'learning_rate': 4.7128413906248885e-07, 'epoch': 0.36126932465419037, 'num_input_tokens_seen': 4471128064, 'completed': '57.82% (2_132 / 3_687)', 'remaining time': '13:50:40', 'throughput': '8258.02', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:19:55,545 >> {'loss': 0.3734, 'grad_norm': 14.280765533447266, 'learning_rate': 4.708867065149775e-07, 'epoch': 0.36154054787089773, 'num_input_tokens_seen': 4473225216, 'completed': '57.85% (2_133 / 3_687)', 'remaining time': '13:50:06', 'throughput': '8769.45', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:20:24,241 >> {'loss': 0.5905, 'grad_norm': 21.687335968017578, 'learning_rate': 4.704893376348342e-07, 'epoch': 0.3618117710876051, 'num_input_tokens_seen': 4475322368, 'completed': '57.88% (2_134 / 3_687)', 'remaining time': '13:49:30', 'throughput': '9135.08', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:20:51,885 >> {'loss': 0.6312, 'grad_norm': 18.599130630493164, 'learning_rate': 4.700920327418463e-07, 'epoch': 0.36208299430431246, 'num_input_tokens_seen': 4477419520, 'completed': '57.91% (2_135 / 3_687)', 'remaining time': '13:48:52', 'throughput': '9482.93', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:21:22,683 >> {'loss': 0.3828, 'grad_norm': 15.053811073303223, 'learning_rate': 4.696947921557498e-07, 'epoch': 0.3623542175210198, 'num_input_tokens_seen': 4479516672, 'completed': '57.93% (2_136 / 3_687)', 'remaining time': '13:48:19', 'throughput': '8511.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:21:52,835 >> {'loss': 0.443, 'grad_norm': 15.113408088684082, 'learning_rate': 4.6929761619622866e-07, 'epoch': 0.3626254407377271, 'num_input_tokens_seen': 4481613824, 'completed': '57.96% (2_137 / 3_687)', 'remaining time': '13:47:45', 'throughput': '8693.97', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:22:21,958 >> {'loss': 0.3773, 'grad_norm': 14.625730514526367, 'learning_rate': 4.68900505182915e-07, 'epoch': 0.3628966639544345, 'num_input_tokens_seen': 4483710976, 'completed': '57.99% (2_138 / 3_687)', 'remaining time': '13:47:09', 'throughput': '9001.40', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:22:53,336 >> {'loss': 0.2264, 'grad_norm': 13.605332374572754, 'learning_rate': 4.6850345943538896e-07, 'epoch': 0.36316788717114185, 'num_input_tokens_seen': 4485808128, 'completed': '58.01% (2_139 / 3_687)', 'remaining time': '13:46:36', 'throughput': '8354.37', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:23:25,162 >> {'loss': 0.5183, 'grad_norm': 18.40673065185547, 'learning_rate': 4.6810647927317735e-07, 'epoch': 0.3634391103878492, 'num_input_tokens_seen': 4487905280, 'completed': '58.04% (2_140 / 3_687)', 'remaining time': '13:46:04', 'throughput': '8236.90', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:23:55,546 >> {'loss': 0.4819, 'grad_norm': 17.36908721923828, 'learning_rate': 4.677095650157551e-07, 'epoch': 0.36371033360455657, 'num_input_tokens_seen': 4490002432, 'completed': '58.07% (2_141 / 3_687)', 'remaining time': '13:45:30', 'throughput': '8627.80', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:24:25,059 >> {'loss': 0.5503, 'grad_norm': 17.948476791381836, 'learning_rate': 4.6731271698254326e-07, 'epoch': 0.3639815568212639, 'num_input_tokens_seen': 4492099584, 'completed': '58.10% (2_142 / 3_687)', 'remaining time': '13:44:55', 'throughput': '8882.09', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:24:54,516 >> {'loss': 0.9079, 'grad_norm': 22.524932861328125, 'learning_rate': 4.669159354929105e-07, 'epoch': 0.36425278003797124, 'num_input_tokens_seen': 4494196736, 'completed': '58.12% (2_143 / 3_687)', 'remaining time': '13:44:20', 'throughput': '8899.23', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:25:24,284 >> {'loss': 0.2876, 'grad_norm': 11.158336639404297, 'learning_rate': 4.6651922086617134e-07, 'epoch': 0.3645240032546786, 'num_input_tokens_seen': 4496293888, 'completed': '58.15% (2_144 / 3_687)', 'remaining time': '13:43:46', 'throughput': '8806.29', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:25:56,082 >> {'loss': 0.4339, 'grad_norm': 16.561214447021484, 'learning_rate': 4.661225734215867e-07, 'epoch': 0.36479522647138596, 'num_input_tokens_seen': 4498391040, 'completed': '58.18% (2_145 / 3_687)', 'remaining time': '13:43:13', 'throughput': '8244.01', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:26:25,404 >> {'loss': 0.2669, 'grad_norm': 11.833779335021973, 'learning_rate': 4.657259934783628e-07, 'epoch': 0.3650664496880933, 'num_input_tokens_seen': 4500488192, 'completed': '58.20% (2_146 / 3_687)', 'remaining time': '13:42:38', 'throughput': '8940.15', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:26:53,304 >> {'loss': 0.5366, 'grad_norm': 15.685547828674316, 'learning_rate': 4.6532948135565264e-07, 'epoch': 0.3653376729048006, 'num_input_tokens_seen': 4502585344, 'completed': '58.23% (2_147 / 3_687)', 'remaining time': '13:42:01', 'throughput': '9395.75', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:27:25,489 >> {'loss': 0.4777, 'grad_norm': 17.699525833129883, 'learning_rate': 4.6493303737255397e-07, 'epoch': 0.365608896121508, 'num_input_tokens_seen': 4504682496, 'completed': '58.26% (2_148 / 3_687)', 'remaining time': '13:41:30', 'throughput': '8145.06', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:27:54,561 >> {'loss': 0.6268, 'grad_norm': 18.16535186767578, 'learning_rate': 4.645366618481098e-07, 'epoch': 0.36588011933821535, 'num_input_tokens_seen': 4506779648, 'completed': '58.29% (2_149 / 3_687)', 'remaining time': '13:40:54', 'throughput': '9017.05', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:28:23,520 >> {'loss': 0.729, 'grad_norm': 18.208988189697266, 'learning_rate': 4.641403551013081e-07, 'epoch': 0.3661513425549227, 'num_input_tokens_seen': 4508876800, 'completed': '58.31% (2_150 / 3_687)', 'remaining time': '13:40:19', 'throughput': '9052.05', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:28:56,487 >> {'loss': 0.3639, 'grad_norm': 14.158181190490723, 'learning_rate': 4.637441174510813e-07, 'epoch': 0.36642256577163007, 'num_input_tokens_seen': 4510973952, 'completed': '58.34% (2_151 / 3_687)', 'remaining time': '13:39:48', 'throughput': '7951.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:29:25,881 >> {'loss': 0.4579, 'grad_norm': 15.75287914276123, 'learning_rate': 4.633479492163066e-07, 'epoch': 0.3666937889883374, 'num_input_tokens_seen': 4513071104, 'completed': '58.37% (2_152 / 3_687)', 'remaining time': '13:39:13', 'throughput': '8918.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:29:56,410 >> {'loss': 0.413, 'grad_norm': 14.739073753356934, 'learning_rate': 4.6295185071580487e-07, 'epoch': 0.36696501220504474, 'num_input_tokens_seen': 4515168256, 'completed': '58.39% (2_153 / 3_687)', 'remaining time': '13:38:39', 'throughput': '8586.90', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:30:25,733 >> {'loss': 0.4131, 'grad_norm': 14.379277229309082, 'learning_rate': 4.6255582226834133e-07, 'epoch': 0.3672362354217521, 'num_input_tokens_seen': 4517265408, 'completed': '58.42% (2_154 / 3_687)', 'remaining time': '13:38:04', 'throughput': '8939.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:30:56,242 >> {'loss': 0.5776, 'grad_norm': 19.29395294189453, 'learning_rate': 4.6215986419262444e-07, 'epoch': 0.36750745863845946, 'num_input_tokens_seen': 4519362560, 'completed': '58.45% (2_155 / 3_687)', 'remaining time': '13:37:30', 'throughput': '8592.35', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:31:22,814 >> {'loss': 0.9065, 'grad_norm': 22.25945281982422, 'learning_rate': 4.617639768073066e-07, 'epoch': 0.3677786818551668, 'num_input_tokens_seen': 4521459712, 'completed': '58.48% (2_156 / 3_687)', 'remaining time': '13:36:52', 'throughput': '9865.36', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:31:53,797 >> {'loss': 0.302, 'grad_norm': 11.623411178588867, 'learning_rate': 4.613681604309824e-07, 'epoch': 0.36804990507187413, 'num_input_tokens_seen': 4523556864, 'completed': '58.50% (2_157 / 3_687)', 'remaining time': '13:36:19', 'throughput': '8460.91', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:32:24,896 >> {'loss': 0.5756, 'grad_norm': 19.57111358642578, 'learning_rate': 4.609724153821898e-07, 'epoch': 0.3683211282885815, 'num_input_tokens_seen': 4525654016, 'completed': '58.53% (2_158 / 3_687)', 'remaining time': '13:35:46', 'throughput': '8429.33', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:32:55,048 >> {'loss': 0.4861, 'grad_norm': 19.20432472229004, 'learning_rate': 4.6057674197940974e-07, 'epoch': 0.36859235150528885, 'num_input_tokens_seen': 4527751168, 'completed': '58.56% (2_159 / 3_687)', 'remaining time': '13:35:12', 'throughput': '8694.02', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:33:28,406 >> {'loss': 0.3036, 'grad_norm': 12.728202819824219, 'learning_rate': 4.6018114054106494e-07, 'epoch': 0.3688635747219962, 'num_input_tokens_seen': 4529848320, 'completed': '58.58% (2_160 / 3_687)', 'remaining time': '13:34:41', 'throughput': '7858.58', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:33:58,361 >> {'loss': 0.3981, 'grad_norm': 15.91971492767334, 'learning_rate': 4.597856113855203e-07, 'epoch': 0.3691347979387036, 'num_input_tokens_seen': 4531945472, 'completed': '58.61% (2_161 / 3_687)', 'remaining time': '13:34:07', 'throughput': '8751.31', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:34:25,755 >> {'loss': 0.6073, 'grad_norm': 20.145366668701172, 'learning_rate': 4.593901548310828e-07, 'epoch': 0.3694060211554109, 'num_input_tokens_seen': 4534042624, 'completed': '58.64% (2_162 / 3_687)', 'remaining time': '13:33:30', 'throughput': '9569.33', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:34:53,845 >> {'loss': 0.4431, 'grad_norm': 18.48992156982422, 'learning_rate': 4.5899477119600073e-07, 'epoch': 0.36967724437211824, 'num_input_tokens_seen': 4536139776, 'completed': '58.67% (2_163 / 3_687)', 'remaining time': '13:32:53', 'throughput': '9332.36', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:35:27,993 >> {'loss': 0.4718, 'grad_norm': 20.453821182250977, 'learning_rate': 4.585994607984637e-07, 'epoch': 0.3699484675888256, 'num_input_tokens_seen': 4538236928, 'completed': '58.69% (2_164 / 3_687)', 'remaining time': '13:32:24', 'throughput': '7676.63', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:35:59,442 >> {'loss': 0.555, 'grad_norm': 19.20899772644043, 'learning_rate': 4.582042239566026e-07, 'epoch': 0.37021969080553296, 'num_input_tokens_seen': 4540334080, 'completed': '58.72% (2_165 / 3_687)', 'remaining time': '13:31:51', 'throughput': '8335.55', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:36:35,410 >> {'loss': 0.3494, 'grad_norm': 14.162814140319824, 'learning_rate': 4.578090609884889e-07, 'epoch': 0.3704909140222403, 'num_input_tokens_seen': 4542431232, 'completed': '58.75% (2_166 / 3_687)', 'remaining time': '13:31:23', 'throughput': '7288.19', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:37:05,591 >> {'loss': 0.3046, 'grad_norm': 16.07235336303711, 'learning_rate': 4.57413972212135e-07, 'epoch': 0.37076213723894763, 'num_input_tokens_seen': 4544528384, 'completed': '58.77% (2_167 / 3_687)', 'remaining time': '13:30:49', 'throughput': '8685.81', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:37:34,698 >> {'loss': 0.403, 'grad_norm': 18.295516967773438, 'learning_rate': 4.5701895794549293e-07, 'epoch': 0.371033360455655, 'num_input_tokens_seen': 4546625536, 'completed': '58.80% (2_168 / 3_687)', 'remaining time': '13:30:14', 'throughput': '9006.33', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:38:06,775 >> {'loss': 0.3592, 'grad_norm': 19.14270782470703, 'learning_rate': 4.566240185064554e-07, 'epoch': 0.37130458367236235, 'num_input_tokens_seen': 4548722688, 'completed': '58.83% (2_169 / 3_687)', 'remaining time': '13:29:42', 'throughput': '8172.25', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:38:39,232 >> {'loss': 0.5408, 'grad_norm': 17.336620330810547, 'learning_rate': 4.5622915421285446e-07, 'epoch': 0.3715758068890697, 'num_input_tokens_seen': 4550819840, 'completed': '58.86% (2_170 / 3_687)', 'remaining time': '13:29:11', 'throughput': '8076.64', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:39:09,038 >> {'loss': 0.0982, 'grad_norm': 7.247825622558594, 'learning_rate': 4.558343653824619e-07, 'epoch': 0.3718470301057771, 'num_input_tokens_seen': 4552916992, 'completed': '58.88% (2_171 / 3_687)', 'remaining time': '13:28:36', 'throughput': '8795.12', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:39:39,899 >> {'loss': 0.3333, 'grad_norm': 14.099793434143066, 'learning_rate': 4.55439652332989e-07, 'epoch': 0.3721182533224844, 'num_input_tokens_seen': 4555014144, 'completed': '58.91% (2_172 / 3_687)', 'remaining time': '13:28:03', 'throughput': '8494.28', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:40:11,025 >> {'loss': 0.4548, 'grad_norm': 17.59613800048828, 'learning_rate': 4.550450153820857e-07, 'epoch': 0.37238947653919174, 'num_input_tokens_seen': 4557111296, 'completed': '58.94% (2_173 / 3_687)', 'remaining time': '13:27:30', 'throughput': '8422.02', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:40:43,235 >> {'loss': 0.3559, 'grad_norm': 12.305207252502441, 'learning_rate': 4.5465045484734044e-07, 'epoch': 0.3726606997558991, 'num_input_tokens_seen': 4559208448, 'completed': '58.96% (2_174 / 3_687)', 'remaining time': '13:26:58', 'throughput': '8138.55', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:41:12,505 >> {'loss': 0.3236, 'grad_norm': 13.379070281982422, 'learning_rate': 4.5425597104628086e-07, 'epoch': 0.37293192297260647, 'num_input_tokens_seen': 4561305600, 'completed': '58.99% (2_175 / 3_687)', 'remaining time': '13:26:23', 'throughput': '8956.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:41:41,082 >> {'loss': 0.4341, 'grad_norm': 15.527801513671875, 'learning_rate': 4.5386156429637256e-07, 'epoch': 0.37320314618931383, 'num_input_tokens_seen': 4563402752, 'completed': '59.02% (2_176 / 3_687)', 'remaining time': '13:25:48', 'throughput': '9173.26', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:42:14,281 >> {'loss': 0.3883, 'grad_norm': 22.03895378112793, 'learning_rate': 4.534672349150189e-07, 'epoch': 0.37347436940602113, 'num_input_tokens_seen': 4565499904, 'completed': '59.05% (2_177 / 3_687)', 'remaining time': '13:25:17', 'throughput': '7895.98', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:42:44,417 >> {'loss': 0.311, 'grad_norm': 15.303971290588379, 'learning_rate': 4.530729832195612e-07, 'epoch': 0.3737455926227285, 'num_input_tokens_seen': 4567597056, 'completed': '59.07% (2_178 / 3_687)', 'remaining time': '13:24:43', 'throughput': '8698.66', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:43:16,664 >> {'loss': 0.3807, 'grad_norm': 15.900908470153809, 'learning_rate': 4.5267880952727845e-07, 'epoch': 0.37401681583943586, 'num_input_tokens_seen': 4569694208, 'completed': '59.10% (2_179 / 3_687)', 'remaining time': '13:24:11', 'throughput': '8129.40', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:43:44,421 >> {'loss': 0.9865, 'grad_norm': 26.075777053833008, 'learning_rate': 4.5228471415538637e-07, 'epoch': 0.3742880390561432, 'num_input_tokens_seen': 4571791360, 'completed': '59.13% (2_180 / 3_687)', 'remaining time': '13:23:35', 'throughput': '9444.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:44:15,566 >> {'loss': 0.3947, 'grad_norm': 18.148820877075195, 'learning_rate': 4.518906974210379e-07, 'epoch': 0.3745592622728506, 'num_input_tokens_seen': 4573888512, 'completed': '59.15% (2_181 / 3_687)', 'remaining time': '13:23:02', 'throughput': '8416.86', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:44:45,447 >> {'loss': 0.4087, 'grad_norm': 32.52322006225586, 'learning_rate': 4.514967596413228e-07, 'epoch': 0.3748304854895579, 'num_input_tokens_seen': 4575985664, 'completed': '59.18% (2_182 / 3_687)', 'remaining time': '13:22:27', 'throughput': '8772.87', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:45:15,887 >> {'loss': 0.3748, 'grad_norm': 14.72100830078125, 'learning_rate': 4.511029011332672e-07, 'epoch': 0.37510170870626525, 'num_input_tokens_seen': 4578082816, 'completed': '59.21% (2_183 / 3_687)', 'remaining time': '13:21:54', 'throughput': '8612.02', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:45:47,047 >> {'loss': 0.5725, 'grad_norm': 18.871889114379883, 'learning_rate': 4.507091222138335e-07, 'epoch': 0.3753729319229726, 'num_input_tokens_seen': 4580179968, 'completed': '59.24% (2_184 / 3_687)', 'remaining time': '13:21:21', 'throughput': '8412.64', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:46:17,793 >> {'loss': 0.2177, 'grad_norm': 15.662956237792969, 'learning_rate': 4.503154231999202e-07, 'epoch': 0.37564415513967997, 'num_input_tokens_seen': 4582277120, 'completed': '59.26% (2_185 / 3_687)', 'remaining time': '13:20:47', 'throughput': '8526.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:46:47,673 >> {'loss': 0.799, 'grad_norm': 23.3785343170166, 'learning_rate': 4.499218044083608e-07, 'epoch': 0.37591537835638733, 'num_input_tokens_seen': 4584374272, 'completed': '59.29% (2_186 / 3_687)', 'remaining time': '13:20:13', 'throughput': '8773.24', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:47:18,729 >> {'loss': 0.4064, 'grad_norm': 15.690712928771973, 'learning_rate': 4.495282661559248e-07, 'epoch': 0.37618660157309464, 'num_input_tokens_seen': 4586471424, 'completed': '59.32% (2_187 / 3_687)', 'remaining time': '13:19:40', 'throughput': '8441.11', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:47:47,776 >> {'loss': 0.5982, 'grad_norm': 17.365020751953125, 'learning_rate': 4.491348087593171e-07, 'epoch': 0.376457824789802, 'num_input_tokens_seen': 4588568576, 'completed': '59.34% (2_188 / 3_687)', 'remaining time': '13:19:05', 'throughput': '9024.81', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:48:17,890 >> {'loss': 0.2989, 'grad_norm': 12.830045700073242, 'learning_rate': 4.4874143253517706e-07, 'epoch': 0.37672904800650936, 'num_input_tokens_seen': 4590665728, 'completed': '59.37% (2_189 / 3_687)', 'remaining time': '13:18:31', 'throughput': '8704.82', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:48:48,676 >> {'loss': 0.7782, 'grad_norm': 21.898000717163086, 'learning_rate': 4.483481378000791e-07, 'epoch': 0.3770002712232167, 'num_input_tokens_seen': 4592762880, 'completed': '59.40% (2_190 / 3_687)', 'remaining time': '13:17:58', 'throughput': '8514.99', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:49:19,928 >> {'loss': 0.5265, 'grad_norm': 17.4901065826416, 'learning_rate': 4.4795492487053155e-07, 'epoch': 0.3772714944399241, 'num_input_tokens_seen': 4594860032, 'completed': '59.43% (2_191 / 3_687)', 'remaining time': '13:17:25', 'throughput': '8388.13', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:49:51,613 >> {'loss': 0.3187, 'grad_norm': 11.635929107666016, 'learning_rate': 4.4756179406297744e-07, 'epoch': 0.3775427176566314, 'num_input_tokens_seen': 4596957184, 'completed': '59.45% (2_192 / 3_687)', 'remaining time': '13:16:53', 'throughput': '8273.47', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:50:20,547 >> {'loss': 0.3864, 'grad_norm': 16.157140731811523, 'learning_rate': 4.4716874569379336e-07, 'epoch': 0.37781394087333875, 'num_input_tokens_seen': 4599054336, 'completed': '59.48% (2_193 / 3_687)', 'remaining time': '13:16:17', 'throughput': '9060.17', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:50:47,885 >> {'loss': 0.3689, 'grad_norm': 15.90643310546875, 'learning_rate': 4.467757800792897e-07, 'epoch': 0.3780851640900461, 'num_input_tokens_seen': 4601151488, 'completed': '59.51% (2_194 / 3_687)', 'remaining time': '13:15:40', 'throughput': '9588.96', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:51:17,153 >> {'loss': 0.6203, 'grad_norm': 20.70597267150879, 'learning_rate': 4.4638289753571025e-07, 'epoch': 0.37835638730675347, 'num_input_tokens_seen': 4603248640, 'completed': '59.53% (2_195 / 3_687)', 'remaining time': '13:15:06', 'throughput': '8956.74', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:51:47,613 >> {'loss': 0.3584, 'grad_norm': 13.205007553100586, 'learning_rate': 4.459900983792321e-07, 'epoch': 0.37862761052346083, 'num_input_tokens_seen': 4605345792, 'completed': '59.56% (2_196 / 3_687)', 'remaining time': '13:14:32', 'throughput': '8606.19', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:52:18,480 >> {'loss': 0.2914, 'grad_norm': 15.028583526611328, 'learning_rate': 4.455973829259648e-07, 'epoch': 0.37889883374016814, 'num_input_tokens_seen': 4607442944, 'completed': '59.59% (2_197 / 3_687)', 'remaining time': '13:13:59', 'throughput': '8492.49', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:52:50,750 >> {'loss': 0.4295, 'grad_norm': 14.172557830810547, 'learning_rate': 4.4520475149195093e-07, 'epoch': 0.3791700569568755, 'num_input_tokens_seen': 4609540096, 'completed': '59.61% (2_198 / 3_687)', 'remaining time': '13:13:27', 'throughput': '8123.48', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:53:20,198 >> {'loss': 0.2735, 'grad_norm': 11.871790885925293, 'learning_rate': 4.4481220439316514e-07, 'epoch': 0.37944128017358286, 'num_input_tokens_seen': 4611637248, 'completed': '59.64% (2_199 / 3_687)', 'remaining time': '13:12:52', 'throughput': '8901.95', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 03:53:50,365 >> {'loss': 0.6647, 'grad_norm': 17.767791748046875, 'learning_rate': 4.444197419455147e-07, 'epoch': 0.3797125033902902, 'num_input_tokens_seen': 4613734400, 'completed': '59.67% (2_200 / 3_687)', 'remaining time': '13:12:19', 'throughput': '8689.80', 'gpu_mem_free': '5581MB'} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( [INFO|trainer.py:3503] 2024-12-26 03:54:13,820 >> Saving model checkpoint to /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2200 [INFO|configuration_utils.py:472] 2024-12-26 03:54:13,823 >> Configuration saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2200/config.json [INFO|configuration_utils.py:807] 2024-12-26 03:54:13,825 >> Configuration saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2200/generation_config.json [INFO|modeling_utils.py:2807] 2024-12-26 03:55:09,960 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 7 checkpoint shards. You can find where each parameters has been saved in the index located at /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2200/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2684] 2024-12-26 03:55:09,964 >> tokenizer config file saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2200/tokenizer_config.json [INFO|tokenization_utils_base.py:2693] 2024-12-26 03:55:09,965 >> Special tokens file saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2200/special_tokens_map.json /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( [WARNING|trainer.py:868] 2024-12-26 03:59:00,091 >> Save streaming dataset state: {'epoch': 0, 'sample_in_epoch': 4400, 'num_canonical_nodes': 1, 'shuffle_seed': 42, 'initial_physical_nodes': 1} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] [INFO|trainer.py:175] 2024-12-26 03:59:32,063 >> {'loss': 0.4934, 'grad_norm': 15.653825759887695, 'learning_rate': 4.4402736446483824e-07, 'epoch': 0.3799837266069976, 'num_input_tokens_seen': 4615831552, 'completed': '59.70% (2_201 / 3_687)', 'remaining time': '13:17:15', 'throughput': '767.18', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:00:05,569 >> {'loss': 0.4065, 'grad_norm': 16.383865356445312, 'learning_rate': 4.436350722669065e-07, 'epoch': 0.3802549498237049, 'num_input_tokens_seen': 4617928704, 'completed': '59.72% (2_202 / 3_687)', 'remaining time': '13:16:44', 'throughput': '7823.91', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:00:34,089 >> {'loss': 0.5536, 'grad_norm': 17.470500946044922, 'learning_rate': 4.43242865667421e-07, 'epoch': 0.38052617304041225, 'num_input_tokens_seen': 4620025856, 'completed': '59.75% (2_203 / 3_687)', 'remaining time': '13:16:08', 'throughput': '9191.38', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:01:05,211 >> {'loss': 0.4773, 'grad_norm': 17.186763763427734, 'learning_rate': 4.428507449820147e-07, 'epoch': 0.3807973962571196, 'num_input_tokens_seen': 4622123008, 'completed': '59.78% (2_204 / 3_687)', 'remaining time': '13:15:35', 'throughput': '8423.09', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:01:33,843 >> {'loss': 0.6064, 'grad_norm': 18.592941284179688, 'learning_rate': 4.424587105262516e-07, 'epoch': 0.381068619473827, 'num_input_tokens_seen': 4624220160, 'completed': '59.80% (2_205 / 3_687)', 'remaining time': '13:14:59', 'throughput': '9155.62', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:02:00,907 >> {'loss': 0.6243, 'grad_norm': 18.531604766845703, 'learning_rate': 4.4206676261562603e-07, 'epoch': 0.38133984269053434, 'num_input_tokens_seen': 4626317312, 'completed': '59.83% (2_206 / 3_687)', 'remaining time': '13:14:21', 'throughput': '9686.36', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:02:31,693 >> {'loss': 0.3981, 'grad_norm': 18.356502532958984, 'learning_rate': 4.4167490156556274e-07, 'epoch': 0.38161106590724164, 'num_input_tokens_seen': 4628414464, 'completed': '59.86% (2_207 / 3_687)', 'remaining time': '13:13:48', 'throughput': '8515.02', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:03:01,766 >> {'loss': 0.4111, 'grad_norm': 16.479177474975586, 'learning_rate': 4.4128312769141664e-07, 'epoch': 0.381882289123949, 'num_input_tokens_seen': 4630511616, 'completed': '59.89% (2_208 / 3_687)', 'remaining time': '13:13:13', 'throughput': '8716.87', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:03:32,290 >> {'loss': 0.3192, 'grad_norm': 13.391749382019043, 'learning_rate': 4.408914413084722e-07, 'epoch': 0.38215351234065636, 'num_input_tokens_seen': 4632608768, 'completed': '59.91% (2_209 / 3_687)', 'remaining time': '13:12:39', 'throughput': '8588.21', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:04:03,567 >> {'loss': 0.7114, 'grad_norm': 24.691661834716797, 'learning_rate': 4.4049984273194406e-07, 'epoch': 0.3824247355573637, 'num_input_tokens_seen': 4634705920, 'completed': '59.94% (2_210 / 3_687)', 'remaining time': '13:12:06', 'throughput': '8381.32', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:04:33,226 >> {'loss': 0.7462, 'grad_norm': 27.44068717956543, 'learning_rate': 4.401083322769756e-07, 'epoch': 0.3826959587740711, 'num_input_tokens_seen': 4636803072, 'completed': '59.97% (2_211 / 3_687)', 'remaining time': '13:11:32', 'throughput': '8838.55', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:05:04,595 >> {'loss': 0.4238, 'grad_norm': 14.547937393188477, 'learning_rate': 4.3971691025863964e-07, 'epoch': 0.3829671819907784, 'num_input_tokens_seen': 4638900224, 'completed': '59.99% (2_212 / 3_687)', 'remaining time': '13:10:59', 'throughput': '8356.81', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:05:34,842 >> {'loss': 0.5338, 'grad_norm': 16.86309242248535, 'learning_rate': 4.3932557699193794e-07, 'epoch': 0.38323840520748575, 'num_input_tokens_seen': 4640997376, 'completed': '60.02% (2_213 / 3_687)', 'remaining time': '13:10:24', 'throughput': '8666.64', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:06:05,326 >> {'loss': 0.3702, 'grad_norm': 13.788909912109375, 'learning_rate': 4.389343327918004e-07, 'epoch': 0.3835096284241931, 'num_input_tokens_seen': 4643094528, 'completed': '60.05% (2_214 / 3_687)', 'remaining time': '13:09:50', 'throughput': '8599.56', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:06:35,588 >> {'loss': 0.4389, 'grad_norm': 18.939809799194336, 'learning_rate': 4.3854317797308515e-07, 'epoch': 0.3837808516409005, 'num_input_tokens_seen': 4645191680, 'completed': '60.08% (2_215 / 3_687)', 'remaining time': '13:09:16', 'throughput': '8662.37', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:07:07,720 >> {'loss': 0.3214, 'grad_norm': 13.284666061401367, 'learning_rate': 4.381521128505791e-07, 'epoch': 0.3840520748576078, 'num_input_tokens_seen': 4647288832, 'completed': '60.10% (2_216 / 3_687)', 'remaining time': '13:08:44', 'throughput': '8158.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:07:39,203 >> {'loss': 0.2278, 'grad_norm': 10.879389762878418, 'learning_rate': 4.377611377389966e-07, 'epoch': 0.38432329807431515, 'num_input_tokens_seen': 4649385984, 'completed': '60.13% (2_217 / 3_687)', 'remaining time': '13:08:11', 'throughput': '8326.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:08:11,830 >> {'loss': 0.3289, 'grad_norm': 12.832559585571289, 'learning_rate': 4.3737025295297937e-07, 'epoch': 0.3845945212910225, 'num_input_tokens_seen': 4651483136, 'completed': '60.16% (2_218 / 3_687)', 'remaining time': '13:07:39', 'throughput': '8034.70', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:08:42,943 >> {'loss': 0.5121, 'grad_norm': 15.852336883544922, 'learning_rate': 4.3697945880709686e-07, 'epoch': 0.38486574450772987, 'num_input_tokens_seen': 4653580288, 'completed': '60.18% (2_219 / 3_687)', 'remaining time': '13:07:06', 'throughput': '8425.48', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:09:14,306 >> {'loss': 0.5452, 'grad_norm': 19.836444854736328, 'learning_rate': 4.3658875561584494e-07, 'epoch': 0.38513696772443723, 'num_input_tokens_seen': 4655677440, 'completed': '60.21% (2_220 / 3_687)', 'remaining time': '13:06:33', 'throughput': '8358.34', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:09:45,939 >> {'loss': 0.1762, 'grad_norm': 18.720237731933594, 'learning_rate': 4.36198143693647e-07, 'epoch': 0.38540819094114454, 'num_input_tokens_seen': 4657774592, 'completed': '60.24% (2_221 / 3_687)', 'remaining time': '13:06:00', 'throughput': '8286.97', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:10:16,051 >> {'loss': 0.2582, 'grad_norm': 12.053540229797363, 'learning_rate': 4.3580762335485253e-07, 'epoch': 0.3856794141578519, 'num_input_tokens_seen': 4659871744, 'completed': '60.27% (2_222 / 3_687)', 'remaining time': '13:05:26', 'throughput': '8705.85', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:10:47,620 >> {'loss': 0.7018, 'grad_norm': 21.45743751525879, 'learning_rate': 4.3541719491373743e-07, 'epoch': 0.38595063737455926, 'num_input_tokens_seen': 4661968896, 'completed': '60.29% (2_223 / 3_687)', 'remaining time': '13:04:53', 'throughput': '8303.71', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:11:19,425 >> {'loss': 0.686, 'grad_norm': 18.61931610107422, 'learning_rate': 4.350268586845035e-07, 'epoch': 0.3862218605912666, 'num_input_tokens_seen': 4664066048, 'completed': '60.32% (2_224 / 3_687)', 'remaining time': '13:04:21', 'throughput': '8242.14', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:11:50,991 >> {'loss': 0.4922, 'grad_norm': 15.874135971069336, 'learning_rate': 4.34636614981279e-07, 'epoch': 0.386493083807974, 'num_input_tokens_seen': 4666163200, 'completed': '60.35% (2_225 / 3_687)', 'remaining time': '13:03:48', 'throughput': '8304.72', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:12:20,366 >> {'loss': 0.4819, 'grad_norm': 17.982471466064453, 'learning_rate': 4.342464641181166e-07, 'epoch': 0.3867643070246813, 'num_input_tokens_seen': 4668260352, 'completed': '60.37% (2_226 / 3_687)', 'remaining time': '13:03:13', 'throughput': '8924.12', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:12:50,318 >> {'loss': 0.3803, 'grad_norm': 13.340551376342773, 'learning_rate': 4.3385640640899524e-07, 'epoch': 0.38703553024138865, 'num_input_tokens_seen': 4670357504, 'completed': '60.40% (2_227 / 3_687)', 'remaining time': '13:02:39', 'throughput': '8752.12', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:13:20,902 >> {'loss': 0.4039, 'grad_norm': 15.015870094299316, 'learning_rate': 4.3346644216781823e-07, 'epoch': 0.387306753458096, 'num_input_tokens_seen': 4672454656, 'completed': '60.43% (2_228 / 3_687)', 'remaining time': '13:02:05', 'throughput': '8571.41', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:13:51,402 >> {'loss': 0.2899, 'grad_norm': 13.041723251342773, 'learning_rate': 4.3307657170841417e-07, 'epoch': 0.38757797667480337, 'num_input_tokens_seen': 4674551808, 'completed': '60.46% (2_229 / 3_687)', 'remaining time': '13:01:31', 'throughput': '8594.84', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:14:22,499 >> {'loss': 0.7029, 'grad_norm': 21.264972686767578, 'learning_rate': 4.326867953445362e-07, 'epoch': 0.38784919989151073, 'num_input_tokens_seen': 4676648960, 'completed': '60.48% (2_230 / 3_687)', 'remaining time': '13:00:58', 'throughput': '8429.74', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:14:55,637 >> {'loss': 0.4447, 'grad_norm': 17.26335906982422, 'learning_rate': 4.322971133898611e-07, 'epoch': 0.38812042310821804, 'num_input_tokens_seen': 4678746112, 'completed': '60.51% (2_231 / 3_687)', 'remaining time': '13:00:27', 'throughput': '7910.84', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:15:24,050 >> {'loss': 0.3625, 'grad_norm': 13.549259185791016, 'learning_rate': 4.319075261579901e-07, 'epoch': 0.3883916463249254, 'num_input_tokens_seen': 4680843264, 'completed': '60.54% (2_232 / 3_687)', 'remaining time': '12:59:51', 'throughput': '9225.94', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:15:52,690 >> {'loss': 0.5931, 'grad_norm': 17.16545295715332, 'learning_rate': 4.3151803396244833e-07, 'epoch': 0.38866286954163276, 'num_input_tokens_seen': 4682940416, 'completed': '60.56% (2_233 / 3_687)', 'remaining time': '12:59:15', 'throughput': '9153.03', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:16:23,526 >> {'loss': 0.3044, 'grad_norm': 20.503673553466797, 'learning_rate': 4.3112863711668435e-07, 'epoch': 0.3889340927583401, 'num_input_tokens_seen': 4685037568, 'completed': '60.59% (2_234 / 3_687)', 'remaining time': '12:58:41', 'throughput': '8501.50', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:16:54,225 >> {'loss': 0.9052, 'grad_norm': 24.06279182434082, 'learning_rate': 4.3073933593406974e-07, 'epoch': 0.3892053159750475, 'num_input_tokens_seen': 4687134720, 'completed': '60.62% (2_235 / 3_687)', 'remaining time': '12:58:08', 'throughput': '8539.17', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:17:23,292 >> {'loss': 0.3166, 'grad_norm': 13.339831352233887, 'learning_rate': 4.303501307278994e-07, 'epoch': 0.3894765391917548, 'num_input_tokens_seen': 4689231872, 'completed': '60.65% (2_236 / 3_687)', 'remaining time': '12:57:32', 'throughput': '9018.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:17:52,786 >> {'loss': 0.4647, 'grad_norm': 14.58847713470459, 'learning_rate': 4.299610218113908e-07, 'epoch': 0.38974776240846215, 'num_input_tokens_seen': 4691329024, 'completed': '60.67% (2_237 / 3_687)', 'remaining time': '12:56:58', 'throughput': '8888.18', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:18:26,221 >> {'loss': 0.3683, 'grad_norm': 16.615280151367188, 'learning_rate': 4.2957200949768414e-07, 'epoch': 0.3900189856251695, 'num_input_tokens_seen': 4693426176, 'completed': '60.70% (2_238 / 3_687)', 'remaining time': '12:56:27', 'throughput': '7840.31', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:18:57,851 >> {'loss': 0.6683, 'grad_norm': 25.591224670410156, 'learning_rate': 4.2918309409984145e-07, 'epoch': 0.3902902088418769, 'num_input_tokens_seen': 4695523328, 'completed': '60.73% (2_239 / 3_687)', 'remaining time': '12:55:54', 'throughput': '8287.86', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:19:29,708 >> {'loss': 0.1616, 'grad_norm': 9.293283462524414, 'learning_rate': 4.2879427593084714e-07, 'epoch': 0.39056143205858423, 'num_input_tokens_seen': 4697620480, 'completed': '60.75% (2_240 / 3_687)', 'remaining time': '12:55:22', 'throughput': '8228.68', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:20:02,239 >> {'loss': 0.3878, 'grad_norm': 16.830848693847656, 'learning_rate': 4.2840555530360756e-07, 'epoch': 0.39083265527529154, 'num_input_tokens_seen': 4699717632, 'completed': '60.78% (2_241 / 3_687)', 'remaining time': '12:54:50', 'throughput': '8058.33', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:20:31,788 >> {'loss': 0.3504, 'grad_norm': 13.508627891540527, 'learning_rate': 4.280169325309502e-07, 'epoch': 0.3911038784919989, 'num_input_tokens_seen': 4701814784, 'completed': '60.81% (2_242 / 3_687)', 'remaining time': '12:54:15', 'throughput': '8871.61', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:21:03,206 >> {'loss': 0.2894, 'grad_norm': 12.19189739227295, 'learning_rate': 4.276284079256235e-07, 'epoch': 0.39137510170870626, 'num_input_tokens_seen': 4703911936, 'completed': '60.84% (2_243 / 3_687)', 'remaining time': '12:53:42', 'throughput': '8343.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:21:33,547 >> {'loss': 0.5448, 'grad_norm': 24.510799407958984, 'learning_rate': 4.272399818002974e-07, 'epoch': 0.3916463249254136, 'num_input_tokens_seen': 4706009088, 'completed': '60.86% (2_244 / 3_687)', 'remaining time': '12:53:08', 'throughput': '8639.90', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:22:03,652 >> {'loss': 0.2909, 'grad_norm': 17.75606918334961, 'learning_rate': 4.268516544675628e-07, 'epoch': 0.391917548142121, 'num_input_tokens_seen': 4708106240, 'completed': '60.89% (2_245 / 3_687)', 'remaining time': '12:52:34', 'throughput': '8707.63', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:22:37,489 >> {'loss': 0.3659, 'grad_norm': 16.183746337890625, 'learning_rate': 4.2646342623993035e-07, 'epoch': 0.3921887713588283, 'num_input_tokens_seen': 4710203392, 'completed': '60.92% (2_246 / 3_687)', 'remaining time': '12:52:04', 'throughput': '7747.20', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:23:06,349 >> {'loss': 0.3025, 'grad_norm': 12.006651878356934, 'learning_rate': 4.2607529742983174e-07, 'epoch': 0.39245999457553565, 'num_input_tokens_seen': 4712300544, 'completed': '60.94% (2_247 / 3_687)', 'remaining time': '12:51:28', 'throughput': '9083.38', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:23:34,924 >> {'loss': 0.6958, 'grad_norm': 22.091772079467773, 'learning_rate': 4.2568726834961797e-07, 'epoch': 0.392731217792243, 'num_input_tokens_seen': 4714397696, 'completed': '60.97% (2_248 / 3_687)', 'remaining time': '12:50:53', 'throughput': '9173.82', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:24:05,734 >> {'loss': 0.2623, 'grad_norm': 13.590865135192871, 'learning_rate': 4.252993393115601e-07, 'epoch': 0.3930024410089504, 'num_input_tokens_seen': 4716494848, 'completed': '61.00% (2_249 / 3_687)', 'remaining time': '12:50:19', 'throughput': '8508.58', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:24:35,096 >> {'loss': 0.4497, 'grad_norm': 18.128318786621094, 'learning_rate': 4.249115106278489e-07, 'epoch': 0.39327366422565774, 'num_input_tokens_seen': 4718592000, 'completed': '61.03% (2_250 / 3_687)', 'remaining time': '12:49:44', 'throughput': '8928.02', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:25:04,693 >> {'loss': 0.5144, 'grad_norm': 17.97321319580078, 'learning_rate': 4.2452378261059397e-07, 'epoch': 0.39354488744236504, 'num_input_tokens_seen': 4720689152, 'completed': '61.05% (2_251 / 3_687)', 'remaining time': '12:49:09', 'throughput': '8857.06', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:25:33,608 >> {'loss': 0.4629, 'grad_norm': 17.074485778808594, 'learning_rate': 4.24136155571824e-07, 'epoch': 0.3938161106590724, 'num_input_tokens_seen': 4722786304, 'completed': '61.08% (2_252 / 3_687)', 'remaining time': '12:48:34', 'throughput': '9065.77', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:26:05,313 >> {'loss': 0.4546, 'grad_norm': 16.58645248413086, 'learning_rate': 4.2374862982348657e-07, 'epoch': 0.39408733387577977, 'num_input_tokens_seen': 4724883456, 'completed': '61.11% (2_253 / 3_687)', 'remaining time': '12:48:02', 'throughput': '8268.46', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:26:35,855 >> {'loss': 0.8026, 'grad_norm': 20.560569763183594, 'learning_rate': 4.233612056774477e-07, 'epoch': 0.3943585570924871, 'num_input_tokens_seen': 4726980608, 'completed': '61.13% (2_254 / 3_687)', 'remaining time': '12:47:28', 'throughput': '8582.95', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:27:06,351 >> {'loss': 0.2919, 'grad_norm': 13.33758544921875, 'learning_rate': 4.2297388344549146e-07, 'epoch': 0.3946297803091945, 'num_input_tokens_seen': 4729077760, 'completed': '61.16% (2_255 / 3_687)', 'remaining time': '12:46:54', 'throughput': '8595.95', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:27:36,729 >> {'loss': 0.4084, 'grad_norm': 17.15109634399414, 'learning_rate': 4.2258666343932004e-07, 'epoch': 0.3949010035259018, 'num_input_tokens_seen': 4731174912, 'completed': '61.19% (2_256 / 3_687)', 'remaining time': '12:46:20', 'throughput': '8629.37', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:28:05,930 >> {'loss': 0.365, 'grad_norm': 16.682199478149414, 'learning_rate': 4.2219954597055354e-07, 'epoch': 0.39517222674260916, 'num_input_tokens_seen': 4733272064, 'completed': '61.22% (2_257 / 3_687)', 'remaining time': '12:45:45', 'throughput': '8977.47', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:28:37,127 >> {'loss': 0.3236, 'grad_norm': 13.481849670410156, 'learning_rate': 4.2181253135072925e-07, 'epoch': 0.3954434499593165, 'num_input_tokens_seen': 4735369216, 'completed': '61.24% (2_258 / 3_687)', 'remaining time': '12:45:12', 'throughput': '8402.67', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:29:10,995 >> {'loss': 0.5844, 'grad_norm': 27.910301208496094, 'learning_rate': 4.21425619891302e-07, 'epoch': 0.3957146731760239, 'num_input_tokens_seen': 4737466368, 'completed': '61.27% (2_259 / 3_687)', 'remaining time': '12:44:42', 'throughput': '7740.32', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:29:43,357 >> {'loss': 0.6685, 'grad_norm': 24.167097091674805, 'learning_rate': 4.21038811903643e-07, 'epoch': 0.39598589639273124, 'num_input_tokens_seen': 4739563520, 'completed': '61.30% (2_260 / 3_687)', 'remaining time': '12:44:10', 'throughput': '8100.33', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:30:11,701 >> {'loss': 0.5551, 'grad_norm': 17.239904403686523, 'learning_rate': 4.206521076990409e-07, 'epoch': 0.39625711960943855, 'num_input_tokens_seen': 4741660672, 'completed': '61.32% (2_261 / 3_687)', 'remaining time': '12:43:34', 'throughput': '9248.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:30:43,434 >> {'loss': 0.4008, 'grad_norm': 21.12584114074707, 'learning_rate': 4.202655075887005e-07, 'epoch': 0.3965283428261459, 'num_input_tokens_seen': 4743757824, 'completed': '61.35% (2_262 / 3_687)', 'remaining time': '12:43:02', 'throughput': '8261.03', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:31:15,341 >> {'loss': 0.2911, 'grad_norm': 12.652347564697266, 'learning_rate': 4.1987901188374286e-07, 'epoch': 0.39679956604285327, 'num_input_tokens_seen': 4745854976, 'completed': '61.38% (2_263 / 3_687)', 'remaining time': '12:42:29', 'throughput': '8215.90', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:31:44,955 >> {'loss': 0.414, 'grad_norm': 13.351825714111328, 'learning_rate': 4.194926208952051e-07, 'epoch': 0.39707078925956063, 'num_input_tokens_seen': 4747952128, 'completed': '61.40% (2_264 / 3_687)', 'remaining time': '12:41:55', 'throughput': '8852.06', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:32:15,096 >> {'loss': 0.3764, 'grad_norm': 13.11346435546875, 'learning_rate': 4.191063349340397e-07, 'epoch': 0.397342012476268, 'num_input_tokens_seen': 4750049280, 'completed': '61.43% (2_265 / 3_687)', 'remaining time': '12:41:21', 'throughput': '8697.18', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:32:47,052 >> {'loss': 0.3352, 'grad_norm': 13.020018577575684, 'learning_rate': 4.1872015431111505e-07, 'epoch': 0.3976132356929753, 'num_input_tokens_seen': 4752146432, 'completed': '61.46% (2_266 / 3_687)', 'remaining time': '12:40:48', 'throughput': '8203.41', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:33:16,408 >> {'loss': 0.6615, 'grad_norm': 20.551406860351562, 'learning_rate': 4.1833407933721476e-07, 'epoch': 0.39788445890968266, 'num_input_tokens_seen': 4754243584, 'completed': '61.49% (2_267 / 3_687)', 'remaining time': '12:40:14', 'throughput': '8929.69', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:33:46,219 >> {'loss': 0.3388, 'grad_norm': 18.05826759338379, 'learning_rate': 4.17948110323037e-07, 'epoch': 0.39815568212639, 'num_input_tokens_seen': 4756340736, 'completed': '61.51% (2_268 / 3_687)', 'remaining time': '12:39:39', 'throughput': '8793.66', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:34:15,818 >> {'loss': 0.3536, 'grad_norm': 13.186254501342773, 'learning_rate': 4.1756224757919513e-07, 'epoch': 0.3984269053430974, 'num_input_tokens_seen': 4758437888, 'completed': '61.54% (2_269 / 3_687)', 'remaining time': '12:39:05', 'throughput': '8856.43', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:34:46,409 >> {'loss': 0.5714, 'grad_norm': 19.050817489624023, 'learning_rate': 4.17176491416217e-07, 'epoch': 0.39869812855980474, 'num_input_tokens_seen': 4760535040, 'completed': '61.57% (2_270 / 3_687)', 'remaining time': '12:38:31', 'throughput': '8569.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:35:16,984 >> {'loss': 0.2979, 'grad_norm': 13.171993255615234, 'learning_rate': 4.1679084214454405e-07, 'epoch': 0.39896935177651205, 'num_input_tokens_seen': 4762632192, 'completed': '61.59% (2_271 / 3_687)', 'remaining time': '12:37:57', 'throughput': '8573.97', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:35:47,890 >> {'loss': 0.4981, 'grad_norm': 19.698951721191406, 'learning_rate': 4.1640530007453245e-07, 'epoch': 0.3992405749932194, 'num_input_tokens_seen': 4764729344, 'completed': '61.62% (2_272 / 3_687)', 'remaining time': '12:37:24', 'throughput': '8481.81', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:36:19,170 >> {'loss': 0.3392, 'grad_norm': 14.195073127746582, 'learning_rate': 4.1601986551645163e-07, 'epoch': 0.39951179820992677, 'num_input_tokens_seen': 4766826496, 'completed': '61.65% (2_273 / 3_687)', 'remaining time': '12:36:51', 'throughput': '8380.56', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:36:49,154 >> {'loss': 0.3579, 'grad_norm': 14.913055419921875, 'learning_rate': 4.156345387804847e-07, 'epoch': 0.39978302142663413, 'num_input_tokens_seen': 4768923648, 'completed': '61.68% (2_274 / 3_687)', 'remaining time': '12:36:17', 'throughput': '8742.84', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:37:19,187 >> {'loss': 0.3177, 'grad_norm': 14.513870239257812, 'learning_rate': 4.152493201767281e-07, 'epoch': 0.4000542446433415, 'num_input_tokens_seen': 4771020800, 'completed': '61.70% (2_275 / 3_687)', 'remaining time': '12:35:43', 'throughput': '8728.65', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:37:47,568 >> {'loss': 0.6674, 'grad_norm': 18.92793083190918, 'learning_rate': 4.1486421001519087e-07, 'epoch': 0.4003254678600488, 'num_input_tokens_seen': 4773117952, 'completed': '61.73% (2_276 / 3_687)', 'remaining time': '12:35:07', 'throughput': '9236.60', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:38:19,214 >> {'loss': 0.3269, 'grad_norm': 14.322976112365723, 'learning_rate': 4.1447920860579524e-07, 'epoch': 0.40059669107675616, 'num_input_tokens_seen': 4775215104, 'completed': '61.76% (2_277 / 3_687)', 'remaining time': '12:34:35', 'throughput': '8283.46', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:38:47,058 >> {'loss': 0.5806, 'grad_norm': 20.151758193969727, 'learning_rate': 4.1409431625837545e-07, 'epoch': 0.4008679142934635, 'num_input_tokens_seen': 4777312256, 'completed': '61.78% (2_278 / 3_687)', 'remaining time': '12:33:59', 'throughput': '9414.85', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:39:20,372 >> {'loss': 0.3868, 'grad_norm': 17.131452560424805, 'learning_rate': 4.137095332826784e-07, 'epoch': 0.4011391375101709, 'num_input_tokens_seen': 4779409408, 'completed': '61.81% (2_279 / 3_687)', 'remaining time': '12:33:28', 'throughput': '7869.01', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:39:52,018 >> {'loss': 0.4183, 'grad_norm': 15.69462776184082, 'learning_rate': 4.1332485998836277e-07, 'epoch': 0.40141036072687825, 'num_input_tokens_seen': 4781506560, 'completed': '61.84% (2_280 / 3_687)', 'remaining time': '12:32:55', 'throughput': '8283.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:40:23,764 >> {'loss': 0.265, 'grad_norm': 13.978616714477539, 'learning_rate': 4.129402966849987e-07, 'epoch': 0.40168158394358555, 'num_input_tokens_seen': 4783603712, 'completed': '61.87% (2_281 / 3_687)', 'remaining time': '12:32:23', 'throughput': '8257.73', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:40:56,106 >> {'loss': 0.5206, 'grad_norm': 17.568675994873047, 'learning_rate': 4.1255584368206877e-07, 'epoch': 0.4019528071602929, 'num_input_tokens_seen': 4785700864, 'completed': '61.89% (2_282 / 3_687)', 'remaining time': '12:31:51', 'throughput': '8105.18', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:41:27,609 >> {'loss': 0.3761, 'grad_norm': 26.41534423828125, 'learning_rate': 4.121715012889655e-07, 'epoch': 0.4022240303770003, 'num_input_tokens_seen': 4787798016, 'completed': '61.92% (2_283 / 3_687)', 'remaining time': '12:31:18', 'throughput': '8321.28', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:41:57,080 >> {'loss': 0.3851, 'grad_norm': 14.635917663574219, 'learning_rate': 4.1178726981499313e-07, 'epoch': 0.40249525359370764, 'num_input_tokens_seen': 4789895168, 'completed': '61.95% (2_284 / 3_687)', 'remaining time': '12:30:43', 'throughput': '8895.11', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:42:27,397 >> {'loss': 0.6027, 'grad_norm': 18.9488582611084, 'learning_rate': 4.1140314956936663e-07, 'epoch': 0.402766476810415, 'num_input_tokens_seen': 4791992320, 'completed': '61.97% (2_285 / 3_687)', 'remaining time': '12:30:10', 'throughput': '8646.63', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:42:58,099 >> {'loss': 0.269, 'grad_norm': 17.327831268310547, 'learning_rate': 4.1101914086121137e-07, 'epoch': 0.4030377000271223, 'num_input_tokens_seen': 4794089472, 'completed': '62.00% (2_286 / 3_687)', 'remaining time': '12:29:36', 'throughput': '8538.48', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:43:29,354 >> {'loss': 0.4427, 'grad_norm': 15.44780445098877, 'learning_rate': 4.106352439995632e-07, 'epoch': 0.40330892324382966, 'num_input_tokens_seen': 4796186624, 'completed': '62.03% (2_287 / 3_687)', 'remaining time': '12:29:03', 'throughput': '8387.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:44:00,585 >> {'loss': 0.3452, 'grad_norm': 21.893918991088867, 'learning_rate': 4.102514592933671e-07, 'epoch': 0.403580146460537, 'num_input_tokens_seen': 4798283776, 'completed': '62.06% (2_288 / 3_687)', 'remaining time': '12:28:30', 'throughput': '8393.56', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:44:29,832 >> {'loss': 0.4563, 'grad_norm': 15.465867042541504, 'learning_rate': 4.098677870514788e-07, 'epoch': 0.4038513696772444, 'num_input_tokens_seen': 4800380928, 'completed': '62.08% (2_289 / 3_687)', 'remaining time': '12:27:56', 'throughput': '8963.29', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:45:00,750 >> {'loss': 0.4338, 'grad_norm': 18.96814727783203, 'learning_rate': 4.094842275826631e-07, 'epoch': 0.40412259289395175, 'num_input_tokens_seen': 4802478080, 'completed': '62.11% (2_290 / 3_687)', 'remaining time': '12:27:22', 'throughput': '8478.67', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:45:29,952 >> {'loss': 0.4135, 'grad_norm': 18.4038143157959, 'learning_rate': 4.091007811955941e-07, 'epoch': 0.40439381611065905, 'num_input_tokens_seen': 4804575232, 'completed': '62.14% (2_291 / 3_687)', 'remaining time': '12:26:48', 'throughput': '8976.74', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:45:57,221 >> {'loss': 0.5425, 'grad_norm': 16.829374313354492, 'learning_rate': 4.0871744819885467e-07, 'epoch': 0.4046650393273664, 'num_input_tokens_seen': 4806672384, 'completed': '62.16% (2_292 / 3_687)', 'remaining time': '12:26:11', 'throughput': '9613.51', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:46:28,722 >> {'loss': 0.5627, 'grad_norm': 16.508983612060547, 'learning_rate': 4.0833422890093684e-07, 'epoch': 0.4049362625440738, 'num_input_tokens_seen': 4808769536, 'completed': '62.19% (2_293 / 3_687)', 'remaining time': '12:25:38', 'throughput': '8321.73', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:47:01,343 >> {'loss': 0.5495, 'grad_norm': 19.47481918334961, 'learning_rate': 4.0795112361024075e-07, 'epoch': 0.40520748576078114, 'num_input_tokens_seen': 4810866688, 'completed': '62.22% (2_294 / 3_687)', 'remaining time': '12:25:07', 'throughput': '8035.99', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:47:32,672 >> {'loss': 0.3162, 'grad_norm': 12.910174369812012, 'learning_rate': 4.0756813263507496e-07, 'epoch': 0.4054787089774885, 'num_input_tokens_seen': 4812963840, 'completed': '62.25% (2_295 / 3_687)', 'remaining time': '12:24:34', 'throughput': '8367.49', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:48:05,325 >> {'loss': 0.3257, 'grad_norm': 15.603672981262207, 'learning_rate': 4.0718525628365617e-07, 'epoch': 0.4057499321941958, 'num_input_tokens_seen': 4815060992, 'completed': '62.27% (2_296 / 3_687)', 'remaining time': '12:24:02', 'throughput': '8028.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:48:36,149 >> {'loss': 0.4128, 'grad_norm': 17.36051368713379, 'learning_rate': 4.0680249486410845e-07, 'epoch': 0.40602115541090317, 'num_input_tokens_seen': 4817158144, 'completed': '62.30% (2_297 / 3_687)', 'remaining time': '12:23:29', 'throughput': '8504.36', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:49:07,695 >> {'loss': 0.5455, 'grad_norm': 20.466211318969727, 'learning_rate': 4.0641984868446386e-07, 'epoch': 0.40629237862761053, 'num_input_tokens_seen': 4819255296, 'completed': '62.33% (2_298 / 3_687)', 'remaining time': '12:22:56', 'throughput': '8309.88', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:49:40,828 >> {'loss': 0.3718, 'grad_norm': 15.06020736694336, 'learning_rate': 4.0603731805266175e-07, 'epoch': 0.4065636018443179, 'num_input_tokens_seen': 4821352448, 'completed': '62.35% (2_299 / 3_687)', 'remaining time': '12:22:25', 'throughput': '7911.94', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:50:12,104 >> {'loss': 0.4691, 'grad_norm': 16.064971923828125, 'learning_rate': 4.056549032765476e-07, 'epoch': 0.40683482506102525, 'num_input_tokens_seen': 4823449600, 'completed': '62.38% (2_300 / 3_687)', 'remaining time': '12:21:52', 'throughput': '8381.64', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:50:41,960 >> {'loss': 0.7604, 'grad_norm': 24.106367111206055, 'learning_rate': 4.0527260466387446e-07, 'epoch': 0.40710604827773256, 'num_input_tokens_seen': 4825546752, 'completed': '62.41% (2_301 / 3_687)', 'remaining time': '12:21:18', 'throughput': '8780.50', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:51:13,042 >> {'loss': 0.4701, 'grad_norm': 17.96955680847168, 'learning_rate': 4.0489042252230197e-07, 'epoch': 0.4073772714944399, 'num_input_tokens_seen': 4827643904, 'completed': '62.44% (2_302 / 3_687)', 'remaining time': '12:20:45', 'throughput': '8433.75', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:51:46,346 >> {'loss': 0.36, 'grad_norm': 13.112820625305176, 'learning_rate': 4.045083571593955e-07, 'epoch': 0.4076484947111473, 'num_input_tokens_seen': 4829741056, 'completed': '62.46% (2_303 / 3_687)', 'remaining time': '12:20:14', 'throughput': '7871.17', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:52:14,767 >> {'loss': 0.7148, 'grad_norm': 19.882753372192383, 'learning_rate': 4.0412640888262697e-07, 'epoch': 0.40791971792785464, 'num_input_tokens_seen': 4831838208, 'completed': '62.49% (2_304 / 3_687)', 'remaining time': '12:19:39', 'throughput': '9223.72', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:52:45,559 >> {'loss': 0.3871, 'grad_norm': 15.251856803894043, 'learning_rate': 4.0374457799937354e-07, 'epoch': 0.40819094114456195, 'num_input_tokens_seen': 4833935360, 'completed': '62.52% (2_305 / 3_687)', 'remaining time': '12:19:06', 'throughput': '8513.41', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:53:17,639 >> {'loss': 0.3569, 'grad_norm': 18.7488956451416, 'learning_rate': 4.033628648169184e-07, 'epoch': 0.4084621643612693, 'num_input_tokens_seen': 4836032512, 'completed': '62.54% (2_306 / 3_687)', 'remaining time': '12:18:33', 'throughput': '8171.66', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:53:47,086 >> {'loss': 0.561, 'grad_norm': 20.533279418945312, 'learning_rate': 4.0298126964244947e-07, 'epoch': 0.40873338757797667, 'num_input_tokens_seen': 4838129664, 'completed': '62.57% (2_307 / 3_687)', 'remaining time': '12:17:59', 'throughput': '8902.17', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:54:18,084 >> {'loss': 0.3199, 'grad_norm': 13.255075454711914, 'learning_rate': 4.025997927830604e-07, 'epoch': 0.40900461079468403, 'num_input_tokens_seen': 4840226816, 'completed': '62.60% (2_308 / 3_687)', 'remaining time': '12:17:26', 'throughput': '8456.75', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:54:48,352 >> {'loss': 0.3569, 'grad_norm': 18.75166130065918, 'learning_rate': 4.022184345457492e-07, 'epoch': 0.4092758340113914, 'num_input_tokens_seen': 4842323968, 'completed': '62.63% (2_309 / 3_687)', 'remaining time': '12:16:52', 'throughput': '8660.87', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:55:18,507 >> {'loss': 0.5254, 'grad_norm': 18.911184310913086, 'learning_rate': 4.0183719523741846e-07, 'epoch': 0.4095470572280987, 'num_input_tokens_seen': 4844421120, 'completed': '62.65% (2_310 / 3_687)', 'remaining time': '12:16:18', 'throughput': '8693.07', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:55:47,519 >> {'loss': 0.4653, 'grad_norm': 18.690418243408203, 'learning_rate': 4.014560751648747e-07, 'epoch': 0.40981828044480606, 'num_input_tokens_seen': 4846518272, 'completed': '62.68% (2_311 / 3_687)', 'remaining time': '12:15:43', 'throughput': '9035.88', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:56:15,015 >> {'loss': 0.4713, 'grad_norm': 19.980985641479492, 'learning_rate': 4.0107507463482924e-07, 'epoch': 0.4100895036615134, 'num_input_tokens_seen': 4848615424, 'completed': '62.71% (2_312 / 3_687)', 'remaining time': '12:15:07', 'throughput': '9533.87', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:56:44,981 >> {'loss': 0.5677, 'grad_norm': 18.602523803710938, 'learning_rate': 4.0069419395389657e-07, 'epoch': 0.4103607268782208, 'num_input_tokens_seen': 4850712576, 'completed': '62.73% (2_313 / 3_687)', 'remaining time': '12:14:33', 'throughput': '8747.99', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:57:14,941 >> {'loss': 0.4518, 'grad_norm': 14.569324493408203, 'learning_rate': 4.0031343342859526e-07, 'epoch': 0.41063195009492814, 'num_input_tokens_seen': 4852809728, 'completed': '62.76% (2_314 / 3_687)', 'remaining time': '12:13:59', 'throughput': '8749.65', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:57:44,329 >> {'loss': 0.5966, 'grad_norm': 17.369930267333984, 'learning_rate': 3.9993279336534667e-07, 'epoch': 0.41090317331163545, 'num_input_tokens_seen': 4854906880, 'completed': '62.79% (2_315 / 3_687)', 'remaining time': '12:13:25', 'throughput': '8920.19', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:58:15,078 >> {'loss': 0.568, 'grad_norm': 16.838441848754883, 'learning_rate': 3.995522740704758e-07, 'epoch': 0.4111743965283428, 'num_input_tokens_seen': 4857004032, 'completed': '62.82% (2_316 / 3_687)', 'remaining time': '12:12:51', 'throughput': '8525.25', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:58:48,215 >> {'loss': 0.3684, 'grad_norm': 16.305984497070312, 'learning_rate': 3.991718758502094e-07, 'epoch': 0.4114456197450502, 'num_input_tokens_seen': 4859101184, 'completed': '62.84% (2_317 / 3_687)', 'remaining time': '12:12:20', 'throughput': '7911.07', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:59:18,343 >> {'loss': 0.5516, 'grad_norm': 27.06982421875, 'learning_rate': 3.987915990106779e-07, 'epoch': 0.41171684296175753, 'num_input_tokens_seen': 4861198336, 'completed': '62.87% (2_318 / 3_687)', 'remaining time': '12:11:46', 'throughput': '8700.85', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 04:59:48,355 >> {'loss': 0.417, 'grad_norm': 15.275781631469727, 'learning_rate': 3.984114438579137e-07, 'epoch': 0.4119880661784649, 'num_input_tokens_seen': 4863295488, 'completed': '62.90% (2_319 / 3_687)', 'remaining time': '12:11:13', 'throughput': '8734.69', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:00:20,275 >> {'loss': 0.3962, 'grad_norm': 14.07772159576416, 'learning_rate': 3.980314106978512e-07, 'epoch': 0.4122592893951722, 'num_input_tokens_seen': 4865392640, 'completed': '62.92% (2_320 / 3_687)', 'remaining time': '12:10:40', 'throughput': '8212.41', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:00:50,009 >> {'loss': 0.3945, 'grad_norm': 14.107834815979004, 'learning_rate': 3.976514998363265e-07, 'epoch': 0.41253051261187956, 'num_input_tokens_seen': 4867489792, 'completed': '62.95% (2_321 / 3_687)', 'remaining time': '12:10:06', 'throughput': '8816.52', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:01:20,819 >> {'loss': 0.3535, 'grad_norm': 15.831769943237305, 'learning_rate': 3.972717115790773e-07, 'epoch': 0.4128017358285869, 'num_input_tokens_seen': 4869586944, 'completed': '62.98% (2_322 / 3_687)', 'remaining time': '12:09:33', 'throughput': '8508.21', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:01:50,939 >> {'loss': 0.4851, 'grad_norm': 16.84862518310547, 'learning_rate': 3.968920462317431e-07, 'epoch': 0.4130729590452943, 'num_input_tokens_seen': 4871684096, 'completed': '63.01% (2_323 / 3_687)', 'remaining time': '12:08:59', 'throughput': '8703.55', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:02:20,597 >> {'loss': 0.3473, 'grad_norm': 12.755271911621094, 'learning_rate': 3.965125040998637e-07, 'epoch': 0.41334418226200165, 'num_input_tokens_seen': 4873781248, 'completed': '63.03% (2_324 / 3_687)', 'remaining time': '12:08:25', 'throughput': '8838.89', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:02:48,657 >> {'loss': 0.4099, 'grad_norm': 15.923316955566406, 'learning_rate': 3.961330854888805e-07, 'epoch': 0.41361540547870895, 'num_input_tokens_seen': 4875878400, 'completed': '63.06% (2_325 / 3_687)', 'remaining time': '12:07:49', 'throughput': '9342.06', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:03:18,705 >> {'loss': 0.2539, 'grad_norm': 11.70615291595459, 'learning_rate': 3.9575379070413485e-07, 'epoch': 0.4138866286954163, 'num_input_tokens_seen': 4877975552, 'completed': '63.09% (2_326 / 3_687)', 'remaining time': '12:07:15', 'throughput': '8724.17', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:03:48,402 >> {'loss': 0.7379, 'grad_norm': 27.65726661682129, 'learning_rate': 3.953746200508693e-07, 'epoch': 0.4141578519121237, 'num_input_tokens_seen': 4880072704, 'completed': '63.11% (2_327 / 3_687)', 'remaining time': '12:06:41', 'throughput': '8827.35', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:04:17,427 >> {'loss': 0.7167, 'grad_norm': 20.938331604003906, 'learning_rate': 3.9499557383422534e-07, 'epoch': 0.41442907512883104, 'num_input_tokens_seen': 4882169856, 'completed': '63.14% (2_328 / 3_687)', 'remaining time': '12:06:06', 'throughput': '9031.79', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:04:46,610 >> {'loss': 0.2987, 'grad_norm': 13.526237487792969, 'learning_rate': 3.94616652359245e-07, 'epoch': 0.4147002983455384, 'num_input_tokens_seen': 4884267008, 'completed': '63.17% (2_329 / 3_687)', 'remaining time': '12:05:32', 'throughput': '8982.72', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:05:15,985 >> {'loss': 0.2497, 'grad_norm': 10.629281997680664, 'learning_rate': 3.942378559308703e-07, 'epoch': 0.4149715215622457, 'num_input_tokens_seen': 4886364160, 'completed': '63.20% (2_330 / 3_687)', 'remaining time': '12:04:57', 'throughput': '8923.85', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:05:48,585 >> {'loss': 0.3717, 'grad_norm': 15.359818458557129, 'learning_rate': 3.93859184853942e-07, 'epoch': 0.41524274477895307, 'num_input_tokens_seen': 4888461312, 'completed': '63.22% (2_331 / 3_687)', 'remaining time': '12:04:26', 'throughput': '8041.35', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:06:18,725 >> {'loss': 0.3598, 'grad_norm': 13.921889305114746, 'learning_rate': 3.934806394332001e-07, 'epoch': 0.4155139679956604, 'num_input_tokens_seen': 4890558464, 'completed': '63.25% (2_332 / 3_687)', 'remaining time': '12:03:52', 'throughput': '8697.44', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:06:48,906 >> {'loss': 0.5743, 'grad_norm': 21.2630558013916, 'learning_rate': 3.9310221997328363e-07, 'epoch': 0.4157851912123678, 'num_input_tokens_seen': 4892655616, 'completed': '63.28% (2_333 / 3_687)', 'remaining time': '12:03:18', 'throughput': '8685.83', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:07:18,981 >> {'loss': 0.6111, 'grad_norm': 20.155832290649414, 'learning_rate': 3.9272392677873e-07, 'epoch': 0.41605641442907515, 'num_input_tokens_seen': 4894752768, 'completed': '63.30% (2_334 / 3_687)', 'remaining time': '12:02:45', 'throughput': '8716.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:07:49,033 >> {'loss': 0.8151, 'grad_norm': 24.92179298400879, 'learning_rate': 3.9234576015397536e-07, 'epoch': 0.41632763764578246, 'num_input_tokens_seen': 4896849920, 'completed': '63.33% (2_335 / 3_687)', 'remaining time': '12:02:11', 'throughput': '8723.14', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:08:18,425 >> {'loss': 0.325, 'grad_norm': 14.339614868164062, 'learning_rate': 3.9196772040335367e-07, 'epoch': 0.4165988608624898, 'num_input_tokens_seen': 4898947072, 'completed': '63.36% (2_336 / 3_687)', 'remaining time': '12:01:36', 'throughput': '8918.82', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:08:48,056 >> {'loss': 0.6659, 'grad_norm': 24.353408813476562, 'learning_rate': 3.915898078310972e-07, 'epoch': 0.4168700840791972, 'num_input_tokens_seen': 4901044224, 'completed': '63.38% (2_337 / 3_687)', 'remaining time': '12:01:02', 'throughput': '8846.89', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:09:17,905 >> {'loss': 0.5632, 'grad_norm': 20.132877349853516, 'learning_rate': 3.9121202274133525e-07, 'epoch': 0.41714130729590454, 'num_input_tokens_seen': 4903141376, 'completed': '63.41% (2_338 / 3_687)', 'remaining time': '12:00:28', 'throughput': '8782.52', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:09:47,896 >> {'loss': 0.6138, 'grad_norm': 17.58623695373535, 'learning_rate': 3.9083436543809536e-07, 'epoch': 0.4174125305126119, 'num_input_tokens_seen': 4905238528, 'completed': '63.44% (2_339 / 3_687)', 'remaining time': '11:59:54', 'throughput': '8740.64', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:10:17,267 >> {'loss': 0.3483, 'grad_norm': 14.02330493927002, 'learning_rate': 3.904568362253011e-07, 'epoch': 0.4176837537293192, 'num_input_tokens_seen': 4907335680, 'completed': '63.47% (2_340 / 3_687)', 'remaining time': '11:59:20', 'throughput': '8925.33', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:10:46,695 >> {'loss': 0.4415, 'grad_norm': 15.86801528930664, 'learning_rate': 3.9007943540677426e-07, 'epoch': 0.41795497694602657, 'num_input_tokens_seen': 4909432832, 'completed': '63.49% (2_341 / 3_687)', 'remaining time': '11:58:46', 'throughput': '8908.00', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:11:17,362 >> {'loss': 0.3732, 'grad_norm': 14.43091869354248, 'learning_rate': 3.897021632862321e-07, 'epoch': 0.41822620016273393, 'num_input_tokens_seen': 4911529984, 'completed': '63.52% (2_342 / 3_687)', 'remaining time': '11:58:12', 'throughput': '8548.15', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:11:47,558 >> {'loss': 0.4415, 'grad_norm': 15.3654203414917, 'learning_rate': 3.893250201672893e-07, 'epoch': 0.4184974233794413, 'num_input_tokens_seen': 4913627136, 'completed': '63.55% (2_343 / 3_687)', 'remaining time': '11:57:39', 'throughput': '8681.16', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:12:20,789 >> {'loss': 0.4123, 'grad_norm': 15.816032409667969, 'learning_rate': 3.889480063534563e-07, 'epoch': 0.41876864659614865, 'num_input_tokens_seen': 4915724288, 'completed': '63.57% (2_344 / 3_687)', 'remaining time': '11:57:08', 'throughput': '7888.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:12:53,423 >> {'loss': 0.2871, 'grad_norm': 12.658620834350586, 'learning_rate': 3.88571122148139e-07, 'epoch': 0.41903986981285596, 'num_input_tokens_seen': 4917821440, 'completed': '63.60% (2_345 / 3_687)', 'remaining time': '11:56:36', 'throughput': '8032.75', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:13:25,600 >> {'loss': 0.4812, 'grad_norm': 14.271703720092773, 'learning_rate': 3.881943678546399e-07, 'epoch': 0.4193110930295633, 'num_input_tokens_seen': 4919918592, 'completed': '63.63% (2_346 / 3_687)', 'remaining time': '11:56:04', 'throughput': '8146.94', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:13:53,513 >> {'loss': 0.9452, 'grad_norm': 25.85860824584961, 'learning_rate': 3.878177437761564e-07, 'epoch': 0.4195823162462707, 'num_input_tokens_seen': 4922015744, 'completed': '63.66% (2_347 / 3_687)', 'remaining time': '11:55:29', 'throughput': '9391.30', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:14:20,384 >> {'loss': 0.47, 'grad_norm': 17.978992462158203, 'learning_rate': 3.8744125021578123e-07, 'epoch': 0.41985353946297804, 'num_input_tokens_seen': 4924112896, 'completed': '63.68% (2_348 / 3_687)', 'remaining time': '11:54:52', 'throughput': '9755.72', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:14:52,141 >> {'loss': 0.6245, 'grad_norm': 21.466154098510742, 'learning_rate': 3.87064887476502e-07, 'epoch': 0.4201247626796854, 'num_input_tokens_seen': 4926210048, 'completed': '63.71% (2_349 / 3_687)', 'remaining time': '11:54:20', 'throughput': '8254.79', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:15:22,943 >> {'loss': 0.3928, 'grad_norm': 17.588224411010742, 'learning_rate': 3.8668865586120124e-07, 'epoch': 0.4203959858963927, 'num_input_tokens_seen': 4928307200, 'completed': '63.74% (2_350 / 3_687)', 'remaining time': '11:53:47', 'throughput': '8510.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:15:54,613 >> {'loss': 0.5021, 'grad_norm': 19.852426528930664, 'learning_rate': 3.8631255567265573e-07, 'epoch': 0.42066720911310007, 'num_input_tokens_seen': 4930404352, 'completed': '63.76% (2_351 / 3_687)', 'remaining time': '11:53:15', 'throughput': '8277.52', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:16:25,633 >> {'loss': 0.3786, 'grad_norm': 17.130460739135742, 'learning_rate': 3.859365872135367e-07, 'epoch': 0.42093843232980743, 'num_input_tokens_seen': 4932501504, 'completed': '63.79% (2_352 / 3_687)', 'remaining time': '11:52:42', 'throughput': '8450.90', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:16:56,279 >> {'loss': 0.4072, 'grad_norm': 15.021489143371582, 'learning_rate': 3.8556075078640925e-07, 'epoch': 0.4212096555465148, 'num_input_tokens_seen': 4934598656, 'completed': '63.82% (2_353 / 3_687)', 'remaining time': '11:52:08', 'throughput': '8553.82', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:17:25,169 >> {'loss': 0.5125, 'grad_norm': 18.32381820678711, 'learning_rate': 3.85185046693732e-07, 'epoch': 0.42148087876322216, 'num_input_tokens_seen': 4936695808, 'completed': '63.85% (2_354 / 3_687)', 'remaining time': '11:51:34', 'throughput': '9073.89', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:17:55,054 >> {'loss': 0.4148, 'grad_norm': 14.185894966125488, 'learning_rate': 3.8480947523785767e-07, 'epoch': 0.42175210197992946, 'num_input_tokens_seen': 4938792960, 'completed': '63.87% (2_355 / 3_687)', 'remaining time': '11:51:00', 'throughput': '8771.70', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:18:25,535 >> {'loss': 0.497, 'grad_norm': 17.715118408203125, 'learning_rate': 3.844340367210318e-07, 'epoch': 0.4220233251966368, 'num_input_tokens_seen': 4940890112, 'completed': '63.90% (2_356 / 3_687)', 'remaining time': '11:50:26', 'throughput': '8600.37', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:18:58,080 >> {'loss': 0.3361, 'grad_norm': 13.911568641662598, 'learning_rate': 3.840587314453928e-07, 'epoch': 0.4222945484133442, 'num_input_tokens_seen': 4942987264, 'completed': '63.93% (2_357 / 3_687)', 'remaining time': '11:49:55', 'throughput': '8054.74', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:19:29,956 >> {'loss': 0.3143, 'grad_norm': 16.71425437927246, 'learning_rate': 3.8368355971297204e-07, 'epoch': 0.42256577163005155, 'num_input_tokens_seen': 4945084416, 'completed': '63.95% (2_358 / 3_687)', 'remaining time': '11:49:23', 'throughput': '8223.98', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:20:02,277 >> {'loss': 0.2808, 'grad_norm': 12.690278053283691, 'learning_rate': 3.8330852182569374e-07, 'epoch': 0.4228369948467589, 'num_input_tokens_seen': 4947181568, 'completed': '63.98% (2_359 / 3_687)', 'remaining time': '11:48:51', 'throughput': '8110.61', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:20:31,274 >> {'loss': 0.532, 'grad_norm': 17.643646240234375, 'learning_rate': 3.8293361808537404e-07, 'epoch': 0.4231082180634662, 'num_input_tokens_seen': 4949278720, 'completed': '64.01% (2_360 / 3_687)', 'remaining time': '11:48:16', 'throughput': '9040.24', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:21:00,195 >> {'loss': 0.5061, 'grad_norm': 19.790626525878906, 'learning_rate': 3.825588487937211e-07, 'epoch': 0.4233794412801736, 'num_input_tokens_seen': 4951375872, 'completed': '64.04% (2_361 / 3_687)', 'remaining time': '11:47:42', 'throughput': '9064.19', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:21:29,983 >> {'loss': 0.2747, 'grad_norm': 17.743762969970703, 'learning_rate': 3.821842142523352e-07, 'epoch': 0.42365066449688094, 'num_input_tokens_seen': 4953473024, 'completed': '64.06% (2_362 / 3_687)', 'remaining time': '11:47:08', 'throughput': '8800.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:22:01,516 >> {'loss': 0.4853, 'grad_norm': 18.460596084594727, 'learning_rate': 3.818097147627076e-07, 'epoch': 0.4239218877135883, 'num_input_tokens_seen': 4955570176, 'completed': '64.09% (2_363 / 3_687)', 'remaining time': '11:46:35', 'throughput': '8313.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:22:31,524 >> {'loss': 0.563, 'grad_norm': 20.445358276367188, 'learning_rate': 3.8143535062622154e-07, 'epoch': 0.42419311093029566, 'num_input_tokens_seen': 4957667328, 'completed': '64.12% (2_364 / 3_687)', 'remaining time': '11:46:02', 'throughput': '8735.92', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:23:01,675 >> {'loss': 0.6022, 'grad_norm': 21.970741271972656, 'learning_rate': 3.8106112214415087e-07, 'epoch': 0.42446433414700296, 'num_input_tokens_seen': 4959764480, 'completed': '64.14% (2_365 / 3_687)', 'remaining time': '11:45:28', 'throughput': '8694.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:23:32,295 >> {'loss': 0.4444, 'grad_norm': 15.904183387756348, 'learning_rate': 3.8068702961766053e-07, 'epoch': 0.4247355573637103, 'num_input_tokens_seen': 4961861632, 'completed': '64.17% (2_366 / 3_687)', 'remaining time': '11:44:55', 'throughput': '8561.29', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:24:03,108 >> {'loss': 0.1744, 'grad_norm': 9.053658485412598, 'learning_rate': 3.8031307334780594e-07, 'epoch': 0.4250067805804177, 'num_input_tokens_seen': 4963958784, 'completed': '64.20% (2_367 / 3_687)', 'remaining time': '11:44:22', 'throughput': '8507.66', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:24:32,903 >> {'loss': 0.3963, 'grad_norm': 14.599189758300781, 'learning_rate': 3.7993925363553294e-07, 'epoch': 0.42527800379712505, 'num_input_tokens_seen': 4966055936, 'completed': '64.23% (2_368 / 3_687)', 'remaining time': '11:43:48', 'throughput': '8798.35', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:25:05,423 >> {'loss': 0.5241, 'grad_norm': 18.069921493530273, 'learning_rate': 3.795655707816772e-07, 'epoch': 0.4255492270138324, 'num_input_tokens_seen': 4968153088, 'completed': '64.25% (2_369 / 3_687)', 'remaining time': '11:43:16', 'throughput': '8060.82', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:25:35,150 >> {'loss': 0.515, 'grad_norm': 19.06882095336914, 'learning_rate': 3.791920250869645e-07, 'epoch': 0.4258204502305397, 'num_input_tokens_seen': 4970250240, 'completed': '64.28% (2_370 / 3_687)', 'remaining time': '11:42:42', 'throughput': '8818.34', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:26:05,006 >> {'loss': 0.94, 'grad_norm': 26.823545455932617, 'learning_rate': 3.7881861685201033e-07, 'epoch': 0.4260916734472471, 'num_input_tokens_seen': 4972347392, 'completed': '64.31% (2_371 / 3_687)', 'remaining time': '11:42:09', 'throughput': '8780.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:26:33,888 >> {'loss': 0.2774, 'grad_norm': 12.676380157470703, 'learning_rate': 3.784453463773194e-07, 'epoch': 0.42636289666395444, 'num_input_tokens_seen': 4974444544, 'completed': '64.33% (2_372 / 3_687)', 'remaining time': '11:41:34', 'throughput': '9076.35', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:27:04,174 >> {'loss': 0.5369, 'grad_norm': 19.58039665222168, 'learning_rate': 3.780722139632858e-07, 'epoch': 0.4266341198806618, 'num_input_tokens_seen': 4976541696, 'completed': '64.36% (2_373 / 3_687)', 'remaining time': '11:41:00', 'throughput': '8655.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:27:32,575 >> {'loss': 0.7374, 'grad_norm': 23.31465721130371, 'learning_rate': 3.776992199101918e-07, 'epoch': 0.42690534309736916, 'num_input_tokens_seen': 4978638848, 'completed': '64.39% (2_374 / 3_687)', 'remaining time': '11:40:25', 'throughput': '9229.93', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:28:02,956 >> {'loss': 0.5331, 'grad_norm': 18.41512107849121, 'learning_rate': 3.773263645182091e-07, 'epoch': 0.42717656631407647, 'num_input_tokens_seen': 4980736000, 'completed': '64.42% (2_375 / 3_687)', 'remaining time': '11:39:52', 'throughput': '8628.48', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:28:34,765 >> {'loss': 0.3192, 'grad_norm': 15.421567916870117, 'learning_rate': 3.769536480873976e-07, 'epoch': 0.42744778953078383, 'num_input_tokens_seen': 4982833152, 'completed': '64.44% (2_376 / 3_687)', 'remaining time': '11:39:20', 'throughput': '8241.33', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:29:03,527 >> {'loss': 0.6033, 'grad_norm': 19.701927185058594, 'learning_rate': 3.765810709177052e-07, 'epoch': 0.4277190127474912, 'num_input_tokens_seen': 4984930304, 'completed': '64.47% (2_377 / 3_687)', 'remaining time': '11:38:45', 'throughput': '9114.17', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:29:32,064 >> {'loss': 0.6628, 'grad_norm': 20.730276107788086, 'learning_rate': 3.762086333089678e-07, 'epoch': 0.42799023596419855, 'num_input_tokens_seen': 4987027456, 'completed': '64.50% (2_378 / 3_687)', 'remaining time': '11:38:10', 'throughput': '9185.94', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:30:00,987 >> {'loss': 0.4378, 'grad_norm': 17.648303985595703, 'learning_rate': 3.758363355609092e-07, 'epoch': 0.4282614591809059, 'num_input_tokens_seen': 4989124608, 'completed': '64.52% (2_379 / 3_687)', 'remaining time': '11:37:36', 'throughput': '9063.65', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:30:29,919 >> {'loss': 0.3577, 'grad_norm': 14.124812126159668, 'learning_rate': 3.7546417797314023e-07, 'epoch': 0.4285326823976132, 'num_input_tokens_seen': 4991221760, 'completed': '64.55% (2_380 / 3_687)', 'remaining time': '11:37:01', 'throughput': '9060.72', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:30:57,834 >> {'loss': 0.4176, 'grad_norm': 16.618432998657227, 'learning_rate': 3.7509216084515916e-07, 'epoch': 0.4288039056143206, 'num_input_tokens_seen': 4993318912, 'completed': '64.58% (2_381 / 3_687)', 'remaining time': '11:36:26', 'throughput': '9390.74', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:31:27,067 >> {'loss': 0.553, 'grad_norm': 18.444599151611328, 'learning_rate': 3.747202844763514e-07, 'epoch': 0.42907512883102794, 'num_input_tokens_seen': 4995416064, 'completed': '64.61% (2_382 / 3_687)', 'remaining time': '11:35:52', 'throughput': '8967.44', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:31:56,539 >> {'loss': 0.6629, 'grad_norm': 21.762187957763672, 'learning_rate': 3.743485491659887e-07, 'epoch': 0.4293463520477353, 'num_input_tokens_seen': 4997513216, 'completed': '64.63% (2_383 / 3_687)', 'remaining time': '11:35:18', 'throughput': '8894.69', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:32:26,743 >> {'loss': 0.4474, 'grad_norm': 16.13730239868164, 'learning_rate': 3.739769552132298e-07, 'epoch': 0.42961757526444266, 'num_input_tokens_seen': 4999610368, 'completed': '64.66% (2_384 / 3_687)', 'remaining time': '11:34:44', 'throughput': '8679.24', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:32:59,749 >> {'loss': 0.2769, 'grad_norm': 12.197819709777832, 'learning_rate': 3.73605502917119e-07, 'epoch': 0.42988879848114997, 'num_input_tokens_seen': 5001707520, 'completed': '64.69% (2_385 / 3_687)', 'remaining time': '11:34:13', 'throughput': '7942.24', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:33:32,168 >> {'loss': 0.5748, 'grad_norm': 16.63471031188965, 'learning_rate': 3.7323419257658716e-07, 'epoch': 0.43016002169785733, 'num_input_tokens_seen': 5003804672, 'completed': '64.71% (2_386 / 3_687)', 'remaining time': '11:33:41', 'throughput': '8086.04', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:34:02,012 >> {'loss': 0.3293, 'grad_norm': 13.66933822631836, 'learning_rate': 3.728630244904507e-07, 'epoch': 0.4304312449145647, 'num_input_tokens_seen': 5005901824, 'completed': '64.74% (2_387 / 3_687)', 'remaining time': '11:33:08', 'throughput': '8783.95', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:34:32,970 >> {'loss': 0.4252, 'grad_norm': 15.115860939025879, 'learning_rate': 3.724919989574116e-07, 'epoch': 0.43070246813127205, 'num_input_tokens_seen': 5007998976, 'completed': '64.77% (2_388 / 3_687)', 'remaining time': '11:32:35', 'throughput': '8467.63', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:35:01,103 >> {'loss': 0.7401, 'grad_norm': 21.96157455444336, 'learning_rate': 3.7212111627605704e-07, 'epoch': 0.43097369134797936, 'num_input_tokens_seen': 5010096128, 'completed': '64.80% (2_389 / 3_687)', 'remaining time': '11:32:00', 'throughput': '9317.89', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:35:32,660 >> {'loss': 0.4966, 'grad_norm': 17.18273162841797, 'learning_rate': 3.717503767448593e-07, 'epoch': 0.4312449145646867, 'num_input_tokens_seen': 5012193280, 'completed': '64.82% (2_390 / 3_687)', 'remaining time': '11:31:27', 'throughput': '8307.12', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:36:02,650 >> {'loss': 0.4974, 'grad_norm': 23.42146110534668, 'learning_rate': 3.7137978066217555e-07, 'epoch': 0.4315161377813941, 'num_input_tokens_seen': 5014290432, 'completed': '64.85% (2_391 / 3_687)', 'remaining time': '11:30:54', 'throughput': '8741.10', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:36:32,995 >> {'loss': 0.7683, 'grad_norm': 21.250967025756836, 'learning_rate': 3.710093283262472e-07, 'epoch': 0.43178736099810144, 'num_input_tokens_seen': 5016387584, 'completed': '64.88% (2_392 / 3_687)', 'remaining time': '11:30:20', 'throughput': '8638.68', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:37:01,832 >> {'loss': 0.347, 'grad_norm': 14.832980155944824, 'learning_rate': 3.706390200352003e-07, 'epoch': 0.4320585842148088, 'num_input_tokens_seen': 5018484736, 'completed': '64.90% (2_393 / 3_687)', 'remaining time': '11:29:46', 'throughput': '9090.47', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:37:32,462 >> {'loss': 0.4751, 'grad_norm': 20.22023582458496, 'learning_rate': 3.7026885608704494e-07, 'epoch': 0.4323298074315161, 'num_input_tokens_seen': 5020581888, 'completed': '64.93% (2_394 / 3_687)', 'remaining time': '11:29:13', 'throughput': '8558.65', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:38:04,751 >> {'loss': 0.3332, 'grad_norm': 15.793986320495605, 'learning_rate': 3.6989883677967483e-07, 'epoch': 0.4326010306482235, 'num_input_tokens_seen': 5022679040, 'completed': '64.96% (2_395 / 3_687)', 'remaining time': '11:28:41', 'throughput': '8118.62', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:38:37,920 >> {'loss': 0.4048, 'grad_norm': 15.396735191345215, 'learning_rate': 3.6952896241086783e-07, 'epoch': 0.43287225386493083, 'num_input_tokens_seen': 5024776192, 'completed': '64.99% (2_396 / 3_687)', 'remaining time': '11:28:10', 'throughput': '7903.32', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:39:09,955 >> {'loss': 0.3603, 'grad_norm': 16.177085876464844, 'learning_rate': 3.6915923327828423e-07, 'epoch': 0.4331434770816382, 'num_input_tokens_seen': 5026873344, 'completed': '65.01% (2_397 / 3_687)', 'remaining time': '11:27:38', 'throughput': '8182.91', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:39:42,173 >> {'loss': 0.3433, 'grad_norm': 12.686070442199707, 'learning_rate': 3.6878964967946813e-07, 'epoch': 0.43341470029834556, 'num_input_tokens_seen': 5028970496, 'completed': '65.04% (2_398 / 3_687)', 'remaining time': '11:27:06', 'throughput': '8136.81', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:40:12,990 >> {'loss': 0.4353, 'grad_norm': 16.852243423461914, 'learning_rate': 3.6842021191184636e-07, 'epoch': 0.43368592351505286, 'num_input_tokens_seen': 5031067648, 'completed': '65.07% (2_399 / 3_687)', 'remaining time': '11:26:33', 'throughput': '8506.29', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:40:42,545 >> {'loss': 0.8005, 'grad_norm': 25.399932861328125, 'learning_rate': 3.6805092027272853e-07, 'epoch': 0.4339571467317602, 'num_input_tokens_seen': 5033164800, 'completed': '65.09% (2_400 / 3_687)', 'remaining time': '11:25:59', 'throughput': '8869.71', 'gpu_mem_free': '5581MB'} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( [INFO|trainer.py:3503] 2024-12-26 05:41:05,933 >> Saving model checkpoint to /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2400 [INFO|configuration_utils.py:472] 2024-12-26 05:41:05,936 >> Configuration saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2400/config.json [INFO|configuration_utils.py:807] 2024-12-26 05:41:05,938 >> Configuration saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2400/generation_config.json [INFO|modeling_utils.py:2807] 2024-12-26 05:42:04,546 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 7 checkpoint shards. You can find where each parameters has been saved in the index located at /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2400/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2684] 2024-12-26 05:42:04,550 >> tokenizer config file saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2400/tokenizer_config.json [INFO|tokenization_utils_base.py:2693] 2024-12-26 05:42:04,550 >> Special tokens file saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2400/special_tokens_map.json /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( [WARNING|trainer.py:868] 2024-12-26 05:45:49,977 >> Save streaming dataset state: {'epoch': 0, 'sample_in_epoch': 4800, 'num_canonical_nodes': 1, 'shuffle_seed': 42, 'initial_physical_nodes': 1} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] [INFO|trainer.py:175] 2024-12-26 05:46:23,281 >> {'loss': 0.4201, 'grad_norm': 16.505773544311523, 'learning_rate': 3.6768177505930655e-07, 'epoch': 0.4342283699484676, 'num_input_tokens_seen': 5035261952, 'completed': '65.12% (2_401 / 3_687)', 'remaining time': '11:29:35', 'throughput': '769.35', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:46:55,393 >> {'loss': 0.5035, 'grad_norm': 18.550128936767578, 'learning_rate': 3.67312776568654e-07, 'epoch': 0.43449959316517495, 'num_input_tokens_seen': 5037359104, 'completed': '65.15% (2_402 / 3_687)', 'remaining time': '11:29:03', 'throughput': '8163.39', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:47:23,729 >> {'loss': 0.6078, 'grad_norm': 19.673357009887695, 'learning_rate': 3.669439250977272e-07, 'epoch': 0.4347708163818823, 'num_input_tokens_seen': 5039456256, 'completed': '65.17% (2_403 / 3_687)', 'remaining time': '11:28:28', 'throughput': '9251.23', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:47:52,824 >> {'loss': 0.5501, 'grad_norm': 17.886112213134766, 'learning_rate': 3.665752209433637e-07, 'epoch': 0.4350420395985896, 'num_input_tokens_seen': 5041553408, 'completed': '65.20% (2_404 / 3_687)', 'remaining time': '11:27:53', 'throughput': '9009.93', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:48:23,067 >> {'loss': 0.338, 'grad_norm': 13.1587495803833, 'learning_rate': 3.6620666440228254e-07, 'epoch': 0.435313262815297, 'num_input_tokens_seen': 5043650560, 'completed': '65.23% (2_405 / 3_687)', 'remaining time': '11:27:20', 'throughput': '8667.84', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:48:52,577 >> {'loss': 0.2726, 'grad_norm': 16.396703720092773, 'learning_rate': 3.6583825577108397e-07, 'epoch': 0.43558448603200434, 'num_input_tokens_seen': 5045747712, 'completed': '65.26% (2_406 / 3_687)', 'remaining time': '11:26:45', 'throughput': '8883.36', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:49:22,582 >> {'loss': 0.5378, 'grad_norm': 24.322254180908203, 'learning_rate': 3.654699953462494e-07, 'epoch': 0.4358557092487117, 'num_input_tokens_seen': 5047844864, 'completed': '65.28% (2_407 / 3_687)', 'remaining time': '11:26:11', 'throughput': '8736.65', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:49:52,389 >> {'loss': 0.333, 'grad_norm': 13.831670761108398, 'learning_rate': 3.651018834241406e-07, 'epoch': 0.43612693246541906, 'num_input_tokens_seen': 5049942016, 'completed': '65.31% (2_408 / 3_687)', 'remaining time': '11:25:37', 'throughput': '8794.70', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:50:26,276 >> {'loss': 0.3657, 'grad_norm': 13.609641075134277, 'learning_rate': 3.6473392030100014e-07, 'epoch': 0.43639815568212637, 'num_input_tokens_seen': 5052039168, 'completed': '65.34% (2_409 / 3_687)', 'remaining time': '11:25:06', 'throughput': '7735.88', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:50:57,757 >> {'loss': 0.4042, 'grad_norm': 15.657468795776367, 'learning_rate': 3.6436610627295074e-07, 'epoch': 0.4366693788988337, 'num_input_tokens_seen': 5054136320, 'completed': '65.36% (2_410 / 3_687)', 'remaining time': '11:24:34', 'throughput': '8326.93', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:51:25,847 >> {'loss': 0.6303, 'grad_norm': 18.23346519470215, 'learning_rate': 3.639984416359949e-07, 'epoch': 0.4369406021155411, 'num_input_tokens_seen': 5056233472, 'completed': '65.39% (2_411 / 3_687)', 'remaining time': '11:23:58', 'throughput': '9332.39', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:51:56,887 >> {'loss': 0.2942, 'grad_norm': 10.519538879394531, 'learning_rate': 3.636309266860156e-07, 'epoch': 0.43721182533224845, 'num_input_tokens_seen': 5058330624, 'completed': '65.42% (2_412 / 3_687)', 'remaining time': '11:23:25', 'throughput': '8445.36', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:52:26,795 >> {'loss': 0.5607, 'grad_norm': 17.80950927734375, 'learning_rate': 3.6326356171877483e-07, 'epoch': 0.4374830485489558, 'num_input_tokens_seen': 5060427776, 'completed': '65.45% (2_413 / 3_687)', 'remaining time': '11:22:51', 'throughput': '8765.03', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:52:59,400 >> {'loss': 0.348, 'grad_norm': 17.949668884277344, 'learning_rate': 3.6289634702991343e-07, 'epoch': 0.4377542717656631, 'num_input_tokens_seen': 5062524928, 'completed': '65.47% (2_414 / 3_687)', 'remaining time': '11:22:20', 'throughput': '8039.97', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:53:31,454 >> {'loss': 0.2518, 'grad_norm': 12.619994163513184, 'learning_rate': 3.625292829149521e-07, 'epoch': 0.4380254949823705, 'num_input_tokens_seen': 5064622080, 'completed': '65.50% (2_415 / 3_687)', 'remaining time': '11:21:47', 'throughput': '8178.23', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:54:01,385 >> {'loss': 0.785, 'grad_norm': 23.58609962463379, 'learning_rate': 3.6216236966929015e-07, 'epoch': 0.43829671819907784, 'num_input_tokens_seen': 5066719232, 'completed': '65.53% (2_416 / 3_687)', 'remaining time': '11:21:13', 'throughput': '8758.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:54:32,484 >> {'loss': 0.357, 'grad_norm': 17.113204956054688, 'learning_rate': 3.6179560758820527e-07, 'epoch': 0.4385679414157852, 'num_input_tokens_seen': 5068816384, 'completed': '65.55% (2_417 / 3_687)', 'remaining time': '11:20:40', 'throughput': '8429.37', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:55:02,465 >> {'loss': 0.6071, 'grad_norm': 29.91645622253418, 'learning_rate': 3.61428996966854e-07, 'epoch': 0.43883916463249256, 'num_input_tokens_seen': 5070913536, 'completed': '65.58% (2_418 / 3_687)', 'remaining time': '11:20:07', 'throughput': '8743.66', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:55:37,881 >> {'loss': 0.3174, 'grad_norm': 16.93893814086914, 'learning_rate': 3.610625381002701e-07, 'epoch': 0.43911038784919987, 'num_input_tokens_seen': 5073010688, 'completed': '65.61% (2_419 / 3_687)', 'remaining time': '11:19:37', 'throughput': '7401.88', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:56:08,360 >> {'loss': 0.5038, 'grad_norm': 17.732091903686523, 'learning_rate': 3.606962312833659e-07, 'epoch': 0.43938161106590723, 'num_input_tokens_seen': 5075107840, 'completed': '65.64% (2_420 / 3_687)', 'remaining time': '11:19:04', 'throughput': '8600.81', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:56:40,487 >> {'loss': 0.3963, 'grad_norm': 16.852190017700195, 'learning_rate': 3.603300768109311e-07, 'epoch': 0.4396528342826146, 'num_input_tokens_seen': 5077204992, 'completed': '65.66% (2_421 / 3_687)', 'remaining time': '11:18:31', 'throughput': '8159.55', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:57:11,479 >> {'loss': 0.67, 'grad_norm': 22.718647003173828, 'learning_rate': 3.5996407497763305e-07, 'epoch': 0.43992405749932195, 'num_input_tokens_seen': 5079302144, 'completed': '65.69% (2_422 / 3_687)', 'remaining time': '11:17:58', 'throughput': '8458.34', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:57:42,872 >> {'loss': 0.3855, 'grad_norm': 14.987257957458496, 'learning_rate': 3.5959822607801617e-07, 'epoch': 0.4401952807160293, 'num_input_tokens_seen': 5081399296, 'completed': '65.72% (2_423 / 3_687)', 'remaining time': '11:17:26', 'throughput': '8350.48', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:58:13,980 >> {'loss': 0.5356, 'grad_norm': 19.861112594604492, 'learning_rate': 3.592325304065018e-07, 'epoch': 0.4404665039327366, 'num_input_tokens_seen': 5083496448, 'completed': '65.74% (2_424 / 3_687)', 'remaining time': '11:16:53', 'throughput': '8426.93', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:58:43,550 >> {'loss': 0.4618, 'grad_norm': 17.07244873046875, 'learning_rate': 3.588669882573875e-07, 'epoch': 0.440737727149444, 'num_input_tokens_seen': 5085593600, 'completed': '65.77% (2_425 / 3_687)', 'remaining time': '11:16:18', 'throughput': '8865.13', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:59:12,283 >> {'loss': 0.9808, 'grad_norm': 27.226409912109375, 'learning_rate': 3.5850159992484787e-07, 'epoch': 0.44100895036615134, 'num_input_tokens_seen': 5087690752, 'completed': '65.80% (2_426 / 3_687)', 'remaining time': '11:15:44', 'throughput': '9123.56', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 05:59:43,183 >> {'loss': 0.344, 'grad_norm': 16.227130889892578, 'learning_rate': 3.581363657029336e-07, 'epoch': 0.4412801735828587, 'num_input_tokens_seen': 5089787904, 'completed': '65.83% (2_427 / 3_687)', 'remaining time': '11:15:10', 'throughput': '8483.48', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:00:16,044 >> {'loss': 0.202, 'grad_norm': 9.734285354614258, 'learning_rate': 3.5777128588557126e-07, 'epoch': 0.44155139679956606, 'num_input_tokens_seen': 5091885056, 'completed': '65.85% (2_428 / 3_687)', 'remaining time': '11:14:39', 'throughput': '7977.56', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:00:47,086 >> {'loss': 0.6473, 'grad_norm': 18.710073471069336, 'learning_rate': 3.574063607665633e-07, 'epoch': 0.44182262001627337, 'num_input_tokens_seen': 5093982208, 'completed': '65.88% (2_429 / 3_687)', 'remaining time': '11:14:06', 'throughput': '8444.74', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:01:16,563 >> {'loss': 0.5162, 'grad_norm': 18.220901489257812, 'learning_rate': 3.570415906395873e-07, 'epoch': 0.44209384323298073, 'num_input_tokens_seen': 5096079360, 'completed': '65.91% (2_430 / 3_687)', 'remaining time': '11:13:32', 'throughput': '8893.14', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:01:46,517 >> {'loss': 0.4738, 'grad_norm': 17.251747131347656, 'learning_rate': 3.5667697579819655e-07, 'epoch': 0.4423650664496881, 'num_input_tokens_seen': 5098176512, 'completed': '65.93% (2_431 / 3_687)', 'remaining time': '11:12:58', 'throughput': '8751.57', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:02:16,919 >> {'loss': 0.5164, 'grad_norm': 19.4804744720459, 'learning_rate': 3.563125165358193e-07, 'epoch': 0.44263628966639545, 'num_input_tokens_seen': 5100273664, 'completed': '65.96% (2_432 / 3_687)', 'remaining time': '11:12:24', 'throughput': '8622.54', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:02:46,297 >> {'loss': 0.4864, 'grad_norm': 17.455604553222656, 'learning_rate': 3.559482131457583e-07, 'epoch': 0.4429075128831028, 'num_input_tokens_seen': 5102370816, 'completed': '65.99% (2_433 / 3_687)', 'remaining time': '11:11:50', 'throughput': '8923.21', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:03:17,921 >> {'loss': 0.4475, 'grad_norm': 16.524316787719727, 'learning_rate': 3.5558406592119115e-07, 'epoch': 0.4431787360998101, 'num_input_tokens_seen': 5104467968, 'completed': '66.02% (2_434 / 3_687)', 'remaining time': '11:11:17', 'throughput': '8289.32', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:03:47,096 >> {'loss': 0.5127, 'grad_norm': 15.812894821166992, 'learning_rate': 3.552200751551697e-07, 'epoch': 0.4434499593165175, 'num_input_tokens_seen': 5106565120, 'completed': '66.04% (2_435 / 3_687)', 'remaining time': '11:10:43', 'throughput': '8985.19', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:04:17,048 >> {'loss': 0.4391, 'grad_norm': 17.802688598632812, 'learning_rate': 3.548562411406201e-07, 'epoch': 0.44372118253322484, 'num_input_tokens_seen': 5108662272, 'completed': '66.07% (2_436 / 3_687)', 'remaining time': '11:10:09', 'throughput': '8752.11', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:04:47,664 >> {'loss': 0.465, 'grad_norm': 21.305082321166992, 'learning_rate': 3.544925641703413e-07, 'epoch': 0.4439924057499322, 'num_input_tokens_seen': 5110759424, 'completed': '66.10% (2_437 / 3_687)', 'remaining time': '11:09:36', 'throughput': '8562.38', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:05:16,343 >> {'loss': 0.6903, 'grad_norm': 23.760269165039062, 'learning_rate': 3.5412904453700754e-07, 'epoch': 0.44426362896663957, 'num_input_tokens_seen': 5112856576, 'completed': '66.12% (2_438 / 3_687)', 'remaining time': '11:09:01', 'throughput': '9140.56', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:05:44,450 >> {'loss': 0.6868, 'grad_norm': 35.48469543457031, 'learning_rate': 3.537656825331653e-07, 'epoch': 0.4445348521833469, 'num_input_tokens_seen': 5114953728, 'completed': '66.15% (2_439 / 3_687)', 'remaining time': '11:08:26', 'throughput': '9326.67', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:06:17,643 >> {'loss': 0.5361, 'grad_norm': 18.90154457092285, 'learning_rate': 3.534024784512345e-07, 'epoch': 0.44480607540005423, 'num_input_tokens_seen': 5117050880, 'completed': '66.18% (2_440 / 3_687)', 'remaining time': '11:07:55', 'throughput': '7897.63', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:06:48,821 >> {'loss': 0.4076, 'grad_norm': 16.816068649291992, 'learning_rate': 3.5303943258350813e-07, 'epoch': 0.4450772986167616, 'num_input_tokens_seen': 5119148032, 'completed': '66.21% (2_441 / 3_687)', 'remaining time': '11:07:22', 'throughput': '8408.03', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:07:20,078 >> {'loss': 0.3967, 'grad_norm': 31.906917572021484, 'learning_rate': 3.526765452221512e-07, 'epoch': 0.44534852183346896, 'num_input_tokens_seen': 5121245184, 'completed': '66.23% (2_442 / 3_687)', 'remaining time': '11:06:49', 'throughput': '8386.59', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:07:52,329 >> {'loss': 0.468, 'grad_norm': 16.148090362548828, 'learning_rate': 3.523138166592021e-07, 'epoch': 0.4456197450501763, 'num_input_tokens_seen': 5123342336, 'completed': '66.26% (2_443 / 3_687)', 'remaining time': '11:06:17', 'throughput': '8128.36', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:08:22,275 >> {'loss': 0.7676, 'grad_norm': 22.877065658569336, 'learning_rate': 3.5195124718657075e-07, 'epoch': 0.4458909682668836, 'num_input_tokens_seen': 5125439488, 'completed': '66.29% (2_444 / 3_687)', 'remaining time': '11:05:43', 'throughput': '8753.84', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:08:52,076 >> {'loss': 0.3127, 'grad_norm': 16.407276153564453, 'learning_rate': 3.5158883709603946e-07, 'epoch': 0.446162191483591, 'num_input_tokens_seen': 5127536640, 'completed': '66.31% (2_445 / 3_687)', 'remaining time': '11:05:09', 'throughput': '8796.64', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:09:26,225 >> {'loss': 0.5557, 'grad_norm': 17.211170196533203, 'learning_rate': 3.5122658667926177e-07, 'epoch': 0.44643341470029835, 'num_input_tokens_seen': 5129633792, 'completed': '66.34% (2_446 / 3_687)', 'remaining time': '11:04:39', 'throughput': '7676.36', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:09:55,282 >> {'loss': 0.6258, 'grad_norm': 17.69900131225586, 'learning_rate': 3.5086449622776346e-07, 'epoch': 0.4467046379170057, 'num_input_tokens_seen': 5131730944, 'completed': '66.37% (2_447 / 3_687)', 'remaining time': '11:04:04', 'throughput': '9021.58', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:10:25,859 >> {'loss': 0.441, 'grad_norm': 17.46190643310547, 'learning_rate': 3.505025660329408e-07, 'epoch': 0.44697586113371307, 'num_input_tokens_seen': 5133828096, 'completed': '66.40% (2_448 / 3_687)', 'remaining time': '11:03:31', 'throughput': '8573.41', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:10:56,341 >> {'loss': 0.3493, 'grad_norm': 13.633960723876953, 'learning_rate': 3.5014079638606164e-07, 'epoch': 0.4472470843504204, 'num_input_tokens_seen': 5135925248, 'completed': '66.42% (2_449 / 3_687)', 'remaining time': '11:02:57', 'throughput': '8599.99', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:11:24,726 >> {'loss': 0.3263, 'grad_norm': 15.675495147705078, 'learning_rate': 3.497791875782643e-07, 'epoch': 0.44751830756712774, 'num_input_tokens_seen': 5138022400, 'completed': '66.45% (2_450 / 3_687)', 'remaining time': '11:02:23', 'throughput': '9235.09', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:11:56,370 >> {'loss': 0.4154, 'grad_norm': 17.797544479370117, 'learning_rate': 3.4941773990055777e-07, 'epoch': 0.4477895307838351, 'num_input_tokens_seen': 5140119552, 'completed': '66.48% (2_451 / 3_687)', 'remaining time': '11:01:50', 'throughput': '8284.30', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:12:25,946 >> {'loss': 0.7703, 'grad_norm': 26.41427993774414, 'learning_rate': 3.490564536438215e-07, 'epoch': 0.44806075400054246, 'num_input_tokens_seen': 5142216704, 'completed': '66.50% (2_452 / 3_687)', 'remaining time': '11:01:16', 'throughput': '8863.30', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:12:55,931 >> {'loss': 0.4778, 'grad_norm': 16.2842960357666, 'learning_rate': 3.4869532909880485e-07, 'epoch': 0.4483319772172498, 'num_input_tokens_seen': 5144313856, 'completed': '66.53% (2_453 / 3_687)', 'remaining time': '11:00:42', 'throughput': '8742.43', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:13:25,489 >> {'loss': 0.3942, 'grad_norm': 18.36961555480957, 'learning_rate': 3.483343665561271e-07, 'epoch': 0.4486032004339571, 'num_input_tokens_seen': 5146411008, 'completed': '66.56% (2_454 / 3_687)', 'remaining time': '11:00:08', 'throughput': '8869.00', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:13:58,058 >> {'loss': 0.5367, 'grad_norm': 19.449003219604492, 'learning_rate': 3.479735663062773e-07, 'epoch': 0.4488744236506645, 'num_input_tokens_seen': 5148508160, 'completed': '66.59% (2_455 / 3_687)', 'remaining time': '10:59:36', 'throughput': '8048.99', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:14:27,951 >> {'loss': 0.3392, 'grad_norm': 11.512831687927246, 'learning_rate': 3.4761292863961354e-07, 'epoch': 0.44914564686737185, 'num_input_tokens_seen': 5150605312, 'completed': '66.61% (2_456 / 3_687)', 'remaining time': '10:59:03', 'throughput': '8769.10', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:14:58,717 >> {'loss': 0.4308, 'grad_norm': 17.354673385620117, 'learning_rate': 3.4725245384636347e-07, 'epoch': 0.4494168700840792, 'num_input_tokens_seen': 5152702464, 'completed': '66.64% (2_457 / 3_687)', 'remaining time': '10:58:30', 'throughput': '8520.69', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:15:28,938 >> {'loss': 0.2647, 'grad_norm': 12.854970932006836, 'learning_rate': 3.4689214221662364e-07, 'epoch': 0.4496880933007866, 'num_input_tokens_seen': 5154799616, 'completed': '66.67% (2_458 / 3_687)', 'remaining time': '10:57:56', 'throughput': '8674.13', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:15:59,875 >> {'loss': 0.5143, 'grad_norm': 16.739013671875, 'learning_rate': 3.465319940403587e-07, 'epoch': 0.4499593165174939, 'num_input_tokens_seen': 5156896768, 'completed': '66.69% (2_459 / 3_687)', 'remaining time': '10:57:23', 'throughput': '8473.57', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:16:29,558 >> {'loss': 0.3709, 'grad_norm': 15.290994644165039, 'learning_rate': 3.4617200960740247e-07, 'epoch': 0.45023053973420124, 'num_input_tokens_seen': 5158993920, 'completed': '66.72% (2_460 / 3_687)', 'remaining time': '10:56:49', 'throughput': '8831.52', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:17:01,243 >> {'loss': 0.34, 'grad_norm': 12.358589172363281, 'learning_rate': 3.4581218920745663e-07, 'epoch': 0.4505017629509086, 'num_input_tokens_seen': 5161091072, 'completed': '66.75% (2_461 / 3_687)', 'remaining time': '10:56:17', 'throughput': '8273.30', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:17:30,717 >> {'loss': 0.3167, 'grad_norm': 13.680041313171387, 'learning_rate': 3.454525331300908e-07, 'epoch': 0.45077298616761596, 'num_input_tokens_seen': 5163188224, 'completed': '66.78% (2_462 / 3_687)', 'remaining time': '10:55:43', 'throughput': '8894.03', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:18:01,850 >> {'loss': 0.4137, 'grad_norm': 22.562665939331055, 'learning_rate': 3.450930416647429e-07, 'epoch': 0.4510442093843233, 'num_input_tokens_seen': 5165285376, 'completed': '66.80% (2_463 / 3_687)', 'remaining time': '10:55:10', 'throughput': '8420.30', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:18:31,592 >> {'loss': 0.4412, 'grad_norm': 14.763349533081055, 'learning_rate': 3.4473371510071795e-07, 'epoch': 0.45131543260103063, 'num_input_tokens_seen': 5167382528, 'completed': '66.83% (2_464 / 3_687)', 'remaining time': '10:54:36', 'throughput': '8813.78', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:19:02,750 >> {'loss': 0.578, 'grad_norm': 20.03864097595215, 'learning_rate': 3.4437455372718795e-07, 'epoch': 0.451586655817738, 'num_input_tokens_seen': 5169479680, 'completed': '66.86% (2_465 / 3_687)', 'remaining time': '10:54:03', 'throughput': '8413.46', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:19:35,576 >> {'loss': 0.3843, 'grad_norm': 19.433835983276367, 'learning_rate': 3.440155578331925e-07, 'epoch': 0.45185787903444535, 'num_input_tokens_seen': 5171576832, 'completed': '66.88% (2_466 / 3_687)', 'remaining time': '10:53:31', 'throughput': '7985.84', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:20:03,064 >> {'loss': 0.3445, 'grad_norm': 15.715560913085938, 'learning_rate': 3.4365672770763783e-07, 'epoch': 0.4521291022511527, 'num_input_tokens_seen': 5173673984, 'completed': '66.91% (2_467 / 3_687)', 'remaining time': '10:52:56', 'throughput': '9536.69', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:20:35,602 >> {'loss': 0.3954, 'grad_norm': 16.51280975341797, 'learning_rate': 3.432980636392967e-07, 'epoch': 0.4524003254678601, 'num_input_tokens_seen': 5175771136, 'completed': '66.94% (2_468 / 3_687)', 'remaining time': '10:52:24', 'throughput': '8056.50', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:21:05,524 >> {'loss': 0.3528, 'grad_norm': 13.621045112609863, 'learning_rate': 3.429395659168084e-07, 'epoch': 0.4526715486845674, 'num_input_tokens_seen': 5177868288, 'completed': '66.97% (2_469 / 3_687)', 'remaining time': '10:51:50', 'throughput': '8761.03', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:21:33,663 >> {'loss': 0.5579, 'grad_norm': 19.30001449584961, 'learning_rate': 3.425812348286782e-07, 'epoch': 0.45294277190127474, 'num_input_tokens_seen': 5179965440, 'completed': '66.99% (2_470 / 3_687)', 'remaining time': '10:51:15', 'throughput': '9316.10', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:22:01,272 >> {'loss': 0.8619, 'grad_norm': 25.89718246459961, 'learning_rate': 3.422230706632774e-07, 'epoch': 0.4532139951179821, 'num_input_tokens_seen': 5182062592, 'completed': '67.02% (2_471 / 3_687)', 'remaining time': '10:50:40', 'throughput': '9494.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:22:30,973 >> {'loss': 0.489, 'grad_norm': 17.709318161010742, 'learning_rate': 3.418650737088427e-07, 'epoch': 0.45348521833468947, 'num_input_tokens_seen': 5184159744, 'completed': '67.05% (2_472 / 3_687)', 'remaining time': '10:50:06', 'throughput': '8826.00', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:23:02,231 >> {'loss': 0.3265, 'grad_norm': 17.07084846496582, 'learning_rate': 3.415072442534767e-07, 'epoch': 0.45375644155139677, 'num_input_tokens_seen': 5186256896, 'completed': '67.07% (2_473 / 3_687)', 'remaining time': '10:49:33', 'throughput': '8386.59', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:23:33,564 >> {'loss': 0.3484, 'grad_norm': 14.670482635498047, 'learning_rate': 3.411495825851467e-07, 'epoch': 0.45402766476810413, 'num_input_tokens_seen': 5188354048, 'completed': '67.10% (2_474 / 3_687)', 'remaining time': '10:49:01', 'throughput': '8366.37', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:24:03,273 >> {'loss': 0.3738, 'grad_norm': 15.770243644714355, 'learning_rate': 3.4079208899168545e-07, 'epoch': 0.4542988879848115, 'num_input_tokens_seen': 5190451200, 'completed': '67.13% (2_475 / 3_687)', 'remaining time': '10:48:27', 'throughput': '8823.78', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:24:34,149 >> {'loss': 0.4085, 'grad_norm': 18.940078735351562, 'learning_rate': 3.404347637607899e-07, 'epoch': 0.45457011120151886, 'num_input_tokens_seen': 5192548352, 'completed': '67.15% (2_476 / 3_687)', 'remaining time': '10:47:54', 'throughput': '8490.06', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:25:03,644 >> {'loss': 0.5006, 'grad_norm': 20.297151565551758, 'learning_rate': 3.400776071800219e-07, 'epoch': 0.4548413344182262, 'num_input_tokens_seen': 5194645504, 'completed': '67.18% (2_477 / 3_687)', 'remaining time': '10:47:20', 'throughput': '8887.86', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:25:33,558 >> {'loss': 0.2718, 'grad_norm': 13.716742515563965, 'learning_rate': 3.3972061953680734e-07, 'epoch': 0.4551125576349335, 'num_input_tokens_seen': 5196742656, 'completed': '67.21% (2_478 / 3_687)', 'remaining time': '10:46:46', 'throughput': '8763.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:26:06,626 >> {'loss': 0.4441, 'grad_norm': 16.452495574951172, 'learning_rate': 3.3936380111843666e-07, 'epoch': 0.4553837808516409, 'num_input_tokens_seen': 5198839808, 'completed': '67.24% (2_479 / 3_687)', 'remaining time': '10:46:15', 'throughput': '7927.52', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:26:38,628 >> {'loss': 0.3707, 'grad_norm': 12.949135780334473, 'learning_rate': 3.390071522120635e-07, 'epoch': 0.45565500406834825, 'num_input_tokens_seen': 5200936960, 'completed': '67.26% (2_480 / 3_687)', 'remaining time': '10:45:43', 'throughput': '8191.45', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:27:09,762 >> {'loss': 0.3601, 'grad_norm': 13.870152473449707, 'learning_rate': 3.3865067310470554e-07, 'epoch': 0.4559262272850556, 'num_input_tokens_seen': 5203034112, 'completed': '67.29% (2_481 / 3_687)', 'remaining time': '10:45:10', 'throughput': '8419.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:27:38,877 >> {'loss': 0.3517, 'grad_norm': 12.890467643737793, 'learning_rate': 3.3829436408324316e-07, 'epoch': 0.45619745050176297, 'num_input_tokens_seen': 5205131264, 'completed': '67.32% (2_482 / 3_687)', 'remaining time': '10:44:36', 'throughput': '9003.89', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:28:10,520 >> {'loss': 0.4483, 'grad_norm': 14.944725036621094, 'learning_rate': 3.3793822543442074e-07, 'epoch': 0.4564686737184703, 'num_input_tokens_seen': 5207228416, 'completed': '67.34% (2_483 / 3_687)', 'remaining time': '10:44:03', 'throughput': '8284.34', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:28:43,383 >> {'loss': 0.4053, 'grad_norm': 16.919103622436523, 'learning_rate': 3.3758225744484483e-07, 'epoch': 0.45673989693517764, 'num_input_tokens_seen': 5209325568, 'completed': '67.37% (2_484 / 3_687)', 'remaining time': '10:43:32', 'throughput': '7976.92', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:29:15,376 >> {'loss': 0.4156, 'grad_norm': 14.759577751159668, 'learning_rate': 3.372264604009851e-07, 'epoch': 0.457011120151885, 'num_input_tokens_seen': 5211422720, 'completed': '67.40% (2_485 / 3_687)', 'remaining time': '10:43:00', 'throughput': '8193.78', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:29:44,470 >> {'loss': 0.4126, 'grad_norm': 15.936949729919434, 'learning_rate': 3.3687083458917344e-07, 'epoch': 0.45728234336859236, 'num_input_tokens_seen': 5213519872, 'completed': '67.43% (2_486 / 3_687)', 'remaining time': '10:42:25', 'throughput': '9010.12', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:30:15,661 >> {'loss': 0.3871, 'grad_norm': 15.409136772155762, 'learning_rate': 3.3651538029560377e-07, 'epoch': 0.4575535665852997, 'num_input_tokens_seen': 5215617024, 'completed': '67.45% (2_487 / 3_687)', 'remaining time': '10:41:53', 'throughput': '8404.56', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:30:44,282 >> {'loss': 0.4349, 'grad_norm': 15.34483814239502, 'learning_rate': 3.361600978063325e-07, 'epoch': 0.457824789802007, 'num_input_tokens_seen': 5217714176, 'completed': '67.48% (2_488 / 3_687)', 'remaining time': '10:41:18', 'throughput': '9159.20', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:31:13,890 >> {'loss': 0.4529, 'grad_norm': 18.0634708404541, 'learning_rate': 3.358049874072771e-07, 'epoch': 0.4580960130187144, 'num_input_tokens_seen': 5219811328, 'completed': '67.51% (2_489 / 3_687)', 'remaining time': '10:40:44', 'throughput': '8853.71', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:31:45,525 >> {'loss': 0.4033, 'grad_norm': 17.72582244873047, 'learning_rate': 3.3545004938421734e-07, 'epoch': 0.45836723623542175, 'num_input_tokens_seen': 5221908480, 'completed': '67.53% (2_490 / 3_687)', 'remaining time': '10:40:12', 'throughput': '8286.47', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:32:12,877 >> {'loss': 0.7801, 'grad_norm': 20.901613235473633, 'learning_rate': 3.3509528402279357e-07, 'epoch': 0.4586384594521291, 'num_input_tokens_seen': 5224005632, 'completed': '67.56% (2_491 / 3_687)', 'remaining time': '10:39:36', 'throughput': '9584.21', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:32:44,377 >> {'loss': 0.9615, 'grad_norm': 34.06769943237305, 'learning_rate': 3.347406916085074e-07, 'epoch': 0.45890968266883647, 'num_input_tokens_seen': 5226102784, 'completed': '67.59% (2_492 / 3_687)', 'remaining time': '10:39:04', 'throughput': '8322.01', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:33:15,591 >> {'loss': 0.7134, 'grad_norm': 19.558860778808594, 'learning_rate': 3.3438627242672164e-07, 'epoch': 0.4591809058855438, 'num_input_tokens_seen': 5228199936, 'completed': '67.62% (2_493 / 3_687)', 'remaining time': '10:38:31', 'throughput': '8398.40', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:33:46,832 >> {'loss': 0.4235, 'grad_norm': 17.086078643798828, 'learning_rate': 3.3403202676265875e-07, 'epoch': 0.45945212910225114, 'num_input_tokens_seen': 5230297088, 'completed': '67.64% (2_494 / 3_687)', 'remaining time': '10:37:58', 'throughput': '8390.88', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:34:20,162 >> {'loss': 0.2654, 'grad_norm': 11.597755432128906, 'learning_rate': 3.336779549014026e-07, 'epoch': 0.4597233523189585, 'num_input_tokens_seen': 5232394240, 'completed': '67.67% (2_495 / 3_687)', 'remaining time': '10:37:27', 'throughput': '7865.09', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:34:49,598 >> {'loss': 0.3075, 'grad_norm': 12.856144905090332, 'learning_rate': 3.333240571278968e-07, 'epoch': 0.45999457553566586, 'num_input_tokens_seen': 5234491392, 'completed': '67.70% (2_496 / 3_687)', 'remaining time': '10:36:53', 'throughput': '8905.68', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:35:21,034 >> {'loss': 0.2412, 'grad_norm': 9.705018997192383, 'learning_rate': 3.3297033372694473e-07, 'epoch': 0.4602657987523732, 'num_input_tokens_seen': 5236588544, 'completed': '67.72% (2_497 / 3_687)', 'remaining time': '10:36:21', 'throughput': '8338.83', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:35:47,640 >> {'loss': 0.8226, 'grad_norm': 20.149858474731445, 'learning_rate': 3.3261678498320954e-07, 'epoch': 0.46053702196908053, 'num_input_tokens_seen': 5238685696, 'completed': '67.75% (2_498 / 3_687)', 'remaining time': '10:35:45', 'throughput': '9852.88', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:36:14,868 >> {'loss': 0.5587, 'grad_norm': 19.860633850097656, 'learning_rate': 3.3226341118121367e-07, 'epoch': 0.4608082451857879, 'num_input_tokens_seen': 5240782848, 'completed': '67.78% (2_499 / 3_687)', 'remaining time': '10:35:09', 'throughput': '9627.96', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:36:46,261 >> {'loss': 0.3087, 'grad_norm': 11.387728691101074, 'learning_rate': 3.319102126053389e-07, 'epoch': 0.46107946840249525, 'num_input_tokens_seen': 5242880000, 'completed': '67.81% (2_500 / 3_687)', 'remaining time': '10:34:37', 'throughput': '8350.35', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:37:19,114 >> {'loss': 0.2748, 'grad_norm': 16.481552124023438, 'learning_rate': 3.315571895398261e-07, 'epoch': 0.4613506916192026, 'num_input_tokens_seen': 5244977152, 'completed': '67.83% (2_501 / 3_687)', 'remaining time': '10:34:05', 'throughput': '7979.23', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:37:50,250 >> {'loss': 0.3922, 'grad_norm': 14.168554306030273, 'learning_rate': 3.312043422687749e-07, 'epoch': 0.46162191483591, 'num_input_tokens_seen': 5247074304, 'completed': '67.86% (2_502 / 3_687)', 'remaining time': '10:33:33', 'throughput': '8419.38', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:38:23,262 >> {'loss': 0.2, 'grad_norm': 10.951041221618652, 'learning_rate': 3.3085167107614297e-07, 'epoch': 0.4618931380526173, 'num_input_tokens_seen': 5249171456, 'completed': '67.89% (2_503 / 3_687)', 'remaining time': '10:33:01', 'throughput': '7940.89', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:38:52,200 >> {'loss': 0.714, 'grad_norm': 21.78125762939453, 'learning_rate': 3.3049917624574737e-07, 'epoch': 0.46216436126932464, 'num_input_tokens_seen': 5251268608, 'completed': '67.91% (2_504 / 3_687)', 'remaining time': '10:32:27', 'throughput': '9058.79', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:39:24,274 >> {'loss': 0.3656, 'grad_norm': 14.395316123962402, 'learning_rate': 3.301468580612619e-07, 'epoch': 0.462435584486032, 'num_input_tokens_seen': 5253365760, 'completed': '67.94% (2_505 / 3_687)', 'remaining time': '10:31:55', 'throughput': '8173.09', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:39:55,035 >> {'loss': 0.3252, 'grad_norm': 16.801877975463867, 'learning_rate': 3.2979471680621903e-07, 'epoch': 0.46270680770273936, 'num_input_tokens_seen': 5255462912, 'completed': '67.97% (2_506 / 3_687)', 'remaining time': '10:31:22', 'throughput': '8521.97', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:40:23,477 >> {'loss': 0.5567, 'grad_norm': 18.835493087768555, 'learning_rate': 3.2944275276400857e-07, 'epoch': 0.4629780309194467, 'num_input_tokens_seen': 5257560064, 'completed': '68.00% (2_507 / 3_687)', 'remaining time': '10:30:47', 'throughput': '9216.72', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:40:53,980 >> {'loss': 0.4733, 'grad_norm': 18.04051971435547, 'learning_rate': 3.290909662178779e-07, 'epoch': 0.46324925413615403, 'num_input_tokens_seen': 5259657216, 'completed': '68.02% (2_508 / 3_687)', 'remaining time': '10:30:14', 'throughput': '8594.09', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:41:24,296 >> {'loss': 0.1792, 'grad_norm': 11.844828605651855, 'learning_rate': 3.2873935745093145e-07, 'epoch': 0.4635204773528614, 'num_input_tokens_seen': 5261754368, 'completed': '68.05% (2_509 / 3_687)', 'remaining time': '10:29:41', 'throughput': '8646.94', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:41:56,088 >> {'loss': 0.3807, 'grad_norm': 15.879327774047852, 'learning_rate': 3.283879267461305e-07, 'epoch': 0.46379170056956875, 'num_input_tokens_seen': 5263851520, 'completed': '68.08% (2_510 / 3_687)', 'remaining time': '10:29:08', 'throughput': '8245.72', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:42:25,974 >> {'loss': 0.2678, 'grad_norm': 14.160933494567871, 'learning_rate': 3.280366743862931e-07, 'epoch': 0.4640629237862761, 'num_input_tokens_seen': 5265948672, 'completed': '68.10% (2_511 / 3_687)', 'remaining time': '10:28:35', 'throughput': '8771.49', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:43:00,489 >> {'loss': 0.4197, 'grad_norm': 16.566879272460938, 'learning_rate': 3.276856006540939e-07, 'epoch': 0.4643341470029835, 'num_input_tokens_seen': 5268045824, 'completed': '68.13% (2_512 / 3_687)', 'remaining time': '10:28:05', 'throughput': '7594.93', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:43:31,469 >> {'loss': 0.7876, 'grad_norm': 22.101320266723633, 'learning_rate': 3.2733470583206357e-07, 'epoch': 0.4646053702196908, 'num_input_tokens_seen': 5270142976, 'completed': '68.16% (2_513 / 3_687)', 'remaining time': '10:27:32', 'throughput': '8461.96', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:44:01,831 >> {'loss': 0.6538, 'grad_norm': 22.703693389892578, 'learning_rate': 3.2698399020258895e-07, 'epoch': 0.46487659343639814, 'num_input_tokens_seen': 5272240128, 'completed': '68.19% (2_514 / 3_687)', 'remaining time': '10:26:58', 'throughput': '8633.87', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:44:32,337 >> {'loss': 0.324, 'grad_norm': 14.076583862304688, 'learning_rate': 3.266334540479128e-07, 'epoch': 0.4651478166531055, 'num_input_tokens_seen': 5274337280, 'completed': '68.21% (2_515 / 3_687)', 'remaining time': '10:26:25', 'throughput': '8593.07', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:45:02,418 >> {'loss': 0.4449, 'grad_norm': 17.571779251098633, 'learning_rate': 3.262830976501329e-07, 'epoch': 0.46541903986981287, 'num_input_tokens_seen': 5276434432, 'completed': '68.24% (2_516 / 3_687)', 'remaining time': '10:25:52', 'throughput': '8714.67', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:45:34,754 >> {'loss': 0.3639, 'grad_norm': 13.718509674072266, 'learning_rate': 3.2593292129120295e-07, 'epoch': 0.46569026308652023, 'num_input_tokens_seen': 5278531584, 'completed': '68.27% (2_517 / 3_687)', 'remaining time': '10:25:20', 'throughput': '8106.95', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:46:03,699 >> {'loss': 0.3981, 'grad_norm': 16.070167541503906, 'learning_rate': 3.2558292525293156e-07, 'epoch': 0.46596148630322753, 'num_input_tokens_seen': 5280628736, 'completed': '68.29% (2_518 / 3_687)', 'remaining time': '10:24:46', 'throughput': '9056.49', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:46:34,936 >> {'loss': 0.4636, 'grad_norm': 17.910581588745117, 'learning_rate': 3.2523310981698213e-07, 'epoch': 0.4662327095199349, 'num_input_tokens_seen': 5282725888, 'completed': '68.32% (2_519 / 3_687)', 'remaining time': '10:24:13', 'throughput': '8392.15', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:47:04,363 >> {'loss': 0.6682, 'grad_norm': 21.05547332763672, 'learning_rate': 3.24883475264873e-07, 'epoch': 0.46650393273664226, 'num_input_tokens_seen': 5284823040, 'completed': '68.35% (2_520 / 3_687)', 'remaining time': '10:23:39', 'throughput': '8908.43', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:47:35,403 >> {'loss': 0.4988, 'grad_norm': 14.658893585205078, 'learning_rate': 3.2453402187797684e-07, 'epoch': 0.4667751559533496, 'num_input_tokens_seen': 5286920192, 'completed': '68.38% (2_521 / 3_687)', 'remaining time': '10:23:07', 'throughput': '8445.36', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:48:05,633 >> {'loss': 0.4897, 'grad_norm': 17.221160888671875, 'learning_rate': 3.241847499375201e-07, 'epoch': 0.467046379170057, 'num_input_tokens_seen': 5289017344, 'completed': '68.40% (2_522 / 3_687)', 'remaining time': '10:22:33', 'throughput': '8671.55', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:48:34,553 >> {'loss': 0.2647, 'grad_norm': 14.185545921325684, 'learning_rate': 3.238356597245837e-07, 'epoch': 0.4673176023867643, 'num_input_tokens_seen': 5291114496, 'completed': '68.43% (2_523 / 3_687)', 'remaining time': '10:21:59', 'throughput': '9064.40', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:49:03,957 >> {'loss': 0.4688, 'grad_norm': 17.018024444580078, 'learning_rate': 3.2348675152010217e-07, 'epoch': 0.46758882560347165, 'num_input_tokens_seen': 5293211648, 'completed': '68.46% (2_524 / 3_687)', 'remaining time': '10:21:25', 'throughput': '8915.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:49:35,354 >> {'loss': 0.5423, 'grad_norm': 21.121870040893555, 'learning_rate': 3.2313802560486353e-07, 'epoch': 0.467860048820179, 'num_input_tokens_seen': 5295308800, 'completed': '68.48% (2_525 / 3_687)', 'remaining time': '10:20:53', 'throughput': '8349.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:50:06,196 >> {'loss': 0.6087, 'grad_norm': 41.621604919433594, 'learning_rate': 3.2278948225950916e-07, 'epoch': 0.46813127203688637, 'num_input_tokens_seen': 5297405952, 'completed': '68.51% (2_526 / 3_687)', 'remaining time': '10:20:20', 'throughput': '8499.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:50:36,336 >> {'loss': 0.4773, 'grad_norm': 15.993080139160156, 'learning_rate': 3.2244112176453343e-07, 'epoch': 0.46840249525359373, 'num_input_tokens_seen': 5299503104, 'completed': '68.54% (2_527 / 3_687)', 'remaining time': '10:19:46', 'throughput': '8697.53', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:51:06,493 >> {'loss': 0.6761, 'grad_norm': 19.632001876831055, 'learning_rate': 3.2209294440028366e-07, 'epoch': 0.46867371847030104, 'num_input_tokens_seen': 5301600256, 'completed': '68.57% (2_528 / 3_687)', 'remaining time': '10:19:13', 'throughput': '8692.65', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:51:39,333 >> {'loss': 0.3395, 'grad_norm': 14.451026916503906, 'learning_rate': 3.2174495044695973e-07, 'epoch': 0.4689449416870084, 'num_input_tokens_seen': 5303697408, 'completed': '68.59% (2_529 / 3_687)', 'remaining time': '10:18:42', 'throughput': '7982.37', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:52:10,971 >> {'loss': 0.483, 'grad_norm': 18.51044464111328, 'learning_rate': 3.2139714018461396e-07, 'epoch': 0.46921616490371576, 'num_input_tokens_seen': 5305794560, 'completed': '68.62% (2_530 / 3_687)', 'remaining time': '10:18:09', 'throughput': '8286.31', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:52:42,584 >> {'loss': 0.5314, 'grad_norm': 16.25364875793457, 'learning_rate': 3.2104951389315073e-07, 'epoch': 0.4694873881204231, 'num_input_tokens_seen': 5307891712, 'completed': '68.65% (2_531 / 3_687)', 'remaining time': '10:17:37', 'throughput': '8292.15', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:53:10,760 >> {'loss': 0.603, 'grad_norm': 21.938037872314453, 'learning_rate': 3.207020718523266e-07, 'epoch': 0.4697586113371305, 'num_input_tokens_seen': 5309988864, 'completed': '68.67% (2_532 / 3_687)', 'remaining time': '10:17:02', 'throughput': '9303.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:53:42,394 >> {'loss': 0.3931, 'grad_norm': 15.501047134399414, 'learning_rate': 3.2035481434174966e-07, 'epoch': 0.4700298345538378, 'num_input_tokens_seen': 5312086016, 'completed': '68.70% (2_533 / 3_687)', 'remaining time': '10:16:30', 'throughput': '8286.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:54:12,167 >> {'loss': 0.4532, 'grad_norm': 16.000394821166992, 'learning_rate': 3.200077416408794e-07, 'epoch': 0.47030105777054515, 'num_input_tokens_seen': 5314183168, 'completed': '68.73% (2_534 / 3_687)', 'remaining time': '10:15:56', 'throughput': '8804.71', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:54:41,966 >> {'loss': 0.631, 'grad_norm': 21.34868049621582, 'learning_rate': 3.196608540290266e-07, 'epoch': 0.4705722809872525, 'num_input_tokens_seen': 5316280320, 'completed': '68.76% (2_535 / 3_687)', 'remaining time': '10:15:23', 'throughput': '8798.00', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:55:14,571 >> {'loss': 0.5566, 'grad_norm': 23.8501033782959, 'learning_rate': 3.193141517853536e-07, 'epoch': 0.4708435042039599, 'num_input_tokens_seen': 5318377472, 'completed': '68.78% (2_536 / 3_687)', 'remaining time': '10:14:51', 'throughput': '8039.20', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:55:45,261 >> {'loss': 0.4631, 'grad_norm': 14.53099250793457, 'learning_rate': 3.1896763518887305e-07, 'epoch': 0.47111472742066723, 'num_input_tokens_seen': 5320474624, 'completed': '68.81% (2_537 / 3_687)', 'remaining time': '10:14:18', 'throughput': '8541.73', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:56:12,292 >> {'loss': 0.9052, 'grad_norm': 25.955514907836914, 'learning_rate': 3.186213045184484e-07, 'epoch': 0.47138595063737454, 'num_input_tokens_seen': 5322571776, 'completed': '68.84% (2_538 / 3_687)', 'remaining time': '10:13:43', 'throughput': '9697.83', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:56:45,509 >> {'loss': 0.329, 'grad_norm': 12.901291847229004, 'learning_rate': 3.1827516005279306e-07, 'epoch': 0.4716571738540819, 'num_input_tokens_seen': 5324668928, 'completed': '68.86% (2_539 / 3_687)', 'remaining time': '10:13:12', 'throughput': '7891.78', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:57:16,267 >> {'loss': 0.3753, 'grad_norm': 14.975889205932617, 'learning_rate': 3.1792920207047114e-07, 'epoch': 0.47192839707078926, 'num_input_tokens_seen': 5326766080, 'completed': '68.89% (2_540 / 3_687)', 'remaining time': '10:12:39', 'throughput': '8522.91', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:57:48,738 >> {'loss': 0.4148, 'grad_norm': 17.118913650512695, 'learning_rate': 3.175834308498964e-07, 'epoch': 0.4721996202874966, 'num_input_tokens_seen': 5328863232, 'completed': '68.92% (2_541 / 3_687)', 'remaining time': '10:12:07', 'throughput': '8073.19', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:58:18,778 >> {'loss': 0.3607, 'grad_norm': 16.228837966918945, 'learning_rate': 3.172378466693325e-07, 'epoch': 0.472470843504204, 'num_input_tokens_seen': 5330960384, 'completed': '68.94% (2_542 / 3_687)', 'remaining time': '10:11:33', 'throughput': '8726.50', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:58:46,741 >> {'loss': 0.9653, 'grad_norm': 22.712890625, 'learning_rate': 3.168924498068923e-07, 'epoch': 0.4727420667209113, 'num_input_tokens_seen': 5333057536, 'completed': '68.97% (2_543 / 3_687)', 'remaining time': '10:10:59', 'throughput': '9374.65', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:59:17,948 >> {'loss': 0.4317, 'grad_norm': 18.086130142211914, 'learning_rate': 3.1654724054053805e-07, 'epoch': 0.47301328993761865, 'num_input_tokens_seen': 5335154688, 'completed': '69.00% (2_544 / 3_687)', 'remaining time': '10:10:26', 'throughput': '8400.13', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 06:59:50,306 >> {'loss': 0.4091, 'grad_norm': 16.25860595703125, 'learning_rate': 3.1620221914808115e-07, 'epoch': 0.473284513154326, 'num_input_tokens_seen': 5337251840, 'completed': '69.03% (2_545 / 3_687)', 'remaining time': '10:09:54', 'throughput': '8101.33', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 07:00:22,136 >> {'loss': 0.3707, 'grad_norm': 19.094097137451172, 'learning_rate': 3.1585738590718157e-07, 'epoch': 0.4735557363710334, 'num_input_tokens_seen': 5339348992, 'completed': '69.05% (2_546 / 3_687)', 'remaining time': '10:09:22', 'throughput': '8235.73', 'gpu_mem_free': '5581MB'} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/import_utils.py:575: FutureWarning: `is_torch_tpu_available` is deprecated and will be removed in 4.41.0. Please use the `is_torch_xla_available` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/import_utils.py:575: FutureWarning: `is_torch_tpu_available` is deprecated and will be removed in 4.41.0. Please use the `is_torch_xla_available` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/import_utils.py:575: FutureWarning: `is_torch_tpu_available` is deprecated and will be removed in 4.41.0. Please use the `is_torch_xla_available` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/import_utils.py:575: FutureWarning: `is_torch_tpu_available` is deprecated and will be removed in 4.41.0. Please use the `is_torch_xla_available` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/import_utils.py:575: FutureWarning: `is_torch_tpu_available` is deprecated and will be removed in 4.41.0. Please use the `is_torch_xla_available` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/import_utils.py:575: FutureWarning: `is_torch_tpu_available` is deprecated and will be removed in 4.41.0. Please use the `is_torch_xla_available` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/import_utils.py:575: FutureWarning: `is_torch_tpu_available` is deprecated and will be removed in 4.41.0. Please use the `is_torch_xla_available` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/import_utils.py:575: FutureWarning: `is_torch_tpu_available` is deprecated and will be removed in 4.41.0. Please use the `is_torch_xla_available` instead. warnings.warn( 12/26/2024 14:47:26 - WARNING - __main__ - Process rank: 7, device: cuda:7, n_gpu: 1distributed training: True, fp16 training: False, bf16 training: True 12/26/2024 14:47:27 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, fp16 training: False, bf16 training: True 12/26/2024 14:47:27 - INFO - __main__ - Additional arguments ScriptArguments(model_family='llama', model_name_or_path='/scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/', config_overrides=None, config_overrides_json='', config_name='/scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/', tokenizer_name='/scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/', cache_dir=None, use_fast_tokenizer=False, model_revision='main', use_auth_token=False, tokenized_mds_train=['/work/pi_miyyer_umass_edu/ctpham/BookClaim-dev/data/ft/bookclaim_balanced_pack_complete'], tokenized_mds_validation=[], tokenized_mds_test=[], token_scaled_loss=True) 12/26/2024 14:47:27 - INFO - __main__ - Checkpoint detected, resuming training at /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2400. To avoid this behavior, change the `--output_dir` or add `--overwrite_output_dir` to train from scratch. [INFO|tokenization_utils_base.py:2267] 2024-12-26 14:47:27,918 >> loading file tokenizer.json [INFO|tokenization_utils_base.py:2267] 2024-12-26 14:47:27,918 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:2267] 2024-12-26 14:47:27,918 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:2267] 2024-12-26 14:47:27,918 >> loading file tokenizer_config.json Loading checkpoint shards: 0%| | 0/7 [00:00> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [INFO|configuration_utils.py:731] 2024-12-26 14:47:28,230 >> loading configuration file /scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/config.json [INFO|configuration_utils.py:800] 2024-12-26 14:47:28,231 >> Model config LlamaConfig { "_name_or_path": "/scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": 128009, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 524288, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "rope_theta": 128000000.0, "tie_word_embeddings": false, "torch_dtype": "float32", "transformers_version": "4.44.2", "use_cache": true, "vocab_size": 128256 } 12/26/2024 14:47:28 - INFO - __main__ - Loaded tokenizer - CPU Memory usage: 633.36 MB RSS Loading checkpoint shards: 29%|██▊ | 2/7 [00:00<00:00, 8.27it/s]12/26/2024 14:47:28 - INFO - __main__ - GPU 0: 4/81920 MB 12/26/2024 14:47:28 - INFO - __main__ - GPU 1: 423/81920 MB 12/26/2024 14:47:28 - INFO - __main__ - GPU 2: 423/81920 MB 12/26/2024 14:47:28 - INFO - __main__ - GPU 3: 423/81920 MB 12/26/2024 14:47:28 - INFO - __main__ - GPU 4: 423/81920 MB 12/26/2024 14:47:28 - INFO - __main__ - GPU 5: 423/81920 MB 12/26/2024 14:47:28 - INFO - __main__ - GPU 6: 423/81920 MB 12/26/2024 14:47:28 - INFO - __main__ - GPU 7: 423/81920 MB [INFO|modeling_utils.py:3675] 2024-12-26 14:47:28,314 >> loading weights file /scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/model.safetensors.index.json [INFO|configuration_utils.py:1038] 2024-12-26 14:47:28,315 >> Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128009, "pad_token_id": 0 } Loading checkpoint shards: 0%| | 0/7 [00:00> All model checkpoint weights were used when initializing LlamaForCausalLM. [INFO|modeling_utils.py:4515] 2024-12-26 14:47:29,080 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at /scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/. If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. [INFO|modeling_utils.py:4003] 2024-12-26 14:47:29,083 >> Generation config file not found, using a generation config created from the model config. 12/26/2024 14:47:29 - INFO - __main__ - Model: LlamaForCausalLM( (model): LlamaModel( (embed_tokens): Embedding(128256, 4096, padding_idx=0) (layers): ModuleList( (0-31): 32 x LlamaDecoderLayer( (self_attn): LlamaAttention( (q_proj): Linear(in_features=4096, out_features=4096, bias=False) (k_proj): Linear(in_features=4096, out_features=1024, bias=False) (v_proj): Linear(in_features=4096, out_features=1024, bias=False) (o_proj): Linear(in_features=4096, out_features=4096, bias=False) (rotary_emb): LlamaRotaryEmbedding() (distributed_attn_func): DistributedAttention() (distributed_varlen_attn_func): DistributedAttention() ) (mlp): LlamaMLP( (gate_proj): Linear(in_features=4096, out_features=14336, bias=False) (up_proj): Linear(in_features=4096, out_features=14336, bias=False) (down_proj): Linear(in_features=14336, out_features=4096, bias=False) (act_fn): SiLU() ) (input_layernorm): LlamaRMSNorm() (post_attention_layernorm): LlamaRMSNorm() ) ) (norm): LlamaRMSNorm() ) (lm_head): Linear(in_features=4096, out_features=128256, bias=False) ) 12/26/2024 14:47:29 - INFO - __main__ - Loaded model - CPU Memory usage: 657.86 MB RSS Loading checkpoint shards: 57%|█████▋ | 4/7 [00:00<00:00, 8.18it/s] Loading checkpoint shards: 57%|█████▋ | 4/7 [00:00<00:00, 8.29it/s] Loading checkpoint shards: 57%|█████▋ | 4/7 [00:00<00:00, 8.35it/s] Loading checkpoint shards: 57%|█████▋ | 4/7 [00:00<00:00, 8.34it/s] Loading checkpoint shards: 57%|█████▋ | 4/7 [00:00<00:00, 8.20it/s] Loading checkpoint shards: 71%|███████▏ | 5/7 [00:00<00:00, 8.76it/s]12/26/2024 14:47:29 - INFO - __main__ - GPU 0: 4/81920 MB 12/26/2024 14:47:29 - INFO - __main__ - GPU 1: 423/81920 MB 12/26/2024 14:47:29 - INFO - __main__ - GPU 2: 423/81920 MB 12/26/2024 14:47:29 - INFO - __main__ - GPU 3: 423/81920 MB 12/26/2024 14:47:29 - INFO - __main__ - GPU 4: 423/81920 MB 12/26/2024 14:47:29 - INFO - __main__ - GPU 5: 423/81920 MB 12/26/2024 14:47:29 - INFO - __main__ - GPU 6: 423/81920 MB 12/26/2024 14:47:29 - INFO - __main__ - GPU 7: 425/81920 MB Loading checkpoint shards: 71%|███████▏ | 5/7 [00:00<00:00, 8.34it/s] Loading checkpoint shards: 71%|███████▏ | 5/7 [00:00<00:00, 8.45it/s] Loading checkpoint shards: 71%|███████▏ | 5/7 [00:00<00:00, 8.56it/s] Loading checkpoint shards: 71%|███████▏ | 5/7 [00:00<00:00, 8.53it/s] Loading checkpoint shards: 71%|███████▏ | 5/7 [00:00<00:00, 8.37it/s] Loading checkpoint shards: 86%|████████▌ | 6/7 [00:00<00:00, 8.92it/s] Loading checkpoint shards: 86%|████████▌ | 6/7 [00:00<00:00, 8.44it/s] Loading checkpoint shards: 86%|████████▌ | 6/7 [00:00<00:00, 8.53it/s] Loading checkpoint shards: 86%|████████▌ | 6/7 [00:00<00:00, 8.67it/s] Loading checkpoint shards: 86%|████████▌ | 6/7 [00:00<00:00, 8.64it/s] Loading checkpoint shards: 86%|████████▌ | 6/7 [00:00<00:00, 8.46it/s] Loading checkpoint shards: 100%|██████████| 7/7 [00:00<00:00, 9.18it/s] Loading checkpoint shards: 100%|██████████| 7/7 [00:00<00:00, 8.80it/s] Loading checkpoint shards: 100%|██████████| 7/7 [00:00<00:00, 8.63it/s] Loading checkpoint shards: 100%|██████████| 7/7 [00:00<00:00, 8.27it/s] Loading checkpoint shards: 100%|██████████| 7/7 [00:00<00:00, 8.72it/s] Loading checkpoint shards: 100%|██████████| 7/7 [00:00<00:00, 8.40it/s] Loading checkpoint shards: 100%|██████████| 7/7 [00:00<00:00, 8.88it/s] Loading checkpoint shards: 100%|██████████| 7/7 [00:00<00:00, 8.46it/s] Loading checkpoint shards: 100%|██████████| 7/7 [00:00<00:00, 8.87it/s] Loading checkpoint shards: 100%|██████████| 7/7 [00:00<00:00, 8.45it/s] Loading checkpoint shards: 100%|██████████| 7/7 [00:00<00:00, 8.70it/s] Loading checkpoint shards: 100%|██████████| 7/7 [00:00<00:00, 8.29it/s] 12/26/2024 14:47:33 - WARNING - streaming.base.dataset - Because `predownload` was not specified, it will default to 8*batch_size if batch_size is not None, otherwise 64. Prior to Streaming v0.7.0, `predownload` defaulted to max(batch_size, 256 * batch_size // num_canonical_nodes). 12/26/2024 14:47:33 - WARNING - streaming.base.dataset - Because `predownload` was not specified, it will default to 8*batch_size if batch_size is not None, otherwise 64. Prior to Streaming v0.7.0, `predownload` defaulted to max(batch_size, 256 * batch_size // num_canonical_nodes). 12/26/2024 14:47:33 - INFO - training.dataset - Loading datasets for training 12/26/2024 14:47:33 - WARNING - streaming.base.dataset - Because `predownload` was not specified, it will default to 8*batch_size if batch_size is not None, otherwise 64. Prior to Streaming v0.7.0, `predownload` defaulted to max(batch_size, 256 * batch_size // num_canonical_nodes). 12/26/2024 14:47:33 - WARNING - streaming.base.dataset - Because `predownload` was not specified, it will default to 8*batch_size if batch_size is not None, otherwise 64. Prior to Streaming v0.7.0, `predownload` defaulted to max(batch_size, 256 * batch_size // num_canonical_nodes). 12/26/2024 14:47:33 - WARNING - streaming.base.dataset - Because `predownload` was not specified, it will default to 8*batch_size if batch_size is not None, otherwise 64. Prior to Streaming v0.7.0, `predownload` defaulted to max(batch_size, 256 * batch_size // num_canonical_nodes). 12/26/2024 14:47:33 - WARNING - streaming.base.dataset - Because `predownload` was not specified, it will default to 8*batch_size if batch_size is not None, otherwise 64. Prior to Streaming v0.7.0, `predownload` defaulted to max(batch_size, 256 * batch_size // num_canonical_nodes). 12/26/2024 14:47:33 - WARNING - streaming.base.dataset - Because `predownload` was not specified, it will default to 8*batch_size if batch_size is not None, otherwise 64. Prior to Streaming v0.7.0, `predownload` defaulted to max(batch_size, 256 * batch_size // num_canonical_nodes). 12/26/2024 14:47:33 - WARNING - streaming.base.dataset - Because `predownload` was not specified, it will default to 8*batch_size if batch_size is not None, otherwise 64. Prior to Streaming v0.7.0, `predownload` defaulted to max(batch_size, 256 * batch_size // num_canonical_nodes). 12/26/2024 14:47:33 - INFO - __main__ - Loaded training dataset [INFO|trainer.py:648] 2024-12-26 14:47:34,155 >> Using auto half precision backend [INFO|trainer.py:262] 2024-12-26 14:47:34,156 >> Using world as sequence parallel group 12/26/2024 14:47:34 - INFO - __main__ - Trainer created 12/26/2024 14:47:34 - INFO - __main__ - Training/evaluation parameters TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.95, adam_epsilon=1e-08, auto_find_batch_size=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, cuda_empty_cache=True, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=1, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=False, ddp_timeout=1800, debug=[], deepspeed=None, disable_tqdm=True, dispatch_batches=None, do_eval=False, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[, ], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=2, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=1e-06, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=info, log_level_replica=warning, log_on_each_node=True, logging_dir=/scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/runs/Dec26_14-47-22_gpu021, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lr_scheduler_kwargs={}, lr_scheduler_type=cosine, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, min_lr_ratio=0.1, mp_parameters=, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=1.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=1, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=200, save_strategy=steps, save_total_limit=None, seed=42, seq_parallel_size=8, skip_memory_metrics=True, split_batches=None, streaming_dataset=True, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.05, warmup_steps=0, weight_decay=0.1, ) 12/26/2024 14:47:34 - INFO - __main__ - *** Train *** [WARNING|trainer.py:795] 2024-12-26 14:47:34,305 >> Use streaming dataloader for train [WARNING|trainer.py:795] 2024-12-26 14:47:34,306 >> Use streaming dataloader for train [WARNING|trainer.py:795] 2024-12-26 14:47:34,307 >> Use streaming dataloader for train [WARNING|trainer.py:795] 2024-12-26 14:47:34,307 >> Use streaming dataloader for train [WARNING|trainer.py:795] 2024-12-26 14:47:34,310 >> Use streaming dataloader for train [WARNING|trainer.py:795] 2024-12-26 14:47:34,311 >> Use streaming dataloader for train [WARNING|trainer.py:795] 2024-12-26 14:47:34,315 >> Use streaming dataloader for train [WARNING|trainer.py:795] 2024-12-26 14:47:34,321 >> Use streaming dataloader for train [INFO|trainer.py:2526] 2024-12-26 14:47:47,199 >> Loading model from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2400. /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/accelerate/utils/fsdp_utils.py:126: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. state_dict = torch.load(input_model_file) /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/accelerate/utils/fsdp_utils.py:126: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. state_dict = torch.load(input_model_file) /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/accelerate/utils/fsdp_utils.py:126: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. state_dict = torch.load(input_model_file) /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/accelerate/utils/fsdp_utils.py:126: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. state_dict = torch.load(input_model_file) /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/accelerate/utils/fsdp_utils.py:126: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. state_dict = torch.load(input_model_file) /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/accelerate/utils/fsdp_utils.py:126: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. state_dict = torch.load(input_model_file) /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/accelerate/utils/fsdp_utils.py:126: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. state_dict = torch.load(input_model_file) /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/accelerate/utils/fsdp_utils.py:126: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. state_dict = torch.load(input_model_file) [WARNING|trainer.py:901] 2024-12-26 14:50:03,530 >> Load optimizer state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2400 [WARNING|trainer.py:901] 2024-12-26 14:50:03,646 >> Load optimizer state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2400 [WARNING|trainer.py:901] 2024-12-26 14:50:03,653 >> Load optimizer state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2400 [WARNING|trainer.py:901] 2024-12-26 14:50:03,689 >> Load optimizer state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2400 [WARNING|trainer.py:901] 2024-12-26 14:50:03,695 >> Load optimizer state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2400 [WARNING|trainer.py:901] 2024-12-26 14:50:03,696 >> Load optimizer state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2400 [WARNING|trainer.py:901] 2024-12-26 14:50:03,696 >> Load optimizer state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2400 [WARNING|trainer.py:901] 2024-12-26 14:50:03,697 >> Load optimizer state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2400 /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/accelerate/utils/fsdp_utils.py:209: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. optim_state = torch.load(input_optimizer_file) [WARNING|trainer.py:921] 2024-12-26 14:53:56,865 >> Load scheduler state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2400 [WARNING|trainer.py:942] 2024-12-26 14:53:56,868 >> Resume streaming dataset state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2400: {'epoch': 0, 'sample_in_epoch': 4800, 'num_canonical_nodes': 1, 'shuffle_seed': 42, 'initial_physical_nodes': 1} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/trainer.py:2833: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. checkpoint_rng_state = torch.load(rng_file) [WARNING|trainer.py:921] 2024-12-26 14:53:56,892 >> Load scheduler state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2400 [WARNING|trainer.py:942] 2024-12-26 14:53:56,893 >> Resume streaming dataset state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2400: {'epoch': 0, 'sample_in_epoch': 4800, 'num_canonical_nodes': 1, 'shuffle_seed': 42, 'initial_physical_nodes': 1} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/trainer.py:2833: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. checkpoint_rng_state = torch.load(rng_file) [WARNING|trainer.py:921] 2024-12-26 14:53:56,921 >> Load scheduler state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2400 [WARNING|trainer.py:942] 2024-12-26 14:53:56,922 >> Resume streaming dataset state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2400: {'epoch': 0, 'sample_in_epoch': 4800, 'num_canonical_nodes': 1, 'shuffle_seed': 42, 'initial_physical_nodes': 1} [WARNING|trainer.py:921] 2024-12-26 14:53:56,928 >> Load scheduler state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2400 [WARNING|trainer.py:942] 2024-12-26 14:53:56,929 >> Resume streaming dataset state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2400: {'epoch': 0, 'sample_in_epoch': 4800, 'num_canonical_nodes': 1, 'shuffle_seed': 42, 'initial_physical_nodes': 1} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/trainer.py:2833: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. checkpoint_rng_state = torch.load(rng_file) /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/trainer.py:2833: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. checkpoint_rng_state = torch.load(rng_file) [WARNING|trainer.py:921] 2024-12-26 14:53:56,951 >> Load scheduler state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2400 [WARNING|trainer.py:942] 2024-12-26 14:53:56,952 >> Resume streaming dataset state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2400: {'epoch': 0, 'sample_in_epoch': 4800, 'num_canonical_nodes': 1, 'shuffle_seed': 42, 'initial_physical_nodes': 1} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/trainer.py:2833: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. checkpoint_rng_state = torch.load(rng_file) [WARNING|trainer.py:921] 2024-12-26 14:53:56,968 >> Load scheduler state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2400 [WARNING|trainer.py:942] 2024-12-26 14:53:56,969 >> Resume streaming dataset state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2400: {'epoch': 0, 'sample_in_epoch': 4800, 'num_canonical_nodes': 1, 'shuffle_seed': 42, 'initial_physical_nodes': 1} [WARNING|trainer.py:921] 2024-12-26 14:53:56,983 >> Load scheduler state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2400 [WARNING|trainer.py:942] 2024-12-26 14:53:56,984 >> Resume streaming dataset state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2400: {'epoch': 0, 'sample_in_epoch': 4800, 'num_canonical_nodes': 1, 'shuffle_seed': 42, 'initial_physical_nodes': 1} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/trainer.py:2833: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. checkpoint_rng_state = torch.load(rng_file) /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/trainer.py:2833: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. checkpoint_rng_state = torch.load(rng_file) 12/26/2024 14:53:57 - WARNING - streaming.base.dataset - Because `shuffle_block_size` was not specified, it will default to max(4_000_000 // num_canonical_nodes, 1 << 18) if num_canonical_nodes is not None, otherwise 262144. Prior to Streaming v0.7.0, `shuffle_block_size` defaulted to 262144. 12/26/2024 14:53:57 - WARNING - streaming.base.dataset - Because `shuffle_block_size` was not specified, it will default to max(4_000_000 // num_canonical_nodes, 1 << 18) if num_canonical_nodes is not None, otherwise 262144. Prior to Streaming v0.7.0, `shuffle_block_size` defaulted to 262144. 12/26/2024 14:53:57 - WARNING - streaming.base.dataset - Because `shuffle_block_size` was not specified, it will default to max(4_000_000 // num_canonical_nodes, 1 << 18) if num_canonical_nodes is not None, otherwise 262144. Prior to Streaming v0.7.0, `shuffle_block_size` defaulted to 262144. 12/26/2024 14:53:57 - WARNING - streaming.base.dataset - Because `shuffle_block_size` was not specified, it will default to max(4_000_000 // num_canonical_nodes, 1 << 18) if num_canonical_nodes is not None, otherwise 262144. Prior to Streaming v0.7.0, `shuffle_block_size` defaulted to 262144. 12/26/2024 14:53:57 - WARNING - streaming.base.dataset - Because `shuffle_block_size` was not specified, it will default to max(4_000_000 // num_canonical_nodes, 1 << 18) if num_canonical_nodes is not None, otherwise 262144. Prior to Streaming v0.7.0, `shuffle_block_size` defaulted to 262144. 12/26/2024 14:53:57 - WARNING - streaming.base.dataset - Because `shuffle_block_size` was not specified, it will default to max(4_000_000 // num_canonical_nodes, 1 << 18) if num_canonical_nodes is not None, otherwise 262144. Prior to Streaming v0.7.0, `shuffle_block_size` defaulted to 262144. 12/26/2024 14:53:57 - WARNING - streaming.base.dataset - Because `shuffle_block_size` was not specified, it will default to max(4_000_000 // num_canonical_nodes, 1 << 18) if num_canonical_nodes is not None, otherwise 262144. Prior to Streaming v0.7.0, `shuffle_block_size` defaulted to 262144. [WARNING|trainer.py:921] 2024-12-26 14:54:01,580 >> Load scheduler state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2400 [WARNING|trainer.py:942] 2024-12-26 14:54:01,582 >> Resume streaming dataset state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2400: {'epoch': 0, 'sample_in_epoch': 4800, 'num_canonical_nodes': 1, 'shuffle_seed': 42, 'initial_physical_nodes': 1} [INFO|trainer.py:2134] 2024-12-26 14:54:01,589 >> ***** Running training ***** [INFO|trainer.py:2135] 2024-12-26 14:54:01,589 >> Num examples = 7,374 [INFO|trainer.py:2136] 2024-12-26 14:54:01,589 >> Num Epochs = 1 [INFO|trainer.py:2137] 2024-12-26 14:54:01,589 >> Instantaneous batch size per device = 1 [INFO|trainer.py:2140] 2024-12-26 14:54:01,589 >> Total train batch size (w. parallel, distributed & accumulation) = 16 [INFO|trainer.py:2141] 2024-12-26 14:54:01,589 >> Gradient Accumulation steps = 2 [INFO|trainer.py:2142] 2024-12-26 14:54:01,589 >> Total optimization steps = 3,687 [INFO|trainer.py:2143] 2024-12-26 14:54:01,591 >> Number of trainable parameters = 1,003,782,656 [INFO|trainer.py:2165] 2024-12-26 14:54:01,597 >> Continuing training from checkpoint, will skip to saved global_step [INFO|trainer.py:2166] 2024-12-26 14:54:01,597 >> Continuing training from epoch 0 [INFO|trainer.py:2167] 2024-12-26 14:54:01,597 >> Continuing training from global step 2400 [INFO|integration_utils.py:807] 2024-12-26 14:54:01,598 >> Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true" wandb: Currently logged in as: chtmp223. Use `wandb login --relogin` to force relogin wandb: wandb version 0.19.1 is available! To upgrade, please run: wandb: $ pip install wandb --upgrade wandb: Tracking run with wandb version 0.17.3 wandb: Run data is saved locally in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/wandb/run-20241226_145403-c7ovtvny wandb: Run `wandb offline` to turn off syncing. wandb: Syncing run balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_ wandb: ⭐️ View project at https://wandb.ai/chtmp223/prolong wandb: 🚀 View run at https://wandb.ai/chtmp223/prolong/runs/c7ovtvny /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/trainer.py:2833: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. checkpoint_rng_state = torch.load(rng_file) 12/26/2024 14:54:11 - WARNING - streaming.base.dataset - Because `shuffle_block_size` was not specified, it will default to max(4_000_000 // num_canonical_nodes, 1 << 18) if num_canonical_nodes is not None, otherwise 262144. Prior to Streaming v0.7.0, `shuffle_block_size` defaulted to 262144. 12/26/2024 14:54:11 - WARNING - streaming.base.dataset - The `replication` arg has been set to 8 and training is resuming from sample 4800. Make sure you are accounting for sample replication when using StreamingDataset's `state_dict` method for deterministic resumption. Otherwise, you will resume training from the wrong sample. /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] [INFO|trainer.py:175] 2024-12-26 14:54:56,722 >> {'loss': 0.4364, 'grad_norm': 17.1519832611084, 'learning_rate': 3.6768177505930655e-07, 'epoch': 0.00027122321670735016, 'num_input_tokens_seen': 5035261952, 'completed': '65.12% (2_401 / 3_687)', 'remaining time': '16:03:40', 'throughput': '2915.21', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 14:55:28,853 >> {'loss': 0.5551, 'grad_norm': 19.901262283325195, 'learning_rate': 3.67312776568654e-07, 'epoch': 0.0005424464334147003, 'num_input_tokens_seen': 5037359104, 'completed': '65.15% (2_402 / 3_687)', 'remaining time': '13:45:31', 'throughput': '8158.65', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 14:55:57,560 >> {'loss': 0.624, 'grad_norm': 20.185373306274414, 'learning_rate': 3.669439250977272e-07, 'epoch': 0.0008136696501220504, 'num_input_tokens_seen': 5039456256, 'completed': '65.17% (2_403 / 3_687)', 'remaining time': '12:34:42', 'throughput': '9131.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 14:56:26,588 >> {'loss': 0.5393, 'grad_norm': 17.927940368652344, 'learning_rate': 3.665752209433637e-07, 'epoch': 0.0010848928668294006, 'num_input_tokens_seen': 5041553408, 'completed': '65.20% (2_404 / 3_687)', 'remaining time': '12:00:45', 'throughput': '9030.55', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 14:56:56,750 >> {'loss': 0.3492, 'grad_norm': 13.626057624816895, 'learning_rate': 3.6620666440228254e-07, 'epoch': 0.0013561160835367507, 'num_input_tokens_seen': 5043650560, 'completed': '65.23% (2_405 / 3_687)', 'remaining time': '11:45:03', 'throughput': '8691.14', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 14:57:26,079 >> {'loss': 0.2869, 'grad_norm': 15.946044921875, 'learning_rate': 3.6583825577108397e-07, 'epoch': 0.0016273393002441008, 'num_input_tokens_seen': 5045747712, 'completed': '65.26% (2_406 / 3_687)', 'remaining time': '11:31:27', 'throughput': '8938.09', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 14:57:55,973 >> {'loss': 0.544, 'grad_norm': 24.774717330932617, 'learning_rate': 3.654699953462494e-07, 'epoch': 0.001898562516951451, 'num_input_tokens_seen': 5047844864, 'completed': '65.28% (2_407 / 3_687)', 'remaining time': '11:23:18', 'throughput': '8769.31', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 14:58:25,739 >> {'loss': 0.34, 'grad_norm': 14.22808837890625, 'learning_rate': 3.651018834241406e-07, 'epoch': 0.0021697857336588013, 'num_input_tokens_seen': 5049942016, 'completed': '65.31% (2_408 / 3_687)', 'remaining time': '11:16:44', 'throughput': '8806.74', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 14:59:00,238 >> {'loss': 0.3746, 'grad_norm': 13.739176750183105, 'learning_rate': 3.6473392030100014e-07, 'epoch': 0.0024410089503661514, 'num_input_tokens_seen': 5052039168, 'completed': '65.34% (2_409 / 3_687)', 'remaining time': '11:22:43', 'throughput': '7598.49', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 14:59:31,662 >> {'loss': 0.4212, 'grad_norm': 16.19430160522461, 'learning_rate': 3.6436610627295074e-07, 'epoch': 0.0027122321670735015, 'num_input_tokens_seen': 5054136320, 'completed': '65.36% (2_410 / 3_687)', 'remaining time': '11:20:51', 'throughput': '8342.11', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 14:59:59,722 >> {'loss': 0.6517, 'grad_norm': 18.812156677246094, 'learning_rate': 3.639984416359949e-07, 'epoch': 0.0029834553837808516, 'num_input_tokens_seen': 5056233472, 'completed': '65.39% (2_411 / 3_687)', 'remaining time': '11:12:43', 'throughput': '9342.54', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:00:30,808 >> {'loss': 0.301, 'grad_norm': 10.88464069366455, 'learning_rate': 3.636309266860156e-07, 'epoch': 0.0032546786004882017, 'num_input_tokens_seen': 5058330624, 'completed': '65.42% (2_412 / 3_687)', 'remaining time': '11:11:13', 'throughput': '8432.72', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:01:00,624 >> {'loss': 0.5711, 'grad_norm': 17.772342681884766, 'learning_rate': 3.6326356171877483e-07, 'epoch': 0.003525901817195552, 'num_input_tokens_seen': 5060427776, 'completed': '65.45% (2_413 / 3_687)', 'remaining time': '11:07:48', 'throughput': '8792.11', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:01:32,736 >> {'loss': 0.3508, 'grad_norm': 18.594541549682617, 'learning_rate': 3.6289634702991343e-07, 'epoch': 0.003797125033902902, 'num_input_tokens_seen': 5062524928, 'completed': '65.47% (2_414 / 3_687)', 'remaining time': '11:08:17', 'throughput': '8163.46', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:02:04,847 >> {'loss': 0.2594, 'grad_norm': 12.95483112335205, 'learning_rate': 3.625292829149521e-07, 'epoch': 0.0040683482506102524, 'num_input_tokens_seen': 5064622080, 'completed': '65.50% (2_415 / 3_687)', 'remaining time': '11:08:37', 'throughput': '8163.72', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:02:34,568 >> {'loss': 0.7946, 'grad_norm': 23.838085174560547, 'learning_rate': 3.6216236966929015e-07, 'epoch': 0.0043395714673176026, 'num_input_tokens_seen': 5066719232, 'completed': '65.53% (2_416 / 3_687)', 'remaining time': '11:05:41', 'throughput': '8820.07', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:03:05,536 >> {'loss': 0.359, 'grad_norm': 17.340089797973633, 'learning_rate': 3.6179560758820527e-07, 'epoch': 0.004610794684024953, 'num_input_tokens_seen': 5068816384, 'completed': '65.55% (2_417 / 3_687)', 'remaining time': '11:04:36', 'throughput': '8464.94', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:03:35,512 >> {'loss': 0.5987, 'grad_norm': 30.172731399536133, 'learning_rate': 3.61428996966854e-07, 'epoch': 0.004882017900732303, 'num_input_tokens_seen': 5070913536, 'completed': '65.58% (2_418 / 3_687)', 'remaining time': '11:02:24', 'throughput': '8745.29', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:04:10,852 >> {'loss': 0.311, 'grad_norm': 16.280479431152344, 'learning_rate': 3.610625381002701e-07, 'epoch': 0.005153241117439653, 'num_input_tokens_seen': 5073010688, 'completed': '65.61% (2_419 / 3_687)', 'remaining time': '11:06:21', 'throughput': '7417.72', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:04:42,282 >> {'loss': 0.495, 'grad_norm': 17.15519142150879, 'learning_rate': 3.606962312833659e-07, 'epoch': 0.005424464334147003, 'num_input_tokens_seen': 5075107840, 'completed': '65.64% (2_420 / 3_687)', 'remaining time': '11:05:43', 'throughput': '8340.48', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:05:14,488 >> {'loss': 0.3901, 'grad_norm': 16.51264190673828, 'learning_rate': 3.603300768109311e-07, 'epoch': 0.005695687550854353, 'num_input_tokens_seen': 5077204992, 'completed': '65.66% (2_421 / 3_687)', 'remaining time': '11:05:53', 'throughput': '8139.58', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:05:44,685 >> {'loss': 0.6579, 'grad_norm': 22.195186614990234, 'learning_rate': 3.5996407497763305e-07, 'epoch': 0.005966910767561703, 'num_input_tokens_seen': 5079302144, 'completed': '65.69% (2_422 / 3_687)', 'remaining time': '11:04:03', 'throughput': '8681.23', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:06:15,336 >> {'loss': 0.3808, 'grad_norm': 14.603880882263184, 'learning_rate': 3.5959822607801617e-07, 'epoch': 0.006238133984269053, 'num_input_tokens_seen': 5081399296, 'completed': '65.72% (2_423 / 3_687)', 'remaining time': '11:02:45', 'throughput': '8552.52', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:06:46,312 >> {'loss': 0.5294, 'grad_norm': 19.53980255126953, 'learning_rate': 3.592325304065018e-07, 'epoch': 0.006509357200976403, 'num_input_tokens_seen': 5083496448, 'completed': '65.74% (2_424 / 3_687)', 'remaining time': '11:01:48', 'throughput': '8462.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:07:15,687 >> {'loss': 0.4601, 'grad_norm': 17.005908966064453, 'learning_rate': 3.588669882573875e-07, 'epoch': 0.0067805804176837535, 'num_input_tokens_seen': 5085593600, 'completed': '65.77% (2_425 / 3_687)', 'remaining time': '10:59:32', 'throughput': '8924.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:07:44,379 >> {'loss': 0.9515, 'grad_norm': 26.571182250976562, 'learning_rate': 3.5850159992484787e-07, 'epoch': 0.007051803634391104, 'num_input_tokens_seen': 5087690752, 'completed': '65.80% (2_426 / 3_687)', 'remaining time': '10:56:52', 'throughput': '9136.30', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:08:15,169 >> {'loss': 0.3284, 'grad_norm': 15.586816787719727, 'learning_rate': 3.581363657029336e-07, 'epoch': 0.007323026851098454, 'num_input_tokens_seen': 5089787904, 'completed': '65.83% (2_427 / 3_687)', 'remaining time': '10:55:59', 'throughput': '8513.97', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:08:47,836 >> {'loss': 0.1959, 'grad_norm': 9.308566093444824, 'learning_rate': 3.5777128588557126e-07, 'epoch': 0.007594250067805804, 'num_input_tokens_seen': 5091885056, 'completed': '65.85% (2_428 / 3_687)', 'remaining time': '10:56:32', 'throughput': '8024.66', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:09:18,729 >> {'loss': 0.6265, 'grad_norm': 18.024173736572266, 'learning_rate': 3.574063607665633e-07, 'epoch': 0.007865473284513154, 'num_input_tokens_seen': 5093982208, 'completed': '65.88% (2_429 / 3_687)', 'remaining time': '10:55:43', 'throughput': '8485.54', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:09:48,173 >> {'loss': 0.4991, 'grad_norm': 17.78046417236328, 'learning_rate': 3.570415906395873e-07, 'epoch': 0.008136696501220505, 'num_input_tokens_seen': 5096079360, 'completed': '65.91% (2_430 / 3_687)', 'remaining time': '10:53:55', 'throughput': '8903.13', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:10:18,288 >> {'loss': 0.455, 'grad_norm': 16.535097122192383, 'learning_rate': 3.5667697579819655e-07, 'epoch': 0.008407919717927854, 'num_input_tokens_seen': 5098176512, 'completed': '65.93% (2_431 / 3_687)', 'remaining time': '10:52:39', 'throughput': '8704.94', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:10:48,631 >> {'loss': 0.5008, 'grad_norm': 18.70627784729004, 'learning_rate': 3.563125165358193e-07, 'epoch': 0.008679142934635205, 'num_input_tokens_seen': 5100273664, 'completed': '65.96% (2_432 / 3_687)', 'remaining time': '10:51:36', 'throughput': '8639.31', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:11:17,922 >> {'loss': 0.4694, 'grad_norm': 17.156047821044922, 'learning_rate': 3.559482131457583e-07, 'epoch': 0.008950366151342554, 'num_input_tokens_seen': 5102370816, 'completed': '65.99% (2_433 / 3_687)', 'remaining time': '10:49:54', 'throughput': '8949.70', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:11:49,615 >> {'loss': 0.4304, 'grad_norm': 15.62370777130127, 'learning_rate': 3.5558406592119115e-07, 'epoch': 0.009221589368049905, 'num_input_tokens_seen': 5104467968, 'completed': '66.02% (2_434 / 3_687)', 'remaining time': '10:49:45', 'throughput': '8271.34', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:12:18,801 >> {'loss': 0.4884, 'grad_norm': 14.981012344360352, 'learning_rate': 3.552200751551697e-07, 'epoch': 0.009492812584757255, 'num_input_tokens_seen': 5106565120, 'completed': '66.04% (2_435 / 3_687)', 'remaining time': '10:48:04', 'throughput': '8981.90', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:12:48,695 >> {'loss': 0.4192, 'grad_norm': 16.954042434692383, 'learning_rate': 3.548562411406201e-07, 'epoch': 0.009764035801464606, 'num_input_tokens_seen': 5108662272, 'completed': '66.07% (2_436 / 3_687)', 'remaining time': '10:46:53', 'throughput': '8769.02', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:13:19,340 >> {'loss': 0.4403, 'grad_norm': 20.573305130004883, 'learning_rate': 3.544925641703413e-07, 'epoch': 0.010035259018171955, 'num_input_tokens_seen': 5110759424, 'completed': '66.10% (2_437 / 3_687)', 'remaining time': '10:46:09', 'throughput': '8554.38', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:13:47,923 >> {'loss': 0.6544, 'grad_norm': 22.3472957611084, 'learning_rate': 3.5412904453700754e-07, 'epoch': 0.010306482234879306, 'num_input_tokens_seen': 5112856576, 'completed': '66.12% (2_438 / 3_687)', 'remaining time': '10:44:18', 'throughput': '9171.13', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:14:15,963 >> {'loss': 0.6427, 'grad_norm': 33.12949752807617, 'learning_rate': 3.537656825331653e-07, 'epoch': 0.010577705451586655, 'num_input_tokens_seen': 5114953728, 'completed': '66.15% (2_439 / 3_687)', 'remaining time': '10:42:14', 'throughput': '9348.85', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:14:49,229 >> {'loss': 0.501, 'grad_norm': 18.552413940429688, 'learning_rate': 3.534024784512345e-07, 'epoch': 0.010848928668294006, 'num_input_tokens_seen': 5117050880, 'completed': '66.18% (2_440 / 3_687)', 'remaining time': '10:42:58', 'throughput': '7880.35', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:15:20,514 >> {'loss': 0.3836, 'grad_norm': 15.270557403564453, 'learning_rate': 3.5303943258350813e-07, 'epoch': 0.011120151885001357, 'num_input_tokens_seen': 5119148032, 'completed': '66.21% (2_441 / 3_687)', 'remaining time': '10:42:37', 'throughput': '8379.17', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:15:51,724 >> {'loss': 0.3691, 'grad_norm': 32.084510803222656, 'learning_rate': 3.526765452221512e-07, 'epoch': 0.011391375101708706, 'num_input_tokens_seen': 5121245184, 'completed': '66.23% (2_442 / 3_687)', 'remaining time': '10:42:14', 'throughput': '8399.50', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:16:24,072 >> {'loss': 0.4435, 'grad_norm': 15.427001953125, 'learning_rate': 3.523138166592021e-07, 'epoch': 0.011662598318416057, 'num_input_tokens_seen': 5123342336, 'completed': '66.26% (2_443 / 3_687)', 'remaining time': '10:42:24', 'throughput': '8103.71', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:16:54,026 >> {'loss': 0.725, 'grad_norm': 21.460124969482422, 'learning_rate': 3.5195124718657075e-07, 'epoch': 0.011933821535123406, 'num_input_tokens_seen': 5125439488, 'completed': '66.29% (2_444 / 3_687)', 'remaining time': '10:41:24', 'throughput': '8751.49', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:17:23,822 >> {'loss': 0.2977, 'grad_norm': 15.456875801086426, 'learning_rate': 3.5158883709603946e-07, 'epoch': 0.012205044751830757, 'num_input_tokens_seen': 5127536640, 'completed': '66.31% (2_445 / 3_687)', 'remaining time': '10:40:20', 'throughput': '8798.15', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:17:58,011 >> {'loss': 0.5319, 'grad_norm': 16.195775985717773, 'learning_rate': 3.5122658667926177e-07, 'epoch': 0.012476267968538107, 'num_input_tokens_seen': 5129633792, 'completed': '66.34% (2_446 / 3_687)', 'remaining time': '10:41:17', 'throughput': '7667.44', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:18:27,105 >> {'loss': 0.5982, 'grad_norm': 16.876602172851562, 'learning_rate': 3.5086449622776346e-07, 'epoch': 0.012747491185245458, 'num_input_tokens_seen': 5131730944, 'completed': '66.37% (2_447 / 3_687)', 'remaining time': '10:39:56', 'throughput': '9010.38', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:18:57,681 >> {'loss': 0.4243, 'grad_norm': 16.981779098510742, 'learning_rate': 3.505025660329408e-07, 'epoch': 0.013018714401952807, 'num_input_tokens_seen': 5133828096, 'completed': '66.40% (2_448 / 3_687)', 'remaining time': '10:39:15', 'throughput': '8573.54', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:19:28,568 >> {'loss': 0.3373, 'grad_norm': 13.226367950439453, 'learning_rate': 3.5014079638606164e-07, 'epoch': 0.013289937618660158, 'num_input_tokens_seen': 5135925248, 'completed': '66.42% (2_449 / 3_687)', 'remaining time': '10:38:42', 'throughput': '8487.20', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:19:57,176 >> {'loss': 0.3153, 'grad_norm': 14.811175346374512, 'learning_rate': 3.497791875782643e-07, 'epoch': 0.013561160835367507, 'num_input_tokens_seen': 5138022400, 'completed': '66.45% (2_450 / 3_687)', 'remaining time': '10:37:13', 'throughput': '9163.01', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:20:29,107 >> {'loss': 0.4019, 'grad_norm': 17.1142578125, 'learning_rate': 3.4941773990055777e-07, 'epoch': 0.013832384052074858, 'num_input_tokens_seen': 5140119552, 'completed': '66.48% (2_451 / 3_687)', 'remaining time': '10:37:07', 'throughput': '8209.89', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:20:58,933 >> {'loss': 0.7408, 'grad_norm': 24.696069717407227, 'learning_rate': 3.490564536438215e-07, 'epoch': 0.014103607268782207, 'num_input_tokens_seen': 5142216704, 'completed': '66.50% (2_452 / 3_687)', 'remaining time': '10:36:10', 'throughput': '8788.97', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:21:29,219 >> {'loss': 0.4626, 'grad_norm': 15.740670204162598, 'learning_rate': 3.4869532909880485e-07, 'epoch': 0.014374830485489558, 'num_input_tokens_seen': 5144313856, 'completed': '66.53% (2_453 / 3_687)', 'remaining time': '10:35:24', 'throughput': '8655.80', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:21:59,181 >> {'loss': 0.3852, 'grad_norm': 17.356019973754883, 'learning_rate': 3.483343665561271e-07, 'epoch': 0.014646053702196907, 'num_input_tokens_seen': 5146411008, 'completed': '66.56% (2_454 / 3_687)', 'remaining time': '10:34:32', 'throughput': '8749.03', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:22:31,655 >> {'loss': 0.5229, 'grad_norm': 19.49802589416504, 'learning_rate': 3.479735663062773e-07, 'epoch': 0.014917276918904258, 'num_input_tokens_seen': 5148508160, 'completed': '66.59% (2_455 / 3_687)', 'remaining time': '10:34:37', 'throughput': '8072.48', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:23:01,569 >> {'loss': 0.3333, 'grad_norm': 11.227045059204102, 'learning_rate': 3.4761292863961354e-07, 'epoch': 0.015188500135611608, 'num_input_tokens_seen': 5150605312, 'completed': '66.61% (2_456 / 3_687)', 'remaining time': '10:33:44', 'throughput': '8763.10', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:23:31,993 >> {'loss': 0.4247, 'grad_norm': 17.022945404052734, 'learning_rate': 3.4725245384636347e-07, 'epoch': 0.015459723352318959, 'num_input_tokens_seen': 5152702464, 'completed': '66.64% (2_457 / 3_687)', 'remaining time': '10:33:03', 'throughput': '8616.41', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:24:01,898 >> {'loss': 0.2605, 'grad_norm': 12.745018005371094, 'learning_rate': 3.4689214221662364e-07, 'epoch': 0.015730946569026308, 'num_input_tokens_seen': 5154799616, 'completed': '66.67% (2_458 / 3_687)', 'remaining time': '10:32:12', 'throughput': '8765.92', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:24:33,027 >> {'loss': 0.5055, 'grad_norm': 16.132686614990234, 'learning_rate': 3.465319940403587e-07, 'epoch': 0.01600216978573366, 'num_input_tokens_seen': 5156896768, 'completed': '66.69% (2_459 / 3_687)', 'remaining time': '10:31:47', 'throughput': '8421.26', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:25:02,591 >> {'loss': 0.3678, 'grad_norm': 15.20864486694336, 'learning_rate': 3.4617200960740247e-07, 'epoch': 0.01627339300244101, 'num_input_tokens_seen': 5158993920, 'completed': '66.72% (2_460 / 3_687)', 'remaining time': '10:30:49', 'throughput': '8867.05', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:25:34,521 >> {'loss': 0.3402, 'grad_norm': 12.260754585266113, 'learning_rate': 3.4581218920745663e-07, 'epoch': 0.01654461621914836, 'num_input_tokens_seen': 5161091072, 'completed': '66.75% (2_461 / 3_687)', 'remaining time': '10:30:40', 'throughput': '8209.91', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:26:04,071 >> {'loss': 0.3149, 'grad_norm': 13.886555671691895, 'learning_rate': 3.454525331300908e-07, 'epoch': 0.016815839435855708, 'num_input_tokens_seen': 5163188224, 'completed': '66.78% (2_462 / 3_687)', 'remaining time': '10:29:43', 'throughput': '8871.26', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:26:35,288 >> {'loss': 0.4094, 'grad_norm': 22.19728660583496, 'learning_rate': 3.450930416647429e-07, 'epoch': 0.01708706265256306, 'num_input_tokens_seen': 5165285376, 'completed': '66.80% (2_463 / 3_687)', 'remaining time': '10:29:19', 'throughput': '8397.31', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:27:05,006 >> {'loss': 0.4385, 'grad_norm': 14.57243824005127, 'learning_rate': 3.4473371510071795e-07, 'epoch': 0.01735828586927041, 'num_input_tokens_seen': 5167382528, 'completed': '66.83% (2_464 / 3_687)', 'remaining time': '10:28:27', 'throughput': '8821.10', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:27:36,034 >> {'loss': 0.5753, 'grad_norm': 20.04316520690918, 'learning_rate': 3.4437455372718795e-07, 'epoch': 0.01762950908597776, 'num_input_tokens_seen': 5169479680, 'completed': '66.86% (2_465 / 3_687)', 'remaining time': '10:28:00', 'throughput': '8448.62', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:28:08,889 >> {'loss': 0.3834, 'grad_norm': 18.932300567626953, 'learning_rate': 3.440155578331925e-07, 'epoch': 0.01790073230268511, 'num_input_tokens_seen': 5171576832, 'completed': '66.88% (2_466 / 3_687)', 'remaining time': '10:28:06', 'throughput': '7978.91', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:28:36,506 >> {'loss': 0.3425, 'grad_norm': 15.249947547912598, 'learning_rate': 3.4365672770763783e-07, 'epoch': 0.01817195551939246, 'num_input_tokens_seen': 5173673984, 'completed': '66.91% (2_467 / 3_687)', 'remaining time': '10:26:36', 'throughput': '9492.15', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:29:08,955 >> {'loss': 0.3937, 'grad_norm': 16.69132423400879, 'learning_rate': 3.432980636392967e-07, 'epoch': 0.01844317873609981, 'num_input_tokens_seen': 5175771136, 'completed': '66.94% (2_468 / 3_687)', 'remaining time': '10:26:35', 'throughput': '8078.55', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:29:38,880 >> {'loss': 0.3516, 'grad_norm': 14.121512413024902, 'learning_rate': 3.429395659168084e-07, 'epoch': 0.01871440195280716, 'num_input_tokens_seen': 5177868288, 'completed': '66.97% (2_469 / 3_687)', 'remaining time': '10:25:48', 'throughput': '8760.24', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:30:06,994 >> {'loss': 0.5579, 'grad_norm': 19.293235778808594, 'learning_rate': 3.425812348286782e-07, 'epoch': 0.01898562516951451, 'num_input_tokens_seen': 5179965440, 'completed': '66.99% (2_470 / 3_687)', 'remaining time': '10:24:30', 'throughput': '9324.02', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:30:34,632 >> {'loss': 0.8517, 'grad_norm': 26.16200065612793, 'learning_rate': 3.422230706632774e-07, 'epoch': 0.01925684838622186, 'num_input_tokens_seen': 5182062592, 'completed': '67.02% (2_471 / 3_687)', 'remaining time': '10:23:05', 'throughput': '9485.00', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:31:04,208 >> {'loss': 0.4796, 'grad_norm': 16.774105072021484, 'learning_rate': 3.418650737088427e-07, 'epoch': 0.01952807160292921, 'num_input_tokens_seen': 5184159744, 'completed': '67.05% (2_472 / 3_687)', 'remaining time': '10:22:15', 'throughput': '8863.54', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:31:35,457 >> {'loss': 0.3243, 'grad_norm': 16.8173885345459, 'learning_rate': 3.415072442534767e-07, 'epoch': 0.019799294819636562, 'num_input_tokens_seen': 5186256896, 'completed': '67.07% (2_473 / 3_687)', 'remaining time': '10:21:52', 'throughput': '8388.73', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:32:06,739 >> {'loss': 0.3466, 'grad_norm': 14.54384994506836, 'learning_rate': 3.411495825851467e-07, 'epoch': 0.02007051803634391, 'num_input_tokens_seen': 5188354048, 'completed': '67.10% (2_474 / 3_687)', 'remaining time': '10:21:31', 'throughput': '8380.18', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:32:36,423 >> {'loss': 0.3709, 'grad_norm': 16.708059310913086, 'learning_rate': 3.4079208899168545e-07, 'epoch': 0.02034174125305126, 'num_input_tokens_seen': 5190451200, 'completed': '67.13% (2_475 / 3_687)', 'remaining time': '10:20:43', 'throughput': '8830.95', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:33:07,157 >> {'loss': 0.4063, 'grad_norm': 18.377099990844727, 'learning_rate': 3.404347637607899e-07, 'epoch': 0.02061296446975861, 'num_input_tokens_seen': 5192548352, 'completed': '67.15% (2_476 / 3_687)', 'remaining time': '10:20:12', 'throughput': '8529.54', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:33:36,670 >> {'loss': 0.4975, 'grad_norm': 21.520936965942383, 'learning_rate': 3.400776071800219e-07, 'epoch': 0.020884187686465962, 'num_input_tokens_seen': 5194645504, 'completed': '67.18% (2_477 / 3_687)', 'remaining time': '10:19:22', 'throughput': '8882.37', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:34:06,501 >> {'loss': 0.2697, 'grad_norm': 12.55679702758789, 'learning_rate': 3.3972061953680734e-07, 'epoch': 0.02115541090317331, 'num_input_tokens_seen': 5196742656, 'completed': '67.21% (2_478 / 3_687)', 'remaining time': '10:18:38', 'throughput': '8787.74', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:34:39,660 >> {'loss': 0.4419, 'grad_norm': 16.432504653930664, 'learning_rate': 3.3936380111843666e-07, 'epoch': 0.02142663411988066, 'num_input_tokens_seen': 5198839808, 'completed': '67.24% (2_479 / 3_687)', 'remaining time': '10:18:45', 'throughput': '7905.55', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:35:11,771 >> {'loss': 0.3704, 'grad_norm': 12.978721618652344, 'learning_rate': 3.390071522120635e-07, 'epoch': 0.021697857336588012, 'num_input_tokens_seen': 5200936960, 'completed': '67.26% (2_480 / 3_687)', 'remaining time': '10:18:35', 'throughput': '8163.77', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:35:43,392 >> {'loss': 0.3597, 'grad_norm': 13.821551322937012, 'learning_rate': 3.3865067310470554e-07, 'epoch': 0.021969080553295363, 'num_input_tokens_seen': 5203034112, 'completed': '67.29% (2_481 / 3_687)', 'remaining time': '10:18:17', 'throughput': '8290.04', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:36:12,516 >> {'loss': 0.3515, 'grad_norm': 13.253721237182617, 'learning_rate': 3.3829436408324316e-07, 'epoch': 0.022240303770002714, 'num_input_tokens_seen': 5205131264, 'completed': '67.32% (2_482 / 3_687)', 'remaining time': '10:17:22', 'throughput': '9001.01', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:36:44,328 >> {'loss': 0.4505, 'grad_norm': 15.398584365844727, 'learning_rate': 3.3793822543442074e-07, 'epoch': 0.02251152698671006, 'num_input_tokens_seen': 5207228416, 'completed': '67.34% (2_483 / 3_687)', 'remaining time': '10:17:07', 'throughput': '8240.31', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:37:16,285 >> {'loss': 0.4072, 'grad_norm': 17.346776962280273, 'learning_rate': 3.3758225744484483e-07, 'epoch': 0.022782750203417412, 'num_input_tokens_seen': 5209325568, 'completed': '67.37% (2_484 / 3_687)', 'remaining time': '10:16:54', 'throughput': '8203.24', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:37:48,188 >> {'loss': 0.4183, 'grad_norm': 14.590849876403809, 'learning_rate': 3.372264604009851e-07, 'epoch': 0.023053973420124763, 'num_input_tokens_seen': 5211422720, 'completed': '67.40% (2_485 / 3_687)', 'remaining time': '10:16:39', 'throughput': '8216.69', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:38:17,177 >> {'loss': 0.4176, 'grad_norm': 16.34534454345703, 'learning_rate': 3.3687083458917344e-07, 'epoch': 0.023325196636832114, 'num_input_tokens_seen': 5213519872, 'completed': '67.43% (2_486 / 3_687)', 'remaining time': '10:15:43', 'throughput': '9043.05', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:38:48,130 >> {'loss': 0.3908, 'grad_norm': 15.394360542297363, 'learning_rate': 3.3651538029560377e-07, 'epoch': 0.023596419853539462, 'num_input_tokens_seen': 5215617024, 'completed': '67.45% (2_487 / 3_687)', 'remaining time': '10:15:15', 'throughput': '8469.07', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:39:16,722 >> {'loss': 0.4385, 'grad_norm': 15.60722541809082, 'learning_rate': 3.361600978063325e-07, 'epoch': 0.023867643070246813, 'num_input_tokens_seen': 5217714176, 'completed': '67.48% (2_488 / 3_687)', 'remaining time': '10:14:15', 'throughput': '9168.53', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:39:46,385 >> {'loss': 0.4593, 'grad_norm': 18.028072357177734, 'learning_rate': 3.358049874072771e-07, 'epoch': 0.024138866286954164, 'num_input_tokens_seen': 5219811328, 'completed': '67.51% (2_489 / 3_687)', 'remaining time': '10:13:29', 'throughput': '8837.25', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:40:18,046 >> {'loss': 0.4059, 'grad_norm': 18.258270263671875, 'learning_rate': 3.3545004938421734e-07, 'epoch': 0.024410089503661515, 'num_input_tokens_seen': 5221908480, 'completed': '67.53% (2_490 / 3_687)', 'remaining time': '10:13:11', 'throughput': '8279.81', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:40:45,496 >> {'loss': 0.7823, 'grad_norm': 21.254680633544922, 'learning_rate': 3.3509528402279357e-07, 'epoch': 0.024681312720368862, 'num_input_tokens_seen': 5224005632, 'completed': '67.56% (2_491 / 3_687)', 'remaining time': '10:11:57', 'throughput': '9549.78', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:41:17,066 >> {'loss': 0.9546, 'grad_norm': 37.15273666381836, 'learning_rate': 3.347406916085074e-07, 'epoch': 0.024952535937076213, 'num_input_tokens_seen': 5226102784, 'completed': '67.59% (2_492 / 3_687)', 'remaining time': '10:11:38', 'throughput': '8303.73', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:41:48,190 >> {'loss': 0.7096, 'grad_norm': 19.634872436523438, 'learning_rate': 3.3438627242672164e-07, 'epoch': 0.025223759153783564, 'num_input_tokens_seen': 5228199936, 'completed': '67.62% (2_493 / 3_687)', 'remaining time': '10:11:12', 'throughput': '8422.34', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:42:19,435 >> {'loss': 0.4221, 'grad_norm': 18.12146759033203, 'learning_rate': 3.3403202676265875e-07, 'epoch': 0.025494982370490915, 'num_input_tokens_seen': 5230297088, 'completed': '67.64% (2_494 / 3_687)', 'remaining time': '10:10:48', 'throughput': '8390.26', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:42:52,937 >> {'loss': 0.2631, 'grad_norm': 11.500085830688477, 'learning_rate': 3.336779549014026e-07, 'epoch': 0.025766205587198263, 'num_input_tokens_seen': 5232394240, 'completed': '67.67% (2_495 / 3_687)', 'remaining time': '10:10:53', 'throughput': '7824.50', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:43:22,285 >> {'loss': 0.3061, 'grad_norm': 13.041985511779785, 'learning_rate': 3.333240571278968e-07, 'epoch': 0.026037428803905614, 'num_input_tokens_seen': 5234491392, 'completed': '67.70% (2_496 / 3_687)', 'remaining time': '10:10:04', 'throughput': '8932.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:43:53,848 >> {'loss': 0.2412, 'grad_norm': 9.483246803283691, 'learning_rate': 3.3297033372694473e-07, 'epoch': 0.026308652020612965, 'num_input_tokens_seen': 5236588544, 'completed': '67.72% (2_497 / 3_687)', 'remaining time': '10:09:44', 'throughput': '8305.46', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:44:20,471 >> {'loss': 0.8186, 'grad_norm': 20.214548110961914, 'learning_rate': 3.3261678498320954e-07, 'epoch': 0.026579875237320316, 'num_input_tokens_seen': 5238685696, 'completed': '67.75% (2_498 / 3_687)', 'remaining time': '10:08:23', 'throughput': '9846.55', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:44:47,452 >> {'loss': 0.5537, 'grad_norm': 16.735645294189453, 'learning_rate': 3.3226341118121367e-07, 'epoch': 0.026851098454027666, 'num_input_tokens_seen': 5240782848, 'completed': '67.78% (2_499 / 3_687)', 'remaining time': '10:07:08', 'throughput': '9716.01', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:45:18,670 >> {'loss': 0.3094, 'grad_norm': 11.246064186096191, 'learning_rate': 3.319102126053389e-07, 'epoch': 0.027122321670735014, 'num_input_tokens_seen': 5242880000, 'completed': '67.81% (2_500 / 3_687)', 'remaining time': '10:06:44', 'throughput': '8397.08', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:45:51,530 >> {'loss': 0.2749, 'grad_norm': 16.4909610748291, 'learning_rate': 3.315571895398261e-07, 'epoch': 0.027393544887442365, 'num_input_tokens_seen': 5244977152, 'completed': '67.83% (2_501 / 3_687)', 'remaining time': '10:06:39', 'throughput': '7977.61', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:46:22,796 >> {'loss': 0.3926, 'grad_norm': 14.262845993041992, 'learning_rate': 3.312043422687749e-07, 'epoch': 0.027664768104149716, 'num_input_tokens_seen': 5247074304, 'completed': '67.86% (2_502 / 3_687)', 'remaining time': '10:06:15', 'throughput': '8384.47', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:46:55,572 >> {'loss': 0.2012, 'grad_norm': 11.372760772705078, 'learning_rate': 3.3085167107614297e-07, 'epoch': 0.027935991320857067, 'num_input_tokens_seen': 5249171456, 'completed': '67.89% (2_503 / 3_687)', 'remaining time': '10:06:08', 'throughput': '7997.86', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:47:24,427 >> {'loss': 0.7148, 'grad_norm': 21.803176879882812, 'learning_rate': 3.3049917624574737e-07, 'epoch': 0.028207214537564414, 'num_input_tokens_seen': 5251268608, 'completed': '67.91% (2_504 / 3_687)', 'remaining time': '10:05:16', 'throughput': '9085.02', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:47:56,443 >> {'loss': 0.3626, 'grad_norm': 14.963140487670898, 'learning_rate': 3.301468580612619e-07, 'epoch': 0.028478437754271765, 'num_input_tokens_seen': 5253365760, 'completed': '67.94% (2_505 / 3_687)', 'remaining time': '10:05:00', 'throughput': '8187.77', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:48:27,253 >> {'loss': 0.325, 'grad_norm': 16.734357833862305, 'learning_rate': 3.2979471680621903e-07, 'epoch': 0.028749660970979116, 'num_input_tokens_seen': 5255462912, 'completed': '67.97% (2_506 / 3_687)', 'remaining time': '10:04:31', 'throughput': '8508.45', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:48:55,714 >> {'loss': 0.5537, 'grad_norm': 18.527103424072266, 'learning_rate': 3.2944275276400857e-07, 'epoch': 0.029020884187686467, 'num_input_tokens_seen': 5257560064, 'completed': '68.00% (2_507 / 3_687)', 'remaining time': '10:03:35', 'throughput': '9210.87', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:49:26,266 >> {'loss': 0.4739, 'grad_norm': 17.63797950744629, 'learning_rate': 3.290909662178779e-07, 'epoch': 0.029292107404393815, 'num_input_tokens_seen': 5259657216, 'completed': '68.02% (2_508 / 3_687)', 'remaining time': '10:03:03', 'throughput': '8580.18', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:49:56,659 >> {'loss': 0.1794, 'grad_norm': 11.94525146484375, 'learning_rate': 3.2873935745093145e-07, 'epoch': 0.029563330621101166, 'num_input_tokens_seen': 5261754368, 'completed': '68.05% (2_509 / 3_687)', 'remaining time': '10:02:29', 'throughput': '8625.19', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:50:28,804 >> {'loss': 0.3807, 'grad_norm': 15.735551834106445, 'learning_rate': 3.283879267461305e-07, 'epoch': 0.029834553837808517, 'num_input_tokens_seen': 5263851520, 'completed': '68.08% (2_510 / 3_687)', 'remaining time': '10:02:14', 'throughput': '8155.10', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:50:58,987 >> {'loss': 0.2682, 'grad_norm': 14.347485542297363, 'learning_rate': 3.280366743862931e-07, 'epoch': 0.030105777054515868, 'num_input_tokens_seen': 5265948672, 'completed': '68.10% (2_511 / 3_687)', 'remaining time': '10:01:38', 'throughput': '8685.13', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:51:33,129 >> {'loss': 0.4171, 'grad_norm': 16.711332321166992, 'learning_rate': 3.276856006540939e-07, 'epoch': 0.030377000271223215, 'num_input_tokens_seen': 5268045824, 'completed': '68.13% (2_512 / 3_687)', 'remaining time': '10:01:43', 'throughput': '7677.88', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:52:04,025 >> {'loss': 0.7826, 'grad_norm': 22.094377517700195, 'learning_rate': 3.2733470583206357e-07, 'epoch': 0.030648223487930566, 'num_input_tokens_seen': 5270142976, 'completed': '68.16% (2_513 / 3_687)', 'remaining time': '10:01:14', 'throughput': '8484.85', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:52:34,386 >> {'loss': 0.6522, 'grad_norm': 24.63713264465332, 'learning_rate': 3.2698399020258895e-07, 'epoch': 0.030919446704637917, 'num_input_tokens_seen': 5272240128, 'completed': '68.19% (2_514 / 3_687)', 'remaining time': '10:00:40', 'throughput': '8634.10', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:53:05,024 >> {'loss': 0.3224, 'grad_norm': 14.236292839050293, 'learning_rate': 3.266334540479128e-07, 'epoch': 0.031190669921345268, 'num_input_tokens_seen': 5274337280, 'completed': '68.21% (2_515 / 3_687)', 'remaining time': '10:00:08', 'throughput': '8556.35', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:53:35,004 >> {'loss': 0.4438, 'grad_norm': 17.167448043823242, 'learning_rate': 3.262830976501329e-07, 'epoch': 0.031461893138052616, 'num_input_tokens_seen': 5276434432, 'completed': '68.24% (2_516 / 3_687)', 'remaining time': '9:59:30', 'throughput': '8743.83', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:54:07,321 >> {'loss': 0.3641, 'grad_norm': 14.156777381896973, 'learning_rate': 3.2593292129120295e-07, 'epoch': 0.03173311635475997, 'num_input_tokens_seen': 5278531584, 'completed': '68.27% (2_517 / 3_687)', 'remaining time': '9:59:15', 'throughput': '8111.71', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:54:36,172 >> {'loss': 0.4017, 'grad_norm': 16.016916275024414, 'learning_rate': 3.2558292525293156e-07, 'epoch': 0.03200433957146732, 'num_input_tokens_seen': 5280628736, 'completed': '68.29% (2_518 / 3_687)', 'remaining time': '9:58:26', 'throughput': '9086.20', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:55:07,397 >> {'loss': 0.4632, 'grad_norm': 17.823566436767578, 'learning_rate': 3.2523310981698213e-07, 'epoch': 0.032275562788174665, 'num_input_tokens_seen': 5282725888, 'completed': '68.32% (2_519 / 3_687)', 'remaining time': '9:58:00', 'throughput': '8395.19', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:55:36,886 >> {'loss': 0.6673, 'grad_norm': 21.00741958618164, 'learning_rate': 3.24883475264873e-07, 'epoch': 0.03254678600488202, 'num_input_tokens_seen': 5284823040, 'completed': '68.35% (2_520 / 3_687)', 'remaining time': '9:57:17', 'throughput': '8889.63', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:56:07,967 >> {'loss': 0.4991, 'grad_norm': 14.62010669708252, 'learning_rate': 3.2453402187797684e-07, 'epoch': 0.03281800922158937, 'num_input_tokens_seen': 5286920192, 'completed': '68.38% (2_521 / 3_687)', 'remaining time': '9:56:50', 'throughput': '8434.15', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:56:38,269 >> {'loss': 0.4899, 'grad_norm': 17.261287689208984, 'learning_rate': 3.241847499375201e-07, 'epoch': 0.03308923243829672, 'num_input_tokens_seen': 5289017344, 'completed': '68.40% (2_522 / 3_687)', 'remaining time': '9:56:16', 'throughput': '8650.99', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:57:07,140 >> {'loss': 0.2637, 'grad_norm': 14.625133514404297, 'learning_rate': 3.238356597245837e-07, 'epoch': 0.03336045565500407, 'num_input_tokens_seen': 5291114496, 'completed': '68.43% (2_523 / 3_687)', 'remaining time': '9:55:27', 'throughput': '9080.09', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:57:36,373 >> {'loss': 0.4681, 'grad_norm': 17.17628288269043, 'learning_rate': 3.2348675152010217e-07, 'epoch': 0.033631678871711417, 'num_input_tokens_seen': 5293211648, 'completed': '68.46% (2_524 / 3_687)', 'remaining time': '9:54:43', 'throughput': '8967.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:58:07,870 >> {'loss': 0.5437, 'grad_norm': 21.153200149536133, 'learning_rate': 3.2313802560486353e-07, 'epoch': 0.03390290208841877, 'num_input_tokens_seen': 5295308800, 'completed': '68.48% (2_525 / 3_687)', 'remaining time': '9:54:20', 'throughput': '8322.85', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:58:38,746 >> {'loss': 0.607, 'grad_norm': 21.99086570739746, 'learning_rate': 3.2278948225950916e-07, 'epoch': 0.03417412530512612, 'num_input_tokens_seen': 5297405952, 'completed': '68.51% (2_526 / 3_687)', 'remaining time': '9:53:51', 'throughput': '8490.18', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:59:08,957 >> {'loss': 0.4772, 'grad_norm': 16.050336837768555, 'learning_rate': 3.2244112176453343e-07, 'epoch': 0.034445348521833466, 'num_input_tokens_seen': 5299503104, 'completed': '68.54% (2_527 / 3_687)', 'remaining time': '9:53:16', 'throughput': '8677.02', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 15:59:39,223 >> {'loss': 0.6767, 'grad_norm': 19.21502685546875, 'learning_rate': 3.2209294440028366e-07, 'epoch': 0.03471657173854082, 'num_input_tokens_seen': 5301600256, 'completed': '68.57% (2_528 / 3_687)', 'remaining time': '9:52:41', 'throughput': '8661.51', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:00:12,247 >> {'loss': 0.3405, 'grad_norm': 14.736310958862305, 'learning_rate': 3.2174495044695973e-07, 'epoch': 0.03498779495524817, 'num_input_tokens_seen': 5303697408, 'completed': '68.59% (2_529 / 3_687)', 'remaining time': '9:52:32', 'throughput': '7937.81', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:00:44,055 >> {'loss': 0.4872, 'grad_norm': 17.85927391052246, 'learning_rate': 3.2139714018461396e-07, 'epoch': 0.03525901817195552, 'num_input_tokens_seen': 5305794560, 'completed': '68.62% (2_530 / 3_687)', 'remaining time': '9:52:11', 'throughput': '8241.67', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:01:14,830 >> {'loss': 0.5331, 'grad_norm': 16.15865707397461, 'learning_rate': 3.2104951389315073e-07, 'epoch': 0.03553024138866287, 'num_input_tokens_seen': 5307891712, 'completed': '68.65% (2_531 / 3_687)', 'remaining time': '9:51:41', 'throughput': '8517.84', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:01:43,240 >> {'loss': 0.6059, 'grad_norm': 21.816003799438477, 'learning_rate': 3.207020718523266e-07, 'epoch': 0.03580146460537022, 'num_input_tokens_seen': 5309988864, 'completed': '68.67% (2_532 / 3_687)', 'remaining time': '9:50:50', 'throughput': '9227.23', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:02:14,090 >> {'loss': 0.3919, 'grad_norm': 15.507525444030762, 'learning_rate': 3.2035481434174966e-07, 'epoch': 0.03607268782207757, 'num_input_tokens_seen': 5312086016, 'completed': '68.70% (2_533 / 3_687)', 'remaining time': '9:50:21', 'throughput': '8497.39', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:02:43,737 >> {'loss': 0.4488, 'grad_norm': 15.8746919631958, 'learning_rate': 3.200077416408794e-07, 'epoch': 0.03634391103878492, 'num_input_tokens_seen': 5314183168, 'completed': '68.73% (2_534 / 3_687)', 'remaining time': '9:49:41', 'throughput': '8842.23', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:03:13,383 >> {'loss': 0.627, 'grad_norm': 20.78430938720703, 'learning_rate': 3.196608540290266e-07, 'epoch': 0.03661513425549227, 'num_input_tokens_seen': 5316280320, 'completed': '68.76% (2_535 / 3_687)', 'remaining time': '9:49:01', 'throughput': '8842.53', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:03:45,329 >> {'loss': 0.5527, 'grad_norm': 21.117902755737305, 'learning_rate': 3.193141517853536e-07, 'epoch': 0.03688635747219962, 'num_input_tokens_seen': 5318377472, 'completed': '68.78% (2_536 / 3_687)', 'remaining time': '9:48:41', 'throughput': '8205.90', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:04:16,028 >> {'loss': 0.4613, 'grad_norm': 14.952088356018066, 'learning_rate': 3.1896763518887305e-07, 'epoch': 0.03715758068890697, 'num_input_tokens_seen': 5320474624, 'completed': '68.81% (2_537 / 3_687)', 'remaining time': '9:48:11', 'throughput': '8538.98', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:04:43,100 >> {'loss': 0.8986, 'grad_norm': 25.334875106811523, 'learning_rate': 3.186213045184484e-07, 'epoch': 0.03742880390561432, 'num_input_tokens_seen': 5322571776, 'completed': '68.84% (2_538 / 3_687)', 'remaining time': '9:47:10', 'throughput': '9683.30', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:05:16,282 >> {'loss': 0.3267, 'grad_norm': 12.866583824157715, 'learning_rate': 3.1827516005279306e-07, 'epoch': 0.03770002712232167, 'num_input_tokens_seen': 5324668928, 'completed': '68.86% (2_539 / 3_687)', 'remaining time': '9:47:00', 'throughput': '7900.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:05:46,997 >> {'loss': 0.3742, 'grad_norm': 14.773965835571289, 'learning_rate': 3.1792920207047114e-07, 'epoch': 0.03797125033902902, 'num_input_tokens_seen': 5326766080, 'completed': '68.89% (2_540 / 3_687)', 'remaining time': '9:46:30', 'throughput': '8534.74', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:06:19,421 >> {'loss': 0.4086, 'grad_norm': 17.13579559326172, 'learning_rate': 3.175834308498964e-07, 'epoch': 0.03824247355573637, 'num_input_tokens_seen': 5328863232, 'completed': '68.92% (2_541 / 3_687)', 'remaining time': '9:46:13', 'throughput': '8084.78', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:06:49,500 >> {'loss': 0.3569, 'grad_norm': 16.100215911865234, 'learning_rate': 3.172378466693325e-07, 'epoch': 0.03851369677244372, 'num_input_tokens_seen': 5330960384, 'completed': '68.94% (2_542 / 3_687)', 'remaining time': '9:45:38', 'throughput': '8715.31', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:07:17,444 >> {'loss': 0.9546, 'grad_norm': 22.463777542114258, 'learning_rate': 3.168924498068923e-07, 'epoch': 0.038784919989151075, 'num_input_tokens_seen': 5333057536, 'completed': '68.97% (2_543 / 3_687)', 'remaining time': '9:44:45', 'throughput': '9381.06', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:07:48,533 >> {'loss': 0.4266, 'grad_norm': 17.804658889770508, 'learning_rate': 3.1654724054053805e-07, 'epoch': 0.03905614320585842, 'num_input_tokens_seen': 5335154688, 'completed': '69.00% (2_544 / 3_687)', 'remaining time': '9:44:18', 'throughput': '8431.91', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:08:20,914 >> {'loss': 0.4095, 'grad_norm': 16.298038482666016, 'learning_rate': 3.1620221914808115e-07, 'epoch': 0.03932736642256577, 'num_input_tokens_seen': 5337251840, 'completed': '69.03% (2_545 / 3_687)', 'remaining time': '9:44:00', 'throughput': '8095.61', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:08:52,820 >> {'loss': 0.3647, 'grad_norm': 18.965286254882812, 'learning_rate': 3.1585738590718157e-07, 'epoch': 0.039598589639273124, 'num_input_tokens_seen': 5339348992, 'completed': '69.05% (2_546 / 3_687)', 'remaining time': '9:43:39', 'throughput': '8216.21', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:09:21,858 >> {'loss': 0.6584, 'grad_norm': 22.938982009887695, 'learning_rate': 3.1551274109534805e-07, 'epoch': 0.03986981285598047, 'num_input_tokens_seen': 5341446144, 'completed': '69.08% (2_547 / 3_687)', 'remaining time': '9:42:56', 'throughput': '9027.63', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:09:52,493 >> {'loss': 0.6629, 'grad_norm': 20.005109786987305, 'learning_rate': 3.151682849899376e-07, 'epoch': 0.04014103607268782, 'num_input_tokens_seen': 5343543296, 'completed': '69.11% (2_548 / 3_687)', 'remaining time': '9:42:25', 'throughput': '8556.99', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:10:23,366 >> {'loss': 0.4334, 'grad_norm': 16.92494773864746, 'learning_rate': 3.148240178681553e-07, 'epoch': 0.040412259289395173, 'num_input_tokens_seen': 5345640448, 'completed': '69.13% (2_549 / 3_687)', 'remaining time': '9:41:56', 'throughput': '8490.97', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:10:56,298 >> {'loss': 0.6027, 'grad_norm': 20.46824073791504, 'learning_rate': 3.1447994000705456e-07, 'epoch': 0.04068348250610252, 'num_input_tokens_seen': 5347737600, 'completed': '69.16% (2_550 / 3_687)', 'remaining time': '9:41:42', 'throughput': '7960.10', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:11:25,510 >> {'loss': 0.3774, 'grad_norm': 13.558602333068848, 'learning_rate': 3.141360516835356e-07, 'epoch': 0.040954705722809875, 'num_input_tokens_seen': 5349834752, 'completed': '69.19% (2_551 / 3_687)', 'remaining time': '9:41:00', 'throughput': '8974.01', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:11:56,187 >> {'loss': 0.1835, 'grad_norm': 8.050308227539062, 'learning_rate': 3.1379235317434703e-07, 'epoch': 0.04122592893951722, 'num_input_tokens_seen': 5351931904, 'completed': '69.22% (2_552 / 3_687)', 'remaining time': '9:40:29', 'throughput': '8545.32', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:12:26,801 >> {'loss': 0.3894, 'grad_norm': 15.542895317077637, 'learning_rate': 3.134488447560843e-07, 'epoch': 0.04149715215622457, 'num_input_tokens_seen': 5354029056, 'completed': '69.24% (2_553 / 3_687)', 'remaining time': '9:39:58', 'throughput': '8562.72', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:12:57,223 >> {'loss': 0.7527, 'grad_norm': 25.710311889648438, 'learning_rate': 3.1310552670518987e-07, 'epoch': 0.041768375372931925, 'num_input_tokens_seen': 5356126208, 'completed': '69.27% (2_554 / 3_687)', 'remaining time': '9:39:25', 'throughput': '8616.97', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:13:26,525 >> {'loss': 0.4483, 'grad_norm': 19.23603057861328, 'learning_rate': 3.127623992979532e-07, 'epoch': 0.04203959858963927, 'num_input_tokens_seen': 5358223360, 'completed': '69.30% (2_555 / 3_687)', 'remaining time': '9:38:45', 'throughput': '8946.33', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:13:56,487 >> {'loss': 0.3604, 'grad_norm': 16.744998931884766, 'learning_rate': 3.124194628105098e-07, 'epoch': 0.04231082180634662, 'num_input_tokens_seen': 5360320512, 'completed': '69.32% (2_556 / 3_687)', 'remaining time': '9:38:09', 'throughput': '8749.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:14:27,419 >> {'loss': 0.5432, 'grad_norm': 19.892236709594727, 'learning_rate': 3.120767175188422e-07, 'epoch': 0.042582045023053974, 'num_input_tokens_seen': 5362417664, 'completed': '69.35% (2_557 / 3_687)', 'remaining time': '9:37:40', 'throughput': '8474.90', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:14:55,957 >> {'loss': 0.6695, 'grad_norm': 21.464439392089844, 'learning_rate': 3.1173416369877864e-07, 'epoch': 0.04285326823976132, 'num_input_tokens_seen': 5364514816, 'completed': '69.38% (2_558 / 3_687)', 'remaining time': '9:36:54', 'throughput': '9185.68', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:15:25,632 >> {'loss': 0.4755, 'grad_norm': 18.040634155273438, 'learning_rate': 3.1139180162599346e-07, 'epoch': 0.043124491456468676, 'num_input_tokens_seen': 5366611968, 'completed': '69.41% (2_559 / 3_687)', 'remaining time': '9:36:16', 'throughput': '8834.00', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:15:57,652 >> {'loss': 0.3587, 'grad_norm': 16.184511184692383, 'learning_rate': 3.110496315760065e-07, 'epoch': 0.043395714673176024, 'num_input_tokens_seen': 5368709120, 'completed': '69.43% (2_560 / 3_687)', 'remaining time': '9:35:55', 'throughput': '8186.87', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:16:29,577 >> {'loss': 0.4959, 'grad_norm': 14.748477935791016, 'learning_rate': 3.107076538241835e-07, 'epoch': 0.04366693788988337, 'num_input_tokens_seen': 5370806272, 'completed': '69.46% (2_561 / 3_687)', 'remaining time': '9:35:34', 'throughput': '8211.19', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:16:59,308 >> {'loss': 0.3855, 'grad_norm': 15.208799362182617, 'learning_rate': 3.103658686457349e-07, 'epoch': 0.043938161106590726, 'num_input_tokens_seen': 5372903424, 'completed': '69.49% (2_562 / 3_687)', 'remaining time': '9:34:56', 'throughput': '8817.25', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:17:28,641 >> {'loss': 0.3119, 'grad_norm': 14.134151458740234, 'learning_rate': 3.1002427631571646e-07, 'epoch': 0.04420938432329807, 'num_input_tokens_seen': 5375000576, 'completed': '69.51% (2_563 / 3_687)', 'remaining time': '9:34:17', 'throughput': '8936.78', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:18:02,899 >> {'loss': 0.3008, 'grad_norm': 14.675880432128906, 'learning_rate': 3.0968287710902866e-07, 'epoch': 0.04448060754000543, 'num_input_tokens_seen': 5377097728, 'completed': '69.54% (2_564 / 3_687)', 'remaining time': '9:34:11', 'throughput': '7652.13', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:18:29,658 >> {'loss': 0.638, 'grad_norm': 18.92854881286621, 'learning_rate': 3.0934167130041666e-07, 'epoch': 0.044751830756712775, 'num_input_tokens_seen': 5379194880, 'completed': '69.57% (2_565 / 3_687)', 'remaining time': '9:33:13', 'throughput': '9796.45', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:18:58,294 >> {'loss': 0.5573, 'grad_norm': 17.33856773376465, 'learning_rate': 3.090006591644698e-07, 'epoch': 0.04502305397342012, 'num_input_tokens_seen': 5381292032, 'completed': '69.60% (2_566 / 3_687)', 'remaining time': '9:32:29', 'throughput': '9154.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:19:31,905 >> {'loss': 0.2354, 'grad_norm': 10.942505836486816, 'learning_rate': 3.0865984097562183e-07, 'epoch': 0.04529427719012748, 'num_input_tokens_seen': 5383389184, 'completed': '69.62% (2_567 / 3_687)', 'remaining time': '9:32:18', 'throughput': '7799.37', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:20:04,401 >> {'loss': 0.5418, 'grad_norm': 17.489700317382812, 'learning_rate': 3.083192170081501e-07, 'epoch': 0.045565500406834825, 'num_input_tokens_seen': 5385486336, 'completed': '69.65% (2_568 / 3_687)', 'remaining time': '9:32:00', 'throughput': '8066.84', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:20:34,006 >> {'loss': 0.4361, 'grad_norm': 17.590167999267578, 'learning_rate': 3.079787875361759e-07, 'epoch': 0.04583672362354217, 'num_input_tokens_seen': 5387583488, 'completed': '69.68% (2_569 / 3_687)', 'remaining time': '9:31:22', 'throughput': '8854.78', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:21:04,350 >> {'loss': 0.4253, 'grad_norm': 20.286165237426758, 'learning_rate': 3.0763855283366386e-07, 'epoch': 0.04610794684024953, 'num_input_tokens_seen': 5389680640, 'completed': '69.70% (2_570 / 3_687)', 'remaining time': '9:30:49', 'throughput': '8639.15', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:21:34,729 >> {'loss': 0.2986, 'grad_norm': 13.312159538269043, 'learning_rate': 3.07298513174422e-07, 'epoch': 0.046379170056956874, 'num_input_tokens_seen': 5391777792, 'completed': '69.73% (2_571 / 3_687)', 'remaining time': '9:30:17', 'throughput': '8629.23', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:22:02,937 >> {'loss': 0.5234, 'grad_norm': 17.008472442626953, 'learning_rate': 3.0695866883210143e-07, 'epoch': 0.04665039327366423, 'num_input_tokens_seen': 5393874944, 'completed': '69.76% (2_572 / 3_687)', 'remaining time': '9:29:30', 'throughput': '9292.94', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:22:30,369 >> {'loss': 0.6498, 'grad_norm': 25.539731979370117, 'learning_rate': 3.0661902008019556e-07, 'epoch': 0.046921616490371576, 'num_input_tokens_seen': 5395972096, 'completed': '69.79% (2_573 / 3_687)', 'remaining time': '9:28:39', 'throughput': '9556.44', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:23:01,782 >> {'loss': 0.8046, 'grad_norm': 23.586692810058594, 'learning_rate': 3.062795671920411e-07, 'epoch': 0.047192839707078924, 'num_input_tokens_seen': 5398069248, 'completed': '69.81% (2_574 / 3_687)', 'remaining time': '9:28:13', 'throughput': '8345.03', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:23:34,750 >> {'loss': 0.3061, 'grad_norm': 13.434320449829102, 'learning_rate': 3.059403104408166e-07, 'epoch': 0.04746406292378628, 'num_input_tokens_seen': 5400166400, 'completed': '69.84% (2_575 / 3_687)', 'remaining time': '9:27:57', 'throughput': '7951.49', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:24:05,762 >> {'loss': 0.2875, 'grad_norm': 11.639397621154785, 'learning_rate': 3.0560125009954296e-07, 'epoch': 0.047735286140493625, 'num_input_tokens_seen': 5402263552, 'completed': '69.87% (2_576 / 3_687)', 'remaining time': '9:27:29', 'throughput': '8452.79', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:24:37,984 >> {'loss': 0.5249, 'grad_norm': 23.569503784179688, 'learning_rate': 3.0526238644108335e-07, 'epoch': 0.04800650935720097, 'num_input_tokens_seen': 5404360704, 'completed': '69.89% (2_577 / 3_687)', 'remaining time': '9:27:08', 'throughput': '8135.75', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:25:09,862 >> {'loss': 0.5019, 'grad_norm': 17.796600341796875, 'learning_rate': 3.049237197381424e-07, 'epoch': 0.04827773257390833, 'num_input_tokens_seen': 5406457856, 'completed': '69.92% (2_578 / 3_687)', 'remaining time': '9:26:45', 'throughput': '8223.15', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:25:43,848 >> {'loss': 0.5809, 'grad_norm': 18.38736915588379, 'learning_rate': 3.045852502632657e-07, 'epoch': 0.048548955790615675, 'num_input_tokens_seen': 5408555008, 'completed': '69.95% (2_579 / 3_687)', 'remaining time': '9:26:35', 'throughput': '7713.28', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:26:15,140 >> {'loss': 0.2202, 'grad_norm': 10.763379096984863, 'learning_rate': 3.042469782888409e-07, 'epoch': 0.04882017900732303, 'num_input_tokens_seen': 5410652160, 'completed': '69.98% (2_580 / 3_687)', 'remaining time': '9:26:08', 'throughput': '8377.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:26:44,192 >> {'loss': 0.4292, 'grad_norm': 31.74101448059082, 'learning_rate': 3.0390890408709645e-07, 'epoch': 0.04909140222403038, 'num_input_tokens_seen': 5412749312, 'completed': '70.00% (2_581 / 3_687)', 'remaining time': '9:25:28', 'throughput': '9023.21', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:27:15,432 >> {'loss': 0.3474, 'grad_norm': 14.116605758666992, 'learning_rate': 3.0357102793010145e-07, 'epoch': 0.049362625440737724, 'num_input_tokens_seen': 5414846464, 'completed': '70.03% (2_582 / 3_687)', 'remaining time': '9:25:00', 'throughput': '8391.53', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:27:46,553 >> {'loss': 0.1974, 'grad_norm': 11.441993713378906, 'learning_rate': 3.032333500897659e-07, 'epoch': 0.04963384865744508, 'num_input_tokens_seen': 5416943616, 'completed': '70.06% (2_583 / 3_687)', 'remaining time': '9:24:32', 'throughput': '8423.28', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:28:17,946 >> {'loss': 0.4156, 'grad_norm': 17.127378463745117, 'learning_rate': 3.0289587083784e-07, 'epoch': 0.049905071874152426, 'num_input_tokens_seen': 5419040768, 'completed': '70.08% (2_584 / 3_687)', 'remaining time': '9:24:06', 'throughput': '8350.47', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:28:51,235 >> {'loss': 0.3239, 'grad_norm': 13.827837944030762, 'learning_rate': 3.0255859044591425e-07, 'epoch': 0.05017629509085978, 'num_input_tokens_seen': 5421137920, 'completed': '70.11% (2_585 / 3_687)', 'remaining time': '9:23:51', 'throughput': '7874.77', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:29:21,316 >> {'loss': 0.2857, 'grad_norm': 11.431439399719238, 'learning_rate': 3.02221509185419e-07, 'epoch': 0.05044751830756713, 'num_input_tokens_seen': 5423235072, 'completed': '70.14% (2_586 / 3_687)', 'remaining time': '9:23:16', 'throughput': '8714.61', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:29:53,827 >> {'loss': 0.3933, 'grad_norm': 23.875659942626953, 'learning_rate': 3.0188462732762457e-07, 'epoch': 0.050718741524274476, 'num_input_tokens_seen': 5425332224, 'completed': '70.17% (2_587 / 3_687)', 'remaining time': '9:22:56', 'throughput': '8063.11', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:30:24,623 >> {'loss': 0.3873, 'grad_norm': 16.398712158203125, 'learning_rate': 3.015479451436406e-07, 'epoch': 0.05098996474098183, 'num_input_tokens_seen': 5427429376, 'completed': '70.19% (2_588 / 3_687)', 'remaining time': '9:22:26', 'throughput': '8512.47', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:30:56,282 >> {'loss': 0.4602, 'grad_norm': 17.132213592529297, 'learning_rate': 3.0121146290441593e-07, 'epoch': 0.05126118795768918, 'num_input_tokens_seen': 5429526528, 'completed': '70.22% (2_589 / 3_687)', 'remaining time': '9:22:01', 'throughput': '8280.25', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:31:26,993 >> {'loss': 0.6185, 'grad_norm': 21.008514404296875, 'learning_rate': 3.0087518088073905e-07, 'epoch': 0.051532411174396525, 'num_input_tokens_seen': 5431623680, 'completed': '70.25% (2_590 / 3_687)', 'remaining time': '9:21:30', 'throughput': '8535.84', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:31:55,220 >> {'loss': 0.4214, 'grad_norm': 19.426393508911133, 'learning_rate': 3.005390993432366e-07, 'epoch': 0.05180363439110388, 'num_input_tokens_seen': 5433720832, 'completed': '70.27% (2_591 / 3_687)', 'remaining time': '9:20:45', 'throughput': '9286.88', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:32:27,734 >> {'loss': 0.3765, 'grad_norm': 15.54585075378418, 'learning_rate': 3.00203218562374e-07, 'epoch': 0.05207485760781123, 'num_input_tokens_seen': 5435817984, 'completed': '70.30% (2_592 / 3_687)', 'remaining time': '9:20:25', 'throughput': '8062.38', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:33:01,412 >> {'loss': 0.2658, 'grad_norm': 15.346580505371094, 'learning_rate': 2.9986753880845596e-07, 'epoch': 0.05234608082451858, 'num_input_tokens_seen': 5437915136, 'completed': '70.33% (2_593 / 3_687)', 'remaining time': '9:20:11', 'throughput': '7783.79', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:33:31,572 >> {'loss': 0.2507, 'grad_norm': 11.858630180358887, 'learning_rate': 2.9953206035162433e-07, 'epoch': 0.05261730404122593, 'num_input_tokens_seen': 5440012288, 'completed': '70.36% (2_594 / 3_687)', 'remaining time': '9:19:37', 'throughput': '8692.01', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:34:01,306 >> {'loss': 0.3481, 'grad_norm': 14.775229454040527, 'learning_rate': 2.991967834618597e-07, 'epoch': 0.05288852725793328, 'num_input_tokens_seen': 5442109440, 'completed': '70.38% (2_595 / 3_687)', 'remaining time': '9:19:01', 'throughput': '8816.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:34:32,526 >> {'loss': 0.3183, 'grad_norm': 12.877067565917969, 'learning_rate': 2.9886170840897977e-07, 'epoch': 0.05315975047464063, 'num_input_tokens_seen': 5444206592, 'completed': '70.41% (2_596 / 3_687)', 'remaining time': '9:18:33', 'throughput': '8396.62', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:35:02,082 >> {'loss': 0.4371, 'grad_norm': 20.334400177001953, 'learning_rate': 2.9852683546264047e-07, 'epoch': 0.05343097369134798, 'num_input_tokens_seen': 5446303744, 'completed': '70.44% (2_597 / 3_687)', 'remaining time': '9:17:56', 'throughput': '8869.38', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:35:29,931 >> {'loss': 0.7361, 'grad_norm': 24.98883819580078, 'learning_rate': 2.9819216489233467e-07, 'epoch': 0.05370219690805533, 'num_input_tokens_seen': 5448400896, 'completed': '70.46% (2_598 / 3_687)', 'remaining time': '9:17:09', 'throughput': '9413.07', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:36:01,651 >> {'loss': 0.3259, 'grad_norm': 12.599200248718262, 'learning_rate': 2.978576969673926e-07, 'epoch': 0.05397342012476268, 'num_input_tokens_seen': 5450498048, 'completed': '70.49% (2_599 / 3_687)', 'remaining time': '9:16:44', 'throughput': '8264.43', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:36:31,987 >> {'loss': 0.463, 'grad_norm': 17.292268753051758, 'learning_rate': 2.9752343195698125e-07, 'epoch': 0.05424464334147003, 'num_input_tokens_seen': 5452595200, 'completed': '70.52% (2_600 / 3_687)', 'remaining time': '9:16:12', 'throughput': '8641.26', 'gpu_mem_free': '5581MB'} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( [INFO|trainer.py:3503] 2024-12-26 16:36:54,749 >> Saving model checkpoint to /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2600 [INFO|configuration_utils.py:472] 2024-12-26 16:36:54,752 >> Configuration saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2600/config.json [INFO|configuration_utils.py:807] 2024-12-26 16:36:54,754 >> Configuration saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2600/generation_config.json [INFO|modeling_utils.py:2807] 2024-12-26 16:37:51,845 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 7 checkpoint shards. You can find where each parameters has been saved in the index located at /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2600/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2684] 2024-12-26 16:37:51,849 >> tokenizer config file saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2600/tokenizer_config.json [INFO|tokenization_utils_base.py:2693] 2024-12-26 16:37:51,850 >> Special tokens file saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2600/special_tokens_map.json /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( 12/26/2024 16:41:29 - WARNING - streaming.base.dataset - Because `shuffle_block_size` was not specified, it will default to max(4_000_000 // num_canonical_nodes, 1 << 18) if num_canonical_nodes is not None, otherwise 262144. Prior to Streaming v0.7.0, `shuffle_block_size` defaulted to 262144. [WARNING|trainer.py:868] 2024-12-26 16:41:29,149 >> Save streaming dataset state: {'epoch': 0, 'sample_in_epoch': 5200, 'num_canonical_nodes': 1, 'shuffle_seed': 42, 'initial_physical_nodes': 1} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] [INFO|trainer.py:175] 2024-12-26 16:42:00,038 >> {'loss': 0.3436, 'grad_norm': 18.126787185668945, 'learning_rate': 2.9718937013010444e-07, 'epoch': 0.05451586655817738, 'num_input_tokens_seen': 5454692352, 'completed': '70.55% (2_601 / 3_687)', 'remaining time': '9:42:28', 'throughput': '799.09', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:42:29,817 >> {'loss': 0.192, 'grad_norm': 9.295369148254395, 'learning_rate': 2.9685551175560235e-07, 'epoch': 0.05478708977488473, 'num_input_tokens_seen': 5456789504, 'completed': '70.57% (2_602 / 3_687)', 'remaining time': '9:41:42', 'throughput': '8803.07', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:42:56,578 >> {'loss': 0.7479, 'grad_norm': 21.22579574584961, 'learning_rate': 2.965218571021516e-07, 'epoch': 0.05505831299159208, 'num_input_tokens_seen': 5458886656, 'completed': '70.60% (2_603 / 3_687)', 'remaining time': '9:40:41', 'throughput': '9795.63', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:43:25,695 >> {'loss': 0.4214, 'grad_norm': 18.786785125732422, 'learning_rate': 2.9618840643826464e-07, 'epoch': 0.05532953620829943, 'num_input_tokens_seen': 5460983808, 'completed': '70.63% (2_604 / 3_687)', 'remaining time': '9:39:53', 'throughput': '9003.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:43:58,467 >> {'loss': 0.5877, 'grad_norm': 20.445255279541016, 'learning_rate': 2.958551600322899e-07, 'epoch': 0.05560075942500678, 'num_input_tokens_seen': 5463080960, 'completed': '70.65% (2_605 / 3_687)', 'remaining time': '9:39:24', 'throughput': '7998.86', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:44:26,754 >> {'loss': 0.5026, 'grad_norm': 19.826480865478516, 'learning_rate': 2.9552211815241156e-07, 'epoch': 0.055871982641714134, 'num_input_tokens_seen': 5465178112, 'completed': '70.68% (2_606 / 3_687)', 'remaining time': '9:38:32', 'throughput': '9267.58', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:45:00,850 >> {'loss': 0.35, 'grad_norm': 14.695202827453613, 'learning_rate': 2.9518928106664897e-07, 'epoch': 0.05614320585842148, 'num_input_tokens_seen': 5467275264, 'completed': '70.71% (2_607 / 3_687)', 'remaining time': '9:38:10', 'throughput': '7688.43', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:45:29,284 >> {'loss': 0.4947, 'grad_norm': 16.892658233642578, 'learning_rate': 2.9485664904285643e-07, 'epoch': 0.05641442907512883, 'num_input_tokens_seen': 5469372416, 'completed': '70.74% (2_608 / 3_687)', 'remaining time': '9:37:19', 'throughput': '9219.29', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:45:59,155 >> {'loss': 0.5193, 'grad_norm': 19.319557189941406, 'learning_rate': 2.94524222348724e-07, 'epoch': 0.05668565229183618, 'num_input_tokens_seen': 5471469568, 'completed': '70.76% (2_609 / 3_687)', 'remaining time': '9:36:36', 'throughput': '8775.84', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:46:30,749 >> {'loss': 0.3985, 'grad_norm': 21.24896240234375, 'learning_rate': 2.9419200125177585e-07, 'epoch': 0.05695687550854353, 'num_input_tokens_seen': 5473566720, 'completed': '70.79% (2_610 / 3_687)', 'remaining time': '9:36:01', 'throughput': '8297.36', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:47:03,694 >> {'loss': 0.2865, 'grad_norm': 11.423199653625488, 'learning_rate': 2.938599860193709e-07, 'epoch': 0.05722809872525088, 'num_input_tokens_seen': 5475663872, 'completed': '70.82% (2_611 / 3_687)', 'remaining time': '9:35:33', 'throughput': '7956.94', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:47:34,914 >> {'loss': 0.2112, 'grad_norm': 11.481456756591797, 'learning_rate': 2.935281769187025e-07, 'epoch': 0.05749932194195823, 'num_input_tokens_seen': 5477761024, 'completed': '70.84% (2_612 / 3_687)', 'remaining time': '9:34:57', 'throughput': '8396.73', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:48:06,460 >> {'loss': 0.4367, 'grad_norm': 19.210725784301758, 'learning_rate': 2.9319657421679757e-07, 'epoch': 0.05777054515866558, 'num_input_tokens_seen': 5479858176, 'completed': '70.87% (2_613 / 3_687)', 'remaining time': '9:34:22', 'throughput': '8309.84', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:48:35,589 >> {'loss': 0.3199, 'grad_norm': 17.227354049682617, 'learning_rate': 2.928651781805177e-07, 'epoch': 0.058041768375372935, 'num_input_tokens_seen': 5481955328, 'completed': '70.90% (2_614 / 3_687)', 'remaining time': '9:33:35', 'throughput': '8999.48', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:49:05,415 >> {'loss': 0.2538, 'grad_norm': 11.751642227172852, 'learning_rate': 2.9253398907655775e-07, 'epoch': 0.05831299159208028, 'num_input_tokens_seen': 5484052480, 'completed': '70.92% (2_615 / 3_687)', 'remaining time': '9:32:52', 'throughput': '8789.04', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:49:34,465 >> {'loss': 0.2633, 'grad_norm': 11.568734169006348, 'learning_rate': 2.9220300717144597e-07, 'epoch': 0.05858421480878763, 'num_input_tokens_seen': 5486149632, 'completed': '70.95% (2_616 / 3_687)', 'remaining time': '9:32:05', 'throughput': '9023.87', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:50:05,200 >> {'loss': 0.2719, 'grad_norm': 12.480401992797852, 'learning_rate': 2.9187223273154406e-07, 'epoch': 0.058855438025494984, 'num_input_tokens_seen': 5488246784, 'completed': '70.98% (2_617 / 3_687)', 'remaining time': '9:31:26', 'throughput': '8529.15', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:50:40,364 >> {'loss': 0.5558, 'grad_norm': 21.550291061401367, 'learning_rate': 2.91541666023047e-07, 'epoch': 0.05912666124220233, 'num_input_tokens_seen': 5490343936, 'completed': '71.01% (2_618 / 3_687)', 'remaining time': '9:31:09', 'throughput': '7454.83', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:51:13,074 >> {'loss': 0.5337, 'grad_norm': 20.7867374420166, 'learning_rate': 2.9121130731198204e-07, 'epoch': 0.059397884458909686, 'num_input_tokens_seen': 5492441088, 'completed': '71.03% (2_619 / 3_687)', 'remaining time': '9:30:40', 'throughput': '8014.35', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:51:45,429 >> {'loss': 0.4042, 'grad_norm': 15.338828086853027, 'learning_rate': 2.9088115686420917e-07, 'epoch': 0.059669107675617034, 'num_input_tokens_seen': 5494538240, 'completed': '71.06% (2_620 / 3_687)', 'remaining time': '9:30:10', 'throughput': '8102.04', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:52:14,578 >> {'loss': 0.3717, 'grad_norm': 17.88315200805664, 'learning_rate': 2.9055121494542115e-07, 'epoch': 0.05994033089232438, 'num_input_tokens_seen': 5496635392, 'completed': '71.09% (2_621 / 3_687)', 'remaining time': '9:29:24', 'throughput': '8993.32', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:52:47,663 >> {'loss': 0.2657, 'grad_norm': 12.472522735595703, 'learning_rate': 2.9022148182114247e-07, 'epoch': 0.060211554109031735, 'num_input_tokens_seen': 5498732544, 'completed': '71.11% (2_622 / 3_687)', 'remaining time': '9:28:57', 'throughput': '7923.40', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:53:18,641 >> {'loss': 0.5611, 'grad_norm': 23.858972549438477, 'learning_rate': 2.8989195775673e-07, 'epoch': 0.06048277732573908, 'num_input_tokens_seen': 5500829696, 'completed': '71.14% (2_623 / 3_687)', 'remaining time': '9:28:19', 'throughput': '8462.14', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:53:50,116 >> {'loss': 0.3324, 'grad_norm': 12.803933143615723, 'learning_rate': 2.89562643017372e-07, 'epoch': 0.06075400054244643, 'num_input_tokens_seen': 5502926848, 'completed': '71.17% (2_624 / 3_687)', 'remaining time': '9:27:45', 'throughput': '8328.63', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:54:21,986 >> {'loss': 0.4797, 'grad_norm': 26.708951950073242, 'learning_rate': 2.8923353786808857e-07, 'epoch': 0.061025223759153785, 'num_input_tokens_seen': 5505024000, 'completed': '71.20% (2_625 / 3_687)', 'remaining time': '9:27:12', 'throughput': '8225.36', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:54:50,022 >> {'loss': 0.8212, 'grad_norm': 25.36354637145996, 'learning_rate': 2.8890464257373105e-07, 'epoch': 0.06129644697586113, 'num_input_tokens_seen': 5507121152, 'completed': '71.22% (2_626 / 3_687)', 'remaining time': '9:26:21', 'throughput': '9350.29', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:55:20,167 >> {'loss': 0.3743, 'grad_norm': 13.080056190490723, 'learning_rate': 2.8857595739898164e-07, 'epoch': 0.06156767019256849, 'num_input_tokens_seen': 5509218304, 'completed': '71.25% (2_627 / 3_687)', 'remaining time': '9:25:40', 'throughput': '8696.28', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:55:49,537 >> {'loss': 0.652, 'grad_norm': 19.789167404174805, 'learning_rate': 2.8824748260835386e-07, 'epoch': 0.061838893409275834, 'num_input_tokens_seen': 5511315456, 'completed': '71.28% (2_628 / 3_687)', 'remaining time': '9:24:56', 'throughput': '8925.38', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:56:17,742 >> {'loss': 0.6427, 'grad_norm': 28.163496017456055, 'learning_rate': 2.879192184661918e-07, 'epoch': 0.06211011662598318, 'num_input_tokens_seen': 5513412608, 'completed': '71.30% (2_629 / 3_687)', 'remaining time': '9:24:06', 'throughput': '9294.18', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:56:51,166 >> {'loss': 0.353, 'grad_norm': 15.175286293029785, 'learning_rate': 2.8759116523666973e-07, 'epoch': 0.062381339842690536, 'num_input_tokens_seen': 5515509760, 'completed': '71.33% (2_630 / 3_687)', 'remaining time': '9:23:41', 'throughput': '7842.98', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:57:24,083 >> {'loss': 0.5979, 'grad_norm': 18.326745986938477, 'learning_rate': 2.872633231837923e-07, 'epoch': 0.06265256305939788, 'num_input_tokens_seen': 5517606912, 'completed': '71.36% (2_631 / 3_687)', 'remaining time': '9:23:13', 'throughput': '7963.85', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:57:53,263 >> {'loss': 0.2892, 'grad_norm': 14.984691619873047, 'learning_rate': 2.869356925713946e-07, 'epoch': 0.06292378627610523, 'num_input_tokens_seen': 5519704064, 'completed': '71.39% (2_632 / 3_687)', 'remaining time': '9:22:28', 'throughput': '8983.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:58:21,394 >> {'loss': 0.6904, 'grad_norm': 20.763242721557617, 'learning_rate': 2.86608273663141e-07, 'epoch': 0.06319500949281258, 'num_input_tokens_seen': 5521801216, 'completed': '71.41% (2_633 / 3_687)', 'remaining time': '9:21:39', 'throughput': '9318.82', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:58:49,597 >> {'loss': 0.7099, 'grad_norm': 19.28316307067871, 'learning_rate': 2.8628106672252614e-07, 'epoch': 0.06346623270951994, 'num_input_tokens_seen': 5523898368, 'completed': '71.44% (2_634 / 3_687)', 'remaining time': '9:20:50', 'throughput': '9294.68', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:59:23,034 >> {'loss': 0.3436, 'grad_norm': 14.820695877075195, 'learning_rate': 2.859540720128737e-07, 'epoch': 0.06373745592622729, 'num_input_tokens_seen': 5525995520, 'completed': '71.47% (2_635 / 3_687)', 'remaining time': '9:20:24', 'throughput': '7839.99', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 16:59:51,914 >> {'loss': 0.397, 'grad_norm': 17.229780197143555, 'learning_rate': 2.856272897973362e-07, 'epoch': 0.06400867914293464, 'num_input_tokens_seen': 5528092672, 'completed': '71.49% (2_636 / 3_687)', 'remaining time': '9:19:39', 'throughput': '9076.93', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:00:21,251 >> {'loss': 0.5819, 'grad_norm': 19.926528930664062, 'learning_rate': 2.853007203388958e-07, 'epoch': 0.06427990235964198, 'num_input_tokens_seen': 5530189824, 'completed': '71.52% (2_637 / 3_687)', 'remaining time': '9:18:55', 'throughput': '8935.54', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:00:51,079 >> {'loss': 0.3037, 'grad_norm': 12.162901878356934, 'learning_rate': 2.849743639003631e-07, 'epoch': 0.06455112557634933, 'num_input_tokens_seen': 5532286976, 'completed': '71.55% (2_638 / 3_687)', 'remaining time': '9:18:14', 'throughput': '8788.51', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:01:23,451 >> {'loss': 0.4711, 'grad_norm': 20.921995162963867, 'learning_rate': 2.846482207443773e-07, 'epoch': 0.06482234879305669, 'num_input_tokens_seen': 5534384128, 'completed': '71.58% (2_639 / 3_687)', 'remaining time': '9:17:44', 'throughput': '8097.88', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:01:53,497 >> {'loss': 0.3534, 'grad_norm': 16.384029388427734, 'learning_rate': 2.843222911334061e-07, 'epoch': 0.06509357200976404, 'num_input_tokens_seen': 5536481280, 'completed': '71.60% (2_640 / 3_687)', 'remaining time': '9:17:04', 'throughput': '8724.79', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:02:23,231 >> {'loss': 0.6336, 'grad_norm': 22.67833137512207, 'learning_rate': 2.839965753297452e-07, 'epoch': 0.06536479522647139, 'num_input_tokens_seen': 5538578432, 'completed': '71.63% (2_641 / 3_687)', 'remaining time': '9:16:22', 'throughput': '8816.47', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:02:53,056 >> {'loss': 0.3167, 'grad_norm': 12.454771995544434, 'learning_rate': 2.8367107359551835e-07, 'epoch': 0.06563601844317873, 'num_input_tokens_seen': 5540675584, 'completed': '71.66% (2_642 / 3_687)', 'remaining time': '9:15:41', 'throughput': '8789.46', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:03:20,404 >> {'loss': 0.5943, 'grad_norm': 18.632858276367188, 'learning_rate': 2.8334578619267683e-07, 'epoch': 0.06590724165988608, 'num_input_tokens_seen': 5542772736, 'completed': '71.68% (2_643 / 3_687)', 'remaining time': '9:14:50', 'throughput': '9585.28', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:03:50,252 >> {'loss': 0.2241, 'grad_norm': 10.916572570800781, 'learning_rate': 2.830207133829997e-07, 'epoch': 0.06617846487659344, 'num_input_tokens_seen': 5544869888, 'completed': '71.71% (2_644 / 3_687)', 'remaining time': '9:14:09', 'throughput': '8782.64', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:04:21,666 >> {'loss': 0.5303, 'grad_norm': 20.393115997314453, 'learning_rate': 2.8269585542809305e-07, 'epoch': 0.06644968809330079, 'num_input_tokens_seen': 5546967040, 'completed': '71.74% (2_645 / 3_687)', 'remaining time': '9:13:36', 'throughput': '8344.71', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:04:53,173 >> {'loss': 0.4077, 'grad_norm': 16.837371826171875, 'learning_rate': 2.8237121258939036e-07, 'epoch': 0.06672091131000814, 'num_input_tokens_seen': 5549064192, 'completed': '71.77% (2_646 / 3_687)', 'remaining time': '9:13:02', 'throughput': '8320.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:05:23,438 >> {'loss': 0.518, 'grad_norm': 25.16261863708496, 'learning_rate': 2.8204678512815185e-07, 'epoch': 0.06699213452671549, 'num_input_tokens_seen': 5551161344, 'completed': '71.79% (2_647 / 3_687)', 'remaining time': '9:12:23', 'throughput': '8661.38', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:05:54,318 >> {'loss': 0.572, 'grad_norm': 21.067689895629883, 'learning_rate': 2.8172257330546414e-07, 'epoch': 0.06726335774342283, 'num_input_tokens_seen': 5553258496, 'completed': '71.82% (2_648 / 3_687)', 'remaining time': '9:11:47', 'throughput': '8489.31', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:06:26,262 >> {'loss': 0.3443, 'grad_norm': 16.49931526184082, 'learning_rate': 2.8139857738224055e-07, 'epoch': 0.06753458096013018, 'num_input_tokens_seen': 5555355648, 'completed': '71.85% (2_649 / 3_687)', 'remaining time': '9:11:16', 'throughput': '8206.20', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:06:56,983 >> {'loss': 0.3556, 'grad_norm': 15.077832221984863, 'learning_rate': 2.81074797619221e-07, 'epoch': 0.06780580417683754, 'num_input_tokens_seen': 5557452800, 'completed': '71.87% (2_650 / 3_687)', 'remaining time': '9:10:39', 'throughput': '8533.10', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:07:28,742 >> {'loss': 0.3608, 'grad_norm': 14.75187873840332, 'learning_rate': 2.8075123427697093e-07, 'epoch': 0.06807702739354489, 'num_input_tokens_seen': 5559549952, 'completed': '71.90% (2_651 / 3_687)', 'remaining time': '9:10:07', 'throughput': '8254.10', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:07:58,372 >> {'loss': 0.2241, 'grad_norm': 11.628194808959961, 'learning_rate': 2.8042788761588204e-07, 'epoch': 0.06834825061025224, 'num_input_tokens_seen': 5561647104, 'completed': '71.93% (2_652 / 3_687)', 'remaining time': '9:09:26', 'throughput': '8847.44', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:08:27,929 >> {'loss': 0.5842, 'grad_norm': 17.89313316345215, 'learning_rate': 2.8010475789617105e-07, 'epoch': 0.06861947382695958, 'num_input_tokens_seen': 5563744256, 'completed': '71.96% (2_653 / 3_687)', 'remaining time': '9:08:45', 'throughput': '8869.19', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:08:59,580 >> {'loss': 0.5279, 'grad_norm': 17.917509078979492, 'learning_rate': 2.797818453778806e-07, 'epoch': 0.06889069704366693, 'num_input_tokens_seen': 5565841408, 'completed': '71.98% (2_654 / 3_687)', 'remaining time': '9:08:12', 'throughput': '8282.30', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:09:30,158 >> {'loss': 0.3903, 'grad_norm': 14.405590057373047, 'learning_rate': 2.794591503208785e-07, 'epoch': 0.0691619202603743, 'num_input_tokens_seen': 5567938560, 'completed': '72.01% (2_655 / 3_687)', 'remaining time': '9:07:35', 'throughput': '8572.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:10:01,728 >> {'loss': 0.6449, 'grad_norm': 22.006885528564453, 'learning_rate': 2.791366729848574e-07, 'epoch': 0.06943314347708164, 'num_input_tokens_seen': 5570035712, 'completed': '72.04% (2_656 / 3_687)', 'remaining time': '9:07:02', 'throughput': '8303.77', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:10:33,415 >> {'loss': 0.6725, 'grad_norm': 23.890174865722656, 'learning_rate': 2.7881441362933464e-07, 'epoch': 0.06970436669378899, 'num_input_tokens_seen': 5572132864, 'completed': '72.06% (2_657 / 3_687)', 'remaining time': '9:06:30', 'throughput': '8272.78', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:11:07,508 >> {'loss': 0.5287, 'grad_norm': 19.865018844604492, 'learning_rate': 2.784923725136525e-07, 'epoch': 0.06997558991049634, 'num_input_tokens_seen': 5574230016, 'completed': '72.09% (2_658 / 3_687)', 'remaining time': '9:06:07', 'throughput': '7689.20', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:11:41,630 >> {'loss': 0.2702, 'grad_norm': 9.786247253417969, 'learning_rate': 2.781705498969773e-07, 'epoch': 0.07024681312720368, 'num_input_tokens_seen': 5576327168, 'completed': '72.12% (2_659 / 3_687)', 'remaining time': '9:05:44', 'throughput': '7682.37', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:12:15,097 >> {'loss': 0.3915, 'grad_norm': 13.455928802490234, 'learning_rate': 2.7784894603829966e-07, 'epoch': 0.07051803634391104, 'num_input_tokens_seen': 5578424320, 'completed': '72.15% (2_660 / 3_687)', 'remaining time': '9:05:19', 'throughput': '7833.11', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:12:45,687 >> {'loss': 0.4046, 'grad_norm': 15.833970069885254, 'learning_rate': 2.7752756119643416e-07, 'epoch': 0.07078925956061839, 'num_input_tokens_seen': 5580521472, 'completed': '72.17% (2_661 / 3_687)', 'remaining time': '9:04:42', 'throughput': '8569.60', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:13:16,213 >> {'loss': 0.233, 'grad_norm': 13.186774253845215, 'learning_rate': 2.77206395630019e-07, 'epoch': 0.07106048277732574, 'num_input_tokens_seen': 5582618624, 'completed': '72.20% (2_662 / 3_687)', 'remaining time': '9:04:05', 'throughput': '8587.43', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:13:46,136 >> {'loss': 0.3992, 'grad_norm': 16.846803665161133, 'learning_rate': 2.7688544959751615e-07, 'epoch': 0.07133170599403309, 'num_input_tokens_seen': 5584715776, 'completed': '72.23% (2_663 / 3_687)', 'remaining time': '9:03:25', 'throughput': '8760.67', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:14:18,342 >> {'loss': 0.454, 'grad_norm': 16.44211769104004, 'learning_rate': 2.765647233572108e-07, 'epoch': 0.07160292921074043, 'num_input_tokens_seen': 5586812928, 'completed': '72.25% (2_664 / 3_687)', 'remaining time': '9:02:55', 'throughput': '8139.54', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:14:45,410 >> {'loss': 0.6375, 'grad_norm': 21.622333526611328, 'learning_rate': 2.7624421716721086e-07, 'epoch': 0.0718741524274478, 'num_input_tokens_seen': 5588910080, 'completed': '72.28% (2_665 / 3_687)', 'remaining time': '9:02:05', 'throughput': '9684.69', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:15:14,628 >> {'loss': 0.3815, 'grad_norm': 13.752047538757324, 'learning_rate': 2.7592393128544784e-07, 'epoch': 0.07214537564415514, 'num_input_tokens_seen': 5591007232, 'completed': '72.31% (2_666 / 3_687)', 'remaining time': '9:01:23', 'throughput': '8971.92', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:15:44,049 >> {'loss': 0.3299, 'grad_norm': 15.634025573730469, 'learning_rate': 2.7560386596967553e-07, 'epoch': 0.07241659886086249, 'num_input_tokens_seen': 5593104384, 'completed': '72.34% (2_667 / 3_687)', 'remaining time': '9:00:42', 'throughput': '8910.02', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:16:14,338 >> {'loss': 0.8418, 'grad_norm': 22.741350173950195, 'learning_rate': 2.7528402147747045e-07, 'epoch': 0.07268782207756984, 'num_input_tokens_seen': 5595201536, 'completed': '72.36% (2_668 / 3_687)', 'remaining time': '9:00:04', 'throughput': '8654.78', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:16:41,645 >> {'loss': 0.5166, 'grad_norm': 18.136112213134766, 'learning_rate': 2.7496439806623124e-07, 'epoch': 0.07295904529427719, 'num_input_tokens_seen': 5597298688, 'completed': '72.39% (2_669 / 3_687)', 'remaining time': '8:59:16', 'throughput': '9600.01', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:17:12,897 >> {'loss': 0.2879, 'grad_norm': 13.299650192260742, 'learning_rate': 2.746449959931784e-07, 'epoch': 0.07323026851098453, 'num_input_tokens_seen': 5599395840, 'completed': '72.42% (2_670 / 3_687)', 'remaining time': '8:58:42', 'throughput': '8388.13', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:17:47,022 >> {'loss': 0.2716, 'grad_norm': 12.364994049072266, 'learning_rate': 2.743258155153548e-07, 'epoch': 0.0735014917276919, 'num_input_tokens_seen': 5601492992, 'completed': '72.44% (2_671 / 3_687)', 'remaining time': '8:58:19', 'throughput': '7681.95', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:18:18,421 >> {'loss': 0.6404, 'grad_norm': 22.236961364746094, 'learning_rate': 2.740068568896247e-07, 'epoch': 0.07377271494439924, 'num_input_tokens_seen': 5603590144, 'completed': '72.47% (2_672 / 3_687)', 'remaining time': '8:57:46', 'throughput': '8348.64', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:18:49,192 >> {'loss': 0.646, 'grad_norm': 18.271963119506836, 'learning_rate': 2.7368812037267387e-07, 'epoch': 0.07404393816110659, 'num_input_tokens_seen': 5605687296, 'completed': '72.50% (2_673 / 3_687)', 'remaining time': '8:57:10', 'throughput': '8519.30', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:19:17,777 >> {'loss': 0.7859, 'grad_norm': 22.82843017578125, 'learning_rate': 2.7336960622100907e-07, 'epoch': 0.07431516137781394, 'num_input_tokens_seen': 5607784448, 'completed': '72.53% (2_674 / 3_687)', 'remaining time': '8:56:26', 'throughput': '9170.45', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:19:47,968 >> {'loss': 0.1964, 'grad_norm': 11.289409637451172, 'learning_rate': 2.7305131469095906e-07, 'epoch': 0.07458638459452128, 'num_input_tokens_seen': 5609881600, 'completed': '72.55% (2_675 / 3_687)', 'remaining time': '8:55:49', 'throughput': '8683.03', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:20:17,867 >> {'loss': 0.5049, 'grad_norm': 20.22431182861328, 'learning_rate': 2.7273324603867203e-07, 'epoch': 0.07485760781122865, 'num_input_tokens_seen': 5611978752, 'completed': '72.58% (2_676 / 3_687)', 'remaining time': '8:55:10', 'throughput': '8767.54', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:20:47,863 >> {'loss': 0.6532, 'grad_norm': 21.272485733032227, 'learning_rate': 2.7241540052011787e-07, 'epoch': 0.075128831027936, 'num_input_tokens_seen': 5614075904, 'completed': '72.61% (2_677 / 3_687)', 'remaining time': '8:54:32', 'throughput': '8739.36', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:21:16,900 >> {'loss': 0.2299, 'grad_norm': 10.40255069732666, 'learning_rate': 2.720977783910865e-07, 'epoch': 0.07540005424464334, 'num_input_tokens_seen': 5616173056, 'completed': '72.63% (2_678 / 3_687)', 'remaining time': '8:53:50', 'throughput': '9027.92', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:21:46,174 >> {'loss': 0.4719, 'grad_norm': 18.323423385620117, 'learning_rate': 2.717803799071881e-07, 'epoch': 0.07567127746135069, 'num_input_tokens_seen': 5618270208, 'completed': '72.66% (2_679 / 3_687)', 'remaining time': '8:53:10', 'throughput': '8954.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:22:16,738 >> {'loss': 0.5637, 'grad_norm': 16.696556091308594, 'learning_rate': 2.714632053238529e-07, 'epoch': 0.07594250067805804, 'num_input_tokens_seen': 5620367360, 'completed': '72.69% (2_680 / 3_687)', 'remaining time': '8:52:34', 'throughput': '8577.07', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:22:47,589 >> {'loss': 0.3831, 'grad_norm': 17.217605590820312, 'learning_rate': 2.711462548963309e-07, 'epoch': 0.0762137238947654, 'num_input_tokens_seen': 5622464512, 'completed': '72.71% (2_681 / 3_687)', 'remaining time': '8:51:59', 'throughput': '8497.06', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:23:16,983 >> {'loss': 0.4221, 'grad_norm': 17.2362003326416, 'learning_rate': 2.708295288796918e-07, 'epoch': 0.07648494711147275, 'num_input_tokens_seen': 5624561664, 'completed': '72.74% (2_682 / 3_687)', 'remaining time': '8:51:19', 'throughput': '8918.24', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:23:46,342 >> {'loss': 0.5495, 'grad_norm': 19.473421096801758, 'learning_rate': 2.7051302752882467e-07, 'epoch': 0.07675617032818009, 'num_input_tokens_seen': 5626658816, 'completed': '72.77% (2_683 / 3_687)', 'remaining time': '8:50:39', 'throughput': '8928.96', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:24:18,509 >> {'loss': 0.4145, 'grad_norm': 18.977439880371094, 'learning_rate': 2.7019675109843777e-07, 'epoch': 0.07702739354488744, 'num_input_tokens_seen': 5628755968, 'completed': '72.80% (2_684 / 3_687)', 'remaining time': '8:50:09', 'throughput': '8149.52', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:24:49,105 >> {'loss': 0.5008, 'grad_norm': 22.364627838134766, 'learning_rate': 2.6988069984305817e-07, 'epoch': 0.07729861676159479, 'num_input_tokens_seen': 5630853120, 'completed': '72.82% (2_685 / 3_687)', 'remaining time': '8:49:33', 'throughput': '8567.95', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:25:19,892 >> {'loss': 0.5733, 'grad_norm': 19.7006893157959, 'learning_rate': 2.6956487401703207e-07, 'epoch': 0.07756983997830215, 'num_input_tokens_seen': 5632950272, 'completed': '72.85% (2_686 / 3_687)', 'remaining time': '8:48:58', 'throughput': '8514.69', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:25:50,997 >> {'loss': 0.662, 'grad_norm': 22.176889419555664, 'learning_rate': 2.692492738745241e-07, 'epoch': 0.0778410631950095, 'num_input_tokens_seen': 5635047424, 'completed': '72.88% (2_687 / 3_687)', 'remaining time': '8:48:24', 'throughput': '8427.58', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:26:17,416 >> {'loss': 0.7364, 'grad_norm': 22.010507583618164, 'learning_rate': 2.689338996695171e-07, 'epoch': 0.07811228641171684, 'num_input_tokens_seen': 5637144576, 'completed': '72.90% (2_688 / 3_687)', 'remaining time': '8:47:34', 'throughput': '9922.55', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:26:47,422 >> {'loss': 0.3851, 'grad_norm': 13.696541786193848, 'learning_rate': 2.686187516558122e-07, 'epoch': 0.07838350962842419, 'num_input_tokens_seen': 5639241728, 'completed': '72.93% (2_689 / 3_687)', 'remaining time': '8:46:57', 'throughput': '8736.51', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:27:21,578 >> {'loss': 0.571, 'grad_norm': 27.36061668395996, 'learning_rate': 2.683038300870285e-07, 'epoch': 0.07865473284513154, 'num_input_tokens_seen': 5641338880, 'completed': '72.96% (2_690 / 3_687)', 'remaining time': '8:46:33', 'throughput': '7674.77', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:27:51,715 >> {'loss': 0.3617, 'grad_norm': 14.405625343322754, 'learning_rate': 2.679891352166032e-07, 'epoch': 0.07892595606183889, 'num_input_tokens_seen': 5643436032, 'completed': '72.99% (2_691 / 3_687)', 'remaining time': '8:45:56', 'throughput': '8698.40', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:28:22,486 >> {'loss': 0.552, 'grad_norm': 19.312789916992188, 'learning_rate': 2.6767466729779074e-07, 'epoch': 0.07919717927854625, 'num_input_tokens_seen': 5645533184, 'completed': '73.01% (2_692 / 3_687)', 'remaining time': '8:45:22', 'throughput': '8519.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:28:54,365 >> {'loss': 0.5278, 'grad_norm': 23.497774124145508, 'learning_rate': 2.6736042658366255e-07, 'epoch': 0.0794684024952536, 'num_input_tokens_seen': 5647630336, 'completed': '73.04% (2_693 / 3_687)', 'remaining time': '8:44:51', 'throughput': '8223.10', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:29:25,986 >> {'loss': 0.5603, 'grad_norm': 18.922243118286133, 'learning_rate': 2.6704641332710783e-07, 'epoch': 0.07973962571196094, 'num_input_tokens_seen': 5649727488, 'completed': '73.07% (2_694 / 3_687)', 'remaining time': '8:44:19', 'throughput': '8290.15', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:30:01,160 >> {'loss': 0.4909, 'grad_norm': 18.824480056762695, 'learning_rate': 2.6673262778083246e-07, 'epoch': 0.08001084892866829, 'num_input_tokens_seen': 5651824640, 'completed': '73.09% (2_695 / 3_687)', 'remaining time': '8:43:59', 'throughput': '7452.89', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:30:29,631 >> {'loss': 0.4259, 'grad_norm': 18.664011001586914, 'learning_rate': 2.6641907019735914e-07, 'epoch': 0.08028207214537564, 'num_input_tokens_seen': 5653921792, 'completed': '73.12% (2_696 / 3_687)', 'remaining time': '8:43:16', 'throughput': '9207.36', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:30:59,482 >> {'loss': 0.2075, 'grad_norm': 12.166492462158203, 'learning_rate': 2.6610574082902704e-07, 'epoch': 0.080553295362083, 'num_input_tokens_seen': 5656018944, 'completed': '73.15% (2_697 / 3_687)', 'remaining time': '8:42:39', 'throughput': '8781.70', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:31:29,714 >> {'loss': 0.5047, 'grad_norm': 17.41253662109375, 'learning_rate': 2.657926399279918e-07, 'epoch': 0.08082451857879035, 'num_input_tokens_seen': 5658116096, 'completed': '73.18% (2_698 / 3_687)', 'remaining time': '8:42:02', 'throughput': '8671.25', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:32:00,251 >> {'loss': 0.2702, 'grad_norm': 12.47386646270752, 'learning_rate': 2.65479767746225e-07, 'epoch': 0.0810957417954977, 'num_input_tokens_seen': 5660213248, 'completed': '73.20% (2_699 / 3_687)', 'remaining time': '8:41:27', 'throughput': '8584.30', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:32:32,871 >> {'loss': 0.2429, 'grad_norm': 15.2111234664917, 'learning_rate': 2.651671245355144e-07, 'epoch': 0.08136696501220504, 'num_input_tokens_seen': 5662310400, 'completed': '73.23% (2_700 / 3_687)', 'remaining time': '8:40:58', 'throughput': '8036.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:33:06,060 >> {'loss': 0.4939, 'grad_norm': 20.175670623779297, 'learning_rate': 2.6485471054746315e-07, 'epoch': 0.08163818822891239, 'num_input_tokens_seen': 5664407552, 'completed': '73.26% (2_701 / 3_687)', 'remaining time': '8:40:31', 'throughput': '7898.41', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:33:36,667 >> {'loss': 0.4021, 'grad_norm': 15.249000549316406, 'learning_rate': 2.645425260334904e-07, 'epoch': 0.08190941144561975, 'num_input_tokens_seen': 5666504704, 'completed': '73.28% (2_702 / 3_687)', 'remaining time': '8:39:56', 'throughput': '8565.01', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:34:08,792 >> {'loss': 0.7402, 'grad_norm': 23.084077835083008, 'learning_rate': 2.6423057124483015e-07, 'epoch': 0.0821806346623271, 'num_input_tokens_seen': 5668601856, 'completed': '73.31% (2_703 / 3_687)', 'remaining time': '8:39:26', 'throughput': '8159.97', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:34:38,839 >> {'loss': 0.2521, 'grad_norm': 11.628067016601562, 'learning_rate': 2.6391884643253197e-07, 'epoch': 0.08245185787903445, 'num_input_tokens_seen': 5670699008, 'completed': '73.34% (2_704 / 3_687)', 'remaining time': '8:38:49', 'throughput': '8724.45', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:35:13,385 >> {'loss': 0.3268, 'grad_norm': 14.505975723266602, 'learning_rate': 2.6360735184745984e-07, 'epoch': 0.0827230810957418, 'num_input_tokens_seen': 5672796160, 'completed': '73.37% (2_705 / 3_687)', 'remaining time': '8:38:27', 'throughput': '7588.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:35:42,883 >> {'loss': 0.4597, 'grad_norm': 16.887388229370117, 'learning_rate': 2.6329608774029285e-07, 'epoch': 0.08299430431244914, 'num_input_tokens_seen': 5674893312, 'completed': '73.39% (2_706 / 3_687)', 'remaining time': '8:37:48', 'throughput': '8886.95', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:36:13,210 >> {'loss': 0.6038, 'grad_norm': 24.89675521850586, 'learning_rate': 2.6298505436152457e-07, 'epoch': 0.0832655275291565, 'num_input_tokens_seen': 5676990464, 'completed': '73.42% (2_707 / 3_687)', 'remaining time': '8:37:12', 'throughput': '8643.87', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:36:44,602 >> {'loss': 0.3616, 'grad_norm': 16.737106323242188, 'learning_rate': 2.626742519614629e-07, 'epoch': 0.08353675074586385, 'num_input_tokens_seen': 5679087616, 'completed': '73.45% (2_708 / 3_687)', 'remaining time': '8:36:40', 'throughput': '8350.71', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:37:15,327 >> {'loss': 0.4107, 'grad_norm': 15.908254623413086, 'learning_rate': 2.623636807902298e-07, 'epoch': 0.0838079739625712, 'num_input_tokens_seen': 5681184768, 'completed': '73.47% (2_709 / 3_687)', 'remaining time': '8:36:05', 'throughput': '8532.08', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:37:46,392 >> {'loss': 0.2693, 'grad_norm': 12.272135734558105, 'learning_rate': 2.620533410977609e-07, 'epoch': 0.08407919717927854, 'num_input_tokens_seen': 5683281920, 'completed': '73.50% (2_710 / 3_687)', 'remaining time': '8:35:31', 'throughput': '8438.48', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:38:17,307 >> {'loss': 0.5119, 'grad_norm': 18.592920303344727, 'learning_rate': 2.617432331338059e-07, 'epoch': 0.08435042039598589, 'num_input_tokens_seen': 5685379072, 'completed': '73.53% (2_711 / 3_687)', 'remaining time': '8:34:57', 'throughput': '8479.36', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:38:46,892 >> {'loss': 0.6332, 'grad_norm': 21.81186294555664, 'learning_rate': 2.614333571479279e-07, 'epoch': 0.08462164361269324, 'num_input_tokens_seen': 5687476224, 'completed': '73.56% (2_712 / 3_687)', 'remaining time': '8:34:19', 'throughput': '8860.75', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:39:18,701 >> {'loss': 0.2644, 'grad_norm': 13.03216552734375, 'learning_rate': 2.6112371338950325e-07, 'epoch': 0.0848928668294006, 'num_input_tokens_seen': 5689573376, 'completed': '73.58% (2_713 / 3_687)', 'remaining time': '8:33:48', 'throughput': '8241.34', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:39:49,431 >> {'loss': 0.5831, 'grad_norm': 21.6326904296875, 'learning_rate': 2.608143021077215e-07, 'epoch': 0.08516409004610795, 'num_input_tokens_seen': 5691670528, 'completed': '73.61% (2_714 / 3_687)', 'remaining time': '8:33:14', 'throughput': '8530.46', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:40:20,825 >> {'loss': 0.3289, 'grad_norm': 13.137306213378906, 'learning_rate': 2.6050512355158503e-07, 'epoch': 0.0854353132628153, 'num_input_tokens_seen': 5693767680, 'completed': '73.64% (2_715 / 3_687)', 'remaining time': '8:32:41', 'throughput': '8350.03', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:40:50,522 >> {'loss': 0.687, 'grad_norm': 20.455364227294922, 'learning_rate': 2.601961779699091e-07, 'epoch': 0.08570653647952264, 'num_input_tokens_seen': 5695864832, 'completed': '73.66% (2_716 / 3_687)', 'remaining time': '8:32:04', 'throughput': '8827.48', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:41:20,890 >> {'loss': 0.3623, 'grad_norm': 14.189478874206543, 'learning_rate': 2.5988746561132136e-07, 'epoch': 0.08597775969622999, 'num_input_tokens_seen': 5697961984, 'completed': '73.69% (2_717 / 3_687)', 'remaining time': '8:31:28', 'throughput': '8632.26', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:41:54,548 >> {'loss': 0.3914, 'grad_norm': 14.868339538574219, 'learning_rate': 2.595789867242616e-07, 'epoch': 0.08624898291293735, 'num_input_tokens_seen': 5700059136, 'completed': '73.72% (2_718 / 3_687)', 'remaining time': '8:31:03', 'throughput': '7788.38', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:42:26,035 >> {'loss': 0.3688, 'grad_norm': 16.712444305419922, 'learning_rate': 2.5927074155698203e-07, 'epoch': 0.0865202061296447, 'num_input_tokens_seen': 5702156288, 'completed': '73.75% (2_719 / 3_687)', 'remaining time': '8:30:30', 'throughput': '8325.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:42:54,459 >> {'loss': 0.7762, 'grad_norm': 23.333152770996094, 'learning_rate': 2.589627303575465e-07, 'epoch': 0.08679142934635205, 'num_input_tokens_seen': 5704253440, 'completed': '73.77% (2_720 / 3_687)', 'remaining time': '8:29:49', 'throughput': '9222.63', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:43:23,589 >> {'loss': 0.4798, 'grad_norm': 21.890504837036133, 'learning_rate': 2.5865495337383103e-07, 'epoch': 0.0870626525630594, 'num_input_tokens_seen': 5706350592, 'completed': '73.80% (2_721 / 3_687)', 'remaining time': '8:29:10', 'throughput': '8999.15', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:43:55,720 >> {'loss': 0.4479, 'grad_norm': 17.646825790405273, 'learning_rate': 2.5834741085352223e-07, 'epoch': 0.08733387577976674, 'num_input_tokens_seen': 5708447744, 'completed': '73.83% (2_722 / 3_687)', 'remaining time': '8:28:40', 'throughput': '8158.53', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:44:21,440 >> {'loss': 0.6757, 'grad_norm': 20.30681800842285, 'learning_rate': 2.5804010304411914e-07, 'epoch': 0.0876050989964741, 'num_input_tokens_seen': 5710544896, 'completed': '73.85% (2_723 / 3_687)', 'remaining time': '8:27:50', 'throughput': '10192.43', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:44:50,241 >> {'loss': 0.2059, 'grad_norm': 9.638075828552246, 'learning_rate': 2.5773303019293123e-07, 'epoch': 0.08787632221318145, 'num_input_tokens_seen': 5712642048, 'completed': '73.88% (2_724 / 3_687)', 'remaining time': '8:27:11', 'throughput': '9101.85', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:45:20,378 >> {'loss': 0.1925, 'grad_norm': 11.49023723602295, 'learning_rate': 2.5742619254707905e-07, 'epoch': 0.0881475454298888, 'num_input_tokens_seen': 5714739200, 'completed': '73.91% (2_725 / 3_687)', 'remaining time': '8:26:35', 'throughput': '8698.52', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:45:49,855 >> {'loss': 0.4855, 'grad_norm': 18.779539108276367, 'learning_rate': 2.5711959035349396e-07, 'epoch': 0.08841876864659615, 'num_input_tokens_seen': 5716836352, 'completed': '73.94% (2_726 / 3_687)', 'remaining time': '8:25:57', 'throughput': '8893.03', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:46:21,969 >> {'loss': 0.2486, 'grad_norm': 11.857897758483887, 'learning_rate': 2.568132238589175e-07, 'epoch': 0.0886899918633035, 'num_input_tokens_seen': 5718933504, 'completed': '73.96% (2_727 / 3_687)', 'remaining time': '8:25:27', 'throughput': '8162.90', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:46:51,592 >> {'loss': 0.4293, 'grad_norm': 20.438987731933594, 'learning_rate': 2.565070933099019e-07, 'epoch': 0.08896121508001086, 'num_input_tokens_seen': 5721030656, 'completed': '73.99% (2_728 / 3_687)', 'remaining time': '8:24:49', 'throughput': '8849.54', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:47:20,518 >> {'loss': 0.3677, 'grad_norm': 15.0324125289917, 'learning_rate': 2.5620119895280935e-07, 'epoch': 0.0892324382967182, 'num_input_tokens_seen': 5723127808, 'completed': '74.02% (2_729 / 3_687)', 'remaining time': '8:24:10', 'throughput': '9062.31', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:47:53,774 >> {'loss': 0.3675, 'grad_norm': 15.559118270874023, 'learning_rate': 2.5589554103381195e-07, 'epoch': 0.08950366151342555, 'num_input_tokens_seen': 5725224960, 'completed': '74.04% (2_730 / 3_687)', 'remaining time': '8:23:43', 'throughput': '7882.80', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:48:27,155 >> {'loss': 0.425, 'grad_norm': 19.351423263549805, 'learning_rate': 2.5559011979889155e-07, 'epoch': 0.0897748847301329, 'num_input_tokens_seen': 5727322112, 'completed': '74.07% (2_731 / 3_687)', 'remaining time': '8:23:17', 'throughput': '7852.93', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:48:56,957 >> {'loss': 0.3441, 'grad_norm': 13.77000904083252, 'learning_rate': 2.552849354938399e-07, 'epoch': 0.09004610794684025, 'num_input_tokens_seen': 5729419264, 'completed': '74.10% (2_732 / 3_687)', 'remaining time': '8:22:40', 'throughput': '8796.31', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:49:26,885 >> {'loss': 0.3423, 'grad_norm': 13.648260116577148, 'learning_rate': 2.549799883642573e-07, 'epoch': 0.09031733116354759, 'num_input_tokens_seen': 5731516416, 'completed': '74.13% (2_733 / 3_687)', 'remaining time': '8:22:04', 'throughput': '8759.16', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:49:57,965 >> {'loss': 0.2816, 'grad_norm': 12.919384002685547, 'learning_rate': 2.5467527865555366e-07, 'epoch': 0.09058855438025495, 'num_input_tokens_seen': 5733613568, 'completed': '74.15% (2_734 / 3_687)', 'remaining time': '8:21:31', 'throughput': '8434.34', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:50:25,769 >> {'loss': 0.6129, 'grad_norm': 20.780710220336914, 'learning_rate': 2.5437080661294785e-07, 'epoch': 0.0908597775969623, 'num_input_tokens_seen': 5735710720, 'completed': '74.18% (2_735 / 3_687)', 'remaining time': '8:20:49', 'throughput': '9428.36', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:50:56,304 >> {'loss': 0.5003, 'grad_norm': 22.01446533203125, 'learning_rate': 2.5406657248146735e-07, 'epoch': 0.09113100081366965, 'num_input_tokens_seen': 5737807872, 'completed': '74.21% (2_736 / 3_687)', 'remaining time': '8:20:14', 'throughput': '8585.18', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:51:28,603 >> {'loss': 0.4947, 'grad_norm': 19.538734436035156, 'learning_rate': 2.5376257650594823e-07, 'epoch': 0.091402224030377, 'num_input_tokens_seen': 5739905024, 'completed': '74.23% (2_737 / 3_687)', 'remaining time': '8:19:45', 'throughput': '8115.99', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:51:59,693 >> {'loss': 0.2547, 'grad_norm': 13.193866729736328, 'learning_rate': 2.5345881893103484e-07, 'epoch': 0.09167344724708434, 'num_input_tokens_seen': 5742002176, 'completed': '74.26% (2_738 / 3_687)', 'remaining time': '8:19:12', 'throughput': '8431.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:52:29,821 >> {'loss': 0.7383, 'grad_norm': 23.290098190307617, 'learning_rate': 2.5315530000117973e-07, 'epoch': 0.0919446704637917, 'num_input_tokens_seen': 5744099328, 'completed': '74.29% (2_739 / 3_687)', 'remaining time': '8:18:36', 'throughput': '8701.24', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:53:04,384 >> {'loss': 0.7584, 'grad_norm': 22.168949127197266, 'learning_rate': 2.528520199606434e-07, 'epoch': 0.09221589368049905, 'num_input_tokens_seen': 5746196480, 'completed': '74.32% (2_740 / 3_687)', 'remaining time': '8:18:13', 'throughput': '7584.46', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:53:36,716 >> {'loss': 0.6092, 'grad_norm': 20.419174194335938, 'learning_rate': 2.525489790534941e-07, 'epoch': 0.0924871168972064, 'num_input_tokens_seen': 5748293632, 'completed': '74.34% (2_741 / 3_687)', 'remaining time': '8:17:44', 'throughput': '8107.87', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:54:06,721 >> {'loss': 0.6697, 'grad_norm': 23.462411880493164, 'learning_rate': 2.5224617752360766e-07, 'epoch': 0.09275834011391375, 'num_input_tokens_seen': 5750390784, 'completed': '74.37% (2_742 / 3_687)', 'remaining time': '8:17:08', 'throughput': '8736.63', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:54:37,346 >> {'loss': 0.4207, 'grad_norm': 15.432395935058594, 'learning_rate': 2.519436156146671e-07, 'epoch': 0.0930295633306211, 'num_input_tokens_seen': 5752487936, 'completed': '74.40% (2_743 / 3_687)', 'remaining time': '8:16:34', 'throughput': '8559.85', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:55:06,918 >> {'loss': 0.5068, 'grad_norm': 17.80773162841797, 'learning_rate': 2.51641293570163e-07, 'epoch': 0.09330078654732846, 'num_input_tokens_seen': 5754585088, 'completed': '74.42% (2_744 / 3_687)', 'remaining time': '8:15:57', 'throughput': '8864.56', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:55:39,912 >> {'loss': 0.2518, 'grad_norm': 13.011022567749023, 'learning_rate': 2.513392116333922e-07, 'epoch': 0.0935720097640358, 'num_input_tokens_seen': 5756682240, 'completed': '74.45% (2_745 / 3_687)', 'remaining time': '8:15:29', 'throughput': '7945.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:56:12,662 >> {'loss': 0.4552, 'grad_norm': 16.054059982299805, 'learning_rate': 2.510373700474592e-07, 'epoch': 0.09384323298074315, 'num_input_tokens_seen': 5758779392, 'completed': '74.48% (2_746 / 3_687)', 'remaining time': '8:15:01', 'throughput': '8004.46', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:56:46,456 >> {'loss': 0.2891, 'grad_norm': 13.27901840209961, 'learning_rate': 2.5073576905527407e-07, 'epoch': 0.0941144561974505, 'num_input_tokens_seen': 5760876544, 'completed': '74.51% (2_747 / 3_687)', 'remaining time': '8:14:35', 'throughput': '7757.00', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:57:19,158 >> {'loss': 0.5392, 'grad_norm': 18.663637161254883, 'learning_rate': 2.5043440889955434e-07, 'epoch': 0.09438567941415785, 'num_input_tokens_seen': 5762973696, 'completed': '74.53% (2_748 / 3_687)', 'remaining time': '8:14:07', 'throughput': '8016.29', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:57:51,842 >> {'loss': 0.4573, 'grad_norm': 17.319080352783203, 'learning_rate': 2.501332898228232e-07, 'epoch': 0.09465690263086521, 'num_input_tokens_seen': 5765070848, 'completed': '74.56% (2_749 / 3_687)', 'remaining time': '8:13:38', 'throughput': '8020.58', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:58:23,418 >> {'loss': 0.3974, 'grad_norm': 17.932674407958984, 'learning_rate': 2.4983241206740945e-07, 'epoch': 0.09492812584757256, 'num_input_tokens_seen': 5767168000, 'completed': '74.59% (2_750 / 3_687)', 'remaining time': '8:13:06', 'throughput': '8301.81', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:58:55,031 >> {'loss': 0.5203, 'grad_norm': 16.8289737701416, 'learning_rate': 2.495317758754483e-07, 'epoch': 0.0951993490642799, 'num_input_tokens_seen': 5769265152, 'completed': '74.61% (2_751 / 3_687)', 'remaining time': '8:12:35', 'throughput': '8292.30', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:59:24,619 >> {'loss': 0.4871, 'grad_norm': 18.582918167114258, 'learning_rate': 2.492313814888802e-07, 'epoch': 0.09547057228098725, 'num_input_tokens_seen': 5771362304, 'completed': '74.64% (2_752 / 3_687)', 'remaining time': '8:11:58', 'throughput': '8859.98', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 17:59:55,997 >> {'loss': 0.3114, 'grad_norm': 13.084208488464355, 'learning_rate': 2.4893122914945124e-07, 'epoch': 0.0957417954976946, 'num_input_tokens_seen': 5773459456, 'completed': '74.67% (2_753 / 3_687)', 'remaining time': '8:11:26', 'throughput': '8354.37', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:00:23,545 >> {'loss': 0.4209, 'grad_norm': 23.180992126464844, 'learning_rate': 2.4863131909871247e-07, 'epoch': 0.09601301871440195, 'num_input_tokens_seen': 5775556608, 'completed': '74.69% (2_754 / 3_687)', 'remaining time': '8:10:44', 'throughput': '9515.89', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:00:53,515 >> {'loss': 0.5382, 'grad_norm': 19.62271499633789, 'learning_rate': 2.483316515780202e-07, 'epoch': 0.09628424193110931, 'num_input_tokens_seen': 5777653760, 'completed': '74.72% (2_755 / 3_687)', 'remaining time': '8:10:08', 'throughput': '8746.89', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:01:23,989 >> {'loss': 0.2602, 'grad_norm': 11.799657821655273, 'learning_rate': 2.480322268285354e-07, 'epoch': 0.09655546514781665, 'num_input_tokens_seen': 5779750912, 'completed': '74.75% (2_756 / 3_687)', 'remaining time': '8:09:34', 'throughput': '8602.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:01:55,289 >> {'loss': 0.3326, 'grad_norm': 14.03824234008789, 'learning_rate': 2.4773304509122374e-07, 'epoch': 0.096826688364524, 'num_input_tokens_seen': 5781848064, 'completed': '74.78% (2_757 / 3_687)', 'remaining time': '8:09:01', 'throughput': '8375.25', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:02:25,775 >> {'loss': 0.4274, 'grad_norm': 18.534460067749023, 'learning_rate': 2.474341066068553e-07, 'epoch': 0.09709791158123135, 'num_input_tokens_seen': 5783945216, 'completed': '74.80% (2_758 / 3_687)', 'remaining time': '8:08:27', 'throughput': '8598.64', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:02:56,914 >> {'loss': 0.681, 'grad_norm': 22.347103118896484, 'learning_rate': 2.4713541161600434e-07, 'epoch': 0.0973691347979387, 'num_input_tokens_seen': 5786042368, 'completed': '74.83% (2_759 / 3_687)', 'remaining time': '8:07:55', 'throughput': '8418.54', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:03:28,715 >> {'loss': 0.4139, 'grad_norm': 17.200597763061523, 'learning_rate': 2.4683696035904926e-07, 'epoch': 0.09764035801464606, 'num_input_tokens_seen': 5788139520, 'completed': '74.86% (2_760 / 3_687)', 'remaining time': '8:07:24', 'throughput': '8243.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:03:58,169 >> {'loss': 0.2853, 'grad_norm': 24.39967918395996, 'learning_rate': 2.465387530761724e-07, 'epoch': 0.0979115812313534, 'num_input_tokens_seen': 5790236672, 'completed': '74.88% (2_761 / 3_687)', 'remaining time': '8:06:47', 'throughput': '8900.24', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:04:29,305 >> {'loss': 0.3234, 'grad_norm': 15.185997009277344, 'learning_rate': 2.462407900073594e-07, 'epoch': 0.09818280444806075, 'num_input_tokens_seen': 5792333824, 'completed': '74.91% (2_762 / 3_687)', 'remaining time': '8:06:14', 'throughput': '8419.18', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:04:57,991 >> {'loss': 0.6504, 'grad_norm': 19.6497745513916, 'learning_rate': 2.459430713923995e-07, 'epoch': 0.0984540276647681, 'num_input_tokens_seen': 5794430976, 'completed': '74.94% (2_763 / 3_687)', 'remaining time': '8:05:35', 'throughput': '9138.32', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:05:29,654 >> {'loss': 0.2229, 'grad_norm': 11.902987480163574, 'learning_rate': 2.4564559747088573e-07, 'epoch': 0.09872525088147545, 'num_input_tokens_seen': 5796528128, 'completed': '74.97% (2_764 / 3_687)', 'remaining time': '8:05:04', 'throughput': '8279.38', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:05:59,983 >> {'loss': 0.4959, 'grad_norm': 34.26702880859375, 'learning_rate': 2.4534836848221355e-07, 'epoch': 0.09899647409818281, 'num_input_tokens_seen': 5798625280, 'completed': '74.99% (2_765 / 3_687)', 'remaining time': '8:04:30', 'throughput': '8643.14', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:06:29,296 >> {'loss': 0.4531, 'grad_norm': 16.864425659179688, 'learning_rate': 2.450513846655817e-07, 'epoch': 0.09926769731489016, 'num_input_tokens_seen': 5800722432, 'completed': '75.02% (2_766 / 3_687)', 'remaining time': '8:03:52', 'throughput': '8943.10', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:07:00,561 >> {'loss': 0.4869, 'grad_norm': 15.229866981506348, 'learning_rate': 2.4475464625999113e-07, 'epoch': 0.0995389205315975, 'num_input_tokens_seen': 5802819584, 'completed': '75.05% (2_767 / 3_687)', 'remaining time': '8:03:20', 'throughput': '8384.67', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:07:30,503 >> {'loss': 0.3666, 'grad_norm': 13.635716438293457, 'learning_rate': 2.444581535042459e-07, 'epoch': 0.09981014374830485, 'num_input_tokens_seen': 5804916736, 'completed': '75.07% (2_768 / 3_687)', 'remaining time': '8:02:45', 'throughput': '8754.99', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:07:59,331 >> {'loss': 0.9287, 'grad_norm': 24.428754806518555, 'learning_rate': 2.441619066369519e-07, 'epoch': 0.1000813669650122, 'num_input_tokens_seen': 5807013888, 'completed': '75.10% (2_769 / 3_687)', 'remaining time': '8:02:07', 'throughput': '9093.18', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:08:29,486 >> {'loss': 0.5323, 'grad_norm': 19.71752166748047, 'learning_rate': 2.438659058965175e-07, 'epoch': 0.10035259018171956, 'num_input_tokens_seen': 5809111040, 'completed': '75.13% (2_770 / 3_687)', 'remaining time': '8:01:32', 'throughput': '8693.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:08:59,885 >> {'loss': 0.3992, 'grad_norm': 17.558313369750977, 'learning_rate': 2.435701515211527e-07, 'epoch': 0.10062381339842691, 'num_input_tokens_seen': 5811208192, 'completed': '75.16% (2_771 / 3_687)', 'remaining time': '8:00:58', 'throughput': '8623.62', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:09:29,614 >> {'loss': 0.3217, 'grad_norm': 15.69237995147705, 'learning_rate': 2.4327464374886955e-07, 'epoch': 0.10089503661513426, 'num_input_tokens_seen': 5813305344, 'completed': '75.18% (2_772 / 3_687)', 'remaining time': '8:00:22', 'throughput': '8817.77', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:10:01,646 >> {'loss': 0.459, 'grad_norm': 20.842071533203125, 'learning_rate': 2.4297938281748124e-07, 'epoch': 0.1011662598318416, 'num_input_tokens_seen': 5815402496, 'completed': '75.21% (2_773 / 3_687)', 'remaining time': '7:59:51', 'throughput': '8183.63', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:10:35,297 >> {'loss': 0.5564, 'grad_norm': 19.98575782775879, 'learning_rate': 2.4268436896460267e-07, 'epoch': 0.10143748304854895, 'num_input_tokens_seen': 5817499648, 'completed': '75.24% (2_774 / 3_687)', 'remaining time': '7:59:25', 'throughput': '7790.57', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:11:02,943 >> {'loss': 0.6737, 'grad_norm': 18.332754135131836, 'learning_rate': 2.423896024276497e-07, 'epoch': 0.10170870626525631, 'num_input_tokens_seen': 5819596800, 'completed': '75.26% (2_775 / 3_687)', 'remaining time': '7:58:44', 'throughput': '9481.58', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:11:35,904 >> {'loss': 0.553, 'grad_norm': 22.827144622802734, 'learning_rate': 2.4209508344383926e-07, 'epoch': 0.10197992948196366, 'num_input_tokens_seen': 5821693952, 'completed': '75.29% (2_776 / 3_687)', 'remaining time': '7:58:16', 'throughput': '7953.08', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:12:04,781 >> {'loss': 0.6379, 'grad_norm': 21.232194900512695, 'learning_rate': 2.4180081225018906e-07, 'epoch': 0.10225115269867101, 'num_input_tokens_seen': 5823791104, 'completed': '75.32% (2_777 / 3_687)', 'remaining time': '7:57:39', 'throughput': '9077.98', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:12:33,553 >> {'loss': 0.7381, 'grad_norm': 19.167123794555664, 'learning_rate': 2.4150678908351744e-07, 'epoch': 0.10252237591537836, 'num_input_tokens_seen': 5825888256, 'completed': '75.35% (2_778 / 3_687)', 'remaining time': '7:57:00', 'throughput': '9111.04', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:13:05,701 >> {'loss': 0.3415, 'grad_norm': 15.559260368347168, 'learning_rate': 2.4121301418044264e-07, 'epoch': 0.1027935991320857, 'num_input_tokens_seen': 5827985408, 'completed': '75.37% (2_779 / 3_687)', 'remaining time': '7:56:31', 'throughput': '8154.47', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:13:38,719 >> {'loss': 0.274, 'grad_norm': 12.264792442321777, 'learning_rate': 2.409194877773839e-07, 'epoch': 0.10306482234879305, 'num_input_tokens_seen': 5830082560, 'completed': '75.40% (2_780 / 3_687)', 'remaining time': '7:56:03', 'throughput': '7939.36', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:14:08,781 >> {'loss': 0.4599, 'grad_norm': 17.90496826171875, 'learning_rate': 2.4062621011056006e-07, 'epoch': 0.10333604556550041, 'num_input_tokens_seen': 5832179712, 'completed': '75.43% (2_781 / 3_687)', 'remaining time': '7:55:28', 'throughput': '8720.05', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:14:39,447 >> {'loss': 0.3293, 'grad_norm': 12.02820110321045, 'learning_rate': 2.4033318141598977e-07, 'epoch': 0.10360726878220776, 'num_input_tokens_seen': 5834276864, 'completed': '75.45% (2_782 / 3_687)', 'remaining time': '7:54:54', 'throughput': '8548.38', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:15:09,316 >> {'loss': 0.4602, 'grad_norm': 15.775694847106934, 'learning_rate': 2.4004040192949155e-07, 'epoch': 0.1038784919989151, 'num_input_tokens_seen': 5836374016, 'completed': '75.48% (2_783 / 3_687)', 'remaining time': '7:54:19', 'throughput': '8776.50', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:15:41,623 >> {'loss': 0.3823, 'grad_norm': 18.71192741394043, 'learning_rate': 2.3974787188668284e-07, 'epoch': 0.10414971521562245, 'num_input_tokens_seen': 5838471168, 'completed': '75.51% (2_784 / 3_687)', 'remaining time': '7:53:50', 'throughput': '8114.09', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:16:10,598 >> {'loss': 0.5613, 'grad_norm': 20.695791244506836, 'learning_rate': 2.3945559152298085e-07, 'epoch': 0.1044209384323298, 'num_input_tokens_seen': 5840568320, 'completed': '75.54% (2_785 / 3_687)', 'remaining time': '7:53:12', 'throughput': '9047.41', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:16:42,921 >> {'loss': 0.3668, 'grad_norm': 14.568918228149414, 'learning_rate': 2.3916356107360174e-07, 'epoch': 0.10469216164903716, 'num_input_tokens_seen': 5842665472, 'completed': '75.56% (2_786 / 3_687)', 'remaining time': '7:52:43', 'throughput': '8109.93', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:17:13,728 >> {'loss': 0.518, 'grad_norm': 18.7004337310791, 'learning_rate': 2.388717807735605e-07, 'epoch': 0.10496338486574451, 'num_input_tokens_seen': 5844762624, 'completed': '75.59% (2_787 / 3_687)', 'remaining time': '7:52:10', 'throughput': '8509.39', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:17:45,508 >> {'loss': 0.2456, 'grad_norm': 11.687546730041504, 'learning_rate': 2.385802508576706e-07, 'epoch': 0.10523460808245186, 'num_input_tokens_seen': 5846859776, 'completed': '75.62% (2_788 / 3_687)', 'remaining time': '7:51:39', 'throughput': '8248.78', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:18:19,778 >> {'loss': 0.2584, 'grad_norm': 14.417143821716309, 'learning_rate': 2.382889715605447e-07, 'epoch': 0.1055058312991592, 'num_input_tokens_seen': 5848956928, 'completed': '75.64% (2_789 / 3_687)', 'remaining time': '7:51:14', 'throughput': '7649.25', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:18:50,384 >> {'loss': 0.2735, 'grad_norm': 14.506068229675293, 'learning_rate': 2.3799794311659286e-07, 'epoch': 0.10577705451586655, 'num_input_tokens_seen': 5851054080, 'completed': '75.67% (2_790 / 3_687)', 'remaining time': '7:50:40', 'throughput': '8565.33', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:19:22,342 >> {'loss': 0.5316, 'grad_norm': 30.506330490112305, 'learning_rate': 2.3770716576002383e-07, 'epoch': 0.10604827773257391, 'num_input_tokens_seen': 5853151232, 'completed': '75.70% (2_791 / 3_687)', 'remaining time': '7:50:10', 'throughput': '8202.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:19:51,492 >> {'loss': 0.5209, 'grad_norm': 18.837438583374023, 'learning_rate': 2.3741663972484416e-07, 'epoch': 0.10631950094928126, 'num_input_tokens_seen': 5855248384, 'completed': '75.73% (2_792 / 3_687)', 'remaining time': '7:49:33', 'throughput': '8992.73', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:20:21,917 >> {'loss': 0.2876, 'grad_norm': 13.7747163772583, 'learning_rate': 2.3712636524485817e-07, 'epoch': 0.10659072416598861, 'num_input_tokens_seen': 5857345536, 'completed': '75.75% (2_793 / 3_687)', 'remaining time': '7:48:59', 'throughput': '8616.25', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:20:51,666 >> {'loss': 0.2466, 'grad_norm': 13.69284439086914, 'learning_rate': 2.3683634255366773e-07, 'epoch': 0.10686194738269596, 'num_input_tokens_seen': 5859442688, 'completed': '75.78% (2_794 / 3_687)', 'remaining time': '7:48:24', 'throughput': '8811.87', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:21:19,182 >> {'loss': 0.5751, 'grad_norm': 23.49444007873535, 'learning_rate': 2.3654657188467203e-07, 'epoch': 0.1071331705994033, 'num_input_tokens_seen': 5861539840, 'completed': '75.81% (2_795 / 3_687)', 'remaining time': '7:47:43', 'throughput': '9526.86', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:21:51,632 >> {'loss': 0.4901, 'grad_norm': 32.023529052734375, 'learning_rate': 2.3625705347106748e-07, 'epoch': 0.10740439381611067, 'num_input_tokens_seen': 5863636992, 'completed': '75.83% (2_796 / 3_687)', 'remaining time': '7:47:14', 'throughput': '8078.47', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:22:22,652 >> {'loss': 0.4605, 'grad_norm': 18.634899139404297, 'learning_rate': 2.3596778754584752e-07, 'epoch': 0.10767561703281801, 'num_input_tokens_seen': 5865734144, 'completed': '75.86% (2_797 / 3_687)', 'remaining time': '7:46:42', 'throughput': '8450.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:22:54,990 >> {'loss': 0.4397, 'grad_norm': 17.947357177734375, 'learning_rate': 2.356787743418023e-07, 'epoch': 0.10794684024952536, 'num_input_tokens_seen': 5867831296, 'completed': '75.89% (2_798 / 3_687)', 'remaining time': '7:46:12', 'throughput': '8106.30', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:23:24,920 >> {'loss': 0.2498, 'grad_norm': 10.587057113647461, 'learning_rate': 2.3539001409151867e-07, 'epoch': 0.10821806346623271, 'num_input_tokens_seen': 5869928448, 'completed': '75.92% (2_799 / 3_687)', 'remaining time': '7:45:37', 'throughput': '8758.46', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:23:54,857 >> {'loss': 0.4346, 'grad_norm': 15.883231163024902, 'learning_rate': 2.351015070273799e-07, 'epoch': 0.10848928668294006, 'num_input_tokens_seen': 5872025600, 'completed': '75.94% (2_800 / 3_687)', 'remaining time': '7:45:03', 'throughput': '8756.70', 'gpu_mem_free': '5581MB'} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( [INFO|trainer.py:3503] 2024-12-26 18:24:17,913 >> Saving model checkpoint to /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2800 [INFO|configuration_utils.py:472] 2024-12-26 18:24:17,918 >> Configuration saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2800/config.json [INFO|configuration_utils.py:807] 2024-12-26 18:24:17,920 >> Configuration saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2800/generation_config.json [INFO|modeling_utils.py:2807] 2024-12-26 18:25:14,975 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 7 checkpoint shards. You can find where each parameters has been saved in the index located at /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2800/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2684] 2024-12-26 18:25:14,978 >> tokenizer config file saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2800/tokenizer_config.json [INFO|tokenization_utils_base.py:2693] 2024-12-26 18:25:14,979 >> Special tokens file saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-2800/special_tokens_map.json /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( [WARNING|trainer.py:868] 2024-12-26 18:28:57,214 >> Save streaming dataset state: {'epoch': 0, 'sample_in_epoch': 5600, 'num_canonical_nodes': 1, 'shuffle_seed': 42, 'initial_physical_nodes': 1} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] [INFO|trainer.py:175] 2024-12-26 18:29:31,102 >> {'loss': 0.7427, 'grad_norm': 22.052234649658203, 'learning_rate': 2.3481325338156568e-07, 'epoch': 0.1087605098996474, 'num_input_tokens_seen': 5874122752, 'completed': '75.97% (2_801 / 3_687)', 'remaining time': '7:55:44', 'throughput': '779.62', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:30:00,266 >> {'loss': 0.2588, 'grad_norm': 15.02530288696289, 'learning_rate': 2.3452525338605126e-07, 'epoch': 0.10903173311635476, 'num_input_tokens_seen': 5876219904, 'completed': '76.00% (2_802 / 3_687)', 'remaining time': '7:55:06', 'throughput': '8988.74', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:30:32,029 >> {'loss': 0.2948, 'grad_norm': 15.054925918579102, 'learning_rate': 2.3423750727260813e-07, 'epoch': 0.10930295633306211, 'num_input_tokens_seen': 5878317056, 'completed': '76.02% (2_803 / 3_687)', 'remaining time': '7:54:32', 'throughput': '8253.02', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:31:01,816 >> {'loss': 0.2361, 'grad_norm': 12.671406745910645, 'learning_rate': 2.3395001527280343e-07, 'epoch': 0.10957417954976946, 'num_input_tokens_seen': 5880414208, 'completed': '76.05% (2_804 / 3_687)', 'remaining time': '7:53:55', 'throughput': '8800.85', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:31:32,391 >> {'loss': 0.663, 'grad_norm': 20.93232536315918, 'learning_rate': 2.3366277761799993e-07, 'epoch': 0.10984540276647681, 'num_input_tokens_seen': 5882511360, 'completed': '76.08% (2_805 / 3_687)', 'remaining time': '7:53:19', 'throughput': '8573.53', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:32:04,969 >> {'loss': 0.5558, 'grad_norm': 18.05968475341797, 'learning_rate': 2.333757945393557e-07, 'epoch': 0.11011662598318415, 'num_input_tokens_seen': 5884608512, 'completed': '76.11% (2_806 / 3_687)', 'remaining time': '7:52:48', 'throughput': '8046.67', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:32:38,098 >> {'loss': 0.4106, 'grad_norm': 15.030449867248535, 'learning_rate': 2.3308906626782333e-07, 'epoch': 0.11038784919989152, 'num_input_tokens_seen': 5886705664, 'completed': '76.13% (2_807 / 3_687)', 'remaining time': '7:52:18', 'throughput': '7912.83', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:33:12,940 >> {'loss': 0.5442, 'grad_norm': 20.000356674194336, 'learning_rate': 2.3280259303415115e-07, 'epoch': 0.11065907241659886, 'num_input_tokens_seen': 5888802816, 'completed': '76.16% (2_808 / 3_687)', 'remaining time': '7:51:51', 'throughput': '7523.97', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:33:43,006 >> {'loss': 0.1988, 'grad_norm': 10.596423149108887, 'learning_rate': 2.3251637506888178e-07, 'epoch': 0.11093029563330621, 'num_input_tokens_seen': 5890899968, 'completed': '76.19% (2_809 / 3_687)', 'remaining time': '7:51:14', 'throughput': '8718.92', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:34:11,572 >> {'loss': 0.3359, 'grad_norm': 13.657599449157715, 'learning_rate': 2.3223041260235265e-07, 'epoch': 0.11120151885001356, 'num_input_tokens_seen': 5892997120, 'completed': '76.21% (2_810 / 3_687)', 'remaining time': '7:50:34', 'throughput': '9176.67', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:34:42,907 >> {'loss': 0.218, 'grad_norm': 12.095483779907227, 'learning_rate': 2.3194470586469543e-07, 'epoch': 0.1114727420667209, 'num_input_tokens_seen': 5895094272, 'completed': '76.24% (2_811 / 3_687)', 'remaining time': '7:50:00', 'throughput': '8365.98', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:35:12,547 >> {'loss': 0.3819, 'grad_norm': 16.444936752319336, 'learning_rate': 2.3165925508583598e-07, 'epoch': 0.11174396528342827, 'num_input_tokens_seen': 5897191424, 'completed': '76.27% (2_812 / 3_687)', 'remaining time': '7:49:23', 'throughput': '8844.15', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:35:43,628 >> {'loss': 0.5105, 'grad_norm': 20.455156326293945, 'learning_rate': 2.313740604954943e-07, 'epoch': 0.11201518850013562, 'num_input_tokens_seen': 5899288576, 'completed': '76.30% (2_813 / 3_687)', 'remaining time': '7:48:48', 'throughput': '8434.30', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:36:16,322 >> {'loss': 0.3829, 'grad_norm': 15.869861602783203, 'learning_rate': 2.3108912232318413e-07, 'epoch': 0.11228641171684296, 'num_input_tokens_seen': 5901385728, 'completed': '76.32% (2_814 / 3_687)', 'remaining time': '7:48:17', 'throughput': '8018.01', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:36:47,693 >> {'loss': 0.4095, 'grad_norm': 13.574760437011719, 'learning_rate': 2.3080444079821284e-07, 'epoch': 0.11255763493355031, 'num_input_tokens_seen': 5903482880, 'completed': '76.35% (2_815 / 3_687)', 'remaining time': '7:47:43', 'throughput': '8356.17', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:37:19,980 >> {'loss': 0.8766, 'grad_norm': 27.635112762451172, 'learning_rate': 2.305200161496813e-07, 'epoch': 0.11282885815025766, 'num_input_tokens_seen': 5905580032, 'completed': '76.38% (2_816 / 3_687)', 'remaining time': '7:47:11', 'throughput': '8119.33', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:37:48,792 >> {'loss': 0.6796, 'grad_norm': 20.717670440673828, 'learning_rate': 2.3023584860648364e-07, 'epoch': 0.11310008136696502, 'num_input_tokens_seen': 5907677184, 'completed': '76.40% (2_817 / 3_687)', 'remaining time': '7:46:32', 'throughput': '9098.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:38:19,693 >> {'loss': 0.469, 'grad_norm': 17.95741081237793, 'learning_rate': 2.2995193839730727e-07, 'epoch': 0.11337130458367237, 'num_input_tokens_seen': 5909774336, 'completed': '76.43% (2_818 / 3_687)', 'remaining time': '7:45:57', 'throughput': '8483.38', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:38:48,933 >> {'loss': 0.3544, 'grad_norm': 15.287071228027344, 'learning_rate': 2.2966828575063196e-07, 'epoch': 0.11364252780037971, 'num_input_tokens_seen': 5911871488, 'completed': '76.46% (2_819 / 3_687)', 'remaining time': '7:45:19', 'throughput': '8965.35', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:39:19,664 >> {'loss': 0.4254, 'grad_norm': 17.377723693847656, 'learning_rate': 2.2938489089473065e-07, 'epoch': 0.11391375101708706, 'num_input_tokens_seen': 5913968640, 'completed': '76.48% (2_820 / 3_687)', 'remaining time': '7:44:44', 'throughput': '8530.21', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:39:51,255 >> {'loss': 0.7153, 'grad_norm': 21.972118377685547, 'learning_rate': 2.2910175405766896e-07, 'epoch': 0.11418497423379441, 'num_input_tokens_seen': 5916065792, 'completed': '76.51% (2_821 / 3_687)', 'remaining time': '7:44:10', 'throughput': '8298.11', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:40:21,827 >> {'loss': 0.4351, 'grad_norm': 21.45004653930664, 'learning_rate': 2.288188754673045e-07, 'epoch': 0.11445619745050176, 'num_input_tokens_seen': 5918162944, 'completed': '76.54% (2_822 / 3_687)', 'remaining time': '7:43:35', 'throughput': '8574.52', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:40:52,063 >> {'loss': 0.6523, 'grad_norm': 20.235944747924805, 'learning_rate': 2.2853625535128735e-07, 'epoch': 0.11472742066720912, 'num_input_tokens_seen': 5920260096, 'completed': '76.57% (2_823 / 3_687)', 'remaining time': '7:42:59', 'throughput': '8670.11', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:41:22,605 >> {'loss': 0.2849, 'grad_norm': 12.907526016235352, 'learning_rate': 2.2825389393705922e-07, 'epoch': 0.11499864388391647, 'num_input_tokens_seen': 5922357248, 'completed': '76.59% (2_824 / 3_687)', 'remaining time': '7:42:23', 'throughput': '8583.00', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:41:54,885 >> {'loss': 0.5741, 'grad_norm': 19.846851348876953, 'learning_rate': 2.2797179145185384e-07, 'epoch': 0.11526986710062381, 'num_input_tokens_seen': 5924454400, 'completed': '76.62% (2_825 / 3_687)', 'remaining time': '7:41:52', 'throughput': '8120.94', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:42:27,929 >> {'loss': 0.2806, 'grad_norm': 14.583490371704102, 'learning_rate': 2.2768994812269666e-07, 'epoch': 0.11554109031733116, 'num_input_tokens_seen': 5926551552, 'completed': '76.65% (2_826 / 3_687)', 'remaining time': '7:41:21', 'throughput': '7933.14', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:43:02,206 >> {'loss': 0.3765, 'grad_norm': 16.440160751342773, 'learning_rate': 2.2740836417640432e-07, 'epoch': 0.11581231353403851, 'num_input_tokens_seen': 5928648704, 'completed': '76.67% (2_827 / 3_687)', 'remaining time': '7:40:53', 'throughput': '7647.84', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:43:33,541 >> {'loss': 0.6938, 'grad_norm': 22.054208755493164, 'learning_rate': 2.271270398395851e-07, 'epoch': 0.11608353675074587, 'num_input_tokens_seen': 5930745856, 'completed': '76.70% (2_828 / 3_687)', 'remaining time': '7:40:20', 'throughput': '8365.95', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:44:01,804 >> {'loss': 0.406, 'grad_norm': 13.32007122039795, 'learning_rate': 2.2684597533863793e-07, 'epoch': 0.11635475996745322, 'num_input_tokens_seen': 5932843008, 'completed': '76.73% (2_829 / 3_687)', 'remaining time': '7:39:40', 'throughput': '9275.04', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:44:34,218 >> {'loss': 0.2882, 'grad_norm': 13.135971069335938, 'learning_rate': 2.2656517089975286e-07, 'epoch': 0.11662598318416056, 'num_input_tokens_seen': 5934940160, 'completed': '76.76% (2_830 / 3_687)', 'remaining time': '7:39:08', 'throughput': '8087.37', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:45:05,186 >> {'loss': 0.4794, 'grad_norm': 23.185508728027344, 'learning_rate': 2.2628462674891062e-07, 'epoch': 0.11689720640086791, 'num_input_tokens_seen': 5937037312, 'completed': '76.78% (2_831 / 3_687)', 'remaining time': '7:38:33', 'throughput': '8464.94', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:45:34,348 >> {'loss': 0.4259, 'grad_norm': 16.763158798217773, 'learning_rate': 2.2600434311188253e-07, 'epoch': 0.11716842961757526, 'num_input_tokens_seen': 5939134464, 'completed': '76.81% (2_832 / 3_687)', 'remaining time': '7:37:55', 'throughput': '8989.21', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:46:07,055 >> {'loss': 0.5011, 'grad_norm': 28.8845272064209, 'learning_rate': 2.2572432021423018e-07, 'epoch': 0.11743965283428262, 'num_input_tokens_seen': 5941231616, 'completed': '76.84% (2_833 / 3_687)', 'remaining time': '7:37:24', 'throughput': '8015.09', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:46:34,006 >> {'loss': 0.6964, 'grad_norm': 21.761306762695312, 'learning_rate': 2.2544455828130528e-07, 'epoch': 0.11771087605098997, 'num_input_tokens_seen': 5943328768, 'completed': '76.86% (2_834 / 3_687)', 'remaining time': '7:36:42', 'throughput': '9726.40', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:47:06,145 >> {'loss': 0.5045, 'grad_norm': 18.940256118774414, 'learning_rate': 2.251650575382498e-07, 'epoch': 0.11798209926769732, 'num_input_tokens_seen': 5945425920, 'completed': '76.89% (2_835 / 3_687)', 'remaining time': '7:36:10', 'throughput': '8156.64', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:47:33,536 >> {'loss': 0.4833, 'grad_norm': 16.255231857299805, 'learning_rate': 2.2488581820999481e-07, 'epoch': 0.11825332248440466, 'num_input_tokens_seen': 5947523072, 'completed': '76.92% (2_836 / 3_687)', 'remaining time': '7:35:29', 'throughput': '9570.51', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:48:03,458 >> {'loss': 0.678, 'grad_norm': 19.86233139038086, 'learning_rate': 2.2460684052126195e-07, 'epoch': 0.11852454570111201, 'num_input_tokens_seen': 5949620224, 'completed': '76.95% (2_837 / 3_687)', 'remaining time': '7:34:52', 'throughput': '8760.84', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:48:31,465 >> {'loss': 0.5081, 'grad_norm': 17.203176498413086, 'learning_rate': 2.243281246965617e-07, 'epoch': 0.11879576891781937, 'num_input_tokens_seen': 5951717376, 'completed': '76.97% (2_838 / 3_687)', 'remaining time': '7:34:12', 'throughput': '9360.03', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:49:03,004 >> {'loss': 0.2694, 'grad_norm': 11.493597984313965, 'learning_rate': 2.24049670960194e-07, 'epoch': 0.11906699213452672, 'num_input_tokens_seen': 5953814528, 'completed': '77.00% (2_839 / 3_687)', 'remaining time': '7:33:39', 'throughput': '8311.83', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:49:31,947 >> {'loss': 0.3385, 'grad_norm': 15.567400932312012, 'learning_rate': 2.2377147953624776e-07, 'epoch': 0.11933821535123407, 'num_input_tokens_seen': 5955911680, 'completed': '77.03% (2_840 / 3_687)', 'remaining time': '7:33:01', 'throughput': '9057.28', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:50:01,155 >> {'loss': 0.5221, 'grad_norm': 20.829069137573242, 'learning_rate': 2.2349355064860104e-07, 'epoch': 0.11960943856794141, 'num_input_tokens_seen': 5958008832, 'completed': '77.05% (2_841 / 3_687)', 'remaining time': '7:32:23', 'throughput': '8974.98', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:50:32,793 >> {'loss': 0.6818, 'grad_norm': 25.062725067138672, 'learning_rate': 2.2321588452092006e-07, 'epoch': 0.11988066178464876, 'num_input_tokens_seen': 5960105984, 'completed': '77.08% (2_842 / 3_687)', 'remaining time': '7:31:50', 'throughput': '8285.73', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:51:03,908 >> {'loss': 0.2983, 'grad_norm': 12.187406539916992, 'learning_rate': 2.229384813766602e-07, 'epoch': 0.12015188500135611, 'num_input_tokens_seen': 5962203136, 'completed': '77.11% (2_843 / 3_687)', 'remaining time': '7:31:16', 'throughput': '8425.06', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:51:33,847 >> {'loss': 0.3099, 'grad_norm': 12.41606616973877, 'learning_rate': 2.2266134143906496e-07, 'epoch': 0.12042310821806347, 'num_input_tokens_seen': 5964300288, 'completed': '77.14% (2_844 / 3_687)', 'remaining time': '7:30:40', 'throughput': '8755.82', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:52:09,364 >> {'loss': 0.3705, 'grad_norm': 15.00600528717041, 'learning_rate': 2.2238446493116572e-07, 'epoch': 0.12069433143477082, 'num_input_tokens_seen': 5966397440, 'completed': '77.16% (2_845 / 3_687)', 'remaining time': '7:30:15', 'throughput': '7380.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:52:40,364 >> {'loss': 0.5625, 'grad_norm': 19.592304229736328, 'learning_rate': 2.2210785207578275e-07, 'epoch': 0.12096555465147817, 'num_input_tokens_seen': 5968494592, 'completed': '77.19% (2_846 / 3_687)', 'remaining time': '7:29:41', 'throughput': '8456.41', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:53:10,416 >> {'loss': 0.2395, 'grad_norm': 11.538047790527344, 'learning_rate': 2.2183150309552308e-07, 'epoch': 0.12123677786818551, 'num_input_tokens_seen': 5970591744, 'completed': '77.22% (2_847 / 3_687)', 'remaining time': '7:29:05', 'throughput': '8723.00', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:53:41,883 >> {'loss': 0.4497, 'grad_norm': 17.946090698242188, 'learning_rate': 2.2155541821278196e-07, 'epoch': 0.12150800108489286, 'num_input_tokens_seen': 5972688896, 'completed': '77.24% (2_848 / 3_687)', 'remaining time': '7:28:31', 'throughput': '8330.62', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:54:13,443 >> {'loss': 0.2957, 'grad_norm': 13.891773223876953, 'learning_rate': 2.2127959764974203e-07, 'epoch': 0.12177922430160022, 'num_input_tokens_seen': 5974786048, 'completed': '77.27% (2_849 / 3_687)', 'remaining time': '7:27:58', 'throughput': '8306.25', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:54:44,101 >> {'loss': 0.4703, 'grad_norm': 17.93677520751953, 'learning_rate': 2.2100404162837317e-07, 'epoch': 0.12205044751830757, 'num_input_tokens_seen': 5976883200, 'completed': '77.30% (2_850 / 3_687)', 'remaining time': '7:27:24', 'throughput': '8550.60', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:55:13,925 >> {'loss': 0.4836, 'grad_norm': 21.524486541748047, 'learning_rate': 2.2072875037043232e-07, 'epoch': 0.12232167073501492, 'num_input_tokens_seen': 5978980352, 'completed': '77.33% (2_851 / 3_687)', 'remaining time': '7:26:47', 'throughput': '8789.69', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:55:44,352 >> {'loss': 0.3129, 'grad_norm': 12.988191604614258, 'learning_rate': 2.204537240974636e-07, 'epoch': 0.12259289395172226, 'num_input_tokens_seen': 5981077504, 'completed': '77.35% (2_852 / 3_687)', 'remaining time': '7:26:12', 'throughput': '8615.60', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:56:19,415 >> {'loss': 0.5311, 'grad_norm': 20.595354080200195, 'learning_rate': 2.2017896303079743e-07, 'epoch': 0.12286411716842961, 'num_input_tokens_seen': 5983174656, 'completed': '77.38% (2_853 / 3_687)', 'remaining time': '7:25:46', 'throughput': '7476.24', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:56:51,543 >> {'loss': 0.4373, 'grad_norm': 22.666582107543945, 'learning_rate': 2.1990446739155128e-07, 'epoch': 0.12313534038513697, 'num_input_tokens_seen': 5985271808, 'completed': '77.41% (2_854 / 3_687)', 'remaining time': '7:25:14', 'throughput': '8159.41', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:57:23,739 >> {'loss': 0.2112, 'grad_norm': 11.09825611114502, 'learning_rate': 2.1963023740062864e-07, 'epoch': 0.12340656360184432, 'num_input_tokens_seen': 5987368960, 'completed': '77.43% (2_855 / 3_687)', 'remaining time': '7:24:42', 'throughput': '8142.20', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:57:55,716 >> {'loss': 0.2525, 'grad_norm': 11.390056610107422, 'learning_rate': 2.1935627327871948e-07, 'epoch': 0.12367778681855167, 'num_input_tokens_seen': 5989466112, 'completed': '77.46% (2_856 / 3_687)', 'remaining time': '7:24:10', 'throughput': '8198.01', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:58:25,786 >> {'loss': 0.4179, 'grad_norm': 12.461858749389648, 'learning_rate': 2.1908257524629963e-07, 'epoch': 0.12394901003525902, 'num_input_tokens_seen': 5991563264, 'completed': '77.49% (2_857 / 3_687)', 'remaining time': '7:23:34', 'throughput': '8717.54', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:58:59,853 >> {'loss': 0.5047, 'grad_norm': 17.467121124267578, 'learning_rate': 2.1880914352363106e-07, 'epoch': 0.12422023325196636, 'num_input_tokens_seen': 5993660416, 'completed': '77.52% (2_858 / 3_687)', 'remaining time': '7:23:06', 'throughput': '7694.94', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:59:27,186 >> {'loss': 0.5304, 'grad_norm': 18.427640914916992, 'learning_rate': 2.1853597833076088e-07, 'epoch': 0.12449145646867373, 'num_input_tokens_seen': 5995757568, 'completed': '77.54% (2_859 / 3_687)', 'remaining time': '7:22:25', 'throughput': '9590.93', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 18:59:56,849 >> {'loss': 0.4055, 'grad_norm': 13.977510452270508, 'learning_rate': 2.1826307988752212e-07, 'epoch': 0.12476267968538107, 'num_input_tokens_seen': 5997854720, 'completed': '77.57% (2_860 / 3_687)', 'remaining time': '7:21:49', 'throughput': '8837.52', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:00:26,866 >> {'loss': 0.3931, 'grad_norm': 15.963011741638184, 'learning_rate': 2.179904484135331e-07, 'epoch': 0.1250339029020884, 'num_input_tokens_seen': 5999951872, 'completed': '77.60% (2_861 / 3_687)', 'remaining time': '7:21:13', 'throughput': '8733.12', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:00:55,903 >> {'loss': 0.4642, 'grad_norm': 17.05299186706543, 'learning_rate': 2.177180841281974e-07, 'epoch': 0.12530512611879577, 'num_input_tokens_seen': 6002049024, 'completed': '77.62% (2_862 / 3_687)', 'remaining time': '7:20:35', 'throughput': '9027.74', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:01:25,831 >> {'loss': 0.425, 'grad_norm': 15.496081352233887, 'learning_rate': 2.1744598725070347e-07, 'epoch': 0.12557634933550313, 'num_input_tokens_seen': 6004146176, 'completed': '77.65% (2_863 / 3_687)', 'remaining time': '7:20:00', 'throughput': '8759.29', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:01:56,515 >> {'loss': 0.5045, 'grad_norm': 18.021099090576172, 'learning_rate': 2.1717415800002425e-07, 'epoch': 0.12584757255221046, 'num_input_tokens_seen': 6006243328, 'completed': '77.68% (2_864 / 3_687)', 'remaining time': '7:19:25', 'throughput': '8543.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:02:24,797 >> {'loss': 0.5096, 'grad_norm': 16.96639633178711, 'learning_rate': 2.1690259659491768e-07, 'epoch': 0.12611879576891782, 'num_input_tokens_seen': 6008340480, 'completed': '77.71% (2_865 / 3_687)', 'remaining time': '7:18:47', 'throughput': '9269.20', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:02:54,454 >> {'loss': 0.3753, 'grad_norm': 14.744760513305664, 'learning_rate': 2.166313032539261e-07, 'epoch': 0.12639001898562516, 'num_input_tokens_seen': 6010437632, 'completed': '77.73% (2_866 / 3_687)', 'remaining time': '7:18:10', 'throughput': '8839.08', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:03:21,949 >> {'loss': 0.4849, 'grad_norm': 16.412687301635742, 'learning_rate': 2.1636027819537605e-07, 'epoch': 0.12666124220233252, 'num_input_tokens_seen': 6012534784, 'completed': '77.76% (2_867 / 3_687)', 'remaining time': '7:17:30', 'throughput': '9534.35', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:03:54,229 >> {'loss': 0.2672, 'grad_norm': 14.01508617401123, 'learning_rate': 2.160895216373781e-07, 'epoch': 0.12693246541903988, 'num_input_tokens_seen': 6014631936, 'completed': '77.79% (2_868 / 3_687)', 'remaining time': '7:16:59', 'throughput': '8120.86', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:04:24,807 >> {'loss': 0.2023, 'grad_norm': 9.786606788635254, 'learning_rate': 2.15819033797827e-07, 'epoch': 0.12720368863574721, 'num_input_tokens_seen': 6016729088, 'completed': '77.81% (2_869 / 3_687)', 'remaining time': '7:16:24', 'throughput': '8572.84', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:04:55,462 >> {'loss': 0.4128, 'grad_norm': 15.194939613342285, 'learning_rate': 2.1554881489440092e-07, 'epoch': 0.12747491185245458, 'num_input_tokens_seen': 6018826240, 'completed': '77.84% (2_870 / 3_687)', 'remaining time': '7:15:50', 'throughput': '8551.66', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:05:25,466 >> {'loss': 0.5414, 'grad_norm': 24.29368019104004, 'learning_rate': 2.1527886514456178e-07, 'epoch': 0.1277461350691619, 'num_input_tokens_seen': 6020923392, 'completed': '77.87% (2_871 / 3_687)', 'remaining time': '7:15:14', 'throughput': '8736.83', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:05:55,799 >> {'loss': 0.3288, 'grad_norm': 14.61825180053711, 'learning_rate': 2.150091847655549e-07, 'epoch': 0.12801735828586927, 'num_input_tokens_seen': 6023020544, 'completed': '77.90% (2_872 / 3_687)', 'remaining time': '7:14:40', 'throughput': '8642.21', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:06:29,150 >> {'loss': 0.1956, 'grad_norm': 10.37586784362793, 'learning_rate': 2.1473977397440878e-07, 'epoch': 0.12828858150257663, 'num_input_tokens_seen': 6025117696, 'completed': '77.92% (2_873 / 3_687)', 'remaining time': '7:14:10', 'throughput': '7860.21', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:07:00,823 >> {'loss': 0.5343, 'grad_norm': 17.528772354125977, 'learning_rate': 2.14470632987935e-07, 'epoch': 0.12855980471928397, 'num_input_tokens_seen': 6027214848, 'completed': '77.95% (2_874 / 3_687)', 'remaining time': '7:13:37', 'throughput': '8276.60', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:07:31,595 >> {'loss': 0.193, 'grad_norm': 13.622241020202637, 'learning_rate': 2.1420176202272805e-07, 'epoch': 0.12883102793599133, 'num_input_tokens_seen': 6029312000, 'completed': '77.98% (2_875 / 3_687)', 'remaining time': '7:13:03', 'throughput': '8518.92', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:08:01,357 >> {'loss': 0.448, 'grad_norm': 17.937719345092773, 'learning_rate': 2.1393316129516482e-07, 'epoch': 0.12910225115269866, 'num_input_tokens_seen': 6031409152, 'completed': '78.00% (2_876 / 3_687)', 'remaining time': '7:12:27', 'throughput': '8807.81', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:08:35,938 >> {'loss': 0.3603, 'grad_norm': 19.698455810546875, 'learning_rate': 2.1366483102140497e-07, 'epoch': 0.12937347436940602, 'num_input_tokens_seen': 6033506304, 'completed': '78.03% (2_877 / 3_687)', 'remaining time': '7:12:00', 'throughput': '7580.62', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:09:06,362 >> {'loss': 0.3772, 'grad_norm': 15.03511905670166, 'learning_rate': 2.1339677141739074e-07, 'epoch': 0.12964469758611338, 'num_input_tokens_seen': 6035603456, 'completed': '78.06% (2_878 / 3_687)', 'remaining time': '7:11:25', 'throughput': '8616.31', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:09:35,607 >> {'loss': 0.5342, 'grad_norm': 17.960357666015625, 'learning_rate': 2.1312898269884616e-07, 'epoch': 0.12991592080282072, 'num_input_tokens_seen': 6037700608, 'completed': '78.09% (2_879 / 3_687)', 'remaining time': '7:10:48', 'throughput': '8963.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:10:07,282 >> {'loss': 0.2405, 'grad_norm': 12.057799339294434, 'learning_rate': 2.128614650812775e-07, 'epoch': 0.13018714401952808, 'num_input_tokens_seen': 6039797760, 'completed': '78.11% (2_880 / 3_687)', 'remaining time': '7:10:16', 'throughput': '8276.01', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:10:34,951 >> {'loss': 0.6078, 'grad_norm': 27.220239639282227, 'learning_rate': 2.125942187799726e-07, 'epoch': 0.1304583672362354, 'num_input_tokens_seen': 6041894912, 'completed': '78.14% (2_881 / 3_687)', 'remaining time': '7:09:37', 'throughput': '9474.39', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:11:04,133 >> {'loss': 0.7583, 'grad_norm': 27.639814376831055, 'learning_rate': 2.123272440100012e-07, 'epoch': 0.13072959045294277, 'num_input_tokens_seen': 6043992064, 'completed': '78.17% (2_882 / 3_687)', 'remaining time': '7:09:00', 'throughput': '8983.03', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:11:32,430 >> {'loss': 0.6885, 'grad_norm': 22.158348083496094, 'learning_rate': 2.1206054098621434e-07, 'epoch': 0.13100081366965013, 'num_input_tokens_seen': 6046089216, 'completed': '78.19% (2_883 / 3_687)', 'remaining time': '7:08:22', 'throughput': '9264.21', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:12:00,554 >> {'loss': 0.7587, 'grad_norm': 26.35175895690918, 'learning_rate': 2.117941099232446e-07, 'epoch': 0.13127203688635747, 'num_input_tokens_seen': 6048186368, 'completed': '78.22% (2_884 / 3_687)', 'remaining time': '7:07:44', 'throughput': '9320.83', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:12:33,875 >> {'loss': 0.2805, 'grad_norm': 14.008932113647461, 'learning_rate': 2.1152795103550547e-07, 'epoch': 0.13154326010306483, 'num_input_tokens_seen': 6050283520, 'completed': '78.25% (2_885 / 3_687)', 'remaining time': '7:07:14', 'throughput': '7867.33', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:13:02,363 >> {'loss': 0.6467, 'grad_norm': 21.59432029724121, 'learning_rate': 2.1126206453719157e-07, 'epoch': 0.13181448331977216, 'num_input_tokens_seen': 6052380672, 'completed': '78.28% (2_886 / 3_687)', 'remaining time': '7:06:36', 'throughput': '9201.79', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:13:34,334 >> {'loss': 0.3664, 'grad_norm': 16.01569938659668, 'learning_rate': 2.1099645064227817e-07, 'epoch': 0.13208570653647952, 'num_input_tokens_seen': 6054477824, 'completed': '78.30% (2_887 / 3_687)', 'remaining time': '7:06:04', 'throughput': '8199.55', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:14:03,763 >> {'loss': 0.2808, 'grad_norm': 15.261370658874512, 'learning_rate': 2.1073110956452126e-07, 'epoch': 0.13235692975318689, 'num_input_tokens_seen': 6056574976, 'completed': '78.33% (2_888 / 3_687)', 'remaining time': '7:05:28', 'throughput': '8907.71', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:14:33,084 >> {'loss': 0.3063, 'grad_norm': 14.421086311340332, 'learning_rate': 2.1046604151745723e-07, 'epoch': 0.13262815296989422, 'num_input_tokens_seen': 6058672128, 'completed': '78.36% (2_889 / 3_687)', 'remaining time': '7:04:52', 'throughput': '8940.32', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:15:03,289 >> {'loss': 0.3161, 'grad_norm': 15.327587127685547, 'learning_rate': 2.1020124671440274e-07, 'epoch': 0.13289937618660158, 'num_input_tokens_seen': 6060769280, 'completed': '78.38% (2_890 / 3_687)', 'remaining time': '7:04:17', 'throughput': '8678.77', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:15:31,065 >> {'loss': 0.3972, 'grad_norm': 18.59784507751465, 'learning_rate': 2.0993672536845458e-07, 'epoch': 0.13317059940330891, 'num_input_tokens_seen': 6062866432, 'completed': '78.41% (2_891 / 3_687)', 'remaining time': '7:03:38', 'throughput': '9437.86', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:16:00,771 >> {'loss': 0.4275, 'grad_norm': 16.773000717163086, 'learning_rate': 2.0967247769248968e-07, 'epoch': 0.13344182262001628, 'num_input_tokens_seen': 6064963584, 'completed': '78.44% (2_892 / 3_687)', 'remaining time': '7:03:03', 'throughput': '8824.60', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:16:32,133 >> {'loss': 0.2471, 'grad_norm': 14.18702220916748, 'learning_rate': 2.0940850389916398e-07, 'epoch': 0.13371304583672364, 'num_input_tokens_seen': 6067060736, 'completed': '78.46% (2_893 / 3_687)', 'remaining time': '7:02:30', 'throughput': '8358.68', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:17:01,047 >> {'loss': 0.176, 'grad_norm': 11.325303077697754, 'learning_rate': 2.09144804200914e-07, 'epoch': 0.13398426905343097, 'num_input_tokens_seen': 6069157888, 'completed': '78.49% (2_894 / 3_687)', 'remaining time': '7:01:53', 'throughput': '9066.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:17:31,656 >> {'loss': 0.2562, 'grad_norm': 13.870182037353516, 'learning_rate': 2.0888137880995514e-07, 'epoch': 0.13425549227013833, 'num_input_tokens_seen': 6071255040, 'completed': '78.52% (2_895 / 3_687)', 'remaining time': '7:01:19', 'throughput': '8564.24', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:18:00,591 >> {'loss': 0.5646, 'grad_norm': 21.369096755981445, 'learning_rate': 2.086182279382821e-07, 'epoch': 0.13452671548684567, 'num_input_tokens_seen': 6073352192, 'completed': '78.55% (2_896 / 3_687)', 'remaining time': '7:00:43', 'throughput': '9059.81', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:18:32,738 >> {'loss': 0.4004, 'grad_norm': 17.719070434570312, 'learning_rate': 2.083553517976686e-07, 'epoch': 0.13479793870355303, 'num_input_tokens_seen': 6075449344, 'completed': '78.57% (2_897 / 3_687)', 'remaining time': '7:00:11', 'throughput': '8154.46', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:19:00,311 >> {'loss': 0.4522, 'grad_norm': 16.208087921142578, 'learning_rate': 2.0809275059966764e-07, 'epoch': 0.13506916192026036, 'num_input_tokens_seen': 6077546496, 'completed': '78.60% (2_898 / 3_687)', 'remaining time': '6:59:32', 'throughput': '9507.11', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:19:29,282 >> {'loss': 0.2981, 'grad_norm': 11.811439514160156, 'learning_rate': 2.0783042455561023e-07, 'epoch': 0.13534038513696772, 'num_input_tokens_seen': 6079643648, 'completed': '78.63% (2_899 / 3_687)', 'remaining time': '6:58:56', 'throughput': '9048.53', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:19:59,900 >> {'loss': 0.3378, 'grad_norm': 14.665522575378418, 'learning_rate': 2.0756837387660676e-07, 'epoch': 0.13561160835367508, 'num_input_tokens_seen': 6081740800, 'completed': '78.65% (2_900 / 3_687)', 'remaining time': '6:58:22', 'throughput': '8561.97', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:20:29,510 >> {'loss': 0.3911, 'grad_norm': 18.123260498046875, 'learning_rate': 2.073065987735454e-07, 'epoch': 0.13588283157038242, 'num_input_tokens_seen': 6083837952, 'completed': '78.68% (2_901 / 3_687)', 'remaining time': '6:57:46', 'throughput': '8853.24', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:21:03,560 >> {'loss': 0.4768, 'grad_norm': 15.937180519104004, 'learning_rate': 2.070450994570928e-07, 'epoch': 0.13615405478708978, 'num_input_tokens_seen': 6085935104, 'completed': '78.71% (2_902 / 3_687)', 'remaining time': '6:57:18', 'throughput': '7698.78', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:21:34,393 >> {'loss': 0.4538, 'grad_norm': 18.534635543823242, 'learning_rate': 2.0678387613769397e-07, 'epoch': 0.1364252780037971, 'num_input_tokens_seen': 6088032256, 'completed': '78.74% (2_903 / 3_687)', 'remaining time': '6:56:44', 'throughput': '8501.92', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:22:05,097 >> {'loss': 0.5692, 'grad_norm': 19.72084617614746, 'learning_rate': 2.0652292902557117e-07, 'epoch': 0.13669650122050447, 'num_input_tokens_seen': 6090129408, 'completed': '78.76% (2_904 / 3_687)', 'remaining time': '6:56:11', 'throughput': '8537.69', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:22:32,669 >> {'loss': 0.7825, 'grad_norm': 23.22869110107422, 'learning_rate': 2.0626225833072487e-07, 'epoch': 0.13696772443721184, 'num_input_tokens_seen': 6092226560, 'completed': '78.79% (2_905 / 3_687)', 'remaining time': '6:55:32', 'throughput': '9507.66', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:23:03,522 >> {'loss': 0.3674, 'grad_norm': 16.263364791870117, 'learning_rate': 2.0600186426293282e-07, 'epoch': 0.13723894765391917, 'num_input_tokens_seen': 6094323712, 'completed': '78.82% (2_906 / 3_687)', 'remaining time': '6:54:59', 'throughput': '8496.57', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:23:34,527 >> {'loss': 0.2301, 'grad_norm': 15.390429496765137, 'learning_rate': 2.0574174703175034e-07, 'epoch': 0.13751017087062653, 'num_input_tokens_seen': 6096420864, 'completed': '78.84% (2_907 / 3_687)', 'remaining time': '6:54:25', 'throughput': '8455.06', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:24:04,719 >> {'loss': 0.4732, 'grad_norm': 19.23081398010254, 'learning_rate': 2.0548190684650981e-07, 'epoch': 0.13778139408733386, 'num_input_tokens_seen': 6098518016, 'completed': '78.87% (2_908 / 3_687)', 'remaining time': '6:53:51', 'throughput': '8682.56', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:24:34,651 >> {'loss': 0.5119, 'grad_norm': 16.774734497070312, 'learning_rate': 2.052223439163208e-07, 'epoch': 0.13805261730404123, 'num_input_tokens_seen': 6100615168, 'completed': '78.90% (2_909 / 3_687)', 'remaining time': '6:53:16', 'throughput': '8758.04', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:25:05,610 >> {'loss': 0.2929, 'grad_norm': 11.67673110961914, 'learning_rate': 2.049630584500696e-07, 'epoch': 0.1383238405207486, 'num_input_tokens_seen': 6102712320, 'completed': '78.93% (2_910 / 3_687)', 'remaining time': '6:52:43', 'throughput': '8467.41', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:25:37,512 >> {'loss': 0.5621, 'grad_norm': 22.138757705688477, 'learning_rate': 2.0470405065641938e-07, 'epoch': 0.13859506373745592, 'num_input_tokens_seen': 6104809472, 'completed': '78.95% (2_911 / 3_687)', 'remaining time': '6:52:11', 'throughput': '8216.97', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:26:11,496 >> {'loss': 0.5335, 'grad_norm': 18.90143585205078, 'learning_rate': 2.0444532074380973e-07, 'epoch': 0.13886628695416328, 'num_input_tokens_seen': 6106906624, 'completed': '78.98% (2_912 / 3_687)', 'remaining time': '6:51:42', 'throughput': '7713.94', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:26:43,528 >> {'loss': 0.237, 'grad_norm': 13.576014518737793, 'learning_rate': 2.0418686892045654e-07, 'epoch': 0.13913751017087062, 'num_input_tokens_seen': 6109003776, 'completed': '79.01% (2_913 / 3_687)', 'remaining time': '6:51:11', 'throughput': '8183.71', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:27:13,620 >> {'loss': 0.4888, 'grad_norm': 16.365047454833984, 'learning_rate': 2.039286953943521e-07, 'epoch': 0.13940873338757798, 'num_input_tokens_seen': 6111100928, 'completed': '79.03% (2_914 / 3_687)', 'remaining time': '6:50:36', 'throughput': '8711.57', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:27:47,281 >> {'loss': 0.3311, 'grad_norm': 15.33917236328125, 'learning_rate': 2.0367080037326472e-07, 'epoch': 0.13967995660428534, 'num_input_tokens_seen': 6113198080, 'completed': '79.06% (2_915 / 3_687)', 'remaining time': '6:50:07', 'throughput': '7787.56', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:28:19,379 >> {'loss': 0.3541, 'grad_norm': 14.788158416748047, 'learning_rate': 2.0341318406473833e-07, 'epoch': 0.13995117982099267, 'num_input_tokens_seen': 6115295232, 'completed': '79.09% (2_916 / 3_687)', 'remaining time': '6:49:35', 'throughput': '8167.04', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:28:52,124 >> {'loss': 0.4563, 'grad_norm': 18.039945602416992, 'learning_rate': 2.031558466760927e-07, 'epoch': 0.14022240303770003, 'num_input_tokens_seen': 6117392384, 'completed': '79.12% (2_917 / 3_687)', 'remaining time': '6:49:05', 'throughput': '8005.62', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:29:22,730 >> {'loss': 0.5077, 'grad_norm': 20.71190071105957, 'learning_rate': 2.0289878841442325e-07, 'epoch': 0.14049362625440737, 'num_input_tokens_seen': 6119489536, 'completed': '79.14% (2_918 / 3_687)', 'remaining time': '6:48:31', 'throughput': '8565.21', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:29:51,206 >> {'loss': 0.4001, 'grad_norm': 19.41553497314453, 'learning_rate': 2.0264200948660076e-07, 'epoch': 0.14076484947111473, 'num_input_tokens_seen': 6121586688, 'completed': '79.17% (2_919 / 3_687)', 'remaining time': '6:47:54', 'throughput': '9205.86', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:30:24,492 >> {'loss': 0.4873, 'grad_norm': 17.448558807373047, 'learning_rate': 2.0238551009927125e-07, 'epoch': 0.1410360726878221, 'num_input_tokens_seen': 6123683840, 'completed': '79.20% (2_920 / 3_687)', 'remaining time': '6:47:24', 'throughput': '7875.40', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:30:55,045 >> {'loss': 0.3583, 'grad_norm': 14.663477897644043, 'learning_rate': 2.021292904588553e-07, 'epoch': 0.14130729590452942, 'num_input_tokens_seen': 6125780992, 'completed': '79.22% (2_921 / 3_687)', 'remaining time': '6:46:50', 'throughput': '8580.08', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:31:27,821 >> {'loss': 0.4864, 'grad_norm': 19.37281608581543, 'learning_rate': 2.0187335077154905e-07, 'epoch': 0.14157851912123678, 'num_input_tokens_seen': 6127878144, 'completed': '79.25% (2_922 / 3_687)', 'remaining time': '6:46:20', 'throughput': '7998.03', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:31:55,873 >> {'loss': 0.5179, 'grad_norm': 16.566606521606445, 'learning_rate': 2.0161769124332278e-07, 'epoch': 0.14184974233794412, 'num_input_tokens_seen': 6129975296, 'completed': '79.28% (2_923 / 3_687)', 'remaining time': '6:45:42', 'throughput': '9344.93', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:32:27,244 >> {'loss': 0.3427, 'grad_norm': 14.152243614196777, 'learning_rate': 2.013623120799217e-07, 'epoch': 0.14212096555465148, 'num_input_tokens_seen': 6132072448, 'completed': '79.31% (2_924 / 3_687)', 'remaining time': '6:45:10', 'throughput': '8356.15', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:32:56,600 >> {'loss': 0.4143, 'grad_norm': 17.58519744873047, 'learning_rate': 2.0110721348686523e-07, 'epoch': 0.14239218877135884, 'num_input_tokens_seen': 6134169600, 'completed': '79.33% (2_925 / 3_687)', 'remaining time': '6:44:34', 'throughput': '8929.94', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:33:27,012 >> {'loss': 0.1861, 'grad_norm': 11.517704010009766, 'learning_rate': 2.0085239566944702e-07, 'epoch': 0.14266341198806617, 'num_input_tokens_seen': 6136266752, 'completed': '79.36% (2_926 / 3_687)', 'remaining time': '6:44:00', 'throughput': '8619.63', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:33:57,431 >> {'loss': 0.4736, 'grad_norm': 19.695545196533203, 'learning_rate': 2.0059785883273463e-07, 'epoch': 0.14293463520477354, 'num_input_tokens_seen': 6138363904, 'completed': '79.39% (2_927 / 3_687)', 'remaining time': '6:43:27', 'throughput': '8617.89', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:34:25,902 >> {'loss': 0.4326, 'grad_norm': 17.48176383972168, 'learning_rate': 2.003436031815698e-07, 'epoch': 0.14320585842148087, 'num_input_tokens_seen': 6140461056, 'completed': '79.41% (2_928 / 3_687)', 'remaining time': '6:42:50', 'throughput': '9207.38', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:34:58,495 >> {'loss': 0.3545, 'grad_norm': 15.76534652709961, 'learning_rate': 2.0008962892056762e-07, 'epoch': 0.14347708163818823, 'num_input_tokens_seen': 6142558208, 'completed': '79.44% (2_929 / 3_687)', 'remaining time': '6:42:19', 'throughput': '8043.00', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:35:31,461 >> {'loss': 0.4996, 'grad_norm': 16.841144561767578, 'learning_rate': 1.998359362541171e-07, 'epoch': 0.1437483048548956, 'num_input_tokens_seen': 6144655360, 'completed': '79.47% (2_930 / 3_687)', 'remaining time': '6:41:49', 'throughput': '7951.97', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:36:03,012 >> {'loss': 0.4388, 'grad_norm': 20.11952781677246, 'learning_rate': 1.9958252538638032e-07, 'epoch': 0.14401952807160293, 'num_input_tokens_seen': 6146752512, 'completed': '79.50% (2_931 / 3_687)', 'remaining time': '6:41:17', 'throughput': '8308.62', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:36:37,226 >> {'loss': 0.2301, 'grad_norm': 14.645740509033203, 'learning_rate': 1.9932939652129283e-07, 'epoch': 0.1442907512883103, 'num_input_tokens_seen': 6148849664, 'completed': '79.52% (2_932 / 3_687)', 'remaining time': '6:40:48', 'throughput': '7661.78', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:37:04,417 >> {'loss': 0.651, 'grad_norm': 20.628215789794922, 'learning_rate': 1.990765498625629e-07, 'epoch': 0.14456197450501762, 'num_input_tokens_seen': 6150946816, 'completed': '79.55% (2_933 / 3_687)', 'remaining time': '6:40:10', 'throughput': '9640.83', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:37:36,396 >> {'loss': 0.3269, 'grad_norm': 13.291131019592285, 'learning_rate': 1.988239856136719e-07, 'epoch': 0.14483319772172498, 'num_input_tokens_seen': 6153043968, 'completed': '79.58% (2_934 / 3_687)', 'remaining time': '6:39:38', 'throughput': '8197.49', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:38:05,632 >> {'loss': 0.4404, 'grad_norm': 16.319238662719727, 'learning_rate': 1.9857170397787415e-07, 'epoch': 0.14510442093843234, 'num_input_tokens_seen': 6155141120, 'completed': '79.60% (2_935 / 3_687)', 'remaining time': '6:39:02', 'throughput': '8966.33', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:38:35,815 >> {'loss': 0.3358, 'grad_norm': 14.976720809936523, 'learning_rate': 1.9831970515819625e-07, 'epoch': 0.14537564415513968, 'num_input_tokens_seen': 6157238272, 'completed': '79.63% (2_936 / 3_687)', 'remaining time': '6:38:28', 'throughput': '8685.15', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:39:08,846 >> {'loss': 0.2542, 'grad_norm': 10.872328758239746, 'learning_rate': 1.980679893574373e-07, 'epoch': 0.14564686737184704, 'num_input_tokens_seen': 6159335424, 'completed': '79.66% (2_937 / 3_687)', 'remaining time': '6:37:58', 'throughput': '7936.28', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:39:42,529 >> {'loss': 0.4478, 'grad_norm': 19.26199722290039, 'learning_rate': 1.9781655677816838e-07, 'epoch': 0.14591809058855437, 'num_input_tokens_seen': 6161432576, 'completed': '79.69% (2_938 / 3_687)', 'remaining time': '6:37:29', 'throughput': '7782.69', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:40:12,490 >> {'loss': 0.3474, 'grad_norm': 12.88478946685791, 'learning_rate': 1.97565407622733e-07, 'epoch': 0.14618931380526173, 'num_input_tokens_seen': 6163529728, 'completed': '79.71% (2_939 / 3_687)', 'remaining time': '6:36:54', 'throughput': '8749.56', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:40:40,896 >> {'loss': 0.6245, 'grad_norm': 21.489870071411133, 'learning_rate': 1.9731454209324644e-07, 'epoch': 0.14646053702196907, 'num_input_tokens_seen': 6165626880, 'completed': '79.74% (2_940 / 3_687)', 'remaining time': '6:36:18', 'throughput': '9228.41', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:41:11,158 >> {'loss': 0.5171, 'grad_norm': 18.655330657958984, 'learning_rate': 1.9706396039159568e-07, 'epoch': 0.14673176023867643, 'num_input_tokens_seen': 6167724032, 'completed': '79.77% (2_941 / 3_687)', 'remaining time': '6:35:44', 'throughput': '8662.57', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:41:45,419 >> {'loss': 0.309, 'grad_norm': 15.826581001281738, 'learning_rate': 1.9681366271943945e-07, 'epoch': 0.1470029834553838, 'num_input_tokens_seen': 6169821184, 'completed': '79.79% (2_942 / 3_687)', 'remaining time': '6:35:15', 'throughput': '7651.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:42:16,273 >> {'loss': 0.2265, 'grad_norm': 16.819740295410156, 'learning_rate': 1.965636492782077e-07, 'epoch': 0.14727420667209112, 'num_input_tokens_seen': 6171918336, 'completed': '79.82% (2_943 / 3_687)', 'remaining time': '6:34:42', 'throughput': '8496.12', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:42:47,644 >> {'loss': 0.8497, 'grad_norm': 27.977834701538086, 'learning_rate': 1.963139202691017e-07, 'epoch': 0.14754542988879848, 'num_input_tokens_seen': 6174015488, 'completed': '79.85% (2_944 / 3_687)', 'remaining time': '6:34:10', 'throughput': '8356.32', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:43:17,975 >> {'loss': 0.7042, 'grad_norm': 29.797565460205078, 'learning_rate': 1.9606447589309397e-07, 'epoch': 0.14781665310550582, 'num_input_tokens_seen': 6176112640, 'completed': '79.88% (2_945 / 3_687)', 'remaining time': '6:33:36', 'throughput': '8642.89', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:43:47,585 >> {'loss': 0.2392, 'grad_norm': 14.23031234741211, 'learning_rate': 1.9581531635092773e-07, 'epoch': 0.14808787632221318, 'num_input_tokens_seen': 6178209792, 'completed': '79.90% (2_946 / 3_687)', 'remaining time': '6:33:01', 'throughput': '8853.16', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:44:18,154 >> {'loss': 0.3154, 'grad_norm': 14.467376708984375, 'learning_rate': 1.9556644184311728e-07, 'epoch': 0.14835909953892054, 'num_input_tokens_seen': 6180306944, 'completed': '79.93% (2_947 / 3_687)', 'remaining time': '6:32:27', 'throughput': '8575.44', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:44:46,217 >> {'loss': 0.4227, 'grad_norm': 17.21703338623047, 'learning_rate': 1.9531785256994716e-07, 'epoch': 0.14863032275562787, 'num_input_tokens_seen': 6182404096, 'completed': '79.96% (2_948 / 3_687)', 'remaining time': '6:31:51', 'throughput': '9341.41', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:45:17,866 >> {'loss': 0.5356, 'grad_norm': 19.657089233398438, 'learning_rate': 1.9506954873147276e-07, 'epoch': 0.14890154597233524, 'num_input_tokens_seen': 6184501248, 'completed': '79.98% (2_949 / 3_687)', 'remaining time': '6:31:19', 'throughput': '8282.61', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:45:48,313 >> {'loss': 0.4391, 'grad_norm': 16.794424057006836, 'learning_rate': 1.9482153052751921e-07, 'epoch': 0.14917276918904257, 'num_input_tokens_seen': 6186598400, 'completed': '80.01% (2_950 / 3_687)', 'remaining time': '6:30:45', 'throughput': '8609.86', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:46:18,600 >> {'loss': 0.5088, 'grad_norm': 18.118986129760742, 'learning_rate': 1.9457379815768245e-07, 'epoch': 0.14944399240574993, 'num_input_tokens_seen': 6188695552, 'completed': '80.04% (2_951 / 3_687)', 'remaining time': '6:30:11', 'throughput': '8655.53', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:46:48,827 >> {'loss': 0.4736, 'grad_norm': 17.5322265625, 'learning_rate': 1.9432635182132795e-07, 'epoch': 0.1497152156224573, 'num_input_tokens_seen': 6190792704, 'completed': '80.07% (2_952 / 3_687)', 'remaining time': '6:29:37', 'throughput': '8672.35', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:47:19,244 >> {'loss': 0.3153, 'grad_norm': 15.88696575164795, 'learning_rate': 1.9407919171759112e-07, 'epoch': 0.14998643883916463, 'num_input_tokens_seen': 6192889856, 'completed': '80.09% (2_953 / 3_687)', 'remaining time': '6:29:03', 'throughput': '8618.32', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:47:49,526 >> {'loss': 0.4207, 'grad_norm': 30.527063369750977, 'learning_rate': 1.9383231804537704e-07, 'epoch': 0.150257662055872, 'num_input_tokens_seen': 6194987008, 'completed': '80.12% (2_954 / 3_687)', 'remaining time': '6:28:30', 'throughput': '8656.78', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:48:19,850 >> {'loss': 0.3108, 'grad_norm': 14.228474617004395, 'learning_rate': 1.9358573100336034e-07, 'epoch': 0.15052888527257932, 'num_input_tokens_seen': 6197084160, 'completed': '80.15% (2_955 / 3_687)', 'remaining time': '6:27:56', 'throughput': '8644.79', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:48:50,308 >> {'loss': 0.3895, 'grad_norm': 17.981746673583984, 'learning_rate': 1.933394307899846e-07, 'epoch': 0.15080010848928668, 'num_input_tokens_seen': 6199181312, 'completed': '80.17% (2_956 / 3_687)', 'remaining time': '6:27:22', 'throughput': '8606.74', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:49:21,039 >> {'loss': 0.7074, 'grad_norm': 27.902082443237305, 'learning_rate': 1.9309341760346304e-07, 'epoch': 0.15107133170599404, 'num_input_tokens_seen': 6201278464, 'completed': '80.20% (2_957 / 3_687)', 'remaining time': '6:26:49', 'throughput': '8530.23', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:49:51,459 >> {'loss': 0.2475, 'grad_norm': 14.604863166809082, 'learning_rate': 1.9284769164177767e-07, 'epoch': 0.15134255492270138, 'num_input_tokens_seen': 6203375616, 'completed': '80.23% (2_958 / 3_687)', 'remaining time': '6:26:16', 'throughput': '8617.62', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:50:21,406 >> {'loss': 0.4611, 'grad_norm': 23.595659255981445, 'learning_rate': 1.9260225310267925e-07, 'epoch': 0.15161377813940874, 'num_input_tokens_seen': 6205472768, 'completed': '80.25% (2_959 / 3_687)', 'remaining time': '6:25:41', 'throughput': '8753.54', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:50:51,611 >> {'loss': 0.5274, 'grad_norm': 20.186174392700195, 'learning_rate': 1.9235710218368784e-07, 'epoch': 0.15188500135611607, 'num_input_tokens_seen': 6207569920, 'completed': '80.28% (2_960 / 3_687)', 'remaining time': '6:25:08', 'throughput': '8678.82', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:51:21,747 >> {'loss': 0.4078, 'grad_norm': 15.750053405761719, 'learning_rate': 1.9211223908209114e-07, 'epoch': 0.15215622457282343, 'num_input_tokens_seen': 6209667072, 'completed': '80.31% (2_961 / 3_687)', 'remaining time': '6:24:34', 'throughput': '8698.71', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:51:52,284 >> {'loss': 0.3896, 'grad_norm': 17.78439712524414, 'learning_rate': 1.9186766399494581e-07, 'epoch': 0.1524274477895308, 'num_input_tokens_seen': 6211764224, 'completed': '80.34% (2_962 / 3_687)', 'remaining time': '6:24:00', 'throughput': '8584.41', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:52:22,370 >> {'loss': 0.3784, 'grad_norm': 15.00125503540039, 'learning_rate': 1.9162337711907657e-07, 'epoch': 0.15269867100623813, 'num_input_tokens_seen': 6213861376, 'completed': '80.36% (2_963 / 3_687)', 'remaining time': '6:23:26', 'throughput': '8713.29', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:52:54,120 >> {'loss': 0.1549, 'grad_norm': 10.61737060546875, 'learning_rate': 1.9137937865107606e-07, 'epoch': 0.1529698942229455, 'num_input_tokens_seen': 6215958528, 'completed': '80.39% (2_964 / 3_687)', 'remaining time': '6:22:54', 'throughput': '8256.34', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:53:24,209 >> {'loss': 0.2978, 'grad_norm': 15.642570495605469, 'learning_rate': 1.9113566878730515e-07, 'epoch': 0.15324111743965282, 'num_input_tokens_seen': 6218055680, 'completed': '80.42% (2_965 / 3_687)', 'remaining time': '6:22:21', 'throughput': '8712.48', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:53:53,645 >> {'loss': 0.9192, 'grad_norm': 24.957725524902344, 'learning_rate': 1.9089224772389223e-07, 'epoch': 0.15351234065636019, 'num_input_tokens_seen': 6220152832, 'completed': '80.44% (2_966 / 3_687)', 'remaining time': '6:21:46', 'throughput': '8905.47', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:54:26,441 >> {'loss': 0.2818, 'grad_norm': 12.823304176330566, 'learning_rate': 1.9064911565673328e-07, 'epoch': 0.15378356387306755, 'num_input_tokens_seen': 6222249984, 'completed': '80.47% (2_967 / 3_687)', 'remaining time': '6:21:15', 'throughput': '7993.10', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:54:59,358 >> {'loss': 0.4243, 'grad_norm': 18.57845687866211, 'learning_rate': 1.9040627278149168e-07, 'epoch': 0.15405478708977488, 'num_input_tokens_seen': 6224347136, 'completed': '80.50% (2_968 / 3_687)', 'remaining time': '6:20:45', 'throughput': '7963.74', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:55:27,239 >> {'loss': 0.901, 'grad_norm': 27.991865158081055, 'learning_rate': 1.9016371929359824e-07, 'epoch': 0.15432601030648224, 'num_input_tokens_seen': 6226444288, 'completed': '80.53% (2_969 / 3_687)', 'remaining time': '6:20:08', 'throughput': '9402.38', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:55:57,741 >> {'loss': 0.6081, 'grad_norm': 40.3478889465332, 'learning_rate': 1.8992145538825066e-07, 'epoch': 0.15459723352318958, 'num_input_tokens_seen': 6228541440, 'completed': '80.55% (2_970 / 3_687)', 'remaining time': '6:19:35', 'throughput': '8594.33', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:56:29,255 >> {'loss': 0.4356, 'grad_norm': 16.422760009765625, 'learning_rate': 1.8967948126041383e-07, 'epoch': 0.15486845673989694, 'num_input_tokens_seen': 6230638592, 'completed': '80.58% (2_971 / 3_687)', 'remaining time': '6:19:03', 'throughput': '8318.32', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:56:58,028 >> {'loss': 0.4776, 'grad_norm': 17.511781692504883, 'learning_rate': 1.8943779710481938e-07, 'epoch': 0.1551396799566043, 'num_input_tokens_seen': 6232735744, 'completed': '80.61% (2_972 / 3_687)', 'remaining time': '6:18:27', 'throughput': '9110.80', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:57:28,535 >> {'loss': 0.8041, 'grad_norm': 30.01336097717285, 'learning_rate': 1.891964031159653e-07, 'epoch': 0.15541090317331163, 'num_input_tokens_seen': 6234832896, 'completed': '80.63% (2_973 / 3_687)', 'remaining time': '6:17:54', 'throughput': '8592.94', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:58:00,225 >> {'loss': 0.5671, 'grad_norm': 21.783706665039062, 'learning_rate': 1.8895529948811638e-07, 'epoch': 0.155682126390019, 'num_input_tokens_seen': 6236930048, 'completed': '80.66% (2_974 / 3_687)', 'remaining time': '6:17:22', 'throughput': '8271.99', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:58:30,729 >> {'loss': 0.411, 'grad_norm': 20.175209045410156, 'learning_rate': 1.8871448641530353e-07, 'epoch': 0.15595334960672633, 'num_input_tokens_seen': 6239027200, 'completed': '80.69% (2_975 / 3_687)', 'remaining time': '6:16:49', 'throughput': '8593.90', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:59:01,076 >> {'loss': 0.4136, 'grad_norm': 14.450340270996094, 'learning_rate': 1.8847396409132423e-07, 'epoch': 0.1562245728234337, 'num_input_tokens_seen': 6241124352, 'completed': '80.72% (2_976 / 3_687)', 'remaining time': '6:16:15', 'throughput': '8638.18', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 19:59:31,471 >> {'loss': 0.4456, 'grad_norm': 16.498634338378906, 'learning_rate': 1.8823373270974164e-07, 'epoch': 0.15649579604014105, 'num_input_tokens_seen': 6243221504, 'completed': '80.74% (2_977 / 3_687)', 'remaining time': '6:15:42', 'throughput': '8624.47', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:00:02,923 >> {'loss': 0.2795, 'grad_norm': 16.770217895507812, 'learning_rate': 1.8799379246388463e-07, 'epoch': 0.15676701925684838, 'num_input_tokens_seen': 6245318656, 'completed': '80.77% (2_978 / 3_687)', 'remaining time': '6:15:10', 'throughput': '8334.74', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:00:31,555 >> {'loss': 0.8234, 'grad_norm': 25.7247371673584, 'learning_rate': 1.8775414354684804e-07, 'epoch': 0.15703824247355574, 'num_input_tokens_seen': 6247415808, 'completed': '80.80% (2_979 / 3_687)', 'remaining time': '6:14:34', 'throughput': '9155.77', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:01:03,210 >> {'loss': 0.1999, 'grad_norm': 19.52436065673828, 'learning_rate': 1.875147861514923e-07, 'epoch': 0.15730946569026308, 'num_input_tokens_seen': 6249512960, 'completed': '80.82% (2_980 / 3_687)', 'remaining time': '6:14:02', 'throughput': '8281.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:01:36,465 >> {'loss': 0.3077, 'grad_norm': 13.377266883850098, 'learning_rate': 1.8727572047044308e-07, 'epoch': 0.15758068890697044, 'num_input_tokens_seen': 6251610112, 'completed': '80.85% (2_981 / 3_687)', 'remaining time': '6:13:33', 'throughput': '7882.99', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:02:11,370 >> {'loss': 0.3646, 'grad_norm': 15.139887809753418, 'learning_rate': 1.8703694669609133e-07, 'epoch': 0.15785191212367777, 'num_input_tokens_seen': 6253707264, 'completed': '80.88% (2_982 / 3_687)', 'remaining time': '6:13:05', 'throughput': '7510.17', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:02:43,622 >> {'loss': 0.4268, 'grad_norm': 16.056617736816406, 'learning_rate': 1.8679846502059306e-07, 'epoch': 0.15812313534038513, 'num_input_tokens_seen': 6255804416, 'completed': '80.91% (2_983 / 3_687)', 'remaining time': '6:12:33', 'throughput': '8127.86', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:03:14,052 >> {'loss': 0.3879, 'grad_norm': 15.632225036621094, 'learning_rate': 1.8656027563586934e-07, 'epoch': 0.1583943585570925, 'num_input_tokens_seen': 6257901568, 'completed': '80.93% (2_984 / 3_687)', 'remaining time': '6:12:00', 'throughput': '8614.84', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:03:45,896 >> {'loss': 0.1322, 'grad_norm': 9.523789405822754, 'learning_rate': 1.863223787336059e-07, 'epoch': 0.15866558177379983, 'num_input_tokens_seen': 6259998720, 'completed': '80.96% (2_985 / 3_687)', 'remaining time': '6:11:28', 'throughput': '8231.96', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:04:18,348 >> {'loss': 0.5607, 'grad_norm': 19.751140594482422, 'learning_rate': 1.8608477450525308e-07, 'epoch': 0.1589368049905072, 'num_input_tokens_seen': 6262095872, 'completed': '80.99% (2_986 / 3_687)', 'remaining time': '6:10:58', 'throughput': '8078.07', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:04:51,833 >> {'loss': 0.2927, 'grad_norm': 13.901773452758789, 'learning_rate': 1.8584746314202574e-07, 'epoch': 0.15920802820721452, 'num_input_tokens_seen': 6264193024, 'completed': '81.01% (2_987 / 3_687)', 'remaining time': '6:10:28', 'throughput': '7828.56', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:05:20,359 >> {'loss': 0.5888, 'grad_norm': 19.8528995513916, 'learning_rate': 1.8561044483490301e-07, 'epoch': 0.1594792514239219, 'num_input_tokens_seen': 6266290176, 'completed': '81.04% (2_988 / 3_687)', 'remaining time': '6:09:52', 'throughput': '9189.73', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:05:48,702 >> {'loss': 0.4884, 'grad_norm': 15.824024200439453, 'learning_rate': 1.8537371977462835e-07, 'epoch': 0.15975047464062925, 'num_input_tokens_seen': 6268387328, 'completed': '81.07% (2_989 / 3_687)', 'remaining time': '6:09:16', 'throughput': '9248.95', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:06:18,721 >> {'loss': 0.6673, 'grad_norm': 23.57025718688965, 'learning_rate': 1.8513728815170885e-07, 'epoch': 0.16002169785733658, 'num_input_tokens_seen': 6270484480, 'completed': '81.10% (2_990 / 3_687)', 'remaining time': '6:08:43', 'throughput': '8732.58', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:06:50,792 >> {'loss': 0.546, 'grad_norm': 24.425518035888672, 'learning_rate': 1.8490115015641566e-07, 'epoch': 0.16029292107404394, 'num_input_tokens_seen': 6272581632, 'completed': '81.12% (2_991 / 3_687)', 'remaining time': '6:08:11', 'throughput': '8173.85', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:07:19,184 >> {'loss': 0.8573, 'grad_norm': 27.48299217224121, 'learning_rate': 1.846653059787839e-07, 'epoch': 0.16056414429075128, 'num_input_tokens_seen': 6274678784, 'completed': '81.15% (2_992 / 3_687)', 'remaining time': '6:07:36', 'throughput': '9233.07', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:07:48,649 >> {'loss': 0.7799, 'grad_norm': 26.690526962280273, 'learning_rate': 1.8442975580861185e-07, 'epoch': 0.16083536750745864, 'num_input_tokens_seen': 6276775936, 'completed': '81.18% (2_993 / 3_687)', 'remaining time': '6:07:01', 'throughput': '8896.87', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:08:24,138 >> {'loss': 0.3413, 'grad_norm': 17.080846786499023, 'learning_rate': 1.8419449983546136e-07, 'epoch': 0.161106590724166, 'num_input_tokens_seen': 6278873088, 'completed': '81.20% (2_994 / 3_687)', 'remaining time': '6:06:34', 'throughput': '7386.60', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:08:53,205 >> {'loss': 0.285, 'grad_norm': 17.67878532409668, 'learning_rate': 1.8395953824865762e-07, 'epoch': 0.16137781394087333, 'num_input_tokens_seen': 6280970240, 'completed': '81.23% (2_995 / 3_687)', 'remaining time': '6:05:59', 'throughput': '9018.56', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:09:23,842 >> {'loss': 0.468, 'grad_norm': 23.767698287963867, 'learning_rate': 1.8372487123728844e-07, 'epoch': 0.1616490371575807, 'num_input_tokens_seen': 6283067392, 'completed': '81.26% (2_996 / 3_687)', 'remaining time': '6:05:26', 'throughput': '8556.53', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:09:51,764 >> {'loss': 0.4408, 'grad_norm': 16.759511947631836, 'learning_rate': 1.8349049899020514e-07, 'epoch': 0.16192026037428803, 'num_input_tokens_seen': 6285164544, 'completed': '81.29% (2_997 / 3_687)', 'remaining time': '6:04:50', 'throughput': '9388.28', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:10:21,049 >> {'loss': 0.3345, 'grad_norm': 13.694295883178711, 'learning_rate': 1.832564216960215e-07, 'epoch': 0.1621914835909954, 'num_input_tokens_seen': 6287261696, 'completed': '81.31% (2_998 / 3_687)', 'remaining time': '6:04:15', 'throughput': '8951.63', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:10:50,254 >> {'loss': 0.3348, 'grad_norm': 15.297809600830078, 'learning_rate': 1.8302263954311408e-07, 'epoch': 0.16246270680770275, 'num_input_tokens_seen': 6289358848, 'completed': '81.34% (2_999 / 3_687)', 'remaining time': '6:03:41', 'throughput': '8975.92', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:11:19,341 >> {'loss': 0.7217, 'grad_norm': 20.984760284423828, 'learning_rate': 1.827891527196218e-07, 'epoch': 0.16273393002441008, 'num_input_tokens_seen': 6291456000, 'completed': '81.37% (3_000 / 3_687)', 'remaining time': '6:03:06', 'throughput': '9012.58', 'gpu_mem_free': '5581MB'} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( [INFO|trainer.py:3503] 2024-12-26 20:11:41,851 >> Saving model checkpoint to /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3000 [INFO|configuration_utils.py:472] 2024-12-26 20:11:41,854 >> Configuration saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3000/config.json [INFO|configuration_utils.py:807] 2024-12-26 20:11:41,855 >> Configuration saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3000/generation_config.json [INFO|modeling_utils.py:2807] 2024-12-26 20:12:38,584 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 7 checkpoint shards. You can find where each parameters has been saved in the index located at /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3000/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2684] 2024-12-26 20:12:38,588 >> tokenizer config file saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3000/tokenizer_config.json [INFO|tokenization_utils_base.py:2693] 2024-12-26 20:12:38,589 >> Special tokens file saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3000/special_tokens_map.json /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( [WARNING|trainer.py:868] 2024-12-26 20:16:14,925 >> Save streaming dataset state: {'epoch': 0, 'sample_in_epoch': 6000, 'num_canonical_nodes': 1, 'shuffle_seed': 42, 'initial_physical_nodes': 1} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] [INFO|trainer.py:175] 2024-12-26 20:16:45,750 >> {'loss': 0.2591, 'grad_norm': 16.016870498657227, 'learning_rate': 1.8255596141344605e-07, 'epoch': 0.16300515324111745, 'num_input_tokens_seen': 6293553152, 'completed': '81.39% (3_001 / 3_687)', 'remaining time': '6:08:11', 'throughput': '803.11', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:17:18,819 >> {'loss': 0.4054, 'grad_norm': 15.872649192810059, 'learning_rate': 1.823230658122502e-07, 'epoch': 0.16327637645782478, 'num_input_tokens_seen': 6295650304, 'completed': '81.42% (3_002 / 3_687)', 'remaining time': '6:07:40', 'throughput': '7927.17', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:17:48,801 >> {'loss': 0.4199, 'grad_norm': 17.43962287902832, 'learning_rate': 1.820904661034599e-07, 'epoch': 0.16354759967453214, 'num_input_tokens_seen': 6297747456, 'completed': '81.45% (3_003 / 3_687)', 'remaining time': '6:07:05', 'throughput': '8743.47', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:18:19,459 >> {'loss': 0.3451, 'grad_norm': 16.44698715209961, 'learning_rate': 1.8185816247426245e-07, 'epoch': 0.1638188228912395, 'num_input_tokens_seen': 6299844608, 'completed': '81.48% (3_004 / 3_687)', 'remaining time': '6:06:31', 'throughput': '8550.63', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:18:53,436 >> {'loss': 0.6525, 'grad_norm': 22.08724021911621, 'learning_rate': 1.8162615511160701e-07, 'epoch': 0.16409004610794684, 'num_input_tokens_seen': 6301941760, 'completed': '81.50% (3_005 / 3_687)', 'remaining time': '6:06:01', 'throughput': '7715.16', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:19:25,744 >> {'loss': 0.297, 'grad_norm': 13.724467277526855, 'learning_rate': 1.8139444420220438e-07, 'epoch': 0.1643612693246542, 'num_input_tokens_seen': 6304038912, 'completed': '81.53% (3_006 / 3_687)', 'remaining time': '6:05:29', 'throughput': '8114.01', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:19:55,740 >> {'loss': 0.8337, 'grad_norm': 24.55242156982422, 'learning_rate': 1.8116302993252636e-07, 'epoch': 0.16463249254136153, 'num_input_tokens_seen': 6306136064, 'completed': '81.56% (3_007 / 3_687)', 'remaining time': '6:04:54', 'throughput': '8739.40', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:20:25,010 >> {'loss': 0.2984, 'grad_norm': 12.129997253417969, 'learning_rate': 1.809319124888067e-07, 'epoch': 0.1649037157580689, 'num_input_tokens_seen': 6308233216, 'completed': '81.58% (3_008 / 3_687)', 'remaining time': '6:04:18', 'throughput': '8955.89', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:20:55,710 >> {'loss': 0.2444, 'grad_norm': 12.80477523803711, 'learning_rate': 1.8070109205703972e-07, 'epoch': 0.16517493897477625, 'num_input_tokens_seen': 6310330368, 'completed': '81.61% (3_009 / 3_687)', 'remaining time': '6:03:45', 'throughput': '8539.04', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:21:26,455 >> {'loss': 0.5638, 'grad_norm': 21.404075622558594, 'learning_rate': 1.80470568822981e-07, 'epoch': 0.1654461621914836, 'num_input_tokens_seen': 6312427520, 'completed': '81.64% (3_010 / 3_687)', 'remaining time': '6:03:11', 'throughput': '8526.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:21:55,892 >> {'loss': 0.8146, 'grad_norm': 24.48054313659668, 'learning_rate': 1.8024034297214686e-07, 'epoch': 0.16571738540819095, 'num_input_tokens_seen': 6314524672, 'completed': '81.67% (3_011 / 3_687)', 'remaining time': '6:02:36', 'throughput': '8905.07', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:22:27,645 >> {'loss': 0.4131, 'grad_norm': 16.312585830688477, 'learning_rate': 1.8001041468981442e-07, 'epoch': 0.16598860862489828, 'num_input_tokens_seen': 6316621824, 'completed': '81.69% (3_012 / 3_687)', 'remaining time': '6:02:03', 'throughput': '8255.82', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:22:58,988 >> {'loss': 0.4172, 'grad_norm': 14.948566436767578, 'learning_rate': 1.79780784161021e-07, 'epoch': 0.16625983184160564, 'num_input_tokens_seen': 6318718976, 'completed': '81.72% (3_013 / 3_687)', 'remaining time': '6:01:30', 'throughput': '8363.62', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:23:28,739 >> {'loss': 0.6344, 'grad_norm': 23.79063606262207, 'learning_rate': 1.7955145157056456e-07, 'epoch': 0.166531055058313, 'num_input_tokens_seen': 6320816128, 'completed': '81.75% (3_014 / 3_687)', 'remaining time': '6:00:55', 'throughput': '8811.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:23:59,202 >> {'loss': 0.565, 'grad_norm': 19.02938461303711, 'learning_rate': 1.793224171030033e-07, 'epoch': 0.16680227827502034, 'num_input_tokens_seen': 6322913280, 'completed': '81.77% (3_015 / 3_687)', 'remaining time': '6:00:21', 'throughput': '8605.47', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:24:28,835 >> {'loss': 0.4183, 'grad_norm': 16.11639976501465, 'learning_rate': 1.7909368094265532e-07, 'epoch': 0.1670735014917277, 'num_input_tokens_seen': 6325010432, 'completed': '81.80% (3_016 / 3_687)', 'remaining time': '5:59:46', 'throughput': '8846.16', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:24:59,411 >> {'loss': 0.2774, 'grad_norm': 14.722761154174805, 'learning_rate': 1.788652432735991e-07, 'epoch': 0.16734472470843503, 'num_input_tokens_seen': 6327107584, 'completed': '81.83% (3_017 / 3_687)', 'remaining time': '5:59:12', 'throughput': '8573.67', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:25:29,634 >> {'loss': 0.2382, 'grad_norm': 11.639079093933105, 'learning_rate': 1.7863710427967222e-07, 'epoch': 0.1676159479251424, 'num_input_tokens_seen': 6329204736, 'completed': '81.86% (3_018 / 3_687)', 'remaining time': '5:58:38', 'throughput': '8673.61', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:26:04,296 >> {'loss': 0.5038, 'grad_norm': 18.875831604003906, 'learning_rate': 1.784092641444725e-07, 'epoch': 0.16788717114184976, 'num_input_tokens_seen': 6331301888, 'completed': '81.88% (3_019 / 3_687)', 'remaining time': '5:58:08', 'throughput': '7562.83', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:26:36,709 >> {'loss': 0.5799, 'grad_norm': 21.543594360351562, 'learning_rate': 1.7818172305135683e-07, 'epoch': 0.1681583943585571, 'num_input_tokens_seen': 6333399040, 'completed': '81.91% (3_020 / 3_687)', 'remaining time': '5:57:36', 'throughput': '8087.59', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:27:04,377 >> {'loss': 0.4241, 'grad_norm': 17.74955177307129, 'learning_rate': 1.7795448118344174e-07, 'epoch': 0.16842961757526445, 'num_input_tokens_seen': 6335496192, 'completed': '81.94% (3_021 / 3_687)', 'remaining time': '5:56:59', 'throughput': '9474.78', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:27:35,683 >> {'loss': 0.4974, 'grad_norm': 18.136749267578125, 'learning_rate': 1.7772753872360286e-07, 'epoch': 0.16870084079197178, 'num_input_tokens_seen': 6337593344, 'completed': '81.96% (3_022 / 3_687)', 'remaining time': '5:56:26', 'throughput': '8373.53', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:28:11,427 >> {'loss': 0.4158, 'grad_norm': 16.745140075683594, 'learning_rate': 1.7750089585447473e-07, 'epoch': 0.16897206400867915, 'num_input_tokens_seen': 6339690496, 'completed': '81.99% (3_023 / 3_687)', 'remaining time': '5:55:58', 'throughput': '7333.96', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:28:41,573 >> {'loss': 0.3088, 'grad_norm': 13.123845100402832, 'learning_rate': 1.77274552758451e-07, 'epoch': 0.16924328722538648, 'num_input_tokens_seen': 6341787648, 'completed': '82.02% (3_024 / 3_687)', 'remaining time': '5:55:24', 'throughput': '8695.81', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:29:14,265 >> {'loss': 0.351, 'grad_norm': 13.37322998046875, 'learning_rate': 1.770485096176839e-07, 'epoch': 0.16951451044209384, 'num_input_tokens_seen': 6343884800, 'completed': '82.05% (3_025 / 3_687)', 'remaining time': '5:54:52', 'throughput': '8018.56', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:29:42,851 >> {'loss': 0.6836, 'grad_norm': 20.165077209472656, 'learning_rate': 1.7682276661408448e-07, 'epoch': 0.1697857336588012, 'num_input_tokens_seen': 6345981952, 'completed': '82.07% (3_026 / 3_687)', 'remaining time': '5:54:16', 'throughput': '9170.50', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:30:14,072 >> {'loss': 0.4345, 'grad_norm': 15.843926429748535, 'learning_rate': 1.7659732392932202e-07, 'epoch': 0.17005695687550854, 'num_input_tokens_seen': 6348079104, 'completed': '82.10% (3_027 / 3_687)', 'remaining time': '5:53:43', 'throughput': '8396.43', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:30:46,788 >> {'loss': 0.3054, 'grad_norm': 12.77824878692627, 'learning_rate': 1.7637218174482422e-07, 'epoch': 0.1703281800922159, 'num_input_tokens_seen': 6350176256, 'completed': '82.13% (3_028 / 3_687)', 'remaining time': '5:53:11', 'throughput': '8012.74', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:31:17,865 >> {'loss': 0.4641, 'grad_norm': 15.227442741394043, 'learning_rate': 1.7614734024177707e-07, 'epoch': 0.17059940330892323, 'num_input_tokens_seen': 6352273408, 'completed': '82.15% (3_029 / 3_687)', 'remaining time': '5:52:38', 'throughput': '8435.18', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:31:47,511 >> {'loss': 0.649, 'grad_norm': 20.998775482177734, 'learning_rate': 1.7592279960112424e-07, 'epoch': 0.1708706265256306, 'num_input_tokens_seen': 6354370560, 'completed': '82.18% (3_030 / 3_687)', 'remaining time': '5:52:03', 'throughput': '8842.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:32:19,201 >> {'loss': 0.3258, 'grad_norm': 14.684011459350586, 'learning_rate': 1.7569856000356758e-07, 'epoch': 0.17114184974233795, 'num_input_tokens_seen': 6356467712, 'completed': '82.21% (3_031 / 3_687)', 'remaining time': '5:51:31', 'throughput': '8272.04', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:32:48,979 >> {'loss': 0.6493, 'grad_norm': 22.120227813720703, 'learning_rate': 1.754746216295665e-07, 'epoch': 0.1714130729590453, 'num_input_tokens_seen': 6358564864, 'completed': '82.23% (3_032 / 3_687)', 'remaining time': '5:50:56', 'throughput': '8803.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:33:19,451 >> {'loss': 0.6875, 'grad_norm': 25.33750343322754, 'learning_rate': 1.7525098465933834e-07, 'epoch': 0.17168429617575265, 'num_input_tokens_seen': 6360662016, 'completed': '82.26% (3_033 / 3_687)', 'remaining time': '5:50:22', 'throughput': '8602.65', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:33:50,059 >> {'loss': 0.2934, 'grad_norm': 14.21183967590332, 'learning_rate': 1.750276492728575e-07, 'epoch': 0.17195551939245998, 'num_input_tokens_seen': 6362759168, 'completed': '82.29% (3_034 / 3_687)', 'remaining time': '5:49:49', 'throughput': '8564.82', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:34:21,806 >> {'loss': 0.337, 'grad_norm': 13.687496185302734, 'learning_rate': 1.748046156498557e-07, 'epoch': 0.17222674260916734, 'num_input_tokens_seen': 6364856320, 'completed': '82.32% (3_035 / 3_687)', 'remaining time': '5:49:16', 'throughput': '8257.26', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:34:51,432 >> {'loss': 0.5263, 'grad_norm': 16.938383102416992, 'learning_rate': 1.74581883969822e-07, 'epoch': 0.1724979658258747, 'num_input_tokens_seen': 6366953472, 'completed': '82.34% (3_036 / 3_687)', 'remaining time': '5:48:41', 'throughput': '8848.33', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:35:20,050 >> {'loss': 0.4494, 'grad_norm': 15.311517715454102, 'learning_rate': 1.7435945441200232e-07, 'epoch': 0.17276918904258204, 'num_input_tokens_seen': 6369050624, 'completed': '82.37% (3_037 / 3_687)', 'remaining time': '5:48:06', 'throughput': '9160.16', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:35:50,030 >> {'loss': 0.3876, 'grad_norm': 15.489421844482422, 'learning_rate': 1.7413732715539954e-07, 'epoch': 0.1730404122592894, 'num_input_tokens_seen': 6371147776, 'completed': '82.40% (3_038 / 3_687)', 'remaining time': '5:47:31', 'throughput': '8743.77', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:36:19,298 >> {'loss': 0.6567, 'grad_norm': 21.3696231842041, 'learning_rate': 1.7391550237877326e-07, 'epoch': 0.17331163547599673, 'num_input_tokens_seen': 6373244928, 'completed': '82.42% (3_039 / 3_687)', 'remaining time': '5:46:56', 'throughput': '8956.88', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:36:46,506 >> {'loss': 0.4457, 'grad_norm': 16.473154067993164, 'learning_rate': 1.7369398026063958e-07, 'epoch': 0.1735828586927041, 'num_input_tokens_seen': 6375342080, 'completed': '82.45% (3_040 / 3_687)', 'remaining time': '5:46:19', 'throughput': '9634.83', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:37:14,600 >> {'loss': 0.6155, 'grad_norm': 17.838611602783203, 'learning_rate': 1.7347276097927105e-07, 'epoch': 0.17385408190941146, 'num_input_tokens_seen': 6377439232, 'completed': '82.48% (3_041 / 3_687)', 'remaining time': '5:45:43', 'throughput': '9330.85', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:37:48,680 >> {'loss': 0.4533, 'grad_norm': 17.602170944213867, 'learning_rate': 1.732518447126966e-07, 'epoch': 0.1741253051261188, 'num_input_tokens_seen': 6379536384, 'completed': '82.51% (3_042 / 3_687)', 'remaining time': '5:45:13', 'throughput': '7692.06', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:38:17,794 >> {'loss': 0.4754, 'grad_norm': 25.11285400390625, 'learning_rate': 1.730312316387012e-07, 'epoch': 0.17439652834282615, 'num_input_tokens_seen': 6381633536, 'completed': '82.53% (3_043 / 3_687)', 'remaining time': '5:44:38', 'throughput': '9003.89', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:38:47,169 >> {'loss': 0.4263, 'grad_norm': 15.209177017211914, 'learning_rate': 1.7281092193482582e-07, 'epoch': 0.17466775155953349, 'num_input_tokens_seen': 6383730688, 'completed': '82.56% (3_044 / 3_687)', 'remaining time': '5:44:03', 'throughput': '8924.05', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:39:17,918 >> {'loss': 0.5498, 'grad_norm': 27.183820724487305, 'learning_rate': 1.7259091577836737e-07, 'epoch': 0.17493897477624085, 'num_input_tokens_seen': 6385827840, 'completed': '82.59% (3_045 / 3_687)', 'remaining time': '5:43:29', 'throughput': '8525.53', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:39:45,895 >> {'loss': 0.5436, 'grad_norm': 17.119733810424805, 'learning_rate': 1.723712133463785e-07, 'epoch': 0.1752101979929482, 'num_input_tokens_seen': 6387924992, 'completed': '82.61% (3_046 / 3_687)', 'remaining time': '5:42:53', 'throughput': '9369.91', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:40:15,799 >> {'loss': 0.2955, 'grad_norm': 14.52611255645752, 'learning_rate': 1.7215181481566727e-07, 'epoch': 0.17548142120965554, 'num_input_tokens_seen': 6390022144, 'completed': '82.64% (3_047 / 3_687)', 'remaining time': '5:42:19', 'throughput': '8766.12', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:40:49,063 >> {'loss': 0.2517, 'grad_norm': 11.77927303314209, 'learning_rate': 1.719327203627971e-07, 'epoch': 0.1757526444263629, 'num_input_tokens_seen': 6392119296, 'completed': '82.67% (3_048 / 3_687)', 'remaining time': '5:41:48', 'throughput': '7880.75', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:41:19,507 >> {'loss': 0.3709, 'grad_norm': 16.277713775634766, 'learning_rate': 1.7171393016408715e-07, 'epoch': 0.17602386764307024, 'num_input_tokens_seen': 6394216448, 'completed': '82.70% (3_049 / 3_687)', 'remaining time': '5:41:14', 'throughput': '8610.53', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:41:52,237 >> {'loss': 0.3434, 'grad_norm': 18.34269142150879, 'learning_rate': 1.7149544439561135e-07, 'epoch': 0.1762950908597776, 'num_input_tokens_seen': 6396313600, 'completed': '82.72% (3_050 / 3_687)', 'remaining time': '5:40:43', 'throughput': '8009.51', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:42:23,425 >> {'loss': 0.3524, 'grad_norm': 15.846407890319824, 'learning_rate': 1.7127726323319874e-07, 'epoch': 0.17656631407648496, 'num_input_tokens_seen': 6398410752, 'completed': '82.75% (3_051 / 3_687)', 'remaining time': '5:40:10', 'throughput': '8405.26', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:42:54,820 >> {'loss': 0.2759, 'grad_norm': 13.641855239868164, 'learning_rate': 1.7105938685243339e-07, 'epoch': 0.1768375372931923, 'num_input_tokens_seen': 6400507904, 'completed': '82.78% (3_052 / 3_687)', 'remaining time': '5:39:37', 'throughput': '8349.72', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:43:24,277 >> {'loss': 0.506, 'grad_norm': 16.64573097229004, 'learning_rate': 1.7084181542865355e-07, 'epoch': 0.17710876050989965, 'num_input_tokens_seen': 6402605056, 'completed': '82.80% (3_053 / 3_687)', 'remaining time': '5:39:02', 'throughput': '8899.14', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:43:53,896 >> {'loss': 0.5754, 'grad_norm': 21.56814193725586, 'learning_rate': 1.7062454913695264e-07, 'epoch': 0.177379983726607, 'num_input_tokens_seen': 6404702208, 'completed': '82.83% (3_054 / 3_687)', 'remaining time': '5:38:28', 'throughput': '8850.63', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:44:24,071 >> {'loss': 0.3337, 'grad_norm': 13.835966110229492, 'learning_rate': 1.7040758815217827e-07, 'epoch': 0.17765120694331435, 'num_input_tokens_seen': 6406799360, 'completed': '82.86% (3_055 / 3_687)', 'remaining time': '5:37:54', 'throughput': '8687.40', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:44:54,261 >> {'loss': 0.625, 'grad_norm': 19.247650146484375, 'learning_rate': 1.7019093264893236e-07, 'epoch': 0.1779224301600217, 'num_input_tokens_seen': 6408896512, 'completed': '82.89% (3_056 / 3_687)', 'remaining time': '5:37:20', 'throughput': '8683.21', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:45:22,436 >> {'loss': 0.3558, 'grad_norm': 13.838505744934082, 'learning_rate': 1.6997458280157113e-07, 'epoch': 0.17819365337672904, 'num_input_tokens_seen': 6410993664, 'completed': '82.91% (3_057 / 3_687)', 'remaining time': '5:36:44', 'throughput': '9304.14', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:45:51,328 >> {'loss': 0.5823, 'grad_norm': 19.8231143951416, 'learning_rate': 1.6975853878420457e-07, 'epoch': 0.1784648765934364, 'num_input_tokens_seen': 6413090816, 'completed': '82.94% (3_058 / 3_687)', 'remaining time': '5:36:09', 'throughput': '9073.38', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:46:22,965 >> {'loss': 0.3463, 'grad_norm': 16.521181106567383, 'learning_rate': 1.6954280077069676e-07, 'epoch': 0.17873609981014374, 'num_input_tokens_seen': 6415187968, 'completed': '82.97% (3_059 / 3_687)', 'remaining time': '5:35:37', 'throughput': '8285.85', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:46:54,274 >> {'loss': 0.4292, 'grad_norm': 20.04188346862793, 'learning_rate': 1.6932736893466536e-07, 'epoch': 0.1790073230268511, 'num_input_tokens_seen': 6417285120, 'completed': '82.99% (3_060 / 3_687)', 'remaining time': '5:35:04', 'throughput': '8372.78', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:47:27,124 >> {'loss': 0.3455, 'grad_norm': 15.362913131713867, 'learning_rate': 1.691122434494818e-07, 'epoch': 0.17927854624355846, 'num_input_tokens_seen': 6419382272, 'completed': '83.02% (3_061 / 3_687)', 'remaining time': '5:34:33', 'throughput': '7980.16', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:47:56,352 >> {'loss': 0.5172, 'grad_norm': 17.9241943359375, 'learning_rate': 1.688974244882709e-07, 'epoch': 0.1795497694602658, 'num_input_tokens_seen': 6421479424, 'completed': '83.05% (3_062 / 3_687)', 'remaining time': '5:33:58', 'throughput': '8968.79', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:48:28,825 >> {'loss': 0.4723, 'grad_norm': 16.651180267333984, 'learning_rate': 1.686829122239108e-07, 'epoch': 0.17982099267697316, 'num_input_tokens_seen': 6423576576, 'completed': '83.08% (3_063 / 3_687)', 'remaining time': '5:33:26', 'throughput': '8072.68', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:48:58,210 >> {'loss': 0.5919, 'grad_norm': 21.878660202026367, 'learning_rate': 1.684687068290326e-07, 'epoch': 0.1800922158936805, 'num_input_tokens_seen': 6425673728, 'completed': '83.10% (3_064 / 3_687)', 'remaining time': '5:32:52', 'throughput': '8921.06', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:49:27,462 >> {'loss': 0.3433, 'grad_norm': 16.2543888092041, 'learning_rate': 1.682548084760208e-07, 'epoch': 0.18036343911038785, 'num_input_tokens_seen': 6427770880, 'completed': '83.13% (3_065 / 3_687)', 'remaining time': '5:32:17', 'throughput': '8961.63', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:50:00,821 >> {'loss': 0.3266, 'grad_norm': 14.593696594238281, 'learning_rate': 1.6804121733701277e-07, 'epoch': 0.18063466232709519, 'num_input_tokens_seen': 6429868032, 'completed': '83.16% (3_066 / 3_687)', 'remaining time': '5:31:46', 'throughput': '7858.15', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:50:28,267 >> {'loss': 1.0168, 'grad_norm': 30.589252471923828, 'learning_rate': 1.678279335838983e-07, 'epoch': 0.18090588554380255, 'num_input_tokens_seen': 6431965184, 'completed': '83.18% (3_067 / 3_687)', 'remaining time': '5:31:10', 'throughput': '9551.24', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:50:59,994 >> {'loss': 0.2938, 'grad_norm': 16.187679290771484, 'learning_rate': 1.676149573883202e-07, 'epoch': 0.1811771087605099, 'num_input_tokens_seen': 6434062336, 'completed': '83.21% (3_068 / 3_687)', 'remaining time': '5:30:37', 'throughput': '8262.69', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:51:29,936 >> {'loss': 0.4911, 'grad_norm': 20.22064208984375, 'learning_rate': 1.674022889216737e-07, 'epoch': 0.18144833197721724, 'num_input_tokens_seen': 6436159488, 'completed': '83.24% (3_069 / 3_687)', 'remaining time': '5:30:03', 'throughput': '8755.03', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:52:00,835 >> {'loss': 0.3566, 'grad_norm': 15.452803611755371, 'learning_rate': 1.67189928355106e-07, 'epoch': 0.1817195551939246, 'num_input_tokens_seen': 6438256640, 'completed': '83.27% (3_070 / 3_687)', 'remaining time': '5:29:30', 'throughput': '8483.81', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:52:27,848 >> {'loss': 0.4931, 'grad_norm': 21.338388442993164, 'learning_rate': 1.66977875859517e-07, 'epoch': 0.18199077841063194, 'num_input_tokens_seen': 6440353792, 'completed': '83.29% (3_071 / 3_687)', 'remaining time': '5:28:54', 'throughput': '9704.49', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:52:57,315 >> {'loss': 0.3819, 'grad_norm': 15.379291534423828, 'learning_rate': 1.6676613160555846e-07, 'epoch': 0.1822620016273393, 'num_input_tokens_seen': 6442450944, 'completed': '83.32% (3_072 / 3_687)', 'remaining time': '5:28:19', 'throughput': '8896.25', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:53:25,427 >> {'loss': 0.5989, 'grad_norm': 18.543752670288086, 'learning_rate': 1.665546957636339e-07, 'epoch': 0.18253322484404666, 'num_input_tokens_seen': 6444548096, 'completed': '83.35% (3_073 / 3_687)', 'remaining time': '5:27:44', 'throughput': '9324.98', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:53:56,277 >> {'loss': 0.3388, 'grad_norm': 13.839360237121582, 'learning_rate': 1.6634356850389926e-07, 'epoch': 0.182804448060754, 'num_input_tokens_seen': 6446645248, 'completed': '83.37% (3_074 / 3_687)', 'remaining time': '5:27:11', 'throughput': '8497.21', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:54:27,434 >> {'loss': 0.4291, 'grad_norm': 14.323488235473633, 'learning_rate': 1.6613274999626134e-07, 'epoch': 0.18307567127746135, 'num_input_tokens_seen': 6448742400, 'completed': '83.40% (3_075 / 3_687)', 'remaining time': '5:26:38', 'throughput': '8413.61', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:54:56,018 >> {'loss': 0.5632, 'grad_norm': 19.329299926757812, 'learning_rate': 1.65922240410379e-07, 'epoch': 0.1833468944941687, 'num_input_tokens_seen': 6450839552, 'completed': '83.43% (3_076 / 3_687)', 'remaining time': '5:26:03', 'throughput': '9170.94', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:55:30,364 >> {'loss': 0.3609, 'grad_norm': 14.438604354858398, 'learning_rate': 1.6571203991566234e-07, 'epoch': 0.18361811771087605, 'num_input_tokens_seen': 6452936704, 'completed': '83.46% (3_077 / 3_687)', 'remaining time': '5:25:33', 'throughput': '7632.49', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:56:00,668 >> {'loss': 0.2715, 'grad_norm': 16.972644805908203, 'learning_rate': 1.6550214868127276e-07, 'epoch': 0.1838893409275834, 'num_input_tokens_seen': 6455033856, 'completed': '83.48% (3_078 / 3_687)', 'remaining time': '5:24:59', 'throughput': '8650.52', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:56:33,428 >> {'loss': 0.5057, 'grad_norm': 19.990781784057617, 'learning_rate': 1.6529256687612264e-07, 'epoch': 0.18416056414429074, 'num_input_tokens_seen': 6457131008, 'completed': '83.51% (3_079 / 3_687)', 'remaining time': '5:24:28', 'throughput': '8001.95', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:57:05,885 >> {'loss': 0.32, 'grad_norm': 12.420730590820312, 'learning_rate': 1.6508329466887548e-07, 'epoch': 0.1844317873609981, 'num_input_tokens_seen': 6459228160, 'completed': '83.54% (3_080 / 3_687)', 'remaining time': '5:23:56', 'throughput': '8076.59', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:57:36,242 >> {'loss': 0.531, 'grad_norm': 18.54897689819336, 'learning_rate': 1.6487433222794566e-07, 'epoch': 0.18470301057770544, 'num_input_tokens_seen': 6461325312, 'completed': '83.56% (3_081 / 3_687)', 'remaining time': '5:23:23', 'throughput': '8635.39', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:58:08,086 >> {'loss': 0.2632, 'grad_norm': 13.507473945617676, 'learning_rate': 1.6466567972149806e-07, 'epoch': 0.1849742337944128, 'num_input_tokens_seen': 6463422464, 'completed': '83.59% (3_082 / 3_687)', 'remaining time': '5:22:50', 'throughput': '8232.32', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:58:39,022 >> {'loss': 0.3308, 'grad_norm': 13.425512313842773, 'learning_rate': 1.6445733731744842e-07, 'epoch': 0.18524545701112016, 'num_input_tokens_seen': 6465519616, 'completed': '83.62% (3_083 / 3_687)', 'remaining time': '5:22:17', 'throughput': '8473.68', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:59:13,507 >> {'loss': 0.3104, 'grad_norm': 14.376126289367676, 'learning_rate': 1.642493051834627e-07, 'epoch': 0.1855166802278275, 'num_input_tokens_seen': 6467616768, 'completed': '83.65% (3_084 / 3_687)', 'remaining time': '5:21:48', 'throughput': '7601.57', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 20:59:42,931 >> {'loss': 0.3902, 'grad_norm': 13.848085403442383, 'learning_rate': 1.6404158348695729e-07, 'epoch': 0.18578790344453486, 'num_input_tokens_seen': 6469713920, 'completed': '83.67% (3_085 / 3_687)', 'remaining time': '5:21:13', 'throughput': '8909.23', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:00:11,886 >> {'loss': 0.5748, 'grad_norm': 17.45004653930664, 'learning_rate': 1.6383417239509878e-07, 'epoch': 0.1860591266612422, 'num_input_tokens_seen': 6471811072, 'completed': '83.70% (3_086 / 3_687)', 'remaining time': '5:20:39', 'throughput': '9053.71', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:00:41,329 >> {'loss': 0.6891, 'grad_norm': 19.52586555480957, 'learning_rate': 1.6362707207480347e-07, 'epoch': 0.18633034987794955, 'num_input_tokens_seen': 6473908224, 'completed': '83.73% (3_087 / 3_687)', 'remaining time': '5:20:04', 'throughput': '8904.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:01:12,553 >> {'loss': 0.6888, 'grad_norm': 24.07430076599121, 'learning_rate': 1.6342028269273802e-07, 'epoch': 0.18660157309465691, 'num_input_tokens_seen': 6476005376, 'completed': '83.75% (3_088 / 3_687)', 'remaining time': '5:19:32', 'throughput': '8394.75', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:01:43,411 >> {'loss': 0.6213, 'grad_norm': 19.478736877441406, 'learning_rate': 1.6321380441531852e-07, 'epoch': 0.18687279631136425, 'num_input_tokens_seen': 6478102528, 'completed': '83.78% (3_089 / 3_687)', 'remaining time': '5:18:59', 'throughput': '8495.12', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:02:15,067 >> {'loss': 0.3439, 'grad_norm': 13.465045928955078, 'learning_rate': 1.63007637408711e-07, 'epoch': 0.1871440195280716, 'num_input_tokens_seen': 6480199680, 'completed': '83.81% (3_090 / 3_687)', 'remaining time': '5:18:26', 'throughput': '8281.19', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:02:44,175 >> {'loss': 0.4287, 'grad_norm': 16.045684814453125, 'learning_rate': 1.6280178183883081e-07, 'epoch': 0.18741524274477894, 'num_input_tokens_seen': 6482296832, 'completed': '83.84% (3_091 / 3_687)', 'remaining time': '5:17:52', 'throughput': '9005.88', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:03:13,848 >> {'loss': 0.4006, 'grad_norm': 18.508325576782227, 'learning_rate': 1.6259623787134258e-07, 'epoch': 0.1876864659614863, 'num_input_tokens_seen': 6484393984, 'completed': '83.86% (3_092 / 3_687)', 'remaining time': '5:17:18', 'throughput': '8834.23', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:03:44,061 >> {'loss': 0.392, 'grad_norm': 16.57149314880371, 'learning_rate': 1.6239100567166026e-07, 'epoch': 0.18795768917819367, 'num_input_tokens_seen': 6486491136, 'completed': '83.89% (3_093 / 3_687)', 'remaining time': '5:16:44', 'throughput': '8676.62', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:04:14,989 >> {'loss': 0.3385, 'grad_norm': 13.103677749633789, 'learning_rate': 1.6218608540494693e-07, 'epoch': 0.188228912394901, 'num_input_tokens_seen': 6488588288, 'completed': '83.92% (3_094 / 3_687)', 'remaining time': '5:16:11', 'throughput': '8475.86', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:04:44,447 >> {'loss': 0.3766, 'grad_norm': 17.10627555847168, 'learning_rate': 1.619814772361147e-07, 'epoch': 0.18850013561160836, 'num_input_tokens_seen': 6490685440, 'completed': '83.94% (3_095 / 3_687)', 'remaining time': '5:15:37', 'throughput': '8898.98', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:05:16,183 >> {'loss': 0.5857, 'grad_norm': 18.120052337646484, 'learning_rate': 1.6177718132982441e-07, 'epoch': 0.1887713588283157, 'num_input_tokens_seen': 6492782592, 'completed': '83.97% (3_096 / 3_687)', 'remaining time': '5:15:05', 'throughput': '8260.20', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:05:46,533 >> {'loss': 0.5742, 'grad_norm': 19.000185012817383, 'learning_rate': 1.6157319785048555e-07, 'epoch': 0.18904258204502306, 'num_input_tokens_seen': 6494879744, 'completed': '84.00% (3_097 / 3_687)', 'remaining time': '5:14:32', 'throughput': '8637.33', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:06:20,099 >> {'loss': 0.3558, 'grad_norm': 18.840848922729492, 'learning_rate': 1.6136952696225634e-07, 'epoch': 0.18931380526173042, 'num_input_tokens_seen': 6496976896, 'completed': '84.02% (3_098 / 3_687)', 'remaining time': '5:14:01', 'throughput': '7809.73', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:06:56,160 >> {'loss': 0.5141, 'grad_norm': 19.136682510375977, 'learning_rate': 1.6116616882904332e-07, 'epoch': 0.18958502847843775, 'num_input_tokens_seen': 6499074048, 'completed': '84.05% (3_099 / 3_687)', 'remaining time': '5:13:32', 'throughput': '7269.58', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:07:31,817 >> {'loss': 0.434, 'grad_norm': 17.346790313720703, 'learning_rate': 1.6096312361450142e-07, 'epoch': 0.1898562516951451, 'num_input_tokens_seen': 6501171200, 'completed': '84.08% (3_100 / 3_687)', 'remaining time': '5:13:04', 'throughput': '7351.78', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:08:01,295 >> {'loss': 0.4427, 'grad_norm': 15.207001686096191, 'learning_rate': 1.6076039148203373e-07, 'epoch': 0.19012747491185245, 'num_input_tokens_seen': 6503268352, 'completed': '84.11% (3_101 / 3_687)', 'remaining time': '5:12:29', 'throughput': '8892.78', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:08:30,704 >> {'loss': 0.253, 'grad_norm': 10.525275230407715, 'learning_rate': 1.6055797259479125e-07, 'epoch': 0.1903986981285598, 'num_input_tokens_seen': 6505365504, 'completed': '84.13% (3_102 / 3_687)', 'remaining time': '5:11:55', 'throughput': '8913.87', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:09:04,044 >> {'loss': 0.6012, 'grad_norm': 27.057727813720703, 'learning_rate': 1.6035586711567318e-07, 'epoch': 0.19066992134526717, 'num_input_tokens_seen': 6507462656, 'completed': '84.16% (3_103 / 3_687)', 'remaining time': '5:11:24', 'throughput': '7862.73', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:09:32,001 >> {'loss': 0.673, 'grad_norm': 29.3339786529541, 'learning_rate': 1.601540752073261e-07, 'epoch': 0.1909411445619745, 'num_input_tokens_seen': 6509559808, 'completed': '84.19% (3_104 / 3_687)', 'remaining time': '5:10:49', 'throughput': '9376.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:10:05,277 >> {'loss': 0.2948, 'grad_norm': 12.424835205078125, 'learning_rate': 1.5995259703214445e-07, 'epoch': 0.19121236777868186, 'num_input_tokens_seen': 6511656960, 'completed': '84.21% (3_105 / 3_687)', 'remaining time': '5:10:18', 'throughput': '7877.80', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:10:32,928 >> {'loss': 0.3906, 'grad_norm': 14.519540786743164, 'learning_rate': 1.5975143275227039e-07, 'epoch': 0.1914835909953892, 'num_input_tokens_seen': 6513754112, 'completed': '84.24% (3_106 / 3_687)', 'remaining time': '5:09:43', 'throughput': '9480.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:11:04,446 >> {'loss': 0.4539, 'grad_norm': 18.084651947021484, 'learning_rate': 1.5955058252959318e-07, 'epoch': 0.19175481421209656, 'num_input_tokens_seen': 6515851264, 'completed': '84.27% (3_107 / 3_687)', 'remaining time': '5:09:10', 'throughput': '8317.29', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:11:32,669 >> {'loss': 0.5936, 'grad_norm': 23.673160552978516, 'learning_rate': 1.5935004652574947e-07, 'epoch': 0.1920260374288039, 'num_input_tokens_seen': 6517948416, 'completed': '84.30% (3_108 / 3_687)', 'remaining time': '5:08:35', 'throughput': '9288.41', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:12:02,814 >> {'loss': 0.2686, 'grad_norm': 13.00130844116211, 'learning_rate': 1.591498249021231e-07, 'epoch': 0.19229726064551125, 'num_input_tokens_seen': 6520045568, 'completed': '84.32% (3_109 / 3_687)', 'remaining time': '5:08:02', 'throughput': '8696.02', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:12:35,445 >> {'loss': 0.6705, 'grad_norm': 21.13102912902832, 'learning_rate': 1.5894991781984456e-07, 'epoch': 0.19256848386221861, 'num_input_tokens_seen': 6522142720, 'completed': '84.35% (3_110 / 3_687)', 'remaining time': '5:07:30', 'throughput': '8033.49', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:13:05,713 >> {'loss': 0.1552, 'grad_norm': 10.354960441589355, 'learning_rate': 1.587503254397916e-07, 'epoch': 0.19283970707892595, 'num_input_tokens_seen': 6524239872, 'completed': '84.38% (3_111 / 3_687)', 'remaining time': '5:06:57', 'throughput': '8660.94', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:13:35,292 >> {'loss': 0.2692, 'grad_norm': 16.40614128112793, 'learning_rate': 1.585510479225886e-07, 'epoch': 0.1931109302956333, 'num_input_tokens_seen': 6526337024, 'completed': '84.40% (3_112 / 3_687)', 'remaining time': '5:06:23', 'throughput': '8862.51', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:14:04,560 >> {'loss': 0.3284, 'grad_norm': 14.614860534667969, 'learning_rate': 1.5835208542860648e-07, 'epoch': 0.19338215351234064, 'num_input_tokens_seen': 6528434176, 'completed': '84.43% (3_113 / 3_687)', 'remaining time': '5:05:49', 'throughput': '8956.54', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:14:37,002 >> {'loss': 0.3218, 'grad_norm': 11.804844856262207, 'learning_rate': 1.581534381179627e-07, 'epoch': 0.193653376729048, 'num_input_tokens_seen': 6530531328, 'completed': '84.46% (3_114 / 3_687)', 'remaining time': '5:05:17', 'throughput': '8080.53', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:15:06,927 >> {'loss': 0.5768, 'grad_norm': 19.765899658203125, 'learning_rate': 1.5795510615052104e-07, 'epoch': 0.19392459994575537, 'num_input_tokens_seen': 6532628480, 'completed': '84.49% (3_115 / 3_687)', 'remaining time': '5:04:44', 'throughput': '8759.86', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:15:39,626 >> {'loss': 0.4025, 'grad_norm': 18.440826416015625, 'learning_rate': 1.5775708968589155e-07, 'epoch': 0.1941958231624627, 'num_input_tokens_seen': 6534725632, 'completed': '84.51% (3_116 / 3_687)', 'remaining time': '5:04:12', 'throughput': '8016.96', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:16:09,896 >> {'loss': 0.4297, 'grad_norm': 27.12670135498047, 'learning_rate': 1.575593888834303e-07, 'epoch': 0.19446704637917006, 'num_input_tokens_seen': 6536822784, 'completed': '84.54% (3_117 / 3_687)', 'remaining time': '5:03:39', 'throughput': '8660.12', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:16:38,024 >> {'loss': 0.7779, 'grad_norm': 22.037416458129883, 'learning_rate': 1.5736200390223942e-07, 'epoch': 0.1947382695958774, 'num_input_tokens_seen': 6538919936, 'completed': '84.57% (3_118 / 3_687)', 'remaining time': '5:03:04', 'throughput': '9319.74', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:17:07,006 >> {'loss': 0.3269, 'grad_norm': 15.226699829101562, 'learning_rate': 1.5716493490116684e-07, 'epoch': 0.19500949281258476, 'num_input_tokens_seen': 6541017088, 'completed': '84.59% (3_119 / 3_687)', 'remaining time': '5:02:30', 'throughput': '9045.14', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:17:37,504 >> {'loss': 0.4225, 'grad_norm': 19.05426788330078, 'learning_rate': 1.5696818203880624e-07, 'epoch': 0.19528071602929212, 'num_input_tokens_seen': 6543114240, 'completed': '84.62% (3_120 / 3_687)', 'remaining time': '5:01:57', 'throughput': '8595.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:18:06,060 >> {'loss': 0.561, 'grad_norm': 15.81242847442627, 'learning_rate': 1.5677174547349655e-07, 'epoch': 0.19555193924599945, 'num_input_tokens_seen': 6545211392, 'completed': '84.65% (3_121 / 3_687)', 'remaining time': '5:01:22', 'throughput': '9180.00', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:18:35,783 >> {'loss': 0.5197, 'grad_norm': 24.574951171875, 'learning_rate': 1.565756253633228e-07, 'epoch': 0.1958231624627068, 'num_input_tokens_seen': 6547308544, 'completed': '84.68% (3_122 / 3_687)', 'remaining time': '5:00:48', 'throughput': '8819.61', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:19:07,112 >> {'loss': 0.4524, 'grad_norm': 21.462337493896484, 'learning_rate': 1.5637982186611481e-07, 'epoch': 0.19609438567941415, 'num_input_tokens_seen': 6549405696, 'completed': '84.70% (3_123 / 3_687)', 'remaining time': '5:00:16', 'throughput': '8367.44', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:19:36,680 >> {'loss': 0.2997, 'grad_norm': 18.190275192260742, 'learning_rate': 1.5618433513944787e-07, 'epoch': 0.1963656088961215, 'num_input_tokens_seen': 6551502848, 'completed': '84.73% (3_124 / 3_687)', 'remaining time': '4:59:42', 'throughput': '8865.82', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:20:07,741 >> {'loss': 0.5017, 'grad_norm': 18.96411895751953, 'learning_rate': 1.5598916534064216e-07, 'epoch': 0.19663683211282887, 'num_input_tokens_seen': 6553600000, 'completed': '84.76% (3_125 / 3_687)', 'remaining time': '4:59:09', 'throughput': '8439.73', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:20:39,566 >> {'loss': 0.4162, 'grad_norm': 16.543214797973633, 'learning_rate': 1.557943126267631e-07, 'epoch': 0.1969080553295362, 'num_input_tokens_seen': 6555697152, 'completed': '84.78% (3_126 / 3_687)', 'remaining time': '4:58:37', 'throughput': '8236.95', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:21:08,790 >> {'loss': 0.287, 'grad_norm': 13.697303771972656, 'learning_rate': 1.5559977715462044e-07, 'epoch': 0.19717927854624356, 'num_input_tokens_seen': 6557794304, 'completed': '84.81% (3_127 / 3_687)', 'remaining time': '4:58:03', 'throughput': '8970.21', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:21:40,339 >> {'loss': 0.6185, 'grad_norm': 19.32088279724121, 'learning_rate': 1.5540555908076914e-07, 'epoch': 0.1974505017629509, 'num_input_tokens_seen': 6559891456, 'completed': '84.84% (3_128 / 3_687)', 'remaining time': '4:57:31', 'throughput': '8310.21', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:22:12,235 >> {'loss': 0.4217, 'grad_norm': 21.070152282714844, 'learning_rate': 1.5521165856150841e-07, 'epoch': 0.19772172497965826, 'num_input_tokens_seen': 6561988608, 'completed': '84.87% (3_129 / 3_687)', 'remaining time': '4:56:59', 'throughput': '8217.75', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:22:43,007 >> {'loss': 0.5169, 'grad_norm': 27.660669326782227, 'learning_rate': 1.55018075752882e-07, 'epoch': 0.19799294819636562, 'num_input_tokens_seen': 6564085760, 'completed': '84.89% (3_130 / 3_687)', 'remaining time': '4:56:26', 'throughput': '8518.82', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:23:12,218 >> {'loss': 0.4193, 'grad_norm': 17.022554397583008, 'learning_rate': 1.5482481081067825e-07, 'epoch': 0.19826417141307295, 'num_input_tokens_seen': 6566182912, 'completed': '84.92% (3_131 / 3_687)', 'remaining time': '4:55:52', 'throughput': '8974.11', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:23:41,642 >> {'loss': 0.4522, 'grad_norm': 18.43861198425293, 'learning_rate': 1.546318638904291e-07, 'epoch': 0.19853539462978032, 'num_input_tokens_seen': 6568280064, 'completed': '84.95% (3_132 / 3_687)', 'remaining time': '4:55:18', 'throughput': '8909.18', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:24:14,799 >> {'loss': 0.2211, 'grad_norm': 15.052275657653809, 'learning_rate': 1.5443923514741103e-07, 'epoch': 0.19880661784648765, 'num_input_tokens_seen': 6570377216, 'completed': '84.97% (3_133 / 3_687)', 'remaining time': '4:54:47', 'throughput': '7906.15', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:24:47,465 >> {'loss': 0.3405, 'grad_norm': 21.32815933227539, 'learning_rate': 1.5424692473664429e-07, 'epoch': 0.199077841063195, 'num_input_tokens_seen': 6572474368, 'completed': '85.00% (3_134 / 3_687)', 'remaining time': '4:54:16', 'throughput': '8025.08', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:25:17,069 >> {'loss': 0.6306, 'grad_norm': 20.877336502075195, 'learning_rate': 1.5405493281289302e-07, 'epoch': 0.19934906427990237, 'num_input_tokens_seen': 6574571520, 'completed': '85.03% (3_135 / 3_687)', 'remaining time': '4:53:42', 'throughput': '8855.03', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:25:48,191 >> {'loss': 0.6523, 'grad_norm': 21.885692596435547, 'learning_rate': 1.5386325953066494e-07, 'epoch': 0.1996202874966097, 'num_input_tokens_seen': 6576668672, 'completed': '85.06% (3_136 / 3_687)', 'remaining time': '4:53:10', 'throughput': '8423.15', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:26:19,131 >> {'loss': 0.3159, 'grad_norm': 13.137401580810547, 'learning_rate': 1.5367190504421157e-07, 'epoch': 0.19989151071331707, 'num_input_tokens_seen': 6578765824, 'completed': '85.08% (3_137 / 3_687)', 'remaining time': '4:52:37', 'throughput': '8472.58', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:26:49,849 >> {'loss': 0.3431, 'grad_norm': 15.956317901611328, 'learning_rate': 1.5348086950752753e-07, 'epoch': 0.2001627339300244, 'num_input_tokens_seen': 6580862976, 'completed': '85.11% (3_138 / 3_687)', 'remaining time': '4:52:04', 'throughput': '8533.93', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:27:20,374 >> {'loss': 0.4469, 'grad_norm': 16.24814796447754, 'learning_rate': 1.532901530743511e-07, 'epoch': 0.20043395714673176, 'num_input_tokens_seen': 6582960128, 'completed': '85.14% (3_139 / 3_687)', 'remaining time': '4:51:31', 'throughput': '8587.83', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:27:50,529 >> {'loss': 0.4047, 'grad_norm': 13.850639343261719, 'learning_rate': 1.530997558981635e-07, 'epoch': 0.20070518036343912, 'num_input_tokens_seen': 6585057280, 'completed': '85.16% (3_140 / 3_687)', 'remaining time': '4:50:58', 'throughput': '8693.13', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:28:23,013 >> {'loss': 0.7333, 'grad_norm': 21.258054733276367, 'learning_rate': 1.5290967813218919e-07, 'epoch': 0.20097640358014646, 'num_input_tokens_seen': 6587154432, 'completed': '85.19% (3_141 / 3_687)', 'remaining time': '4:50:27', 'throughput': '8069.98', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:28:58,984 >> {'loss': 0.3701, 'grad_norm': 24.493776321411133, 'learning_rate': 1.527199199293955e-07, 'epoch': 0.20124762679685382, 'num_input_tokens_seen': 6589251584, 'completed': '85.22% (3_142 / 3_687)', 'remaining time': '4:49:58', 'throughput': '7287.59', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:29:31,387 >> {'loss': 0.7751, 'grad_norm': 25.23069190979004, 'learning_rate': 1.525304814424927e-07, 'epoch': 0.20151885001356115, 'num_input_tokens_seen': 6591348736, 'completed': '85.25% (3_143 / 3_687)', 'remaining time': '4:49:26', 'throughput': '8090.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:30:03,194 >> {'loss': 0.3786, 'grad_norm': 16.206256866455078, 'learning_rate': 1.5234136282393356e-07, 'epoch': 0.2017900732302685, 'num_input_tokens_seen': 6593445888, 'completed': '85.27% (3_144 / 3_687)', 'remaining time': '4:48:54', 'throughput': '8241.70', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:30:32,039 >> {'loss': 0.5419, 'grad_norm': 18.445281982421875, 'learning_rate': 1.5215256422591357e-07, 'epoch': 0.20206129644697587, 'num_input_tokens_seen': 6595543040, 'completed': '85.30% (3_145 / 3_687)', 'remaining time': '4:48:20', 'throughput': '9087.80', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:31:04,195 >> {'loss': 0.225, 'grad_norm': 13.487834930419922, 'learning_rate': 1.5196408580037058e-07, 'epoch': 0.2023325196636832, 'num_input_tokens_seen': 6597640192, 'completed': '85.33% (3_146 / 3_687)', 'remaining time': '4:47:48', 'throughput': '8152.29', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:31:34,091 >> {'loss': 0.2773, 'grad_norm': 18.042003631591797, 'learning_rate': 1.51775927698985e-07, 'epoch': 0.20260374288039057, 'num_input_tokens_seen': 6599737344, 'completed': '85.35% (3_147 / 3_687)', 'remaining time': '4:47:15', 'throughput': '8768.60', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:32:04,517 >> {'loss': 0.3948, 'grad_norm': 14.105890274047852, 'learning_rate': 1.5158809007317926e-07, 'epoch': 0.2028749660970979, 'num_input_tokens_seen': 6601834496, 'completed': '85.38% (3_148 / 3_687)', 'remaining time': '4:46:42', 'throughput': '8615.68', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:32:33,763 >> {'loss': 0.7842, 'grad_norm': 22.028852462768555, 'learning_rate': 1.514005730741181e-07, 'epoch': 0.20314618931380526, 'num_input_tokens_seen': 6603931648, 'completed': '85.41% (3_149 / 3_687)', 'remaining time': '4:46:08', 'throughput': '8963.43', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:33:03,119 >> {'loss': 0.4541, 'grad_norm': 18.55408477783203, 'learning_rate': 1.512133768527077e-07, 'epoch': 0.20341741253051263, 'num_input_tokens_seen': 6606028800, 'completed': '85.44% (3_150 / 3_687)', 'remaining time': '4:45:34', 'throughput': '8929.93', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:33:32,847 >> {'loss': 0.3687, 'grad_norm': 15.917207717895508, 'learning_rate': 1.510265015595967e-07, 'epoch': 0.20368863574721996, 'num_input_tokens_seen': 6608125952, 'completed': '85.46% (3_151 / 3_687)', 'remaining time': '4:45:01', 'throughput': '8818.16', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:34:02,276 >> {'loss': 0.6256, 'grad_norm': 25.865598678588867, 'learning_rate': 1.5083994734517518e-07, 'epoch': 0.20395985896392732, 'num_input_tokens_seen': 6610223104, 'completed': '85.49% (3_152 / 3_687)', 'remaining time': '4:44:27', 'throughput': '8907.68', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:34:31,254 >> {'loss': 0.4382, 'grad_norm': 17.131990432739258, 'learning_rate': 1.5065371435957484e-07, 'epoch': 0.20423108218063465, 'num_input_tokens_seen': 6612320256, 'completed': '85.52% (3_153 / 3_687)', 'remaining time': '4:43:53', 'throughput': '9046.07', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:35:02,496 >> {'loss': 0.7498, 'grad_norm': 24.471385955810547, 'learning_rate': 1.5046780275266897e-07, 'epoch': 0.20450230539734202, 'num_input_tokens_seen': 6614417408, 'completed': '85.54% (3_154 / 3_687)', 'remaining time': '4:43:21', 'throughput': '8390.86', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:35:35,883 >> {'loss': 0.4527, 'grad_norm': 17.423646926879883, 'learning_rate': 1.5028221267407207e-07, 'epoch': 0.20477352861404935, 'num_input_tokens_seen': 6616514560, 'completed': '85.57% (3_155 / 3_687)', 'remaining time': '4:42:50', 'throughput': '7852.13', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:36:05,624 >> {'loss': 0.2501, 'grad_norm': 13.580658912658691, 'learning_rate': 1.5009694427314007e-07, 'epoch': 0.2050447518307567, 'num_input_tokens_seen': 6618611712, 'completed': '85.60% (3_156 / 3_687)', 'remaining time': '4:42:17', 'throughput': '8813.55', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:36:38,590 >> {'loss': 0.33, 'grad_norm': 16.855371475219727, 'learning_rate': 1.4991199769896983e-07, 'epoch': 0.20531597504746407, 'num_input_tokens_seen': 6620708864, 'completed': '85.63% (3_157 / 3_687)', 'remaining time': '4:41:45', 'throughput': '7952.03', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:37:09,248 >> {'loss': 0.3885, 'grad_norm': 14.530333518981934, 'learning_rate': 1.497273731003994e-07, 'epoch': 0.2055871982641714, 'num_input_tokens_seen': 6622806016, 'completed': '85.65% (3_158 / 3_687)', 'remaining time': '4:41:13', 'throughput': '8550.67', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:37:40,136 >> {'loss': 0.3511, 'grad_norm': 13.189663887023926, 'learning_rate': 1.4954307062600758e-07, 'epoch': 0.20585842148087877, 'num_input_tokens_seen': 6624903168, 'completed': '85.68% (3_159 / 3_687)', 'remaining time': '4:40:40', 'throughput': '8486.80', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:38:10,012 >> {'loss': 0.6974, 'grad_norm': 21.233369827270508, 'learning_rate': 1.4935909042411412e-07, 'epoch': 0.2061296446975861, 'num_input_tokens_seen': 6627000320, 'completed': '85.71% (3_160 / 3_687)', 'remaining time': '4:40:07', 'throughput': '8774.44', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:38:38,842 >> {'loss': 0.498, 'grad_norm': 19.892419815063477, 'learning_rate': 1.4917543264277901e-07, 'epoch': 0.20640086791429346, 'num_input_tokens_seen': 6629097472, 'completed': '85.73% (3_161 / 3_687)', 'remaining time': '4:39:33', 'throughput': '9092.89', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:39:08,510 >> {'loss': 0.4197, 'grad_norm': 14.9336519241333, 'learning_rate': 1.4899209742980317e-07, 'epoch': 0.20667209113100082, 'num_input_tokens_seen': 6631194624, 'completed': '85.76% (3_162 / 3_687)', 'remaining time': '4:38:59', 'throughput': '8835.87', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:39:38,679 >> {'loss': 0.5148, 'grad_norm': 19.203628540039062, 'learning_rate': 1.488090849327279e-07, 'epoch': 0.20694331434770816, 'num_input_tokens_seen': 6633291776, 'completed': '85.79% (3_163 / 3_687)', 'remaining time': '4:38:26', 'throughput': '8689.10', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:40:09,128 >> {'loss': 0.3568, 'grad_norm': 14.907687187194824, 'learning_rate': 1.4862639529883463e-07, 'epoch': 0.20721453756441552, 'num_input_tokens_seen': 6635388928, 'completed': '85.82% (3_164 / 3_687)', 'remaining time': '4:37:53', 'throughput': '8609.18', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:40:39,554 >> {'loss': 0.3888, 'grad_norm': 16.64046287536621, 'learning_rate': 1.4844402867514503e-07, 'epoch': 0.20748576078112285, 'num_input_tokens_seen': 6637486080, 'completed': '85.84% (3_165 / 3_687)', 'remaining time': '4:37:21', 'throughput': '8615.78', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:41:13,119 >> {'loss': 0.2793, 'grad_norm': 12.00853157043457, 'learning_rate': 1.4826198520842093e-07, 'epoch': 0.2077569839978302, 'num_input_tokens_seen': 6639583232, 'completed': '85.87% (3_166 / 3_687)', 'remaining time': '4:36:50', 'throughput': '7810.24', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:41:44,109 >> {'loss': 0.4134, 'grad_norm': 15.106518745422363, 'learning_rate': 1.4808026504516374e-07, 'epoch': 0.20802820721453757, 'num_input_tokens_seen': 6641680384, 'completed': '85.90% (3_167 / 3_687)', 'remaining time': '4:36:17', 'throughput': '8458.87', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:42:13,338 >> {'loss': 0.5532, 'grad_norm': 18.982995986938477, 'learning_rate': 1.4789886833161506e-07, 'epoch': 0.2082994304312449, 'num_input_tokens_seen': 6643777536, 'completed': '85.92% (3_168 / 3_687)', 'remaining time': '4:35:44', 'throughput': '8968.79', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:42:45,922 >> {'loss': 0.3146, 'grad_norm': 12.5213041305542, 'learning_rate': 1.477177952137561e-07, 'epoch': 0.20857065364795227, 'num_input_tokens_seen': 6645874688, 'completed': '85.95% (3_169 / 3_687)', 'remaining time': '4:35:12', 'throughput': '8045.06', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:43:20,034 >> {'loss': 0.5158, 'grad_norm': 18.02132797241211, 'learning_rate': 1.4753704583730754e-07, 'epoch': 0.2088418768646596, 'num_input_tokens_seen': 6647971840, 'completed': '85.98% (3_170 / 3_687)', 'remaining time': '4:34:42', 'throughput': '7684.82', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:43:52,122 >> {'loss': 0.3841, 'grad_norm': 16.98950958251953, 'learning_rate': 1.4735662034772962e-07, 'epoch': 0.20911310008136696, 'num_input_tokens_seen': 6650068992, 'completed': '86.00% (3_171 / 3_687)', 'remaining time': '4:34:10', 'throughput': '8169.60', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:44:22,021 >> {'loss': 0.327, 'grad_norm': 15.854743003845215, 'learning_rate': 1.47176518890222e-07, 'epoch': 0.20938432329807433, 'num_input_tokens_seen': 6652166144, 'completed': '86.03% (3_172 / 3_687)', 'remaining time': '4:33:37', 'throughput': '8767.53', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:44:53,648 >> {'loss': 0.3407, 'grad_norm': 14.455223083496094, 'learning_rate': 1.4699674160972337e-07, 'epoch': 0.20965554651478166, 'num_input_tokens_seen': 6654263296, 'completed': '86.06% (3_173 / 3_687)', 'remaining time': '4:33:05', 'throughput': '8288.66', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:45:22,674 >> {'loss': 0.2415, 'grad_norm': 12.59441089630127, 'learning_rate': 1.4681728865091165e-07, 'epoch': 0.20992676973148902, 'num_input_tokens_seen': 6656360448, 'completed': '86.09% (3_174 / 3_687)', 'remaining time': '4:32:31', 'throughput': '9031.41', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:45:51,101 >> {'loss': 0.529, 'grad_norm': 21.2336483001709, 'learning_rate': 1.466381601582038e-07, 'epoch': 0.21019799294819635, 'num_input_tokens_seen': 6658457600, 'completed': '86.11% (3_175 / 3_687)', 'remaining time': '4:31:57', 'throughput': '9221.59', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:46:21,161 >> {'loss': 0.2731, 'grad_norm': 15.553447723388672, 'learning_rate': 1.4645935627575562e-07, 'epoch': 0.21046921616490372, 'num_input_tokens_seen': 6660554752, 'completed': '86.14% (3_176 / 3_687)', 'remaining time': '4:31:24', 'throughput': '8720.72', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:46:48,845 >> {'loss': 0.626, 'grad_norm': 19.533782958984375, 'learning_rate': 1.462808771474617e-07, 'epoch': 0.21074043938161108, 'num_input_tokens_seen': 6662651904, 'completed': '86.17% (3_177 / 3_687)', 'remaining time': '4:30:49', 'throughput': '9469.21', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:47:19,602 >> {'loss': 0.2152, 'grad_norm': 11.384428024291992, 'learning_rate': 1.4610272291695503e-07, 'epoch': 0.2110116625983184, 'num_input_tokens_seen': 6664749056, 'completed': '86.19% (3_178 / 3_687)', 'remaining time': '4:30:17', 'throughput': '8522.97', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:47:50,938 >> {'loss': 0.3965, 'grad_norm': 17.49107551574707, 'learning_rate': 1.4592489372760764e-07, 'epoch': 0.21128288581502577, 'num_input_tokens_seen': 6666846208, 'completed': '86.22% (3_179 / 3_687)', 'remaining time': '4:29:45', 'throughput': '8365.56', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:48:22,962 >> {'loss': 0.3181, 'grad_norm': 14.759407043457031, 'learning_rate': 1.4574738972252953e-07, 'epoch': 0.2115541090317331, 'num_input_tokens_seen': 6668943360, 'completed': '86.25% (3_180 / 3_687)', 'remaining time': '4:29:13', 'throughput': '8185.88', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:48:52,024 >> {'loss': 0.4154, 'grad_norm': 15.087119102478027, 'learning_rate': 1.4557021104456923e-07, 'epoch': 0.21182533224844047, 'num_input_tokens_seen': 6671040512, 'completed': '86.28% (3_181 / 3_687)', 'remaining time': '4:28:39', 'throughput': '9020.01', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:49:24,807 >> {'loss': 0.3897, 'grad_norm': 17.55137062072754, 'learning_rate': 1.4539335783631346e-07, 'epoch': 0.21209655546514783, 'num_input_tokens_seen': 6673137664, 'completed': '86.30% (3_182 / 3_687)', 'remaining time': '4:28:08', 'throughput': '7996.34', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:49:56,738 >> {'loss': 0.4131, 'grad_norm': 14.304892539978027, 'learning_rate': 1.4521683024008687e-07, 'epoch': 0.21236777868185516, 'num_input_tokens_seen': 6675234816, 'completed': '86.33% (3_183 / 3_687)', 'remaining time': '4:27:36', 'throughput': '8209.92', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:50:27,463 >> {'loss': 0.4752, 'grad_norm': 17.37781524658203, 'learning_rate': 1.450406283979521e-07, 'epoch': 0.21263900189856252, 'num_input_tokens_seen': 6677331968, 'completed': '86.36% (3_184 / 3_687)', 'remaining time': '4:27:03', 'throughput': '8531.87', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:50:59,529 >> {'loss': 0.2446, 'grad_norm': 11.948087692260742, 'learning_rate': 1.4486475245170977e-07, 'epoch': 0.21291022511526986, 'num_input_tokens_seen': 6679429120, 'completed': '86.38% (3_185 / 3_687)', 'remaining time': '4:26:32', 'throughput': '8175.14', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:51:31,145 >> {'loss': 0.3559, 'grad_norm': 17.137893676757812, 'learning_rate': 1.446892025428981e-07, 'epoch': 0.21318144833197722, 'num_input_tokens_seen': 6681526272, 'completed': '86.41% (3_186 / 3_687)', 'remaining time': '4:26:00', 'throughput': '8291.53', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:52:01,342 >> {'loss': 0.1706, 'grad_norm': 8.884380340576172, 'learning_rate': 1.4451397881279298e-07, 'epoch': 0.21345267154868458, 'num_input_tokens_seen': 6683623424, 'completed': '86.44% (3_187 / 3_687)', 'remaining time': '4:25:27', 'throughput': '8680.97', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:52:31,091 >> {'loss': 0.3805, 'grad_norm': 16.802011489868164, 'learning_rate': 1.4433908140240792e-07, 'epoch': 0.21372389476539191, 'num_input_tokens_seen': 6685720576, 'completed': '86.47% (3_188 / 3_687)', 'remaining time': '4:24:54', 'throughput': '8811.96', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:53:02,121 >> {'loss': 0.3829, 'grad_norm': 20.294599533081055, 'learning_rate': 1.4416451045249354e-07, 'epoch': 0.21399511798209928, 'num_input_tokens_seen': 6687817728, 'completed': '86.49% (3_189 / 3_687)', 'remaining time': '4:24:21', 'throughput': '8447.97', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:53:34,203 >> {'loss': 0.3615, 'grad_norm': 16.205463409423828, 'learning_rate': 1.43990266103538e-07, 'epoch': 0.2142663411988066, 'num_input_tokens_seen': 6689914880, 'completed': '86.52% (3_190 / 3_687)', 'remaining time': '4:23:50', 'throughput': '8171.23', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:54:04,489 >> {'loss': 0.3264, 'grad_norm': 17.31773567199707, 'learning_rate': 1.4381634849576644e-07, 'epoch': 0.21453756441551397, 'num_input_tokens_seen': 6692012032, 'completed': '86.55% (3_191 / 3_687)', 'remaining time': '4:23:17', 'throughput': '8655.45', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:54:35,020 >> {'loss': 0.3105, 'grad_norm': 13.5274658203125, 'learning_rate': 1.4364275776914112e-07, 'epoch': 0.21480878763222133, 'num_input_tokens_seen': 6694109184, 'completed': '86.57% (3_192 / 3_687)', 'remaining time': '4:22:44', 'throughput': '8586.17', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:55:07,619 >> {'loss': 0.3113, 'grad_norm': 12.372538566589355, 'learning_rate': 1.4346949406336136e-07, 'epoch': 0.21508001084892867, 'num_input_tokens_seen': 6696206336, 'completed': '86.60% (3_193 / 3_687)', 'remaining time': '4:22:13', 'throughput': '8041.62', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:55:37,670 >> {'loss': 0.2954, 'grad_norm': 12.951509475708008, 'learning_rate': 1.4329655751786318e-07, 'epoch': 0.21535123406563603, 'num_input_tokens_seen': 6698303488, 'completed': '86.63% (3_194 / 3_687)', 'remaining time': '4:21:40', 'throughput': '8723.08', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:56:08,015 >> {'loss': 0.4798, 'grad_norm': 16.507877349853516, 'learning_rate': 1.4312394827181925e-07, 'epoch': 0.21562245728234336, 'num_input_tokens_seen': 6700400640, 'completed': '86.66% (3_195 / 3_687)', 'remaining time': '4:21:07', 'throughput': '8639.03', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:56:41,187 >> {'loss': 0.3777, 'grad_norm': 15.880234718322754, 'learning_rate': 1.4295166646413898e-07, 'epoch': 0.21589368049905072, 'num_input_tokens_seen': 6702497792, 'completed': '86.68% (3_196 / 3_687)', 'remaining time': '4:20:36', 'throughput': '7902.60', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:57:11,356 >> {'loss': 0.3242, 'grad_norm': 15.432538986206055, 'learning_rate': 1.4277971223346825e-07, 'epoch': 0.21616490371575806, 'num_input_tokens_seen': 6704594944, 'completed': '86.71% (3_197 / 3_687)', 'remaining time': '4:20:03', 'throughput': '8689.15', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:57:43,327 >> {'loss': 0.4032, 'grad_norm': 18.914257049560547, 'learning_rate': 1.4260808571818917e-07, 'epoch': 0.21643612693246542, 'num_input_tokens_seen': 6706692096, 'completed': '86.74% (3_198 / 3_687)', 'remaining time': '4:19:31', 'throughput': '8199.23', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:58:15,547 >> {'loss': 0.3723, 'grad_norm': 17.466224670410156, 'learning_rate': 1.4243678705642027e-07, 'epoch': 0.21670735014917278, 'num_input_tokens_seen': 6708789248, 'completed': '86.76% (3_199 / 3_687)', 'remaining time': '4:19:00', 'throughput': '8136.13', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 21:58:42,988 >> {'loss': 0.4615, 'grad_norm': 19.855295181274414, 'learning_rate': 1.4226581638601635e-07, 'epoch': 0.2169785733658801, 'num_input_tokens_seen': 6710886400, 'completed': '86.79% (3_200 / 3_687)', 'remaining time': '4:18:25', 'throughput': '9553.02', 'gpu_mem_free': '5581MB'} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( [INFO|trainer.py:3503] 2024-12-26 21:59:06,009 >> Saving model checkpoint to /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3200 [INFO|configuration_utils.py:472] 2024-12-26 21:59:06,012 >> Configuration saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3200/config.json [INFO|configuration_utils.py:807] 2024-12-26 21:59:06,014 >> Configuration saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3200/generation_config.json [INFO|modeling_utils.py:2807] 2024-12-26 22:00:00,600 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 7 checkpoint shards. You can find where each parameters has been saved in the index located at /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3200/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2684] 2024-12-26 22:00:00,604 >> tokenizer config file saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3200/tokenizer_config.json [INFO|tokenization_utils_base.py:2693] 2024-12-26 22:00:00,605 >> Special tokens file saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3200/special_tokens_map.json /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( [WARNING|trainer.py:868] 2024-12-26 22:03:38,362 >> Save streaming dataset state: {'epoch': 0, 'sample_in_epoch': 6400, 'num_canonical_nodes': 1, 'shuffle_seed': 42, 'initial_physical_nodes': 1} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] [INFO|trainer.py:175] 2024-12-26 22:04:11,264 >> {'loss': 0.5931, 'grad_norm': 20.14664077758789, 'learning_rate': 1.4209517384456786e-07, 'epoch': 0.21724979658258747, 'num_input_tokens_seen': 6712983552, 'completed': '86.82% (3_201 / 3_687)', 'remaining time': '4:20:53', 'throughput': '798.55', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 22:04:43,013 >> {'loss': 0.3161, 'grad_norm': 11.748592376708984, 'learning_rate': 1.4192485956940153e-07, 'epoch': 0.2175210197992948, 'num_input_tokens_seen': 6715080704, 'completed': '86.85% (3_202 / 3_687)', 'remaining time': '4:20:21', 'throughput': '8256.72', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 22:05:15,795 >> {'loss': 0.2212, 'grad_norm': 10.327018737792969, 'learning_rate': 1.4175487369757973e-07, 'epoch': 0.21779224301600217, 'num_input_tokens_seen': 6717177856, 'completed': '86.87% (3_203 / 3_687)', 'remaining time': '4:19:49', 'throughput': '7996.70', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 22:05:45,393 >> {'loss': 0.5347, 'grad_norm': 18.968761444091797, 'learning_rate': 1.4158521636590079e-07, 'epoch': 0.21806346623270953, 'num_input_tokens_seen': 6719275008, 'completed': '86.90% (3_204 / 3_687)', 'remaining time': '4:19:15', 'throughput': '8856.67', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 22:06:13,516 >> {'loss': 0.6773, 'grad_norm': 20.50962257385254, 'learning_rate': 1.4141588771089838e-07, 'epoch': 0.21833468944941686, 'num_input_tokens_seen': 6721372160, 'completed': '86.93% (3_205 / 3_687)', 'remaining time': '4:18:40', 'throughput': '9321.50', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 22:06:44,712 >> {'loss': 0.2704, 'grad_norm': 11.771401405334473, 'learning_rate': 1.412468878688418e-07, 'epoch': 0.21860591266612422, 'num_input_tokens_seen': 6723469312, 'completed': '86.95% (3_206 / 3_687)', 'remaining time': '4:18:08', 'throughput': '8402.90', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 22:07:14,889 >> {'loss': 0.2279, 'grad_norm': 11.137237548828125, 'learning_rate': 1.410782169757356e-07, 'epoch': 0.21887713588283156, 'num_input_tokens_seen': 6725566464, 'completed': '86.98% (3_207 / 3_687)', 'remaining time': '4:17:34', 'throughput': '8687.04', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 22:07:45,137 >> {'loss': 0.3856, 'grad_norm': 14.032715797424316, 'learning_rate': 1.4090987516731977e-07, 'epoch': 0.21914835909953892, 'num_input_tokens_seen': 6727663616, 'completed': '87.01% (3_208 / 3_687)', 'remaining time': '4:17:01', 'throughput': '8666.32', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 22:08:18,116 >> {'loss': 0.341, 'grad_norm': 13.834157943725586, 'learning_rate': 1.4074186257906934e-07, 'epoch': 0.21941958231624628, 'num_input_tokens_seen': 6729760768, 'completed': '87.04% (3_209 / 3_687)', 'remaining time': '4:16:29', 'throughput': '7948.84', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 22:08:46,630 >> {'loss': 0.4968, 'grad_norm': 16.78160285949707, 'learning_rate': 1.4057417934619453e-07, 'epoch': 0.21969080553295361, 'num_input_tokens_seen': 6731857920, 'completed': '87.06% (3_210 / 3_687)', 'remaining time': '4:15:55', 'throughput': '9193.63', 'gpu_mem_free': '5581MB'} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/import_utils.py:575: FutureWarning: `is_torch_tpu_available` is deprecated and will be removed in 4.41.0. Please use the `is_torch_xla_available` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/import_utils.py:575: FutureWarning: `is_torch_tpu_available` is deprecated and will be removed in 4.41.0. Please use the `is_torch_xla_available` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/import_utils.py:575: FutureWarning: `is_torch_tpu_available` is deprecated and will be removed in 4.41.0. Please use the `is_torch_xla_available` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/import_utils.py:575: FutureWarning: `is_torch_tpu_available` is deprecated and will be removed in 4.41.0. Please use the `is_torch_xla_available` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/import_utils.py:575: FutureWarning: `is_torch_tpu_available` is deprecated and will be removed in 4.41.0. Please use the `is_torch_xla_available` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/import_utils.py:575: FutureWarning: `is_torch_tpu_available` is deprecated and will be removed in 4.41.0. Please use the `is_torch_xla_available` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/import_utils.py:575: FutureWarning: `is_torch_tpu_available` is deprecated and will be removed in 4.41.0. Please use the `is_torch_xla_available` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/import_utils.py:575: FutureWarning: `is_torch_tpu_available` is deprecated and will be removed in 4.41.0. Please use the `is_torch_xla_available` instead. warnings.warn( 12/26/2024 22:45:58 - WARNING - __main__ - Process rank: 7, device: cuda:7, n_gpu: 1distributed training: True, fp16 training: False, bf16 training: True 12/26/2024 22:45:58 - WARNING - __main__ - Process rank: 5, device: cuda:5, n_gpu: 1distributed training: True, fp16 training: False, bf16 training: True 12/26/2024 22:45:58 - WARNING - __main__ - Process rank: 3, device: cuda:3, n_gpu: 1distributed training: True, fp16 training: False, bf16 training: True 12/26/2024 22:46:00 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, fp16 training: False, bf16 training: True 12/26/2024 22:46:00 - INFO - __main__ - Additional arguments ScriptArguments(model_family='llama', model_name_or_path='/scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/', config_overrides=None, config_overrides_json='', config_name='/scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/', tokenizer_name='/scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/', cache_dir=None, use_fast_tokenizer=False, model_revision='main', use_auth_token=False, tokenized_mds_train=['/work/pi_miyyer_umass_edu/ctpham/BookClaim-dev/data/ft/bookclaim_balanced_pack_complete'], tokenized_mds_validation=[], tokenized_mds_test=[], token_scaled_loss=True) 12/26/2024 22:46:00 - INFO - __main__ - Checkpoint detected, resuming training at /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3200. To avoid this behavior, change the `--output_dir` or add `--overwrite_output_dir` to train from scratch. [INFO|tokenization_utils_base.py:2267] 2024-12-26 22:46:00,024 >> loading file tokenizer.json [INFO|tokenization_utils_base.py:2267] 2024-12-26 22:46:00,024 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:2267] 2024-12-26 22:46:00,024 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:2267] 2024-12-26 22:46:00,024 >> loading file tokenizer_config.json Loading checkpoint shards: 0%| | 0/7 [00:00> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [INFO|configuration_utils.py:731] 2024-12-26 22:46:00,354 >> loading configuration file /scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/config.json [INFO|configuration_utils.py:800] 2024-12-26 22:46:00,355 >> Model config LlamaConfig { "_name_or_path": "/scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": 128009, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 524288, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "rope_theta": 128000000.0, "tie_word_embeddings": false, "torch_dtype": "float32", "transformers_version": "4.44.2", "use_cache": true, "vocab_size": 128256 } 12/26/2024 22:46:00 - INFO - __main__ - Loaded tokenizer - CPU Memory usage: 630.45 MB RSS Loading checkpoint shards: 43%|████▎ | 3/7 [00:00<00:00, 8.71it/s] Loading checkpoint shards: 43%|████▎ | 3/7 [00:00<00:00, 8.75it/s] Loading checkpoint shards: 43%|████▎ | 3/7 [00:00<00:00, 8.70it/s]12/26/2024 22:46:00 - INFO - __main__ - GPU 0: 4/81920 MB 12/26/2024 22:46:00 - INFO - __main__ - GPU 1: 423/81920 MB 12/26/2024 22:46:00 - INFO - __main__ - GPU 2: 423/81920 MB 12/26/2024 22:46:00 - INFO - __main__ - GPU 3: 423/81920 MB 12/26/2024 22:46:00 - INFO - __main__ - GPU 4: 423/81920 MB 12/26/2024 22:46:00 - INFO - __main__ - GPU 5: 423/81920 MB 12/26/2024 22:46:00 - INFO - __main__ - GPU 6: 423/81920 MB 12/26/2024 22:46:00 - INFO - __main__ - GPU 7: 423/81920 MB [INFO|modeling_utils.py:3675] 2024-12-26 22:46:00,440 >> loading weights file /scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/model.safetensors.index.json [INFO|configuration_utils.py:1038] 2024-12-26 22:46:00,441 >> Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128009, "pad_token_id": 0 } Loading checkpoint shards: 0%| | 0/7 [00:00> All model checkpoint weights were used when initializing LlamaForCausalLM. [INFO|modeling_utils.py:4515] 2024-12-26 22:46:01,329 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at /scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/. If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. [INFO|modeling_utils.py:4003] 2024-12-26 22:46:01,332 >> Generation config file not found, using a generation config created from the model config. 12/26/2024 22:46:01 - INFO - __main__ - Model: LlamaForCausalLM( (model): LlamaModel( (embed_tokens): Embedding(128256, 4096, padding_idx=0) (layers): ModuleList( (0-31): 32 x LlamaDecoderLayer( (self_attn): LlamaAttention( (q_proj): Linear(in_features=4096, out_features=4096, bias=False) (k_proj): Linear(in_features=4096, out_features=1024, bias=False) (v_proj): Linear(in_features=4096, out_features=1024, bias=False) (o_proj): Linear(in_features=4096, out_features=4096, bias=False) (rotary_emb): LlamaRotaryEmbedding() (distributed_attn_func): DistributedAttention() (distributed_varlen_attn_func): DistributedAttention() ) (mlp): LlamaMLP( (gate_proj): Linear(in_features=4096, out_features=14336, bias=False) (up_proj): Linear(in_features=4096, out_features=14336, bias=False) (down_proj): Linear(in_features=14336, out_features=4096, bias=False) (act_fn): SiLU() ) (input_layernorm): LlamaRMSNorm() (post_attention_layernorm): LlamaRMSNorm() ) ) (norm): LlamaRMSNorm() ) (lm_head): Linear(in_features=4096, out_features=128256, bias=False) ) 12/26/2024 22:46:01 - INFO - __main__ - Loaded model - CPU Memory usage: 655.82 MB RSS Loading checkpoint shards: 86%|████████▌ | 6/7 [00:00<00:00, 8.93it/s] Loading checkpoint shards: 100%|██████████| 7/7 [00:00<00:00, 9.33it/s] Loading checkpoint shards: 100%|██████████| 7/7 [00:00<00:00, 8.90it/s] Loading checkpoint shards: 100%|██████████| 7/7 [00:00<00:00, 9.07it/s] Loading checkpoint shards: 100%|██████████| 7/7 [00:00<00:00, 8.57it/s] 12/26/2024 22:46:01 - INFO - __main__ - GPU 0: 4/81920 MB 12/26/2024 22:46:01 - INFO - __main__ - GPU 1: 423/81920 MB 12/26/2024 22:46:01 - INFO - __main__ - GPU 2: 423/81920 MB 12/26/2024 22:46:01 - INFO - __main__ - GPU 3: 425/81920 MB 12/26/2024 22:46:01 - INFO - __main__ - GPU 4: 423/81920 MB 12/26/2024 22:46:01 - INFO - __main__ - GPU 5: 425/81920 MB 12/26/2024 22:46:01 - INFO - __main__ - GPU 6: 423/81920 MB 12/26/2024 22:46:01 - INFO - __main__ - GPU 7: 425/81920 MB Loading checkpoint shards: 100%|██████████| 7/7 [00:00<00:00, 9.83it/s] Loading checkpoint shards: 100%|██████████| 7/7 [00:00<00:00, 9.16it/s] Loading checkpoint shards: 100%|██████████| 7/7 [00:00<00:00, 8.70it/s] 12/26/2024 22:46:05 - INFO - training.dataset - Loading datasets for training 12/26/2024 22:46:05 - WARNING - streaming.base.dataset - Because `predownload` was not specified, it will default to 8*batch_size if batch_size is not None, otherwise 64. Prior to Streaming v0.7.0, `predownload` defaulted to max(batch_size, 256 * batch_size // num_canonical_nodes). 12/26/2024 22:46:05 - WARNING - streaming.base.dataset - Because `predownload` was not specified, it will default to 8*batch_size if batch_size is not None, otherwise 64. Prior to Streaming v0.7.0, `predownload` defaulted to max(batch_size, 256 * batch_size // num_canonical_nodes). 12/26/2024 22:46:05 - WARNING - streaming.base.dataset - Because `predownload` was not specified, it will default to 8*batch_size if batch_size is not None, otherwise 64. Prior to Streaming v0.7.0, `predownload` defaulted to max(batch_size, 256 * batch_size // num_canonical_nodes). 12/26/2024 22:46:05 - WARNING - streaming.base.dataset - Because `predownload` was not specified, it will default to 8*batch_size if batch_size is not None, otherwise 64. Prior to Streaming v0.7.0, `predownload` defaulted to max(batch_size, 256 * batch_size // num_canonical_nodes). 12/26/2024 22:46:05 - WARNING - streaming.base.dataset - Because `predownload` was not specified, it will default to 8*batch_size if batch_size is not None, otherwise 64. Prior to Streaming v0.7.0, `predownload` defaulted to max(batch_size, 256 * batch_size // num_canonical_nodes). 12/26/2024 22:46:05 - WARNING - streaming.base.dataset - Because `predownload` was not specified, it will default to 8*batch_size if batch_size is not None, otherwise 64. Prior to Streaming v0.7.0, `predownload` defaulted to max(batch_size, 256 * batch_size // num_canonical_nodes). 12/26/2024 22:46:05 - WARNING - streaming.base.dataset - Because `predownload` was not specified, it will default to 8*batch_size if batch_size is not None, otherwise 64. Prior to Streaming v0.7.0, `predownload` defaulted to max(batch_size, 256 * batch_size // num_canonical_nodes). 12/26/2024 22:46:05 - WARNING - streaming.base.dataset - Because `predownload` was not specified, it will default to 8*batch_size if batch_size is not None, otherwise 64. Prior to Streaming v0.7.0, `predownload` defaulted to max(batch_size, 256 * batch_size // num_canonical_nodes). 12/26/2024 22:46:05 - INFO - __main__ - Loaded training dataset [INFO|trainer.py:648] 2024-12-26 22:46:06,218 >> Using auto half precision backend [INFO|trainer.py:262] 2024-12-26 22:46:06,218 >> Using world as sequence parallel group 12/26/2024 22:46:06 - INFO - __main__ - Trainer created 12/26/2024 22:46:06 - INFO - __main__ - Training/evaluation parameters TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.95, adam_epsilon=1e-08, auto_find_batch_size=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, cuda_empty_cache=True, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=1, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=False, ddp_timeout=1800, debug=[], deepspeed=None, disable_tqdm=True, dispatch_batches=None, do_eval=False, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[, ], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=2, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=1e-06, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=info, log_level_replica=warning, log_on_each_node=True, logging_dir=/scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/runs/Dec26_22-45-54_gpu021, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lr_scheduler_kwargs={}, lr_scheduler_type=cosine, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, min_lr_ratio=0.1, mp_parameters=, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=1.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=1, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=200, save_strategy=steps, save_total_limit=None, seed=42, seq_parallel_size=8, skip_memory_metrics=True, split_batches=None, streaming_dataset=True, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.05, warmup_steps=0, weight_decay=0.1, ) 12/26/2024 22:46:06 - INFO - __main__ - *** Train *** [WARNING|trainer.py:795] 2024-12-26 22:46:06,365 >> Use streaming dataloader for train [WARNING|trainer.py:795] 2024-12-26 22:46:06,366 >> Use streaming dataloader for train [WARNING|trainer.py:795] 2024-12-26 22:46:06,366 >> Use streaming dataloader for train [WARNING|trainer.py:795] 2024-12-26 22:46:06,367 >> Use streaming dataloader for train [WARNING|trainer.py:795] 2024-12-26 22:46:06,367 >> Use streaming dataloader for train [WARNING|trainer.py:795] 2024-12-26 22:46:06,375 >> Use streaming dataloader for train [WARNING|trainer.py:795] 2024-12-26 22:46:06,377 >> Use streaming dataloader for train [WARNING|trainer.py:795] 2024-12-26 22:46:06,383 >> Use streaming dataloader for train [INFO|trainer.py:2526] 2024-12-26 22:46:19,219 >> Loading model from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3200. /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/accelerate/utils/fsdp_utils.py:126: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. state_dict = torch.load(input_model_file) /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/accelerate/utils/fsdp_utils.py:126: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. state_dict = torch.load(input_model_file) /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/accelerate/utils/fsdp_utils.py:126: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. state_dict = torch.load(input_model_file) /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/accelerate/utils/fsdp_utils.py:126: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. state_dict = torch.load(input_model_file) /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/accelerate/utils/fsdp_utils.py:126: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. state_dict = torch.load(input_model_file) /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/accelerate/utils/fsdp_utils.py:126: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. state_dict = torch.load(input_model_file) /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/accelerate/utils/fsdp_utils.py:126: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. state_dict = torch.load(input_model_file) /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/accelerate/utils/fsdp_utils.py:126: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. state_dict = torch.load(input_model_file) [WARNING|trainer.py:901] 2024-12-26 22:47:04,757 >> Load optimizer state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3200 [WARNING|trainer.py:901] 2024-12-26 22:47:04,760 >> Load optimizer state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3200 [WARNING|trainer.py:901] 2024-12-26 22:47:04,762 >> Load optimizer state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3200 [WARNING|trainer.py:901] 2024-12-26 22:47:04,765 >> Load optimizer state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3200 [WARNING|trainer.py:901] 2024-12-26 22:47:04,776 >> Load optimizer state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3200 [WARNING|trainer.py:901] 2024-12-26 22:47:04,777 >> Load optimizer state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3200 [WARNING|trainer.py:901] 2024-12-26 22:47:04,781 >> Load optimizer state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3200 [WARNING|trainer.py:901] 2024-12-26 22:47:04,784 >> Load optimizer state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3200 /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/accelerate/utils/fsdp_utils.py:209: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. optim_state = torch.load(input_optimizer_file) [WARNING|trainer.py:921] 2024-12-26 22:48:30,482 >> Load scheduler state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3200 [WARNING|trainer.py:942] 2024-12-26 22:48:30,484 >> Resume streaming dataset state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3200: {'epoch': 0, 'sample_in_epoch': 6400, 'num_canonical_nodes': 1, 'shuffle_seed': 42, 'initial_physical_nodes': 1} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/trainer.py:2833: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. checkpoint_rng_state = torch.load(rng_file) [WARNING|trainer.py:921] 2024-12-26 22:48:30,539 >> Load scheduler state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3200 [WARNING|trainer.py:942] 2024-12-26 22:48:30,540 >> Resume streaming dataset state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3200: {'epoch': 0, 'sample_in_epoch': 6400, 'num_canonical_nodes': 1, 'shuffle_seed': 42, 'initial_physical_nodes': 1} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/trainer.py:2833: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. checkpoint_rng_state = torch.load(rng_file) [WARNING|trainer.py:921] 2024-12-26 22:48:30,595 >> Load scheduler state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3200 [WARNING|trainer.py:942] 2024-12-26 22:48:30,596 >> Resume streaming dataset state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3200: {'epoch': 0, 'sample_in_epoch': 6400, 'num_canonical_nodes': 1, 'shuffle_seed': 42, 'initial_physical_nodes': 1} [WARNING|trainer.py:921] 2024-12-26 22:48:30,613 >> Load scheduler state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3200 /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/trainer.py:2833: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. checkpoint_rng_state = torch.load(rng_file) [WARNING|trainer.py:942] 2024-12-26 22:48:30,630 >> Resume streaming dataset state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3200: {'epoch': 0, 'sample_in_epoch': 6400, 'num_canonical_nodes': 1, 'shuffle_seed': 42, 'initial_physical_nodes': 1} 12/26/2024 22:48:30 - WARNING - streaming.base.dataset - Because `shuffle_block_size` was not specified, it will default to max(4_000_000 // num_canonical_nodes, 1 << 18) if num_canonical_nodes is not None, otherwise 262144. Prior to Streaming v0.7.0, `shuffle_block_size` defaulted to 262144. /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/trainer.py:2833: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. checkpoint_rng_state = torch.load(rng_file) [WARNING|trainer.py:921] 2024-12-26 22:48:30,663 >> Load scheduler state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3200 [WARNING|trainer.py:942] 2024-12-26 22:48:30,664 >> Resume streaming dataset state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3200: {'epoch': 0, 'sample_in_epoch': 6400, 'num_canonical_nodes': 1, 'shuffle_seed': 42, 'initial_physical_nodes': 1} [WARNING|trainer.py:921] 2024-12-26 22:48:30,678 >> Load scheduler state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3200 [WARNING|trainer.py:942] 2024-12-26 22:48:30,679 >> Resume streaming dataset state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3200: {'epoch': 0, 'sample_in_epoch': 6400, 'num_canonical_nodes': 1, 'shuffle_seed': 42, 'initial_physical_nodes': 1} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/trainer.py:2833: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. checkpoint_rng_state = torch.load(rng_file) [WARNING|trainer.py:921] 2024-12-26 22:48:30,686 >> Load scheduler state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3200 [WARNING|trainer.py:942] 2024-12-26 22:48:30,688 >> Resume streaming dataset state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3200: {'epoch': 0, 'sample_in_epoch': 6400, 'num_canonical_nodes': 1, 'shuffle_seed': 42, 'initial_physical_nodes': 1} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/trainer.py:2833: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. checkpoint_rng_state = torch.load(rng_file) /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/trainer.py:2833: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. checkpoint_rng_state = torch.load(rng_file) 12/26/2024 22:48:30 - WARNING - streaming.base.dataset - Because `shuffle_block_size` was not specified, it will default to max(4_000_000 // num_canonical_nodes, 1 << 18) if num_canonical_nodes is not None, otherwise 262144. Prior to Streaming v0.7.0, `shuffle_block_size` defaulted to 262144. 12/26/2024 22:48:30 - WARNING - streaming.base.dataset - Because `shuffle_block_size` was not specified, it will default to max(4_000_000 // num_canonical_nodes, 1 << 18) if num_canonical_nodes is not None, otherwise 262144. Prior to Streaming v0.7.0, `shuffle_block_size` defaulted to 262144. 12/26/2024 22:48:30 - WARNING - streaming.base.dataset - Because `shuffle_block_size` was not specified, it will default to max(4_000_000 // num_canonical_nodes, 1 << 18) if num_canonical_nodes is not None, otherwise 262144. Prior to Streaming v0.7.0, `shuffle_block_size` defaulted to 262144. 12/26/2024 22:48:30 - WARNING - streaming.base.dataset - Because `shuffle_block_size` was not specified, it will default to max(4_000_000 // num_canonical_nodes, 1 << 18) if num_canonical_nodes is not None, otherwise 262144. Prior to Streaming v0.7.0, `shuffle_block_size` defaulted to 262144. 12/26/2024 22:48:30 - WARNING - streaming.base.dataset - Because `shuffle_block_size` was not specified, it will default to max(4_000_000 // num_canonical_nodes, 1 << 18) if num_canonical_nodes is not None, otherwise 262144. Prior to Streaming v0.7.0, `shuffle_block_size` defaulted to 262144. 12/26/2024 22:48:30 - WARNING - streaming.base.dataset - Because `shuffle_block_size` was not specified, it will default to max(4_000_000 // num_canonical_nodes, 1 << 18) if num_canonical_nodes is not None, otherwise 262144. Prior to Streaming v0.7.0, `shuffle_block_size` defaulted to 262144. [WARNING|trainer.py:921] 2024-12-26 22:48:35,222 >> Load scheduler state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3200 [WARNING|trainer.py:942] 2024-12-26 22:48:35,223 >> Resume streaming dataset state from /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3200: {'epoch': 0, 'sample_in_epoch': 6400, 'num_canonical_nodes': 1, 'shuffle_seed': 42, 'initial_physical_nodes': 1} [INFO|trainer.py:2134] 2024-12-26 22:48:35,230 >> ***** Running training ***** [INFO|trainer.py:2135] 2024-12-26 22:48:35,230 >> Num examples = 7,374 [INFO|trainer.py:2136] 2024-12-26 22:48:35,230 >> Num Epochs = 1 [INFO|trainer.py:2137] 2024-12-26 22:48:35,230 >> Instantaneous batch size per device = 1 [INFO|trainer.py:2140] 2024-12-26 22:48:35,230 >> Total train batch size (w. parallel, distributed & accumulation) = 16 [INFO|trainer.py:2141] 2024-12-26 22:48:35,230 >> Gradient Accumulation steps = 2 [INFO|trainer.py:2142] 2024-12-26 22:48:35,230 >> Total optimization steps = 3,687 [INFO|trainer.py:2143] 2024-12-26 22:48:35,232 >> Number of trainable parameters = 1,003,782,656 [INFO|trainer.py:2165] 2024-12-26 22:48:35,239 >> Continuing training from checkpoint, will skip to saved global_step [INFO|trainer.py:2166] 2024-12-26 22:48:35,239 >> Continuing training from epoch 0 [INFO|trainer.py:2167] 2024-12-26 22:48:35,239 >> Continuing training from global step 3200 [INFO|integration_utils.py:807] 2024-12-26 22:48:35,240 >> Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true" wandb: Currently logged in as: chtmp223. Use `wandb login --relogin` to force relogin wandb: wandb version 0.19.1 is available! To upgrade, please run: wandb: $ pip install wandb --upgrade wandb: Tracking run with wandb version 0.17.3 wandb: Run data is saved locally in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/wandb/run-20241226_224836-fta6xses wandb: Run `wandb offline` to turn off syncing. wandb: Syncing run balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_ wandb: ⭐️ View project at https://wandb.ai/chtmp223/prolong wandb: 🚀 View run at https://wandb.ai/chtmp223/prolong/runs/fta6xses /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/trainer.py:2833: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. checkpoint_rng_state = torch.load(rng_file) 12/26/2024 22:48:44 - WARNING - streaming.base.dataset - Because `shuffle_block_size` was not specified, it will default to max(4_000_000 // num_canonical_nodes, 1 << 18) if num_canonical_nodes is not None, otherwise 262144. Prior to Streaming v0.7.0, `shuffle_block_size` defaulted to 262144. 12/26/2024 22:48:44 - WARNING - streaming.base.dataset - The `replication` arg has been set to 8 and training is resuming from sample 6400. Make sure you are accounting for sample replication when using StreamingDataset's `state_dict` method for deterministic resumption. Otherwise, you will resume training from the wrong sample. /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] [INFO|trainer.py:175] 2024-12-26 22:49:29,622 >> {'loss': 0.4811, 'grad_norm': 16.647418975830078, 'learning_rate': 1.4209517384456786e-07, 'epoch': 0.00027122321670735016, 'num_input_tokens_seen': 6712983552, 'completed': '86.82% (3_201 / 3_687)', 'remaining time': '5:59:50', 'throughput': '2950.43', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 22:50:00,672 >> {'loss': 0.2869, 'grad_norm': 10.773716926574707, 'learning_rate': 1.4192485956940153e-07, 'epoch': 0.0005424464334147003, 'num_input_tokens_seen': 6715080704, 'completed': '86.85% (3_202 / 3_687)', 'remaining time': '5:05:02', 'throughput': '8442.54', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 22:50:33,348 >> {'loss': 0.2319, 'grad_norm': 10.768502235412598, 'learning_rate': 1.4175487369757973e-07, 'epoch': 0.0008136696501220504, 'num_input_tokens_seen': 6717177856, 'completed': '86.87% (3_203 / 3_687)', 'remaining time': '4:50:48', 'throughput': '8022.68', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 22:51:02,632 >> {'loss': 0.5739, 'grad_norm': 20.341934204101562, 'learning_rate': 1.4158521636590079e-07, 'epoch': 0.0010848928668294006, 'num_input_tokens_seen': 6719275008, 'completed': '86.90% (3_204 / 3_687)', 'remaining time': '4:36:35', 'throughput': '8951.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 22:51:30,494 >> {'loss': 0.7022, 'grad_norm': 21.294525146484375, 'learning_rate': 1.4141588771089838e-07, 'epoch': 0.0013561160835367507, 'num_input_tokens_seen': 6721372160, 'completed': '86.93% (3_205 / 3_687)', 'remaining time': '4:25:34', 'throughput': '9408.45', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 22:52:01,341 >> {'loss': 0.2761, 'grad_norm': 12.000500679016113, 'learning_rate': 1.412468878688418e-07, 'epoch': 0.0016273393002441008, 'num_input_tokens_seen': 6723469312, 'completed': '86.95% (3_206 / 3_687)', 'remaining time': '4:22:04', 'throughput': '8498.40', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 22:52:31,213 >> {'loss': 0.2413, 'grad_norm': 11.831374168395996, 'learning_rate': 1.410782169757356e-07, 'epoch': 0.001898562516951451, 'num_input_tokens_seen': 6725566464, 'completed': '86.98% (3_207 / 3_687)', 'remaining time': '4:18:18', 'throughput': '8775.65', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 22:53:01,257 >> {'loss': 0.4252, 'grad_norm': 15.416332244873047, 'learning_rate': 1.4090987516731977e-07, 'epoch': 0.0021697857336588013, 'num_input_tokens_seen': 6727663616, 'completed': '87.01% (3_208 / 3_687)', 'remaining time': '4:15:31', 'throughput': '8725.14', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 22:53:34,118 >> {'loss': 0.3768, 'grad_norm': 15.296663284301758, 'learning_rate': 1.4074186257906934e-07, 'epoch': 0.0024410089503661514, 'num_input_tokens_seen': 6729760768, 'completed': '87.04% (3_209 / 3_687)', 'remaining time': '4:15:44', 'throughput': '7977.37', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 22:54:02,485 >> {'loss': 0.5308, 'grad_norm': 17.990812301635742, 'learning_rate': 1.4057417934619453e-07, 'epoch': 0.0027122321670735015, 'num_input_tokens_seen': 6731857920, 'completed': '87.06% (3_210 / 3_687)', 'remaining time': '4:12:14', 'throughput': '9241.13', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 22:54:32,121 >> {'loss': 0.498, 'grad_norm': 20.814661026000977, 'learning_rate': 1.404068256036403e-07, 'epoch': 0.0029834553837808516, 'num_input_tokens_seen': 6733955072, 'completed': '87.09% (3_211 / 3_687)', 'remaining time': '4:10:12', 'throughput': '8845.50', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 22:55:03,519 >> {'loss': 0.363, 'grad_norm': 17.336063385009766, 'learning_rate': 1.4023980148608667e-07, 'epoch': 0.0032546786004882017, 'num_input_tokens_seen': 6736052224, 'completed': '87.12% (3_212 / 3_687)', 'remaining time': '4:09:35', 'throughput': '8349.15', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 22:55:35,176 >> {'loss': 0.1552, 'grad_norm': 9.499157905578613, 'learning_rate': 1.4007310712794827e-07, 'epoch': 0.003525901817195552, 'num_input_tokens_seen': 6738149376, 'completed': '87.14% (3_213 / 3_687)', 'remaining time': '4:09:08', 'throughput': '8280.79', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 22:56:05,122 >> {'loss': 0.5238, 'grad_norm': 22.045595169067383, 'learning_rate': 1.3990674266337442e-07, 'epoch': 0.003797125033902902, 'num_input_tokens_seen': 6740246528, 'completed': '87.17% (3_214 / 3_687)', 'remaining time': '4:07:43', 'throughput': '8753.64', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 22:56:35,183 >> {'loss': 0.311, 'grad_norm': 12.480164527893066, 'learning_rate': 1.3974070822624883e-07, 'epoch': 0.0040683482506102524, 'num_input_tokens_seen': 6742343680, 'completed': '87.20% (3_215 / 3_687)', 'remaining time': '4:06:28', 'throughput': '8720.47', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 22:57:02,588 >> {'loss': 0.508, 'grad_norm': 17.63022232055664, 'learning_rate': 1.3957500395018977e-07, 'epoch': 0.0043395714673176026, 'num_input_tokens_seen': 6744440832, 'completed': '87.23% (3_216 / 3_687)', 'remaining time': '4:04:01', 'throughput': '9565.68', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 22:57:33,988 >> {'loss': 0.5619, 'grad_norm': 19.386255264282227, 'learning_rate': 1.394096299685497e-07, 'epoch': 0.004610794684024953, 'num_input_tokens_seen': 6746537984, 'completed': '87.25% (3_217 / 3_687)', 'remaining time': '4:03:39', 'throughput': '8348.53', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 22:58:01,686 >> {'loss': 0.371, 'grad_norm': 13.394591331481934, 'learning_rate': 1.3924458641441532e-07, 'epoch': 0.004882017900732303, 'num_input_tokens_seen': 6748635136, 'completed': '87.28% (3_218 / 3_687)', 'remaining time': '4:01:39', 'throughput': '9464.24', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 22:58:29,502 >> {'loss': 0.5936, 'grad_norm': 18.7851505279541, 'learning_rate': 1.3907987342060725e-07, 'epoch': 0.005153241117439653, 'num_input_tokens_seen': 6750732288, 'completed': '87.31% (3_219 / 3_687)', 'remaining time': '3:59:52', 'throughput': '9424.37', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 22:59:00,471 >> {'loss': 0.5704, 'grad_norm': 20.079437255859375, 'learning_rate': 1.389154911196805e-07, 'epoch': 0.005424464334147003, 'num_input_tokens_seen': 6752829440, 'completed': '87.33% (3_220 / 3_687)', 'remaining time': '3:59:26', 'throughput': '8464.78', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 22:59:34,469 >> {'loss': 0.2784, 'grad_norm': 11.2096586227417, 'learning_rate': 1.3875143964392355e-07, 'epoch': 0.005695687550854353, 'num_input_tokens_seen': 6754926592, 'completed': '87.36% (3_221 / 3_687)', 'remaining time': '4:00:07', 'throughput': '7710.38', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:00:05,679 >> {'loss': 0.8026, 'grad_norm': 23.216882705688477, 'learning_rate': 1.3858771912535877e-07, 'epoch': 0.005966910767561703, 'num_input_tokens_seen': 6757023744, 'completed': '87.39% (3_222 / 3_687)', 'remaining time': '3:59:42', 'throughput': '8399.47', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:00:38,265 >> {'loss': 0.1695, 'grad_norm': 10.511091232299805, 'learning_rate': 1.3842432969574238e-07, 'epoch': 0.006238133984269053, 'num_input_tokens_seen': 6759120896, 'completed': '87.42% (3_223 / 3_687)', 'remaining time': '3:59:45', 'throughput': '8044.70', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:01:08,722 >> {'loss': 0.3329, 'grad_norm': 14.655221939086914, 'learning_rate': 1.382612714865637e-07, 'epoch': 0.006509357200976403, 'num_input_tokens_seen': 6761218048, 'completed': '87.44% (3_224 / 3_687)', 'remaining time': '3:59:03', 'throughput': '8607.07', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:01:37,601 >> {'loss': 0.5104, 'grad_norm': 18.49831771850586, 'learning_rate': 1.3809854462904596e-07, 'epoch': 0.0067805804176837535, 'num_input_tokens_seen': 6763315200, 'completed': '87.47% (3_225 / 3_687)', 'remaining time': '3:57:54', 'throughput': '9077.28', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:02:10,234 >> {'loss': 0.1723, 'grad_norm': 9.733176231384277, 'learning_rate': 1.3793614925414542e-07, 'epoch': 0.007051803634391104, 'num_input_tokens_seen': 6765412352, 'completed': '87.50% (3_226 / 3_687)', 'remaining time': '3:57:53', 'throughput': '8033.07', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:02:40,106 >> {'loss': 0.3004, 'grad_norm': 16.505550384521484, 'learning_rate': 1.3777408549255182e-07, 'epoch': 0.007323026851098454, 'num_input_tokens_seen': 6767509504, 'completed': '87.52% (3_227 / 3_687)', 'remaining time': '3:57:04', 'throughput': '8775.47', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:03:12,476 >> {'loss': 0.1774, 'grad_norm': 12.750898361206055, 'learning_rate': 1.3761235347468784e-07, 'epoch': 0.007594250067805804, 'num_input_tokens_seen': 6769606656, 'completed': '87.55% (3_228 / 3_687)', 'remaining time': '3:56:57', 'throughput': '8098.56', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:03:42,131 >> {'loss': 0.2317, 'grad_norm': 11.027116775512695, 'learning_rate': 1.3745095333070926e-07, 'epoch': 0.007865473284513154, 'num_input_tokens_seen': 6771703808, 'completed': '87.58% (3_229 / 3_687)', 'remaining time': '3:56:05', 'throughput': '8839.70', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:04:13,288 >> {'loss': 0.9626, 'grad_norm': 26.27062225341797, 'learning_rate': 1.3728988519050476e-07, 'epoch': 0.008136696501220505, 'num_input_tokens_seen': 6773800960, 'completed': '87.61% (3_230 / 3_687)', 'remaining time': '3:55:37', 'throughput': '8413.50', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:04:43,869 >> {'loss': 0.4503, 'grad_norm': 15.877495765686035, 'learning_rate': 1.3712914918369595e-07, 'epoch': 0.008407919717927854, 'num_input_tokens_seen': 6775898112, 'completed': '87.63% (3_231 / 3_687)', 'remaining time': '3:55:01', 'throughput': '8572.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:05:14,486 >> {'loss': 0.4794, 'grad_norm': 21.709930419921875, 'learning_rate': 1.369687454396369e-07, 'epoch': 0.008679142934635205, 'num_input_tokens_seen': 6777995264, 'completed': '87.66% (3_232 / 3_687)', 'remaining time': '3:54:26', 'throughput': '8561.88', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:05:45,739 >> {'loss': 0.2759, 'grad_norm': 12.499728202819824, 'learning_rate': 1.3680867408741457e-07, 'epoch': 0.008950366151342554, 'num_input_tokens_seen': 6780092416, 'completed': '87.69% (3_233 / 3_687)', 'remaining time': '3:54:00', 'throughput': '8387.78', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:06:16,170 >> {'loss': 0.5445, 'grad_norm': 22.60646629333496, 'learning_rate': 1.366489352558483e-07, 'epoch': 0.009221589368049905, 'num_input_tokens_seen': 6782189568, 'completed': '87.71% (3_234 / 3_687)', 'remaining time': '3:53:22', 'throughput': '8614.45', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:06:47,190 >> {'loss': 0.4128, 'grad_norm': 17.723960876464844, 'learning_rate': 1.3648952907348957e-07, 'epoch': 0.009492812584757255, 'num_input_tokens_seen': 6784286720, 'completed': '87.74% (3_235 / 3_687)', 'remaining time': '3:52:53', 'throughput': '8450.82', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:07:17,213 >> {'loss': 0.4623, 'grad_norm': 16.3974552154541, 'learning_rate': 1.3633045566862268e-07, 'epoch': 0.009764035801464606, 'num_input_tokens_seen': 6786383872, 'completed': '87.77% (3_236 / 3_687)', 'remaining time': '3:52:11', 'throughput': '8731.53', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:07:49,215 >> {'loss': 0.3435, 'grad_norm': 13.922554016113281, 'learning_rate': 1.3617171516926382e-07, 'epoch': 0.010035259018171955, 'num_input_tokens_seen': 6788481024, 'completed': '87.79% (3_237 / 3_687)', 'remaining time': '3:51:53', 'throughput': '8191.38', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:08:19,959 >> {'loss': 0.2045, 'grad_norm': 10.876972198486328, 'learning_rate': 1.3601330770316122e-07, 'epoch': 0.010306482234879306, 'num_input_tokens_seen': 6790578176, 'completed': '87.82% (3_238 / 3_687)', 'remaining time': '3:51:20', 'throughput': '8526.63', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:08:51,516 >> {'loss': 0.3805, 'grad_norm': 17.26404571533203, 'learning_rate': 1.3585523339779523e-07, 'epoch': 0.010577705451586655, 'num_input_tokens_seen': 6792675328, 'completed': '87.85% (3_239 / 3_687)', 'remaining time': '3:50:57', 'throughput': '8307.15', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:09:18,847 >> {'loss': 0.8073, 'grad_norm': 24.898168563842773, 'learning_rate': 1.356974923803781e-07, 'epoch': 0.010848928668294006, 'num_input_tokens_seen': 6794772480, 'completed': '87.88% (3_240 / 3_687)', 'remaining time': '3:49:46', 'throughput': '9591.36', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:09:49,228 >> {'loss': 0.4327, 'grad_norm': 16.236896514892578, 'learning_rate': 1.3554008477785367e-07, 'epoch': 0.011120151885001357, 'num_input_tokens_seen': 6796869632, 'completed': '87.90% (3_241 / 3_687)', 'remaining time': '3:49:10', 'throughput': '8628.57', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:10:20,604 >> {'loss': 0.3856, 'grad_norm': 18.53495216369629, 'learning_rate': 1.3538301071689756e-07, 'epoch': 0.011391375101708706, 'num_input_tokens_seen': 6798966784, 'completed': '87.93% (3_242 / 3_687)', 'remaining time': '3:48:45', 'throughput': '8354.97', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:10:53,185 >> {'loss': 0.4498, 'grad_norm': 18.698938369750977, 'learning_rate': 1.3522627032391715e-07, 'epoch': 0.011662598318416057, 'num_input_tokens_seen': 6801063936, 'completed': '87.96% (3_243 / 3_687)', 'remaining time': '3:48:32', 'throughput': '8045.75', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:11:23,574 >> {'loss': 0.3556, 'grad_norm': 14.774138450622559, 'learning_rate': 1.3506986372505098e-07, 'epoch': 0.011933821535123406, 'num_input_tokens_seen': 6803161088, 'completed': '87.98% (3_244 / 3_687)', 'remaining time': '3:47:56', 'throughput': '8626.48', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:11:52,119 >> {'loss': 0.5173, 'grad_norm': 22.23430633544922, 'learning_rate': 1.3491379104616938e-07, 'epoch': 0.012205044751830757, 'num_input_tokens_seen': 6805258240, 'completed': '88.01% (3_245 / 3_687)', 'remaining time': '3:47:02', 'throughput': '9183.38', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:12:24,604 >> {'loss': 0.2426, 'grad_norm': 12.263484954833984, 'learning_rate': 1.3475805241287345e-07, 'epoch': 0.012476267968538107, 'num_input_tokens_seen': 6807355392, 'completed': '88.04% (3_246 / 3_687)', 'remaining time': '3:46:47', 'throughput': '8069.73', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:12:55,140 >> {'loss': 0.6469, 'grad_norm': 22.76350212097168, 'learning_rate': 1.3460264795049577e-07, 'epoch': 0.012747491185245458, 'num_input_tokens_seen': 6809452544, 'completed': '88.07% (3_247 / 3_687)', 'remaining time': '3:46:13', 'throughput': '8584.84', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:13:26,573 >> {'loss': 0.3359, 'grad_norm': 16.658710479736328, 'learning_rate': 1.344475777841e-07, 'epoch': 0.013018714401952807, 'num_input_tokens_seen': 6811549696, 'completed': '88.09% (3_248 / 3_687)', 'remaining time': '3:45:48', 'throughput': '8339.60', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:13:57,189 >> {'loss': 0.2698, 'grad_norm': 12.79448127746582, 'learning_rate': 1.3429284203848074e-07, 'epoch': 0.013289937618660158, 'num_input_tokens_seen': 6813646848, 'completed': '88.12% (3_249 / 3_687)', 'remaining time': '3:45:15', 'throughput': '8562.41', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:14:23,179 >> {'loss': 1.1975, 'grad_norm': 31.765995025634766, 'learning_rate': 1.3413844083816334e-07, 'epoch': 0.013561160835367507, 'num_input_tokens_seen': 6815744000, 'completed': '88.15% (3_250 / 3_687)', 'remaining time': '3:44:01', 'throughput': '10086.35', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:14:52,957 >> {'loss': 0.3363, 'grad_norm': 17.021753311157227, 'learning_rate': 1.3398437430740403e-07, 'epoch': 0.013832384052074858, 'num_input_tokens_seen': 6817841152, 'completed': '88.17% (3_251 / 3_687)', 'remaining time': '3:43:22', 'throughput': '8803.26', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:15:22,292 >> {'loss': 0.3068, 'grad_norm': 14.463712692260742, 'learning_rate': 1.3383064257018978e-07, 'epoch': 0.014103607268782207, 'num_input_tokens_seen': 6819938304, 'completed': '88.20% (3_252 / 3_687)', 'remaining time': '3:42:40', 'throughput': '8936.36', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:15:53,474 >> {'loss': 0.2803, 'grad_norm': 16.731229782104492, 'learning_rate': 1.3367724575023786e-07, 'epoch': 0.014374830485489558, 'num_input_tokens_seen': 6822035456, 'completed': '88.23% (3_253 / 3_687)', 'remaining time': '3:42:13', 'throughput': '8406.82', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:16:26,929 >> {'loss': 0.6333, 'grad_norm': 22.050737380981445, 'learning_rate': 1.3352418397099638e-07, 'epoch': 0.014646053702196907, 'num_input_tokens_seen': 6824132608, 'completed': '88.26% (3_254 / 3_687)', 'remaining time': '3:42:04', 'throughput': '7835.67', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:16:58,090 >> {'loss': 0.4512, 'grad_norm': 25.376235961914062, 'learning_rate': 1.333714573556435e-07, 'epoch': 0.014917276918904258, 'num_input_tokens_seen': 6826229760, 'completed': '88.28% (3_255 / 3_687)', 'remaining time': '3:41:36', 'throughput': '8412.54', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:17:28,800 >> {'loss': 0.4892, 'grad_norm': 16.40929412841797, 'learning_rate': 1.332190660270879e-07, 'epoch': 0.015188500135611608, 'num_input_tokens_seen': 6828326912, 'completed': '88.31% (3_256 / 3_687)', 'remaining time': '3:41:05', 'throughput': '8536.29', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:17:59,705 >> {'loss': 0.4432, 'grad_norm': 17.429452896118164, 'learning_rate': 1.3306701010796832e-07, 'epoch': 0.015459723352318959, 'num_input_tokens_seen': 6830424064, 'completed': '88.34% (3_257 / 3_687)', 'remaining time': '3:40:35', 'throughput': '8482.06', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:18:33,118 >> {'loss': 0.5687, 'grad_norm': 18.253334045410156, 'learning_rate': 1.3291528972065347e-07, 'epoch': 0.015730946569026308, 'num_input_tokens_seen': 6832521216, 'completed': '88.36% (3_258 / 3_687)', 'remaining time': '3:40:24', 'throughput': '7845.71', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:19:02,517 >> {'loss': 0.3613, 'grad_norm': 15.493392944335938, 'learning_rate': 1.327639049872422e-07, 'epoch': 0.01600216978573366, 'num_input_tokens_seen': 6834618368, 'completed': '88.39% (3_259 / 3_687)', 'remaining time': '3:39:43', 'throughput': '8916.78', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:19:31,151 >> {'loss': 0.3888, 'grad_norm': 17.719694137573242, 'learning_rate': 1.326128560295631e-07, 'epoch': 0.01627339300244101, 'num_input_tokens_seen': 6836715520, 'completed': '88.42% (3_260 / 3_687)', 'remaining time': '3:38:57', 'throughput': '9154.90', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:20:00,452 >> {'loss': 0.4548, 'grad_norm': 17.160886764526367, 'learning_rate': 1.324621429691748e-07, 'epoch': 0.01654461621914836, 'num_input_tokens_seen': 6838812672, 'completed': '88.45% (3_261 / 3_687)', 'remaining time': '3:38:16', 'throughput': '8946.57', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:20:32,069 >> {'loss': 0.4992, 'grad_norm': 19.88508415222168, 'learning_rate': 1.3231176592736528e-07, 'epoch': 0.016815839435855708, 'num_input_tokens_seen': 6840909824, 'completed': '88.47% (3_262 / 3_687)', 'remaining time': '3:37:51', 'throughput': '8291.26', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:21:00,047 >> {'loss': 0.5841, 'grad_norm': 20.55489730834961, 'learning_rate': 1.3216172502515248e-07, 'epoch': 0.01708706265256306, 'num_input_tokens_seen': 6843006976, 'completed': '88.50% (3_263 / 3_687)', 'remaining time': '3:37:01', 'throughput': '9369.50', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:21:30,416 >> {'loss': 0.5585, 'grad_norm': 19.093385696411133, 'learning_rate': 1.320120203832833e-07, 'epoch': 0.01735828586927041, 'num_input_tokens_seen': 6845104128, 'completed': '88.53% (3_264 / 3_687)', 'remaining time': '3:36:28', 'throughput': '8632.15', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:22:00,442 >> {'loss': 0.4794, 'grad_norm': 22.638851165771484, 'learning_rate': 1.3186265212223459e-07, 'epoch': 0.01762950908597776, 'num_input_tokens_seen': 6847201280, 'completed': '88.55% (3_265 / 3_687)', 'remaining time': '3:35:53', 'throughput': '8730.52', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:22:30,524 >> {'loss': 0.4148, 'grad_norm': 18.409488677978516, 'learning_rate': 1.3171362036221202e-07, 'epoch': 0.01790073230268511, 'num_input_tokens_seen': 6849298432, 'completed': '88.58% (3_266 / 3_687)', 'remaining time': '3:35:19', 'throughput': '8714.32', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:23:01,765 >> {'loss': 0.7068, 'grad_norm': 25.409086227416992, 'learning_rate': 1.3156492522315088e-07, 'epoch': 0.01817195551939246, 'num_input_tokens_seen': 6851395584, 'completed': '88.61% (3_267 / 3_687)', 'remaining time': '3:34:51', 'throughput': '8390.97', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:23:28,673 >> {'loss': 0.3872, 'grad_norm': 17.42267608642578, 'learning_rate': 1.314165668247153e-07, 'epoch': 0.01844317873609981, 'num_input_tokens_seen': 6853492736, 'completed': '88.64% (3_268 / 3_687)', 'remaining time': '3:33:57', 'throughput': '9742.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:23:57,903 >> {'loss': 0.6012, 'grad_norm': 20.826908111572266, 'learning_rate': 1.3126854528629843e-07, 'epoch': 0.01871440195280716, 'num_input_tokens_seen': 6855589888, 'completed': '88.66% (3_269 / 3_687)', 'remaining time': '3:33:18', 'throughput': '8968.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:24:27,896 >> {'loss': 0.3769, 'grad_norm': 27.992067337036133, 'learning_rate': 1.3112086072702238e-07, 'epoch': 0.01898562516951451, 'num_input_tokens_seen': 6857687040, 'completed': '88.69% (3_270 / 3_687)', 'remaining time': '3:32:44', 'throughput': '8740.35', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:25:02,157 >> {'loss': 0.4274, 'grad_norm': 17.23652458190918, 'learning_rate': 1.3097351326573814e-07, 'epoch': 0.01925684838622186, 'num_input_tokens_seen': 6859784192, 'completed': '88.72% (3_271 / 3_687)', 'remaining time': '3:32:35', 'throughput': '7651.36', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:25:29,240 >> {'loss': 0.5901, 'grad_norm': 27.117645263671875, 'learning_rate': 1.3082650302102524e-07, 'epoch': 0.01952807160292921, 'num_input_tokens_seen': 6861881344, 'completed': '88.74% (3_272 / 3_687)', 'remaining time': '3:31:43', 'throughput': '9679.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:25:59,179 >> {'loss': 0.1871, 'grad_norm': 10.570221900939941, 'learning_rate': 1.30679830111192e-07, 'epoch': 0.019799294819636562, 'num_input_tokens_seen': 6863978496, 'completed': '88.77% (3_273 / 3_687)', 'remaining time': '3:31:09', 'throughput': '8755.80', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:26:30,374 >> {'loss': 0.4029, 'grad_norm': 16.369604110717773, 'learning_rate': 1.3053349465427518e-07, 'epoch': 0.02007051803634391, 'num_input_tokens_seen': 6866075648, 'completed': '88.80% (3_274 / 3_687)', 'remaining time': '3:30:42', 'throughput': '8403.58', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:27:00,865 >> {'loss': 0.3059, 'grad_norm': 16.295419692993164, 'learning_rate': 1.303874967680399e-07, 'epoch': 0.02034174125305126, 'num_input_tokens_seen': 6868172800, 'completed': '88.83% (3_275 / 3_687)', 'remaining time': '3:30:10', 'throughput': '8597.23', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:27:30,494 >> {'loss': 0.2834, 'grad_norm': 13.2958345413208, 'learning_rate': 1.302418365699798e-07, 'epoch': 0.02061296446975861, 'num_input_tokens_seen': 6870269952, 'completed': '88.85% (3_276 / 3_687)', 'remaining time': '3:29:34', 'throughput': '8847.55', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:27:59,873 >> {'loss': 0.8627, 'grad_norm': 24.707290649414062, 'learning_rate': 1.300965141773167e-07, 'epoch': 0.020884187686465962, 'num_input_tokens_seen': 6872367104, 'completed': '88.88% (3_277 / 3_687)', 'remaining time': '3:28:57', 'throughput': '8923.03', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:28:33,542 >> {'loss': 0.2876, 'grad_norm': 13.416788101196289, 'learning_rate': 1.2995152970700044e-07, 'epoch': 0.02115541090317331, 'num_input_tokens_seen': 6874464256, 'completed': '88.91% (3_278 / 3_687)', 'remaining time': '3:28:43', 'throughput': '7785.85', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:29:03,868 >> {'loss': 0.5197, 'grad_norm': 18.29564666748047, 'learning_rate': 1.2980688327570906e-07, 'epoch': 0.02142663411988066, 'num_input_tokens_seen': 6876561408, 'completed': '88.93% (3_279 / 3_687)', 'remaining time': '3:28:11', 'throughput': '8644.10', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:29:33,909 >> {'loss': 0.336, 'grad_norm': 14.38173770904541, 'learning_rate': 1.296625749998485e-07, 'epoch': 0.021697857336588012, 'num_input_tokens_seen': 6878658560, 'completed': '88.96% (3_280 / 3_687)', 'remaining time': '3:27:37', 'throughput': '8726.39', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:30:04,575 >> {'loss': 0.228, 'grad_norm': 11.88624382019043, 'learning_rate': 1.2951860499555253e-07, 'epoch': 0.021969080553295363, 'num_input_tokens_seen': 6880755712, 'completed': '88.99% (3_281 / 3_687)', 'remaining time': '3:27:07', 'throughput': '8548.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:30:36,595 >> {'loss': 0.5354, 'grad_norm': 20.323486328125, 'learning_rate': 1.2937497337868274e-07, 'epoch': 0.022240303770002714, 'num_input_tokens_seen': 6882852864, 'completed': '89.02% (3_282 / 3_687)', 'remaining time': '3:26:43', 'throughput': '8186.89', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:31:06,997 >> {'loss': 0.4723, 'grad_norm': 18.997339248657227, 'learning_rate': 1.2923168026482843e-07, 'epoch': 0.02251152698671006, 'num_input_tokens_seen': 6884950016, 'completed': '89.04% (3_283 / 3_687)', 'remaining time': '3:26:12', 'throughput': '8622.59', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:31:38,089 >> {'loss': 0.3251, 'grad_norm': 16.928192138671875, 'learning_rate': 1.290887257693064e-07, 'epoch': 0.022782750203417412, 'num_input_tokens_seen': 6887047168, 'completed': '89.07% (3_284 / 3_687)', 'remaining time': '3:25:43', 'throughput': '8431.25', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:32:08,159 >> {'loss': 0.4975, 'grad_norm': 18.173126220703125, 'learning_rate': 1.2894611000716103e-07, 'epoch': 0.023053973420124763, 'num_input_tokens_seen': 6889144320, 'completed': '89.10% (3_285 / 3_687)', 'remaining time': '3:25:10', 'throughput': '8717.88', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:32:37,316 >> {'loss': 0.4848, 'grad_norm': 17.16127586364746, 'learning_rate': 1.28803833093164e-07, 'epoch': 0.023325196636832114, 'num_input_tokens_seen': 6891241472, 'completed': '89.12% (3_286 / 3_687)', 'remaining time': '3:24:33', 'throughput': '8990.79', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:33:10,006 >> {'loss': 0.299, 'grad_norm': 14.565631866455078, 'learning_rate': 1.286618951418144e-07, 'epoch': 0.023596419853539462, 'num_input_tokens_seen': 6893338624, 'completed': '89.15% (3_287 / 3_687)', 'remaining time': '3:24:11', 'throughput': '8018.98', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:33:38,635 >> {'loss': 0.4612, 'grad_norm': 16.098228454589844, 'learning_rate': 1.285202962673385e-07, 'epoch': 0.023867643070246813, 'num_input_tokens_seen': 6895435776, 'completed': '89.18% (3_288 / 3_687)', 'remaining time': '3:23:32', 'throughput': '9156.74', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:34:08,059 >> {'loss': 0.5771, 'grad_norm': 19.208843231201172, 'learning_rate': 1.2837903658368965e-07, 'epoch': 0.024138866286954164, 'num_input_tokens_seen': 6897532928, 'completed': '89.21% (3_289 / 3_687)', 'remaining time': '3:22:56', 'throughput': '8909.17', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:34:37,164 >> {'loss': 0.309, 'grad_norm': 14.503118515014648, 'learning_rate': 1.2823811620454833e-07, 'epoch': 0.024410089503661515, 'num_input_tokens_seen': 6899630080, 'completed': '89.23% (3_290 / 3_687)', 'remaining time': '3:22:19', 'throughput': '9006.60', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:35:11,994 >> {'loss': 0.5335, 'grad_norm': 18.937856674194336, 'learning_rate': 1.2809753524332188e-07, 'epoch': 0.024681312720368862, 'num_input_tokens_seen': 6901727232, 'completed': '89.26% (3_291 / 3_687)', 'remaining time': '3:22:07', 'throughput': '7526.60', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:35:43,549 >> {'loss': 0.5826, 'grad_norm': 20.493385314941406, 'learning_rate': 1.2795729381314434e-07, 'epoch': 0.024952535937076213, 'num_input_tokens_seen': 6903824384, 'completed': '89.29% (3_292 / 3_687)', 'remaining time': '3:21:40', 'throughput': '8307.46', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:36:13,615 >> {'loss': 0.5127, 'grad_norm': 18.82896614074707, 'learning_rate': 1.2781739202687694e-07, 'epoch': 0.025223759153783564, 'num_input_tokens_seen': 6905921536, 'completed': '89.31% (3_293 / 3_687)', 'remaining time': '3:21:07', 'throughput': '8718.87', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:36:44,668 >> {'loss': 0.7663, 'grad_norm': 22.182172775268555, 'learning_rate': 1.2767782999710715e-07, 'epoch': 0.025494982370490915, 'num_input_tokens_seen': 6908018688, 'completed': '89.34% (3_294 / 3_687)', 'remaining time': '3:20:38', 'throughput': '8441.89', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:37:12,030 >> {'loss': 0.6303, 'grad_norm': 21.46929359436035, 'learning_rate': 1.2753860783614925e-07, 'epoch': 0.025766205587198263, 'num_input_tokens_seen': 6910115840, 'completed': '89.37% (3_295 / 3_687)', 'remaining time': '3:19:54', 'throughput': '9580.58', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:37:45,946 >> {'loss': 0.3882, 'grad_norm': 16.43645668029785, 'learning_rate': 1.2739972565604396e-07, 'epoch': 0.026037428803905614, 'num_input_tokens_seen': 6912212992, 'completed': '89.40% (3_296 / 3_687)', 'remaining time': '3:19:37', 'throughput': '7729.23', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:38:14,481 >> {'loss': 0.5901, 'grad_norm': 19.76049041748047, 'learning_rate': 1.2726118356855834e-07, 'epoch': 0.026308652020612965, 'num_input_tokens_seen': 6914310144, 'completed': '89.42% (3_297 / 3_687)', 'remaining time': '3:18:58', 'throughput': '9186.77', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:38:43,702 >> {'loss': 0.4005, 'grad_norm': 18.00474739074707, 'learning_rate': 1.271229816851857e-07, 'epoch': 0.026579875237320316, 'num_input_tokens_seen': 6916407296, 'completed': '89.45% (3_298 / 3_687)', 'remaining time': '3:18:22', 'throughput': '8970.87', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:39:15,507 >> {'loss': 0.3887, 'grad_norm': 19.077592849731445, 'learning_rate': 1.2698512011714578e-07, 'epoch': 0.026851098454027666, 'num_input_tokens_seen': 6918504448, 'completed': '89.48% (3_299 / 3_687)', 'remaining time': '3:17:56', 'throughput': '8242.33', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:39:46,906 >> {'loss': 0.4026, 'grad_norm': 20.036828994750977, 'learning_rate': 1.2684759897538422e-07, 'epoch': 0.027122321670735014, 'num_input_tokens_seen': 6920601600, 'completed': '89.50% (3_300 / 3_687)', 'remaining time': '3:17:28', 'throughput': '8348.67', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:40:15,591 >> {'loss': 0.5011, 'grad_norm': 19.472291946411133, 'learning_rate': 1.2671041837057287e-07, 'epoch': 0.027393544887442365, 'num_input_tokens_seen': 6922698752, 'completed': '89.53% (3_301 / 3_687)', 'remaining time': '3:16:50', 'throughput': '9138.81', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:40:46,194 >> {'loss': 0.4298, 'grad_norm': 21.726613998413086, 'learning_rate': 1.2657357841310941e-07, 'epoch': 0.027664768104149716, 'num_input_tokens_seen': 6924795904, 'completed': '89.56% (3_302 / 3_687)', 'remaining time': '3:16:20', 'throughput': '8566.08', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:41:17,170 >> {'loss': 0.3162, 'grad_norm': 14.84683895111084, 'learning_rate': 1.2643707921311765e-07, 'epoch': 0.027935991320857067, 'num_input_tokens_seen': 6926893056, 'completed': '89.59% (3_303 / 3_687)', 'remaining time': '3:15:51', 'throughput': '8462.69', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:41:49,120 >> {'loss': 0.2217, 'grad_norm': 12.949082374572754, 'learning_rate': 1.2630092088044664e-07, 'epoch': 0.028207214537564414, 'num_input_tokens_seen': 6928990208, 'completed': '89.61% (3_304 / 3_687)', 'remaining time': '3:15:25', 'throughput': '8204.94', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:42:18,172 >> {'loss': 0.335, 'grad_norm': 15.199170112609863, 'learning_rate': 1.2616510352467158e-07, 'epoch': 0.028478437754271765, 'num_input_tokens_seen': 6931087360, 'completed': '89.64% (3_305 / 3_687)', 'remaining time': '3:14:49', 'throughput': '9023.09', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:42:48,787 >> {'loss': 0.7631, 'grad_norm': 24.491796493530273, 'learning_rate': 1.2602962725509307e-07, 'epoch': 0.028749660970979116, 'num_input_tokens_seen': 6933184512, 'completed': '89.67% (3_306 / 3_687)', 'remaining time': '3:14:18', 'throughput': '8562.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:43:17,621 >> {'loss': 0.4094, 'grad_norm': 18.071081161499023, 'learning_rate': 1.2589449218073728e-07, 'epoch': 0.029020884187686467, 'num_input_tokens_seen': 6935281664, 'completed': '89.69% (3_307 / 3_687)', 'remaining time': '3:13:41', 'throughput': '9091.52', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:43:48,824 >> {'loss': 0.3208, 'grad_norm': 13.082253456115723, 'learning_rate': 1.2575969841035578e-07, 'epoch': 0.029292107404393815, 'num_input_tokens_seen': 6937378816, 'completed': '89.72% (3_308 / 3_687)', 'remaining time': '3:13:13', 'throughput': '8401.16', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:44:17,886 >> {'loss': 0.6206, 'grad_norm': 20.97207260131836, 'learning_rate': 1.256252460524255e-07, 'epoch': 0.029563330621101166, 'num_input_tokens_seen': 6939475968, 'completed': '89.75% (3_309 / 3_687)', 'remaining time': '3:12:37', 'throughput': '9020.17', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:44:47,840 >> {'loss': 0.6038, 'grad_norm': 20.7148494720459, 'learning_rate': 1.2549113521514844e-07, 'epoch': 0.029834553837808517, 'num_input_tokens_seen': 6941573120, 'completed': '89.77% (3_310 / 3_687)', 'remaining time': '3:12:04', 'throughput': '8751.45', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:45:18,938 >> {'loss': 0.3072, 'grad_norm': 13.58342170715332, 'learning_rate': 1.2535736600645212e-07, 'epoch': 0.030105777054515868, 'num_input_tokens_seen': 6943670272, 'completed': '89.80% (3_311 / 3_687)', 'remaining time': '3:11:35', 'throughput': '8429.56', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:45:52,040 >> {'loss': 0.4495, 'grad_norm': 18.122154235839844, 'learning_rate': 1.2522393853398868e-07, 'epoch': 0.030377000271223215, 'num_input_tokens_seen': 6945767424, 'completed': '89.83% (3_312 / 3_687)', 'remaining time': '3:11:13', 'throughput': '7919.72', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:46:25,744 >> {'loss': 0.1881, 'grad_norm': 11.082249641418457, 'learning_rate': 1.2509085290513563e-07, 'epoch': 0.030648223487930566, 'num_input_tokens_seen': 6947864576, 'completed': '89.86% (3_313 / 3_687)', 'remaining time': '3:10:53', 'throughput': '7777.59', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:46:54,488 >> {'loss': 0.3816, 'grad_norm': 17.34503173828125, 'learning_rate': 1.2495810922699517e-07, 'epoch': 0.030919446704637917, 'num_input_tokens_seen': 6949961728, 'completed': '89.88% (3_314 / 3_687)', 'remaining time': '3:10:16', 'throughput': '9119.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:47:25,522 >> {'loss': 0.3201, 'grad_norm': 13.956634521484375, 'learning_rate': 1.2482570760639439e-07, 'epoch': 0.031190669921345268, 'num_input_tokens_seen': 6952058880, 'completed': '89.91% (3_315 / 3_687)', 'remaining time': '3:09:47', 'throughput': '8446.99', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:47:55,824 >> {'loss': 0.2497, 'grad_norm': 13.51434326171875, 'learning_rate': 1.24693648149885e-07, 'epoch': 0.031461893138052616, 'num_input_tokens_seen': 6954156032, 'completed': '89.94% (3_316 / 3_687)', 'remaining time': '3:09:15', 'throughput': '8651.21', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:48:26,114 >> {'loss': 0.2518, 'grad_norm': 16.2774600982666, 'learning_rate': 1.2456193096374338e-07, 'epoch': 0.03173311635475997, 'num_input_tokens_seen': 6956253184, 'completed': '89.96% (3_317 / 3_687)', 'remaining time': '3:08:44', 'throughput': '8654.41', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:48:56,669 >> {'loss': 0.5708, 'grad_norm': 21.98696517944336, 'learning_rate': 1.244305561539707e-07, 'epoch': 0.03200433957146732, 'num_input_tokens_seen': 6958350336, 'completed': '89.99% (3_318 / 3_687)', 'remaining time': '3:08:13', 'throughput': '8579.33', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:49:25,117 >> {'loss': 0.4993, 'grad_norm': 20.506793975830078, 'learning_rate': 1.242995238262923e-07, 'epoch': 0.032275562788174665, 'num_input_tokens_seen': 6960447488, 'completed': '90.02% (3_319 / 3_687)', 'remaining time': '3:07:36', 'throughput': '9214.87', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:49:56,849 >> {'loss': 0.4024, 'grad_norm': 14.927400588989258, 'learning_rate': 1.2416883408615809e-07, 'epoch': 0.03254678600488202, 'num_input_tokens_seen': 6962544640, 'completed': '90.05% (3_320 / 3_687)', 'remaining time': '3:07:09', 'throughput': '8261.14', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:50:28,393 >> {'loss': 0.3532, 'grad_norm': 16.304428100585938, 'learning_rate': 1.2403848703874202e-07, 'epoch': 0.03281800922158937, 'num_input_tokens_seen': 6964641792, 'completed': '90.07% (3_321 / 3_687)', 'remaining time': '3:06:41', 'throughput': '8310.50', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:51:00,116 >> {'loss': 0.447, 'grad_norm': 15.736098289489746, 'learning_rate': 1.2390848278894264e-07, 'epoch': 0.03308923243829672, 'num_input_tokens_seen': 6966738944, 'completed': '90.10% (3_322 / 3_687)', 'remaining time': '3:06:14', 'throughput': '8263.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:51:31,186 >> {'loss': 0.4226, 'grad_norm': 20.3814697265625, 'learning_rate': 1.2377882144138222e-07, 'epoch': 0.03336045565500407, 'num_input_tokens_seen': 6968836096, 'completed': '90.13% (3_323 / 3_687)', 'remaining time': '3:05:44', 'throughput': '8437.25', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:52:01,353 >> {'loss': 0.5159, 'grad_norm': 20.88442039489746, 'learning_rate': 1.2364950310040744e-07, 'epoch': 0.033631678871711417, 'num_input_tokens_seen': 6970933248, 'completed': '90.15% (3_324 / 3_687)', 'remaining time': '3:05:12', 'throughput': '8689.88', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:52:31,130 >> {'loss': 0.2405, 'grad_norm': 13.660783767700195, 'learning_rate': 1.235205278700887e-07, 'epoch': 0.03390290208841877, 'num_input_tokens_seen': 6973030400, 'completed': '90.18% (3_325 / 3_687)', 'remaining time': '3:04:39', 'throughput': '8803.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:52:59,309 >> {'loss': 0.7742, 'grad_norm': 22.85435676574707, 'learning_rate': 1.233918958542204e-07, 'epoch': 0.03417412530512612, 'num_input_tokens_seen': 6975127552, 'completed': '90.21% (3_326 / 3_687)', 'remaining time': '3:04:02', 'throughput': '9303.03', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:53:33,033 >> {'loss': 0.3336, 'grad_norm': 13.575337409973145, 'learning_rate': 1.2326360715632069e-07, 'epoch': 0.034445348521833466, 'num_input_tokens_seen': 6977224704, 'completed': '90.24% (3_327 / 3_687)', 'remaining time': '3:03:40', 'throughput': '7773.24', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:54:06,715 >> {'loss': 0.6159, 'grad_norm': 24.44587516784668, 'learning_rate': 1.231356618796314e-07, 'epoch': 0.03471657173854082, 'num_input_tokens_seen': 6979321856, 'completed': '90.26% (3_328 / 3_687)', 'remaining time': '3:03:18', 'throughput': '7782.77', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:54:39,724 >> {'loss': 0.659, 'grad_norm': 23.5541934967041, 'learning_rate': 1.2300806012711798e-07, 'epoch': 0.03498779495524817, 'num_input_tokens_seen': 6981419008, 'completed': '90.29% (3_329 / 3_687)', 'remaining time': '3:02:54', 'throughput': '7941.59', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:55:09,824 >> {'loss': 0.1895, 'grad_norm': 9.155745506286621, 'learning_rate': 1.228808020014696e-07, 'epoch': 0.03525901817195552, 'num_input_tokens_seen': 6983516160, 'completed': '90.32% (3_330 / 3_687)', 'remaining time': '3:02:22', 'throughput': '8709.31', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:55:40,717 >> {'loss': 0.4427, 'grad_norm': 16.62866973876953, 'learning_rate': 1.227538876050987e-07, 'epoch': 0.03553024138866287, 'num_input_tokens_seen': 6985613312, 'completed': '90.34% (3_331 / 3_687)', 'remaining time': '3:01:52', 'throughput': '8485.52', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:56:14,057 >> {'loss': 0.357, 'grad_norm': 15.631593704223633, 'learning_rate': 1.226273170401411e-07, 'epoch': 0.03580146460537022, 'num_input_tokens_seen': 6987710464, 'completed': '90.37% (3_332 / 3_687)', 'remaining time': '3:01:28', 'throughput': '7862.70', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:56:43,358 >> {'loss': 0.4486, 'grad_norm': 20.449111938476562, 'learning_rate': 1.2250109040845589e-07, 'epoch': 0.03607268782207757, 'num_input_tokens_seen': 6989807616, 'completed': '90.40% (3_333 / 3_687)', 'remaining time': '3:00:54', 'throughput': '8946.53', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:57:15,404 >> {'loss': 0.2722, 'grad_norm': 19.299829483032227, 'learning_rate': 1.2237520781162567e-07, 'epoch': 0.03634391103878492, 'num_input_tokens_seen': 6991904768, 'completed': '90.43% (3_334 / 3_687)', 'remaining time': '3:00:27', 'throughput': '8180.38', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:57:47,547 >> {'loss': 0.4216, 'grad_norm': 17.439783096313477, 'learning_rate': 1.222496693509559e-07, 'epoch': 0.03661513425549227, 'num_input_tokens_seen': 6994001920, 'completed': '90.45% (3_335 / 3_687)', 'remaining time': '3:00:00', 'throughput': '8155.48', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:58:18,221 >> {'loss': 0.3902, 'grad_norm': 16.625341415405273, 'learning_rate': 1.2212447512747506e-07, 'epoch': 0.03688635747219962, 'num_input_tokens_seen': 6996099072, 'completed': '90.48% (3_336 / 3_687)', 'remaining time': '2:59:30', 'throughput': '8546.05', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:58:48,019 >> {'loss': 0.2277, 'grad_norm': 12.145569801330566, 'learning_rate': 1.2199962524193492e-07, 'epoch': 0.03715758068890697, 'num_input_tokens_seen': 6998196224, 'completed': '90.51% (3_337 / 3_687)', 'remaining time': '2:58:57', 'throughput': '8797.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:59:19,728 >> {'loss': 0.4426, 'grad_norm': 19.74405860900879, 'learning_rate': 1.218751197948096e-07, 'epoch': 0.03742880390561432, 'num_input_tokens_seen': 7000293376, 'completed': '90.53% (3_338 / 3_687)', 'remaining time': '2:58:29', 'throughput': '8267.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-26 23:59:52,429 >> {'loss': 0.5911, 'grad_norm': 20.95228385925293, 'learning_rate': 1.2175095888629657e-07, 'epoch': 0.03770002712232167, 'num_input_tokens_seen': 7002390528, 'completed': '90.56% (3_339 / 3_687)', 'remaining time': '2:58:03', 'throughput': '8016.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:00:22,247 >> {'loss': 0.6537, 'grad_norm': 23.399641036987305, 'learning_rate': 1.2162714261631575e-07, 'epoch': 0.03797125033902902, 'num_input_tokens_seen': 7004487680, 'completed': '90.59% (3_340 / 3_687)', 'remaining time': '2:57:30', 'throughput': '8791.53', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:00:55,347 >> {'loss': 0.2545, 'grad_norm': 11.986372947692871, 'learning_rate': 1.215036710845097e-07, 'epoch': 0.03824247355573637, 'num_input_tokens_seen': 7006584832, 'completed': '90.62% (3_341 / 3_687)', 'remaining time': '2:57:05', 'throughput': '7919.68', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:01:27,156 >> {'loss': 0.386, 'grad_norm': 16.912973403930664, 'learning_rate': 1.213805443902437e-07, 'epoch': 0.03851369677244372, 'num_input_tokens_seen': 7008681984, 'completed': '90.64% (3_342 / 3_687)', 'remaining time': '2:56:37', 'throughput': '8241.40', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:01:56,291 >> {'loss': 0.3066, 'grad_norm': 18.482542037963867, 'learning_rate': 1.212577626326054e-07, 'epoch': 0.038784919989151075, 'num_input_tokens_seen': 7010779136, 'completed': '90.67% (3_343 / 3_687)', 'remaining time': '2:56:03', 'throughput': '8997.43', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:02:27,454 >> {'loss': 0.3742, 'grad_norm': 16.53171157836914, 'learning_rate': 1.2113532591040488e-07, 'epoch': 0.03905614320585842, 'num_input_tokens_seen': 7012876288, 'completed': '90.70% (3_344 / 3_687)', 'remaining time': '2:55:33', 'throughput': '8412.06', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:02:59,989 >> {'loss': 0.3583, 'grad_norm': 18.05655860900879, 'learning_rate': 1.2101323432217454e-07, 'epoch': 0.03932736642256577, 'num_input_tokens_seen': 7014973440, 'completed': '90.72% (3_345 / 3_687)', 'remaining time': '2:55:07', 'throughput': '8057.17', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:03:30,295 >> {'loss': 0.3726, 'grad_norm': 16.21353530883789, 'learning_rate': 1.208914879661691e-07, 'epoch': 0.039598589639273124, 'num_input_tokens_seen': 7017070592, 'completed': '90.75% (3_346 / 3_687)', 'remaining time': '2:54:35', 'throughput': '8649.94', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:03:59,925 >> {'loss': 0.2253, 'grad_norm': 11.822895050048828, 'learning_rate': 1.2077008694036527e-07, 'epoch': 0.03986981285598047, 'num_input_tokens_seen': 7019167744, 'completed': '90.78% (3_347 / 3_687)', 'remaining time': '2:54:02', 'throughput': '8847.39', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:04:31,153 >> {'loss': 0.4391, 'grad_norm': 19.47376823425293, 'learning_rate': 1.2064903134246221e-07, 'epoch': 0.04014103607268782, 'num_input_tokens_seen': 7021264896, 'completed': '90.81% (3_348 / 3_687)', 'remaining time': '2:53:32', 'throughput': '8394.44', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:04:58,965 >> {'loss': 0.6866, 'grad_norm': 18.84421730041504, 'learning_rate': 1.2052832126988053e-07, 'epoch': 0.040412259289395173, 'num_input_tokens_seen': 7023362048, 'completed': '90.83% (3_349 / 3_687)', 'remaining time': '2:52:55', 'throughput': '9425.65', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:05:28,132 >> {'loss': 0.5646, 'grad_norm': 18.107643127441406, 'learning_rate': 1.2040795681976338e-07, 'epoch': 0.04068348250610252, 'num_input_tokens_seen': 7025459200, 'completed': '90.86% (3_350 / 3_687)', 'remaining time': '2:52:21', 'throughput': '8987.78', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:05:58,877 >> {'loss': 0.5655, 'grad_norm': 18.47949981689453, 'learning_rate': 1.2028793808897537e-07, 'epoch': 0.040954705722809875, 'num_input_tokens_seen': 7027556352, 'completed': '90.89% (3_351 / 3_687)', 'remaining time': '2:51:50', 'throughput': '8526.14', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:06:31,699 >> {'loss': 0.3471, 'grad_norm': 18.35427474975586, 'learning_rate': 1.20168265174103e-07, 'epoch': 0.04122592893951722, 'num_input_tokens_seen': 7029653504, 'completed': '90.91% (3_352 / 3_687)', 'remaining time': '2:51:24', 'throughput': '7986.90', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:07:02,601 >> {'loss': 0.2921, 'grad_norm': 13.998270034790039, 'learning_rate': 1.2004893817145456e-07, 'epoch': 0.04149715215622457, 'num_input_tokens_seen': 7031750656, 'completed': '90.94% (3_353 / 3_687)', 'remaining time': '2:50:54', 'throughput': '8483.07', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:07:31,674 >> {'loss': 0.7065, 'grad_norm': 26.075870513916016, 'learning_rate': 1.1992995717705982e-07, 'epoch': 0.041768375372931925, 'num_input_tokens_seen': 7033847808, 'completed': '90.97% (3_354 / 3_687)', 'remaining time': '2:50:20', 'throughput': '9016.92', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:08:00,922 >> {'loss': 0.2938, 'grad_norm': 14.8284912109375, 'learning_rate': 1.198113222866702e-07, 'epoch': 0.04203959858963927, 'num_input_tokens_seen': 7035944960, 'completed': '91.00% (3_355 / 3_687)', 'remaining time': '2:49:46', 'throughput': '8962.55', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:08:30,806 >> {'loss': 0.3828, 'grad_norm': 15.237571716308594, 'learning_rate': 1.1969303359575845e-07, 'epoch': 0.04231082180634662, 'num_input_tokens_seen': 7038042112, 'completed': '91.02% (3_356 / 3_687)', 'remaining time': '2:49:14', 'throughput': '8772.05', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:09:01,751 >> {'loss': 0.3513, 'grad_norm': 14.325973510742188, 'learning_rate': 1.1957509119951885e-07, 'epoch': 0.042582045023053974, 'num_input_tokens_seen': 7040139264, 'completed': '91.05% (3_357 / 3_687)', 'remaining time': '2:48:43', 'throughput': '8471.32', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:09:29,657 >> {'loss': 0.277, 'grad_norm': 14.475112915039062, 'learning_rate': 1.1945749519286694e-07, 'epoch': 0.04285326823976132, 'num_input_tokens_seen': 7042236416, 'completed': '91.08% (3_358 / 3_687)', 'remaining time': '2:48:07', 'throughput': '9393.85', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:10:00,057 >> {'loss': 0.383, 'grad_norm': 16.315767288208008, 'learning_rate': 1.193402456704396e-07, 'epoch': 0.043124491456468676, 'num_input_tokens_seen': 7044333568, 'completed': '91.10% (3_359 / 3_687)', 'remaining time': '2:47:36', 'throughput': '8623.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:10:29,603 >> {'loss': 0.4815, 'grad_norm': 19.562829971313477, 'learning_rate': 1.1922334272659474e-07, 'epoch': 0.043395714673176024, 'num_input_tokens_seen': 7046430720, 'completed': '91.13% (3_360 / 3_687)', 'remaining time': '2:47:03', 'throughput': '8872.31', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:11:02,217 >> {'loss': 0.4433, 'grad_norm': 19.108549118041992, 'learning_rate': 1.1910678645541137e-07, 'epoch': 0.04366693788988337, 'num_input_tokens_seen': 7048527872, 'completed': '91.16% (3_361 / 3_687)', 'remaining time': '2:46:36', 'throughput': '8037.80', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:11:34,448 >> {'loss': 0.2735, 'grad_norm': 12.979419708251953, 'learning_rate': 1.1899057695068954e-07, 'epoch': 0.043938161106590726, 'num_input_tokens_seen': 7050625024, 'completed': '91.19% (3_362 / 3_687)', 'remaining time': '2:46:09', 'throughput': '8133.37', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:12:06,137 >> {'loss': 0.5001, 'grad_norm': 20.514404296875, 'learning_rate': 1.1887471430595028e-07, 'epoch': 0.04420938432329807, 'num_input_tokens_seen': 7052722176, 'completed': '91.21% (3_363 / 3_687)', 'remaining time': '2:45:40', 'throughput': '8272.51', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:12:34,639 >> {'loss': 0.492, 'grad_norm': 20.933198928833008, 'learning_rate': 1.187591986144354e-07, 'epoch': 0.04448060754000543, 'num_input_tokens_seen': 7054819328, 'completed': '91.24% (3_364 / 3_687)', 'remaining time': '2:45:05', 'throughput': '9197.23', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:13:02,589 >> {'loss': 0.6521, 'grad_norm': 21.284048080444336, 'learning_rate': 1.1864402996910748e-07, 'epoch': 0.044751830756712775, 'num_input_tokens_seen': 7056916480, 'completed': '91.27% (3_365 / 3_687)', 'remaining time': '2:44:29', 'throughput': '9378.92', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:13:31,958 >> {'loss': 0.4218, 'grad_norm': 30.391963958740234, 'learning_rate': 1.1852920846265002e-07, 'epoch': 0.04502305397342012, 'num_input_tokens_seen': 7059013632, 'completed': '91.29% (3_366 / 3_687)', 'remaining time': '2:43:56', 'throughput': '8925.92', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:14:00,115 >> {'loss': 0.5948, 'grad_norm': 20.589599609375, 'learning_rate': 1.1841473418746685e-07, 'epoch': 0.04529427719012748, 'num_input_tokens_seen': 7061110784, 'completed': '91.32% (3_367 / 3_687)', 'remaining time': '2:43:21', 'throughput': '9310.21', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:14:30,671 >> {'loss': 0.3326, 'grad_norm': 16.451263427734375, 'learning_rate': 1.1830060723568256e-07, 'epoch': 0.045565500406834825, 'num_input_tokens_seen': 7063207936, 'completed': '91.35% (3_368 / 3_687)', 'remaining time': '2:42:50', 'throughput': '8579.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:14:59,363 >> {'loss': 0.4692, 'grad_norm': 18.889476776123047, 'learning_rate': 1.1818682769914226e-07, 'epoch': 0.04583672362354217, 'num_input_tokens_seen': 7065305088, 'completed': '91.38% (3_369 / 3_687)', 'remaining time': '2:42:16', 'throughput': '9136.34', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:15:29,835 >> {'loss': 0.5904, 'grad_norm': 22.5378475189209, 'learning_rate': 1.1807339566941123e-07, 'epoch': 0.04610794684024953, 'num_input_tokens_seen': 7067402240, 'completed': '91.40% (3_370 / 3_687)', 'remaining time': '2:41:45', 'throughput': '8602.92', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:15:58,723 >> {'loss': 0.668, 'grad_norm': 18.752378463745117, 'learning_rate': 1.1796031123777546e-07, 'epoch': 0.046379170056956874, 'num_input_tokens_seen': 7069499392, 'completed': '91.43% (3_371 / 3_687)', 'remaining time': '2:41:11', 'throughput': '9074.35', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:16:28,695 >> {'loss': 0.3436, 'grad_norm': 14.500882148742676, 'learning_rate': 1.178475744952408e-07, 'epoch': 0.04665039327366423, 'num_input_tokens_seen': 7071596544, 'completed': '91.46% (3_372 / 3_687)', 'remaining time': '2:40:39', 'throughput': '8746.43', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:16:58,156 >> {'loss': 0.3595, 'grad_norm': 16.3090877532959, 'learning_rate': 1.1773518553253352e-07, 'epoch': 0.046921616490371576, 'num_input_tokens_seen': 7073693696, 'completed': '91.48% (3_373 / 3_687)', 'remaining time': '2:40:06', 'throughput': '8897.90', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:17:28,832 >> {'loss': 0.4885, 'grad_norm': 20.062055587768555, 'learning_rate': 1.1762314444010002e-07, 'epoch': 0.047192839707078924, 'num_input_tokens_seen': 7075790848, 'completed': '91.51% (3_374 / 3_687)', 'remaining time': '2:39:36', 'throughput': '8545.51', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:17:59,420 >> {'loss': 0.1721, 'grad_norm': 13.69767951965332, 'learning_rate': 1.1751145130810675e-07, 'epoch': 0.04746406292378628, 'num_input_tokens_seen': 7077888000, 'completed': '91.54% (3_375 / 3_687)', 'remaining time': '2:39:05', 'throughput': '8570.24', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:18:29,328 >> {'loss': 0.3988, 'grad_norm': 15.124690055847168, 'learning_rate': 1.1740010622644008e-07, 'epoch': 0.047735286140493625, 'num_input_tokens_seen': 7079985152, 'completed': '91.56% (3_376 / 3_687)', 'remaining time': '2:38:34', 'throughput': '8764.89', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:19:01,777 >> {'loss': 0.53, 'grad_norm': 22.62101936340332, 'learning_rate': 1.1728910928470629e-07, 'epoch': 0.04800650935720097, 'num_input_tokens_seen': 7082082304, 'completed': '91.59% (3_377 / 3_687)', 'remaining time': '2:38:06', 'throughput': '8078.67', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:19:32,715 >> {'loss': 0.2442, 'grad_norm': 12.13560962677002, 'learning_rate': 1.1717846057223143e-07, 'epoch': 0.04827773257390833, 'num_input_tokens_seen': 7084179456, 'completed': '91.62% (3_378 / 3_687)', 'remaining time': '2:37:36', 'throughput': '8473.31', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:20:04,305 >> {'loss': 0.548, 'grad_norm': 22.917356491088867, 'learning_rate': 1.1706816017806142e-07, 'epoch': 0.048548955790615675, 'num_input_tokens_seen': 7086276608, 'completed': '91.65% (3_379 / 3_687)', 'remaining time': '2:37:07', 'throughput': '8298.33', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:20:32,850 >> {'loss': 0.4848, 'grad_norm': 26.45726776123047, 'learning_rate': 1.1695820819096186e-07, 'epoch': 0.04882017900732303, 'num_input_tokens_seen': 7088373760, 'completed': '91.67% (3_380 / 3_687)', 'remaining time': '2:36:33', 'throughput': '9183.52', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:21:03,414 >> {'loss': 0.8083, 'grad_norm': 25.90103530883789, 'learning_rate': 1.1684860469941785e-07, 'epoch': 0.04909140222403038, 'num_input_tokens_seen': 7090470912, 'completed': '91.70% (3_381 / 3_687)', 'remaining time': '2:36:02', 'throughput': '8576.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:21:36,860 >> {'loss': 0.5493, 'grad_norm': 22.642337799072266, 'learning_rate': 1.1673934979163417e-07, 'epoch': 0.049362625440737724, 'num_input_tokens_seen': 7092568064, 'completed': '91.73% (3_382 / 3_687)', 'remaining time': '2:35:37', 'throughput': '7837.87', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:22:06,823 >> {'loss': 0.2542, 'grad_norm': 11.539298057556152, 'learning_rate': 1.1663044355553493e-07, 'epoch': 0.04963384865744508, 'num_input_tokens_seen': 7094665216, 'completed': '91.75% (3_383 / 3_687)', 'remaining time': '2:35:05', 'throughput': '8748.95', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:22:36,809 >> {'loss': 0.4264, 'grad_norm': 19.434553146362305, 'learning_rate': 1.1652188607876379e-07, 'epoch': 0.049905071874152426, 'num_input_tokens_seen': 7096762368, 'completed': '91.78% (3_384 / 3_687)', 'remaining time': '2:34:33', 'throughput': '8742.24', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:23:02,851 >> {'loss': 0.8053, 'grad_norm': 28.15933609008789, 'learning_rate': 1.1641367744868372e-07, 'epoch': 0.05017629509085978, 'num_input_tokens_seen': 7098859520, 'completed': '91.81% (3_385 / 3_687)', 'remaining time': '2:33:55', 'throughput': '10066.30', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:23:32,419 >> {'loss': 0.3073, 'grad_norm': 23.65896224975586, 'learning_rate': 1.1630581775237681e-07, 'epoch': 0.05044751830756713, 'num_input_tokens_seen': 7100956672, 'completed': '91.84% (3_386 / 3_687)', 'remaining time': '2:33:23', 'throughput': '8865.69', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:24:04,447 >> {'loss': 0.2546, 'grad_norm': 14.19971752166748, 'learning_rate': 1.1619830707664446e-07, 'epoch': 0.050718741524274476, 'num_input_tokens_seen': 7103053824, 'completed': '91.86% (3_387 / 3_687)', 'remaining time': '2:32:55', 'throughput': '8184.81', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:24:39,092 >> {'loss': 0.3369, 'grad_norm': 17.83209800720215, 'learning_rate': 1.160911455080073e-07, 'epoch': 0.05098996474098183, 'num_input_tokens_seen': 7105150976, 'completed': '91.89% (3_388 / 3_687)', 'remaining time': '2:32:31', 'throughput': '7566.58', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:25:09,609 >> {'loss': 0.5246, 'grad_norm': 21.761837005615234, 'learning_rate': 1.1598433313270472e-07, 'epoch': 0.05126118795768918, 'num_input_tokens_seen': 7107248128, 'completed': '91.92% (3_389 / 3_687)', 'remaining time': '2:32:00', 'throughput': '8590.11', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:25:40,741 >> {'loss': 0.4256, 'grad_norm': 26.923656463623047, 'learning_rate': 1.1587787003669523e-07, 'epoch': 0.051532411174396525, 'num_input_tokens_seen': 7109345280, 'completed': '91.94% (3_390 / 3_687)', 'remaining time': '2:31:30', 'throughput': '8420.43', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:26:10,647 >> {'loss': 0.4239, 'grad_norm': 16.93385124206543, 'learning_rate': 1.1577175630565646e-07, 'epoch': 0.05180363439110388, 'num_input_tokens_seen': 7111442432, 'completed': '91.97% (3_391 / 3_687)', 'remaining time': '2:30:58', 'throughput': '8765.52', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:26:40,387 >> {'loss': 0.7451, 'grad_norm': 28.331945419311523, 'learning_rate': 1.1566599202498456e-07, 'epoch': 0.05207485760781123, 'num_input_tokens_seen': 7113539584, 'completed': '92.00% (3_392 / 3_687)', 'remaining time': '2:30:26', 'throughput': '8814.62', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:27:09,319 >> {'loss': 0.3069, 'grad_norm': 13.455109596252441, 'learning_rate': 1.1556057727979468e-07, 'epoch': 0.05234608082451858, 'num_input_tokens_seen': 7115636736, 'completed': '92.03% (3_393 / 3_687)', 'remaining time': '2:29:53', 'throughput': '9060.69', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:27:39,104 >> {'loss': 0.5828, 'grad_norm': 22.329641342163086, 'learning_rate': 1.1545551215492055e-07, 'epoch': 0.05261730404122593, 'num_input_tokens_seen': 7117733888, 'completed': '92.05% (3_394 / 3_687)', 'remaining time': '2:29:22', 'throughput': '8801.09', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:28:11,269 >> {'loss': 0.4792, 'grad_norm': 23.203813552856445, 'learning_rate': 1.1535079673491458e-07, 'epoch': 0.05288852725793328, 'num_input_tokens_seen': 7119831040, 'completed': '92.08% (3_395 / 3_687)', 'remaining time': '2:28:53', 'throughput': '8149.98', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:28:44,557 >> {'loss': 0.2758, 'grad_norm': 12.3374662399292, 'learning_rate': 1.1524643110404778e-07, 'epoch': 0.05315975047464063, 'num_input_tokens_seen': 7121928192, 'completed': '92.11% (3_396 / 3_687)', 'remaining time': '2:28:27', 'throughput': '7874.97', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:29:17,614 >> {'loss': 0.3412, 'grad_norm': 16.85381507873535, 'learning_rate': 1.1514241534630963e-07, 'epoch': 0.05343097369134798, 'num_input_tokens_seen': 7124025344, 'completed': '92.13% (3_397 / 3_687)', 'remaining time': '2:28:00', 'throughput': '7930.11', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:29:47,655 >> {'loss': 0.429, 'grad_norm': 18.195323944091797, 'learning_rate': 1.1503874954540804e-07, 'epoch': 0.05370219690805533, 'num_input_tokens_seen': 7126122496, 'completed': '92.16% (3_398 / 3_687)', 'remaining time': '2:27:28', 'throughput': '8726.41', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:30:16,878 >> {'loss': 0.3129, 'grad_norm': 15.976231575012207, 'learning_rate': 1.1493543378476927e-07, 'epoch': 0.05397342012476268, 'num_input_tokens_seen': 7128219648, 'completed': '92.19% (3_399 / 3_687)', 'remaining time': '2:26:56', 'throughput': '8970.47', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:30:49,412 >> {'loss': 0.4292, 'grad_norm': 15.415371894836426, 'learning_rate': 1.1483246814753798e-07, 'epoch': 0.05424464334147003, 'num_input_tokens_seen': 7130316800, 'completed': '92.22% (3_400 / 3_687)', 'remaining time': '2:26:28', 'throughput': '8057.41', 'gpu_mem_free': '5581MB'} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( [INFO|trainer.py:3503] 2024-12-27 00:31:13,593 >> Saving model checkpoint to /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3400 [INFO|configuration_utils.py:472] 2024-12-27 00:31:13,597 >> Configuration saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3400/config.json [INFO|configuration_utils.py:807] 2024-12-27 00:31:13,598 >> Configuration saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3400/generation_config.json [INFO|modeling_utils.py:2807] 2024-12-27 00:32:10,534 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 7 checkpoint shards. You can find where each parameters has been saved in the index located at /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3400/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2684] 2024-12-27 00:32:10,537 >> tokenizer config file saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3400/tokenizer_config.json [INFO|tokenization_utils_base.py:2693] 2024-12-27 00:32:10,538 >> Special tokens file saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3400/special_tokens_map.json /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( 12/27/2024 00:35:42 - WARNING - streaming.base.dataset - Because `shuffle_block_size` was not specified, it will default to max(4_000_000 // num_canonical_nodes, 1 << 18) if num_canonical_nodes is not None, otherwise 262144. Prior to Streaming v0.7.0, `shuffle_block_size` defaulted to 262144. [WARNING|trainer.py:868] 2024-12-27 00:35:42,146 >> Save streaming dataset state: {'epoch': 0, 'sample_in_epoch': 6800, 'num_canonical_nodes': 1, 'shuffle_seed': 42, 'initial_physical_nodes': 1} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] [INFO|trainer.py:175] 2024-12-27 00:36:12,802 >> {'loss': 0.3157, 'grad_norm': 12.561007499694824, 'learning_rate': 1.1472985271657697e-07, 'epoch': 0.05451586655817738, 'num_input_tokens_seen': 7132413952, 'completed': '92.24% (3_401 / 3_687)', 'remaining time': '2:32:54', 'throughput': '810.61', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:36:44,201 >> {'loss': 0.515, 'grad_norm': 19.007686614990234, 'learning_rate': 1.1462758757446728e-07, 'epoch': 0.05478708977488473, 'num_input_tokens_seen': 7134511104, 'completed': '92.27% (3_402 / 3_687)', 'remaining time': '2:32:21', 'throughput': '8348.69', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:37:15,910 >> {'loss': 0.2947, 'grad_norm': 14.344964027404785, 'learning_rate': 1.1452567280350789e-07, 'epoch': 0.05505831299159208, 'num_input_tokens_seen': 7136608256, 'completed': '92.30% (3_403 / 3_687)', 'remaining time': '2:31:48', 'throughput': '8267.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:37:50,171 >> {'loss': 0.2962, 'grad_norm': 14.677167892456055, 'learning_rate': 1.1442410848571602e-07, 'epoch': 0.05532953620829943, 'num_input_tokens_seen': 7138705408, 'completed': '92.32% (3_404 / 3_687)', 'remaining time': '2:31:19', 'throughput': '7651.38', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:38:20,440 >> {'loss': 0.3201, 'grad_norm': 15.091597557067871, 'learning_rate': 1.1432289470282683e-07, 'epoch': 0.05560075942500678, 'num_input_tokens_seen': 7140802560, 'completed': '92.35% (3_405 / 3_687)', 'remaining time': '2:30:44', 'throughput': '8660.52', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:38:54,015 >> {'loss': 0.3313, 'grad_norm': 16.691532135009766, 'learning_rate': 1.1422203153629312e-07, 'epoch': 0.055871982641714134, 'num_input_tokens_seen': 7142899712, 'completed': '92.38% (3_406 / 3_687)', 'remaining time': '2:30:14', 'throughput': '7807.86', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:39:25,438 >> {'loss': 0.3258, 'grad_norm': 16.004104614257812, 'learning_rate': 1.1412151906728589e-07, 'epoch': 0.05614320585842148, 'num_input_tokens_seen': 7144996864, 'completed': '92.41% (3_407 / 3_687)', 'remaining time': '2:29:41', 'throughput': '8342.39', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:39:52,536 >> {'loss': 0.6703, 'grad_norm': 20.203420639038086, 'learning_rate': 1.1402135737669372e-07, 'epoch': 0.05641442907512883, 'num_input_tokens_seen': 7147094016, 'completed': '92.43% (3_408 / 3_687)', 'remaining time': '2:29:03', 'throughput': '9673.66', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:40:22,566 >> {'loss': 0.4107, 'grad_norm': 16.784305572509766, 'learning_rate': 1.1392154654512289e-07, 'epoch': 0.05668565229183618, 'num_input_tokens_seen': 7149191168, 'completed': '92.46% (3_409 / 3_687)', 'remaining time': '2:28:28', 'throughput': '8729.37', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:40:56,850 >> {'loss': 0.5004, 'grad_norm': 18.371356964111328, 'learning_rate': 1.1382208665289742e-07, 'epoch': 0.05695687550854353, 'num_input_tokens_seen': 7151288320, 'completed': '92.49% (3_410 / 3_687)', 'remaining time': '2:27:59', 'throughput': '7646.25', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:41:22,503 >> {'loss': 0.7699, 'grad_norm': 28.510942459106445, 'learning_rate': 1.1372297778005883e-07, 'epoch': 0.05722809872525088, 'num_input_tokens_seen': 7153385472, 'completed': '92.51% (3_411 / 3_687)', 'remaining time': '2:27:18', 'throughput': '10218.95', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:41:53,660 >> {'loss': 0.399, 'grad_norm': 20.71713638305664, 'learning_rate': 1.1362422000636609e-07, 'epoch': 0.05749932194195823, 'num_input_tokens_seen': 7155482624, 'completed': '92.54% (3_412 / 3_687)', 'remaining time': '2:26:45', 'throughput': '8413.57', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:42:23,310 >> {'loss': 0.4794, 'grad_norm': 17.85915184020996, 'learning_rate': 1.135258134112958e-07, 'epoch': 0.05777054515866558, 'num_input_tokens_seen': 7157579776, 'completed': '92.57% (3_413 / 3_687)', 'remaining time': '2:26:10', 'throughput': '8841.49', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:42:56,610 >> {'loss': 0.4149, 'grad_norm': 16.834300994873047, 'learning_rate': 1.1342775807404177e-07, 'epoch': 0.058041768375372935, 'num_input_tokens_seen': 7159676928, 'completed': '92.60% (3_414 / 3_687)', 'remaining time': '2:25:40', 'throughput': '7872.09', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:43:27,351 >> {'loss': 0.3177, 'grad_norm': 14.176383018493652, 'learning_rate': 1.1333005407351516e-07, 'epoch': 0.05831299159208028, 'num_input_tokens_seen': 7161774080, 'completed': '92.62% (3_415 / 3_687)', 'remaining time': '2:25:06', 'throughput': '8527.67', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:43:54,966 >> {'loss': 0.6466, 'grad_norm': 20.5920352935791, 'learning_rate': 1.1323270148834461e-07, 'epoch': 0.05858421480878763, 'num_input_tokens_seen': 7163871232, 'completed': '92.65% (3_416 / 3_687)', 'remaining time': '2:24:29', 'throughput': '9492.50', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:44:24,163 >> {'loss': 0.5934, 'grad_norm': 20.29216194152832, 'learning_rate': 1.1313570039687571e-07, 'epoch': 0.058855438025494984, 'num_input_tokens_seen': 7165968384, 'completed': '92.68% (3_417 / 3_687)', 'remaining time': '2:23:53', 'throughput': '8978.71', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:44:56,252 >> {'loss': 0.4848, 'grad_norm': 19.53738784790039, 'learning_rate': 1.1303905087717111e-07, 'epoch': 0.05912666124220233, 'num_input_tokens_seen': 7168065536, 'completed': '92.70% (3_418 / 3_687)', 'remaining time': '2:23:21', 'throughput': '8169.13', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:45:25,334 >> {'loss': 0.3478, 'grad_norm': 16.945344924926758, 'learning_rate': 1.1294275300701085e-07, 'epoch': 0.059397884458909686, 'num_input_tokens_seen': 7170162688, 'completed': '92.73% (3_419 / 3_687)', 'remaining time': '2:22:46', 'throughput': '9013.97', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:45:56,730 >> {'loss': 0.4514, 'grad_norm': 17.19941520690918, 'learning_rate': 1.1284680686389163e-07, 'epoch': 0.059669107675617034, 'num_input_tokens_seen': 7172259840, 'completed': '92.76% (3_420 / 3_687)', 'remaining time': '2:22:13', 'throughput': '8349.66', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:46:26,053 >> {'loss': 0.3855, 'grad_norm': 14.423097610473633, 'learning_rate': 1.1275121252502738e-07, 'epoch': 0.05994033089232438, 'num_input_tokens_seen': 7174356992, 'completed': '92.79% (3_421 / 3_687)', 'remaining time': '2:21:38', 'throughput': '8939.77', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:47:00,381 >> {'loss': 0.3488, 'grad_norm': 13.27149772644043, 'learning_rate': 1.1265597006734872e-07, 'epoch': 0.060211554109031735, 'num_input_tokens_seen': 7176454144, 'completed': '92.81% (3_422 / 3_687)', 'remaining time': '2:21:09', 'throughput': '7636.57', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:47:28,882 >> {'loss': 0.4548, 'grad_norm': 17.755043029785156, 'learning_rate': 1.1256107956750319e-07, 'epoch': 0.06048277732573908, 'num_input_tokens_seen': 7178551296, 'completed': '92.84% (3_423 / 3_687)', 'remaining time': '2:20:33', 'throughput': '9197.63', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:47:57,825 >> {'loss': 0.186, 'grad_norm': 15.190084457397461, 'learning_rate': 1.1246654110185501e-07, 'epoch': 0.06075400054244643, 'num_input_tokens_seen': 7180648448, 'completed': '92.87% (3_424 / 3_687)', 'remaining time': '2:19:57', 'throughput': '9057.29', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:48:26,252 >> {'loss': 0.852, 'grad_norm': 23.48501968383789, 'learning_rate': 1.1237235474648516e-07, 'epoch': 0.061025223759153785, 'num_input_tokens_seen': 7182745600, 'completed': '92.89% (3_425 / 3_687)', 'remaining time': '2:19:21', 'throughput': '9221.51', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:48:58,597 >> {'loss': 0.5567, 'grad_norm': 18.804317474365234, 'learning_rate': 1.1227852057719125e-07, 'epoch': 0.06129644697586113, 'num_input_tokens_seen': 7184842752, 'completed': '92.92% (3_426 / 3_687)', 'remaining time': '2:18:50', 'throughput': '8104.73', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:49:29,201 >> {'loss': 0.4162, 'grad_norm': 18.396987915039062, 'learning_rate': 1.121850386694875e-07, 'epoch': 0.06156767019256849, 'num_input_tokens_seen': 7186939904, 'completed': '92.95% (3_427 / 3_687)', 'remaining time': '2:18:17', 'throughput': '8565.70', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:50:00,561 >> {'loss': 0.5276, 'grad_norm': 18.709394454956055, 'learning_rate': 1.1209190909860453e-07, 'epoch': 0.061838893409275834, 'num_input_tokens_seen': 7189037056, 'completed': '92.98% (3_428 / 3_687)', 'remaining time': '2:17:44', 'throughput': '8359.23', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:50:30,064 >> {'loss': 0.5129, 'grad_norm': 25.96376609802246, 'learning_rate': 1.119991319394894e-07, 'epoch': 0.06211011662598318, 'num_input_tokens_seen': 7191134208, 'completed': '93.00% (3_429 / 3_687)', 'remaining time': '2:17:09', 'throughput': '8885.18', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:50:59,554 >> {'loss': 0.6664, 'grad_norm': 24.267688751220703, 'learning_rate': 1.1190670726680579e-07, 'epoch': 0.062381339842690536, 'num_input_tokens_seen': 7193231360, 'completed': '93.03% (3_430 / 3_687)', 'remaining time': '2:16:35', 'throughput': '8889.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:51:31,994 >> {'loss': 0.2624, 'grad_norm': 12.63032054901123, 'learning_rate': 1.1181463515493336e-07, 'epoch': 0.06265256305939788, 'num_input_tokens_seen': 7195328512, 'completed': '93.06% (3_431 / 3_687)', 'remaining time': '2:16:04', 'throughput': '8081.04', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:52:00,887 >> {'loss': 0.3512, 'grad_norm': 18.082197189331055, 'learning_rate': 1.1172291567796846e-07, 'epoch': 0.06292378627610523, 'num_input_tokens_seen': 7197425664, 'completed': '93.08% (3_432 / 3_687)', 'remaining time': '2:15:28', 'throughput': '9072.89', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:52:31,061 >> {'loss': 0.3531, 'grad_norm': 13.270282745361328, 'learning_rate': 1.1163154890972333e-07, 'epoch': 0.06319500949281258, 'num_input_tokens_seen': 7199522816, 'completed': '93.11% (3_433 / 3_687)', 'remaining time': '2:14:55', 'throughput': '8687.63', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:53:01,901 >> {'loss': 0.2167, 'grad_norm': 11.08203411102295, 'learning_rate': 1.1154053492372654e-07, 'epoch': 0.06346623270951994, 'num_input_tokens_seen': 7201619968, 'completed': '93.14% (3_434 / 3_687)', 'remaining time': '2:14:22', 'throughput': '8500.12', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:53:31,000 >> {'loss': 0.5286, 'grad_norm': 18.60203742980957, 'learning_rate': 1.1144987379322254e-07, 'epoch': 0.06373745592622729, 'num_input_tokens_seen': 7203717120, 'completed': '93.17% (3_435 / 3_687)', 'remaining time': '2:13:47', 'throughput': '9008.77', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:53:58,432 >> {'loss': 0.7746, 'grad_norm': 22.605010986328125, 'learning_rate': 1.1135956559117207e-07, 'epoch': 0.06400867914293464, 'num_input_tokens_seen': 7205814272, 'completed': '93.19% (3_436 / 3_687)', 'remaining time': '2:13:10', 'throughput': '9556.19', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:54:29,459 >> {'loss': 0.1999, 'grad_norm': 10.375788688659668, 'learning_rate': 1.1126961039025168e-07, 'epoch': 0.06427990235964198, 'num_input_tokens_seen': 7207911424, 'completed': '93.22% (3_437 / 3_687)', 'remaining time': '2:12:38', 'throughput': '8448.71', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:54:58,658 >> {'loss': 0.3321, 'grad_norm': 16.99294662475586, 'learning_rate': 1.111800082628539e-07, 'epoch': 0.06455112557634933, 'num_input_tokens_seen': 7210008576, 'completed': '93.25% (3_438 / 3_687)', 'remaining time': '2:12:03', 'throughput': '8977.87', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:55:26,427 >> {'loss': 0.517, 'grad_norm': 18.679296493530273, 'learning_rate': 1.1109075928108715e-07, 'epoch': 0.06482234879305669, 'num_input_tokens_seen': 7212105728, 'completed': '93.27% (3_439 / 3_687)', 'remaining time': '2:11:27', 'throughput': '9440.38', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:55:55,179 >> {'loss': 0.8827, 'grad_norm': 25.95256233215332, 'learning_rate': 1.1100186351677567e-07, 'epoch': 0.06509357200976404, 'num_input_tokens_seen': 7214202880, 'completed': '93.30% (3_440 / 3_687)', 'remaining time': '2:10:52', 'throughput': '9117.18', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:56:25,579 >> {'loss': 0.412, 'grad_norm': 15.406830787658691, 'learning_rate': 1.1091332104145921e-07, 'epoch': 0.06536479522647139, 'num_input_tokens_seen': 7216300032, 'completed': '93.33% (3_441 / 3_687)', 'remaining time': '2:10:19', 'throughput': '8623.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:56:56,435 >> {'loss': 0.4441, 'grad_norm': 17.281946182250977, 'learning_rate': 1.1082513192639353e-07, 'epoch': 0.06563601844317873, 'num_input_tokens_seen': 7218397184, 'completed': '93.36% (3_442 / 3_687)', 'remaining time': '2:09:46', 'throughput': '8495.69', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:57:29,832 >> {'loss': 0.2542, 'grad_norm': 11.579797744750977, 'learning_rate': 1.1073729624254984e-07, 'epoch': 0.06590724165988608, 'num_input_tokens_seen': 7220494336, 'completed': '93.38% (3_443 / 3_687)', 'remaining time': '2:09:16', 'throughput': '7849.39', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:57:57,847 >> {'loss': 0.6309, 'grad_norm': 20.977197647094727, 'learning_rate': 1.1064981406061494e-07, 'epoch': 0.06617846487659344, 'num_input_tokens_seen': 7222591488, 'completed': '93.41% (3_444 / 3_687)', 'remaining time': '2:08:40', 'throughput': '9357.08', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:58:26,629 >> {'loss': 0.4147, 'grad_norm': 15.974722862243652, 'learning_rate': 1.1056268545099117e-07, 'epoch': 0.06644968809330079, 'num_input_tokens_seen': 7224688640, 'completed': '93.44% (3_445 / 3_687)', 'remaining time': '2:08:06', 'throughput': '9107.96', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:58:59,415 >> {'loss': 0.2354, 'grad_norm': 11.669343948364258, 'learning_rate': 1.1047591048379635e-07, 'epoch': 0.06672091131000814, 'num_input_tokens_seen': 7226785792, 'completed': '93.46% (3_446 / 3_687)', 'remaining time': '2:07:35', 'throughput': '7995.72', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:59:29,219 >> {'loss': 0.6374, 'grad_norm': 24.988540649414062, 'learning_rate': 1.1038948922886355e-07, 'epoch': 0.06699213452671549, 'num_input_tokens_seen': 7228882944, 'completed': '93.49% (3_447 / 3_687)', 'remaining time': '2:07:01', 'throughput': '8795.65', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 00:59:57,588 >> {'loss': 0.4621, 'grad_norm': 17.950305938720703, 'learning_rate': 1.1030342175574144e-07, 'epoch': 0.06726335774342283, 'num_input_tokens_seen': 7230980096, 'completed': '93.52% (3_448 / 3_687)', 'remaining time': '2:06:26', 'throughput': '9240.56', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:00:29,593 >> {'loss': 0.5804, 'grad_norm': 23.91762351989746, 'learning_rate': 1.1021770813369378e-07, 'epoch': 0.06753458096013018, 'num_input_tokens_seen': 7233077248, 'completed': '93.54% (3_449 / 3_687)', 'remaining time': '2:05:55', 'throughput': '8190.71', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:01:05,483 >> {'loss': 0.2842, 'grad_norm': 17.67766571044922, 'learning_rate': 1.1013234843169967e-07, 'epoch': 0.06780580417683754, 'num_input_tokens_seen': 7235174400, 'completed': '93.57% (3_450 / 3_687)', 'remaining time': '2:05:27', 'throughput': '7303.90', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:01:35,954 >> {'loss': 0.2676, 'grad_norm': 14.409975051879883, 'learning_rate': 1.100473427184534e-07, 'epoch': 0.06807702739354489, 'num_input_tokens_seen': 7237271552, 'completed': '93.60% (3_451 / 3_687)', 'remaining time': '2:04:54', 'throughput': '8603.32', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:02:07,171 >> {'loss': 0.553, 'grad_norm': 25.468711853027344, 'learning_rate': 1.0996269106236425e-07, 'epoch': 0.06834825061025224, 'num_input_tokens_seen': 7239368704, 'completed': '93.63% (3_452 / 3_687)', 'remaining time': '2:04:22', 'throughput': '8397.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:02:37,923 >> {'loss': 0.3635, 'grad_norm': 18.466487884521484, 'learning_rate': 1.0987839353155661e-07, 'epoch': 0.06861947382695958, 'num_input_tokens_seen': 7241465856, 'completed': '93.65% (3_453 / 3_687)', 'remaining time': '2:03:49', 'throughput': '8524.38', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:03:09,528 >> {'loss': 0.4023, 'grad_norm': 15.085163116455078, 'learning_rate': 1.0979445019387e-07, 'epoch': 0.06889069704366693, 'num_input_tokens_seen': 7243563008, 'completed': '93.68% (3_454 / 3_687)', 'remaining time': '2:03:17', 'throughput': '8294.36', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:03:42,269 >> {'loss': 0.3546, 'grad_norm': 15.840180397033691, 'learning_rate': 1.0971086111685883e-07, 'epoch': 0.0691619202603743, 'num_input_tokens_seen': 7245660160, 'completed': '93.71% (3_455 / 3_687)', 'remaining time': '2:02:46', 'throughput': '8006.79', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:04:13,413 >> {'loss': 0.5338, 'grad_norm': 20.410505294799805, 'learning_rate': 1.0962762636779235e-07, 'epoch': 0.06943314347708164, 'num_input_tokens_seen': 7247757312, 'completed': '93.73% (3_456 / 3_687)', 'remaining time': '2:02:14', 'throughput': '8416.94', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:04:44,839 >> {'loss': 0.6095, 'grad_norm': 23.19679832458496, 'learning_rate': 1.0954474601365482e-07, 'epoch': 0.06970436669378899, 'num_input_tokens_seen': 7249854464, 'completed': '93.76% (3_457 / 3_687)', 'remaining time': '2:01:42', 'throughput': '8341.81', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:05:15,354 >> {'loss': 0.4222, 'grad_norm': 14.401229858398438, 'learning_rate': 1.094622201211451e-07, 'epoch': 0.06997558991049634, 'num_input_tokens_seen': 7251951616, 'completed': '93.79% (3_458 / 3_687)', 'remaining time': '2:01:09', 'throughput': '8590.48', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:05:48,017 >> {'loss': 0.4721, 'grad_norm': 17.72669792175293, 'learning_rate': 1.0938004875667689e-07, 'epoch': 0.07024681312720368, 'num_input_tokens_seen': 7254048768, 'completed': '93.82% (3_459 / 3_687)', 'remaining time': '2:00:38', 'throughput': '8025.83', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:06:17,257 >> {'loss': 0.3393, 'grad_norm': 15.16402530670166, 'learning_rate': 1.0929823198637866e-07, 'epoch': 0.07051803634391104, 'num_input_tokens_seen': 7256145920, 'completed': '93.84% (3_460 / 3_687)', 'remaining time': '2:00:04', 'throughput': '8965.29', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:06:47,291 >> {'loss': 0.5465, 'grad_norm': 17.32206153869629, 'learning_rate': 1.0921676987609335e-07, 'epoch': 0.07078925956061839, 'num_input_tokens_seen': 7258243072, 'completed': '93.87% (3_461 / 3_687)', 'remaining time': '1:59:31', 'throughput': '8728.02', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:07:16,248 >> {'loss': 0.6731, 'grad_norm': 20.354280471801758, 'learning_rate': 1.0913566249137865e-07, 'epoch': 0.07106048277732574, 'num_input_tokens_seen': 7260340224, 'completed': '93.90% (3_462 / 3_687)', 'remaining time': '1:58:57', 'throughput': '9053.03', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:07:45,396 >> {'loss': 0.5397, 'grad_norm': 17.26376724243164, 'learning_rate': 1.0905490989750656e-07, 'epoch': 0.07133170599403309, 'num_input_tokens_seen': 7262437376, 'completed': '93.92% (3_463 / 3_687)', 'remaining time': '1:58:23', 'throughput': '8993.47', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:08:14,756 >> {'loss': 0.6668, 'grad_norm': 22.16339683532715, 'learning_rate': 1.0897451215946378e-07, 'epoch': 0.07160292921074043, 'num_input_tokens_seen': 7264534528, 'completed': '93.95% (3_464 / 3_687)', 'remaining time': '1:57:49', 'throughput': '8928.51', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:08:46,748 >> {'loss': 0.5578, 'grad_norm': 25.3452091217041, 'learning_rate': 1.0889446934195141e-07, 'epoch': 0.0718741524274478, 'num_input_tokens_seen': 7266631680, 'completed': '93.98% (3_465 / 3_687)', 'remaining time': '1:57:18', 'throughput': '8194.28', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:09:16,567 >> {'loss': 0.4272, 'grad_norm': 20.26123809814453, 'learning_rate': 1.0881478150938475e-07, 'epoch': 0.07214537564415514, 'num_input_tokens_seen': 7268728832, 'completed': '94.01% (3_466 / 3_687)', 'remaining time': '1:56:45', 'throughput': '8791.19', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:09:47,748 >> {'loss': 0.3703, 'grad_norm': 15.355372428894043, 'learning_rate': 1.0873544872589361e-07, 'epoch': 0.07241659886086249, 'num_input_tokens_seen': 7270825984, 'completed': '94.03% (3_467 / 3_687)', 'remaining time': '1:56:12', 'throughput': '8406.90', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:10:21,328 >> {'loss': 0.4287, 'grad_norm': 19.550086975097656, 'learning_rate': 1.08656471055322e-07, 'epoch': 0.07268782207756984, 'num_input_tokens_seen': 7272923136, 'completed': '94.06% (3_468 / 3_687)', 'remaining time': '1:55:42', 'throughput': '7806.75', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:10:47,794 >> {'loss': 0.8543, 'grad_norm': 23.889299392700195, 'learning_rate': 1.0857784856122812e-07, 'epoch': 0.07295904529427719, 'num_input_tokens_seen': 7275020288, 'completed': '94.09% (3_469 / 3_687)', 'remaining time': '1:55:06', 'throughput': '9904.90', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:11:18,362 >> {'loss': 0.3986, 'grad_norm': 17.23611831665039, 'learning_rate': 1.084995813068843e-07, 'epoch': 0.07323026851098453, 'num_input_tokens_seen': 7277117440, 'completed': '94.11% (3_470 / 3_687)', 'remaining time': '1:54:34', 'throughput': '8575.68', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:11:47,843 >> {'loss': 0.5305, 'grad_norm': 20.09700584411621, 'learning_rate': 1.0842166935527716e-07, 'epoch': 0.0735014917276919, 'num_input_tokens_seen': 7279214592, 'completed': '94.14% (3_471 / 3_687)', 'remaining time': '1:54:00', 'throughput': '8891.88', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:12:18,086 >> {'loss': 0.3567, 'grad_norm': 17.18355941772461, 'learning_rate': 1.0834411276910715e-07, 'epoch': 0.07377271494439924, 'num_input_tokens_seen': 7281311744, 'completed': '94.17% (3_472 / 3_687)', 'remaining time': '1:53:27', 'throughput': '8668.04', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:12:51,765 >> {'loss': 0.5759, 'grad_norm': 20.5414981842041, 'learning_rate': 1.0826691161078895e-07, 'epoch': 0.07404393816110659, 'num_input_tokens_seen': 7283408896, 'completed': '94.20% (3_473 / 3_687)', 'remaining time': '1:52:57', 'throughput': '7783.48', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:13:22,557 >> {'loss': 0.3052, 'grad_norm': 14.221673965454102, 'learning_rate': 1.0819006594245114e-07, 'epoch': 0.07431516137781394, 'num_input_tokens_seen': 7285506048, 'completed': '94.22% (3_474 / 3_687)', 'remaining time': '1:52:25', 'throughput': '8513.50', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:13:53,467 >> {'loss': 0.669, 'grad_norm': 21.631595611572266, 'learning_rate': 1.0811357582593613e-07, 'epoch': 0.07458638459452128, 'num_input_tokens_seen': 7287603200, 'completed': '94.25% (3_475 / 3_687)', 'remaining time': '1:51:53', 'throughput': '8480.87', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:14:22,952 >> {'loss': 0.8402, 'grad_norm': 23.90924644470215, 'learning_rate': 1.0803744132280025e-07, 'epoch': 0.07485760781122865, 'num_input_tokens_seen': 7289700352, 'completed': '94.28% (3_476 / 3_687)', 'remaining time': '1:51:19', 'throughput': '8890.87', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:14:54,838 >> {'loss': 0.5147, 'grad_norm': 19.435346603393555, 'learning_rate': 1.0796166249431371e-07, 'epoch': 0.075128831027936, 'num_input_tokens_seen': 7291797504, 'completed': '94.30% (3_477 / 3_687)', 'remaining time': '1:50:48', 'throughput': '8221.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:15:27,446 >> {'loss': 0.5814, 'grad_norm': 20.027788162231445, 'learning_rate': 1.0788623940146032e-07, 'epoch': 0.07540005424464334, 'num_input_tokens_seen': 7293894656, 'completed': '94.33% (3_478 / 3_687)', 'remaining time': '1:50:17', 'throughput': '8039.15', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:15:58,066 >> {'loss': 0.3769, 'grad_norm': 15.927900314331055, 'learning_rate': 1.0781117210493781e-07, 'epoch': 0.07567127746135069, 'num_input_tokens_seen': 7295991808, 'completed': '94.36% (3_479 / 3_687)', 'remaining time': '1:49:45', 'throughput': '8561.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:16:29,607 >> {'loss': 0.6378, 'grad_norm': 29.270769119262695, 'learning_rate': 1.0773646066515748e-07, 'epoch': 0.07594250067805804, 'num_input_tokens_seen': 7298088960, 'completed': '94.39% (3_480 / 3_687)', 'remaining time': '1:49:13', 'throughput': '8311.15', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:17:01,099 >> {'loss': 0.2933, 'grad_norm': 15.728423118591309, 'learning_rate': 1.0766210514224419e-07, 'epoch': 0.0762137238947654, 'num_input_tokens_seen': 7300186112, 'completed': '94.41% (3_481 / 3_687)', 'remaining time': '1:48:41', 'throughput': '8324.17', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:17:30,277 >> {'loss': 0.4516, 'grad_norm': 19.27528953552246, 'learning_rate': 1.0758810559603651e-07, 'epoch': 0.07648494711147275, 'num_input_tokens_seen': 7302283264, 'completed': '94.44% (3_482 / 3_687)', 'remaining time': '1:48:08', 'throughput': '8984.39', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:18:00,742 >> {'loss': 0.3451, 'grad_norm': 16.80342674255371, 'learning_rate': 1.0751446208608642e-07, 'epoch': 0.07675617032818009, 'num_input_tokens_seen': 7304380416, 'completed': '94.47% (3_483 / 3_687)', 'remaining time': '1:47:35', 'throughput': '8604.69', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:18:29,217 >> {'loss': 0.4333, 'grad_norm': 14.538717269897461, 'learning_rate': 1.0744117467165938e-07, 'epoch': 0.07702739354488744, 'num_input_tokens_seen': 7306477568, 'completed': '94.49% (3_484 / 3_687)', 'remaining time': '1:47:01', 'throughput': '9206.17', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:18:59,805 >> {'loss': 0.3792, 'grad_norm': 15.928664207458496, 'learning_rate': 1.0736824341173442e-07, 'epoch': 0.07729861676159479, 'num_input_tokens_seen': 7308574720, 'completed': '94.52% (3_485 / 3_687)', 'remaining time': '1:46:29', 'throughput': '8570.07', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:19:31,010 >> {'loss': 0.4084, 'grad_norm': 17.85283088684082, 'learning_rate': 1.0729566836500373e-07, 'epoch': 0.07756983997830215, 'num_input_tokens_seen': 7310671872, 'completed': '94.55% (3_486 / 3_687)', 'remaining time': '1:45:57', 'throughput': '8400.80', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:20:00,542 >> {'loss': 0.4754, 'grad_norm': 25.531211853027344, 'learning_rate': 1.07223449589873e-07, 'epoch': 0.0778410631950095, 'num_input_tokens_seen': 7312769024, 'completed': '94.58% (3_487 / 3_687)', 'remaining time': '1:45:24', 'throughput': '8876.49', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:20:30,239 >> {'loss': 0.3349, 'grad_norm': 12.72594165802002, 'learning_rate': 1.0715158714446109e-07, 'epoch': 0.07811228641171684, 'num_input_tokens_seen': 7314866176, 'completed': '94.60% (3_488 / 3_687)', 'remaining time': '1:44:51', 'throughput': '8827.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:20:58,112 >> {'loss': 0.3707, 'grad_norm': 17.275711059570312, 'learning_rate': 1.0708008108660026e-07, 'epoch': 0.07838350962842419, 'num_input_tokens_seen': 7316963328, 'completed': '94.63% (3_489 / 3_687)', 'remaining time': '1:44:17', 'throughput': '9404.95', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:21:28,194 >> {'loss': 0.2781, 'grad_norm': 13.209538459777832, 'learning_rate': 1.0700893147383582e-07, 'epoch': 0.07865473284513154, 'num_input_tokens_seen': 7319060480, 'completed': '94.66% (3_490 / 3_687)', 'remaining time': '1:43:44', 'throughput': '8714.47', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:22:00,613 >> {'loss': 0.5252, 'grad_norm': 17.69983673095703, 'learning_rate': 1.069381383634263e-07, 'epoch': 0.07892595606183889, 'num_input_tokens_seen': 7321157632, 'completed': '94.68% (3_491 / 3_687)', 'remaining time': '1:43:13', 'throughput': '8085.98', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:22:32,099 >> {'loss': 0.4176, 'grad_norm': 16.224292755126953, 'learning_rate': 1.0686770181234322e-07, 'epoch': 0.07919717927854625, 'num_input_tokens_seen': 7323254784, 'completed': '94.71% (3_492 / 3_687)', 'remaining time': '1:42:41', 'throughput': '8325.79', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:23:01,919 >> {'loss': 0.3494, 'grad_norm': 15.399229049682617, 'learning_rate': 1.0679762187727129e-07, 'epoch': 0.0794684024952536, 'num_input_tokens_seen': 7325351936, 'completed': '94.74% (3_493 / 3_687)', 'remaining time': '1:42:09', 'throughput': '8790.78', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:23:35,335 >> {'loss': 0.2833, 'grad_norm': 14.10446548461914, 'learning_rate': 1.0672789861460818e-07, 'epoch': 0.07973962571196094, 'num_input_tokens_seen': 7327449088, 'completed': '94.77% (3_494 / 3_687)', 'remaining time': '1:41:38', 'throughput': '7845.03', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:24:04,508 >> {'loss': 0.57, 'grad_norm': 20.31482696533203, 'learning_rate': 1.0665853208046449e-07, 'epoch': 0.08001084892866829, 'num_input_tokens_seen': 7329546240, 'completed': '94.79% (3_495 / 3_687)', 'remaining time': '1:41:05', 'throughput': '8985.86', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:24:35,533 >> {'loss': 0.3401, 'grad_norm': 18.91710090637207, 'learning_rate': 1.0658952233066381e-07, 'epoch': 0.08028207214537564, 'num_input_tokens_seen': 7331643392, 'completed': '94.82% (3_496 / 3_687)', 'remaining time': '1:40:33', 'throughput': '8449.39', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:25:05,667 >> {'loss': 0.5459, 'grad_norm': 18.957792282104492, 'learning_rate': 1.0652086942074255e-07, 'epoch': 0.080553295362083, 'num_input_tokens_seen': 7333740544, 'completed': '94.85% (3_497 / 3_687)', 'remaining time': '1:40:00', 'throughput': '8699.35', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:25:37,265 >> {'loss': 0.2876, 'grad_norm': 16.166566848754883, 'learning_rate': 1.0645257340594988e-07, 'epoch': 0.08082451857879035, 'num_input_tokens_seen': 7335837696, 'completed': '94.87% (3_498 / 3_687)', 'remaining time': '1:39:29', 'throughput': '8296.11', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:26:10,330 >> {'loss': 0.3123, 'grad_norm': 14.657668113708496, 'learning_rate': 1.06384634341248e-07, 'epoch': 0.0810957417954977, 'num_input_tokens_seen': 7337934848, 'completed': '94.90% (3_499 / 3_687)', 'remaining time': '1:38:58', 'throughput': '7928.17', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:26:39,653 >> {'loss': 0.3603, 'grad_norm': 16.774099349975586, 'learning_rate': 1.0631705228131149e-07, 'epoch': 0.08136696501220504, 'num_input_tokens_seen': 7340032000, 'completed': '94.93% (3_500 / 3_687)', 'remaining time': '1:38:25', 'throughput': '8939.90', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:27:10,835 >> {'loss': 0.261, 'grad_norm': 18.32398223876953, 'learning_rate': 1.0624982728052795e-07, 'epoch': 0.08163818822891239, 'num_input_tokens_seen': 7342129152, 'completed': '94.96% (3_501 / 3_687)', 'remaining time': '1:37:53', 'throughput': '8406.86', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:27:42,525 >> {'loss': 0.3176, 'grad_norm': 13.998818397521973, 'learning_rate': 1.0618295939299752e-07, 'epoch': 0.08190941144561975, 'num_input_tokens_seen': 7344226304, 'completed': '94.98% (3_502 / 3_687)', 'remaining time': '1:37:22', 'throughput': '8272.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:28:12,430 >> {'loss': 0.6581, 'grad_norm': 20.430179595947266, 'learning_rate': 1.0611644867253284e-07, 'epoch': 0.0821806346623271, 'num_input_tokens_seen': 7346323456, 'completed': '95.01% (3_503 / 3_687)', 'remaining time': '1:36:49', 'throughput': '8765.98', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:28:46,128 >> {'loss': 0.637, 'grad_norm': 22.212148666381836, 'learning_rate': 1.0605029517265918e-07, 'epoch': 0.08245185787903445, 'num_input_tokens_seen': 7348420608, 'completed': '95.04% (3_504 / 3_687)', 'remaining time': '1:36:19', 'throughput': '7779.06', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:29:19,065 >> {'loss': 0.3632, 'grad_norm': 18.4092960357666, 'learning_rate': 1.0598449894661445e-07, 'epoch': 0.0827230810957418, 'num_input_tokens_seen': 7350517760, 'completed': '95.06% (3_505 / 3_687)', 'remaining time': '1:35:48', 'throughput': '7958.99', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:29:48,761 >> {'loss': 0.6235, 'grad_norm': 19.119739532470703, 'learning_rate': 1.0591906004734895e-07, 'epoch': 0.08299430431244914, 'num_input_tokens_seen': 7352614912, 'completed': '95.09% (3_506 / 3_687)', 'remaining time': '1:35:16', 'throughput': '8827.48', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:30:18,683 >> {'loss': 0.8699, 'grad_norm': 22.81473159790039, 'learning_rate': 1.0585397852752544e-07, 'epoch': 0.0832655275291565, 'num_input_tokens_seen': 7354712064, 'completed': '95.12% (3_507 / 3_687)', 'remaining time': '1:34:43', 'throughput': '8761.04', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:30:49,338 >> {'loss': 0.568, 'grad_norm': 17.07512855529785, 'learning_rate': 1.0578925443951895e-07, 'epoch': 0.08353675074586385, 'num_input_tokens_seen': 7356809216, 'completed': '95.15% (3_508 / 3_687)', 'remaining time': '1:34:11', 'throughput': '8551.34', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:31:21,236 >> {'loss': 0.2377, 'grad_norm': 10.896271705627441, 'learning_rate': 1.0572488783541702e-07, 'epoch': 0.0838079739625712, 'num_input_tokens_seen': 7358906368, 'completed': '95.17% (3_509 / 3_687)', 'remaining time': '1:33:39', 'throughput': '8218.28', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:31:50,724 >> {'loss': 0.3153, 'grad_norm': 15.978910446166992, 'learning_rate': 1.0566087876701941e-07, 'epoch': 0.08407919717927854, 'num_input_tokens_seen': 7361003520, 'completed': '95.20% (3_510 / 3_687)', 'remaining time': '1:33:07', 'throughput': '8889.69', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:32:18,333 >> {'loss': 0.5863, 'grad_norm': 21.776466369628906, 'learning_rate': 1.0559722728583825e-07, 'epoch': 0.08435042039598589, 'num_input_tokens_seen': 7363100672, 'completed': '95.23% (3_511 / 3_687)', 'remaining time': '1:32:33', 'throughput': '9495.04', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:32:49,973 >> {'loss': 0.3144, 'grad_norm': 11.799039840698242, 'learning_rate': 1.0553393344309775e-07, 'epoch': 0.08462164361269324, 'num_input_tokens_seen': 7365197824, 'completed': '95.25% (3_512 / 3_687)', 'remaining time': '1:32:01', 'throughput': '8285.08', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:33:21,362 >> {'loss': 0.3156, 'grad_norm': 14.083271980285645, 'learning_rate': 1.054709972897344e-07, 'epoch': 0.0848928668294006, 'num_input_tokens_seen': 7367294976, 'completed': '95.28% (3_513 / 3_687)', 'remaining time': '1:31:30', 'throughput': '8351.46', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:33:48,461 >> {'loss': 0.7161, 'grad_norm': 20.98029899597168, 'learning_rate': 1.0540841887639698e-07, 'epoch': 0.08516409004610795, 'num_input_tokens_seen': 7369392128, 'completed': '95.31% (3_514 / 3_687)', 'remaining time': '1:30:56', 'throughput': '9673.88', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:34:20,847 >> {'loss': 0.3341, 'grad_norm': 16.61429786682129, 'learning_rate': 1.0534619825344596e-07, 'epoch': 0.0854353132628153, 'num_input_tokens_seen': 7371489280, 'completed': '95.33% (3_515 / 3_687)', 'remaining time': '1:30:25', 'throughput': '8094.29', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:34:52,847 >> {'loss': 0.3035, 'grad_norm': 13.247276306152344, 'learning_rate': 1.052843354709543e-07, 'epoch': 0.08570653647952264, 'num_input_tokens_seen': 7373586432, 'completed': '95.36% (3_516 / 3_687)', 'remaining time': '1:29:53', 'throughput': '8191.83', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:35:25,374 >> {'loss': 0.3785, 'grad_norm': 17.826738357543945, 'learning_rate': 1.0522283057870675e-07, 'epoch': 0.08597775969622999, 'num_input_tokens_seen': 7375683584, 'completed': '95.39% (3_517 / 3_687)', 'remaining time': '1:29:22', 'throughput': '8059.46', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:35:59,898 >> {'loss': 0.4728, 'grad_norm': 21.997249603271484, 'learning_rate': 1.0516168362620013e-07, 'epoch': 0.08624898291293735, 'num_input_tokens_seen': 7377780736, 'completed': '95.42% (3_518 / 3_687)', 'remaining time': '1:28:52', 'throughput': '7592.94', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:36:31,380 >> {'loss': 0.4232, 'grad_norm': 18.08944320678711, 'learning_rate': 1.0510089466264321e-07, 'epoch': 0.0865202061296447, 'num_input_tokens_seen': 7379877888, 'completed': '95.44% (3_519 / 3_687)', 'remaining time': '1:28:21', 'throughput': '8326.93', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:37:03,713 >> {'loss': 0.2725, 'grad_norm': 12.958785057067871, 'learning_rate': 1.0504046373695648e-07, 'epoch': 0.08679142934635205, 'num_input_tokens_seen': 7381975040, 'completed': '95.47% (3_520 / 3_687)', 'remaining time': '1:27:50', 'throughput': '8107.54', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:37:32,534 >> {'loss': 0.5077, 'grad_norm': 18.204368591308594, 'learning_rate': 1.0498039089777265e-07, 'epoch': 0.0870626525630594, 'num_input_tokens_seen': 7384072192, 'completed': '95.50% (3_521 / 3_687)', 'remaining time': '1:27:17', 'throughput': '9095.51', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:38:05,347 >> {'loss': 0.4001, 'grad_norm': 15.504716873168945, 'learning_rate': 1.0492067619343594e-07, 'epoch': 0.08733387577976674, 'num_input_tokens_seen': 7386169344, 'completed': '95.52% (3_522 / 3_687)', 'remaining time': '1:26:46', 'throughput': '7989.19', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:38:36,212 >> {'loss': 0.196, 'grad_norm': 12.572887420654297, 'learning_rate': 1.0486131967200254e-07, 'epoch': 0.0876050989964741, 'num_input_tokens_seen': 7388266496, 'completed': '95.55% (3_523 / 3_687)', 'remaining time': '1:26:14', 'throughput': '8493.26', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:39:06,299 >> {'loss': 0.2463, 'grad_norm': 12.989893913269043, 'learning_rate': 1.048023213812403e-07, 'epoch': 0.08787632221318145, 'num_input_tokens_seen': 7390363648, 'completed': '95.58% (3_524 / 3_687)', 'remaining time': '1:25:42', 'throughput': '8712.73', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:39:37,569 >> {'loss': 0.4123, 'grad_norm': 18.88071632385254, 'learning_rate': 1.0474368136862876e-07, 'epoch': 0.0881475454298888, 'num_input_tokens_seen': 7392460800, 'completed': '95.61% (3_525 / 3_687)', 'remaining time': '1:25:10', 'throughput': '8383.23', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:40:06,314 >> {'loss': 0.6547, 'grad_norm': 24.264707565307617, 'learning_rate': 1.0468539968135922e-07, 'epoch': 0.08841876864659615, 'num_input_tokens_seen': 7394557952, 'completed': '95.63% (3_526 / 3_687)', 'remaining time': '1:24:37', 'throughput': '9119.80', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:40:37,022 >> {'loss': 0.4053, 'grad_norm': 17.752891540527344, 'learning_rate': 1.046274763663345e-07, 'epoch': 0.0886899918633035, 'num_input_tokens_seen': 7396655104, 'completed': '95.66% (3_527 / 3_687)', 'remaining time': '1:24:05', 'throughput': '8536.48', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:41:07,756 >> {'loss': 0.5896, 'grad_norm': 21.44291114807129, 'learning_rate': 1.045699114701691e-07, 'epoch': 0.08896121508001086, 'num_input_tokens_seen': 7398752256, 'completed': '95.69% (3_528 / 3_687)', 'remaining time': '1:23:33', 'throughput': '8529.49', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:41:37,878 >> {'loss': 0.6174, 'grad_norm': 27.565635681152344, 'learning_rate': 1.0451270503918906e-07, 'epoch': 0.0892324382967182, 'num_input_tokens_seen': 7400849408, 'completed': '95.71% (3_529 / 3_687)', 'remaining time': '1:23:01', 'throughput': '8702.86', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:42:07,157 >> {'loss': 0.6561, 'grad_norm': 22.987125396728516, 'learning_rate': 1.0445585711943205e-07, 'epoch': 0.08950366151342555, 'num_input_tokens_seen': 7402946560, 'completed': '95.74% (3_530 / 3_687)', 'remaining time': '1:22:28', 'throughput': '8953.24', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:42:36,518 >> {'loss': 0.5277, 'grad_norm': 17.22407341003418, 'learning_rate': 1.0439936775664699e-07, 'epoch': 0.0897748847301329, 'num_input_tokens_seen': 7405043712, 'completed': '95.77% (3_531 / 3_687)', 'remaining time': '1:21:56', 'throughput': '8928.35', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:43:08,551 >> {'loss': 0.3715, 'grad_norm': 17.74474334716797, 'learning_rate': 1.043432369962943e-07, 'epoch': 0.09004610794684025, 'num_input_tokens_seen': 7407140864, 'completed': '95.80% (3_532 / 3_687)', 'remaining time': '1:21:24', 'throughput': '8183.52', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:43:40,438 >> {'loss': 0.4398, 'grad_norm': 14.838858604431152, 'learning_rate': 1.0428746488354606e-07, 'epoch': 0.09031733116354759, 'num_input_tokens_seen': 7409238016, 'completed': '95.82% (3_533 / 3_687)', 'remaining time': '1:20:53', 'throughput': '8221.07', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:44:11,061 >> {'loss': 0.3574, 'grad_norm': 15.398836135864258, 'learning_rate': 1.0423205146328548e-07, 'epoch': 0.09058855438025495, 'num_input_tokens_seen': 7411335168, 'completed': '95.85% (3_534 / 3_687)', 'remaining time': '1:20:21', 'throughput': '8560.44', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:44:39,619 >> {'loss': 0.539, 'grad_norm': 20.782089233398438, 'learning_rate': 1.0417699678010708e-07, 'epoch': 0.0908597775969623, 'num_input_tokens_seen': 7413432320, 'completed': '95.88% (3_535 / 3_687)', 'remaining time': '1:19:48', 'throughput': '9179.11', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:45:06,201 >> {'loss': 0.595, 'grad_norm': 23.51567840576172, 'learning_rate': 1.0412230087831689e-07, 'epoch': 0.09113100081366965, 'num_input_tokens_seen': 7415529472, 'completed': '95.90% (3_536 / 3_687)', 'remaining time': '1:19:15', 'throughput': '9861.77', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:45:36,343 >> {'loss': 0.4471, 'grad_norm': 20.366703033447266, 'learning_rate': 1.0406796380193203e-07, 'epoch': 0.091402224030377, 'num_input_tokens_seen': 7417626624, 'completed': '95.93% (3_537 / 3_687)', 'remaining time': '1:18:43', 'throughput': '8697.16', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:46:04,836 >> {'loss': 0.5784, 'grad_norm': 18.491153717041016, 'learning_rate': 1.0401398559468098e-07, 'epoch': 0.09167344724708434, 'num_input_tokens_seen': 7419723776, 'completed': '95.96% (3_538 / 3_687)', 'remaining time': '1:18:10', 'throughput': '9200.03', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:46:33,784 >> {'loss': 0.4378, 'grad_norm': 17.398160934448242, 'learning_rate': 1.0396036630000324e-07, 'epoch': 0.0919446704637917, 'num_input_tokens_seen': 7421820928, 'completed': '95.99% (3_539 / 3_687)', 'remaining time': '1:17:37', 'throughput': '9055.68', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:47:02,858 >> {'loss': 0.3753, 'grad_norm': 17.634761810302734, 'learning_rate': 1.0390710596104965e-07, 'epoch': 0.09221589368049905, 'num_input_tokens_seen': 7423918080, 'completed': '96.01% (3_540 / 3_687)', 'remaining time': '1:17:05', 'throughput': '9016.74', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:47:34,612 >> {'loss': 0.5528, 'grad_norm': 19.80194664001465, 'learning_rate': 1.0385420462068206e-07, 'epoch': 0.0924871168972064, 'num_input_tokens_seen': 7426015232, 'completed': '96.04% (3_541 / 3_687)', 'remaining time': '1:16:33', 'throughput': '8255.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:48:05,965 >> {'loss': 0.2905, 'grad_norm': 13.204157829284668, 'learning_rate': 1.0380166232147354e-07, 'epoch': 0.09275834011391375, 'num_input_tokens_seen': 7428112384, 'completed': '96.07% (3_542 / 3_687)', 'remaining time': '1:16:02', 'throughput': '8361.13', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:48:37,201 >> {'loss': 0.3773, 'grad_norm': 13.604270935058594, 'learning_rate': 1.0374947910570805e-07, 'epoch': 0.0930295633306211, 'num_input_tokens_seen': 7430209536, 'completed': '96.09% (3_543 / 3_687)', 'remaining time': '1:15:30', 'throughput': '8392.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:49:09,647 >> {'loss': 0.3638, 'grad_norm': 15.047932624816895, 'learning_rate': 1.0369765501538067e-07, 'epoch': 0.09330078654732846, 'num_input_tokens_seen': 7432306688, 'completed': '96.12% (3_544 / 3_687)', 'remaining time': '1:14:59', 'throughput': '8079.45', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:49:40,287 >> {'loss': 0.3592, 'grad_norm': 16.72162628173828, 'learning_rate': 1.0364619009219743e-07, 'epoch': 0.0935720097640358, 'num_input_tokens_seen': 7434403840, 'completed': '96.15% (3_545 / 3_687)', 'remaining time': '1:14:27', 'throughput': '8555.51', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:50:10,435 >> {'loss': 0.3736, 'grad_norm': 21.374954223632812, 'learning_rate': 1.0359508437757544e-07, 'epoch': 0.09384323298074315, 'num_input_tokens_seen': 7436500992, 'completed': '96.18% (3_546 / 3_687)', 'remaining time': '1:13:55', 'throughput': '8695.26', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:50:38,520 >> {'loss': 0.4029, 'grad_norm': 16.41029930114746, 'learning_rate': 1.0354433791264255e-07, 'epoch': 0.0941144561974505, 'num_input_tokens_seen': 7438598144, 'completed': '96.20% (3_547 / 3_687)', 'remaining time': '1:13:23', 'throughput': '9334.11', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:51:10,631 >> {'loss': 0.6386, 'grad_norm': 30.30008888244629, 'learning_rate': 1.0349395073823768e-07, 'epoch': 0.09438567941415785, 'num_input_tokens_seen': 7440695296, 'completed': '96.23% (3_548 / 3_687)', 'remaining time': '1:12:51', 'throughput': '8163.61', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:51:39,787 >> {'loss': 0.3967, 'grad_norm': 15.258624076843262, 'learning_rate': 1.0344392289491038e-07, 'epoch': 0.09465690263086521, 'num_input_tokens_seen': 7442792448, 'completed': '96.26% (3_549 / 3_687)', 'remaining time': '1:12:19', 'throughput': '8991.17', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:52:07,172 >> {'loss': 0.7916, 'grad_norm': 21.32552719116211, 'learning_rate': 1.0339425442292118e-07, 'epoch': 0.09492812584757256, 'num_input_tokens_seen': 7444889600, 'completed': '96.28% (3_550 / 3_687)', 'remaining time': '1:11:46', 'throughput': '9572.38', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:52:38,062 >> {'loss': 0.5297, 'grad_norm': 17.294597625732422, 'learning_rate': 1.0334494536224146e-07, 'epoch': 0.0951993490642799, 'num_input_tokens_seen': 7446986752, 'completed': '96.31% (3_551 / 3_687)', 'remaining time': '1:11:14', 'throughput': '8486.35', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:53:07,966 >> {'loss': 0.509, 'grad_norm': 17.55919647216797, 'learning_rate': 1.0329599575255321e-07, 'epoch': 0.09547057228098725, 'num_input_tokens_seen': 7449083904, 'completed': '96.34% (3_552 / 3_687)', 'remaining time': '1:10:42', 'throughput': '8766.19', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:53:37,838 >> {'loss': 0.3171, 'grad_norm': 15.990652084350586, 'learning_rate': 1.0324740563324923e-07, 'epoch': 0.0957417954976946, 'num_input_tokens_seen': 7451181056, 'completed': '96.37% (3_553 / 3_687)', 'remaining time': '1:10:10', 'throughput': '8775.72', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:54:03,456 >> {'loss': 0.7525, 'grad_norm': 24.02924156188965, 'learning_rate': 1.0319917504343297e-07, 'epoch': 0.09601301871440195, 'num_input_tokens_seen': 7453278208, 'completed': '96.39% (3_554 / 3_687)', 'remaining time': '1:09:37', 'throughput': '10232.71', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:54:36,451 >> {'loss': 0.3421, 'grad_norm': 15.403027534484863, 'learning_rate': 1.0315130402191866e-07, 'epoch': 0.09628424193110931, 'num_input_tokens_seen': 7455375360, 'completed': '96.42% (3_555 / 3_687)', 'remaining time': '1:09:06', 'throughput': '7944.92', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:55:06,177 >> {'loss': 0.4427, 'grad_norm': 16.647611618041992, 'learning_rate': 1.0310379260723094e-07, 'epoch': 0.09655546514781665, 'num_input_tokens_seen': 7457472512, 'completed': '96.45% (3_556 / 3_687)', 'remaining time': '1:08:34', 'throughput': '8818.85', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:55:33,856 >> {'loss': 0.6526, 'grad_norm': 20.491514205932617, 'learning_rate': 1.0305664083760532e-07, 'epoch': 0.096826688364524, 'num_input_tokens_seen': 7459569664, 'completed': '96.47% (3_557 / 3_687)', 'remaining time': '1:08:01', 'throughput': '9470.65', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:56:04,087 >> {'loss': 0.5078, 'grad_norm': 15.219748497009277, 'learning_rate': 1.0300984875098772e-07, 'epoch': 0.09709791158123135, 'num_input_tokens_seen': 7461666816, 'completed': '96.50% (3_558 / 3_687)', 'remaining time': '1:07:29', 'throughput': '8671.36', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:56:35,199 >> {'loss': 0.4535, 'grad_norm': 18.49822235107422, 'learning_rate': 1.0296341638503458e-07, 'epoch': 0.0973691347979387, 'num_input_tokens_seen': 7463763968, 'completed': '96.53% (3_559 / 3_687)', 'remaining time': '1:06:58', 'throughput': '8426.05', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:57:04,881 >> {'loss': 0.3632, 'grad_norm': 17.297460556030273, 'learning_rate': 1.029173437771129e-07, 'epoch': 0.09764035801464606, 'num_input_tokens_seen': 7465861120, 'completed': '96.56% (3_560 / 3_687)', 'remaining time': '1:06:26', 'throughput': '8831.75', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:57:35,678 >> {'loss': 0.7611, 'grad_norm': 22.80646514892578, 'learning_rate': 1.0287163096430024e-07, 'epoch': 0.0979115812313534, 'num_input_tokens_seen': 7467958272, 'completed': '96.58% (3_561 / 3_687)', 'remaining time': '1:05:54', 'throughput': '8511.84', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:58:05,996 >> {'loss': 0.338, 'grad_norm': 15.242216110229492, 'learning_rate': 1.0282627798338444e-07, 'epoch': 0.09818280444806075, 'num_input_tokens_seen': 7470055424, 'completed': '96.61% (3_562 / 3_687)', 'remaining time': '1:05:22', 'throughput': '8646.59', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:58:36,342 >> {'loss': 0.5657, 'grad_norm': 17.453556060791016, 'learning_rate': 1.0278128487086387e-07, 'epoch': 0.0984540276647681, 'num_input_tokens_seen': 7472152576, 'completed': '96.64% (3_563 / 3_687)', 'remaining time': '1:04:51', 'throughput': '8638.58', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:59:08,978 >> {'loss': 0.3696, 'grad_norm': 16.23352813720703, 'learning_rate': 1.0273665166294735e-07, 'epoch': 0.09872525088147545, 'num_input_tokens_seen': 7474249728, 'completed': '96.66% (3_564 / 3_687)', 'remaining time': '1:04:20', 'throughput': '8032.21', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 01:59:39,229 >> {'loss': 0.3618, 'grad_norm': 14.19936466217041, 'learning_rate': 1.0269237839555398e-07, 'epoch': 0.09899647409818281, 'num_input_tokens_seen': 7476346880, 'completed': '96.69% (3_565 / 3_687)', 'remaining time': '1:03:48', 'throughput': '8665.70', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:00:10,370 >> {'loss': 0.2983, 'grad_norm': 11.769580841064453, 'learning_rate': 1.0264846510431307e-07, 'epoch': 0.09926769731489016, 'num_input_tokens_seen': 7478444032, 'completed': '96.72% (3_566 / 3_687)', 'remaining time': '1:03:17', 'throughput': '8417.84', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:00:40,958 >> {'loss': 0.8378, 'grad_norm': 22.436574935913086, 'learning_rate': 1.0260491182456453e-07, 'epoch': 0.0995389205315975, 'num_input_tokens_seen': 7480541184, 'completed': '96.75% (3_567 / 3_687)', 'remaining time': '1:02:45', 'throughput': '8570.24', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:01:11,959 >> {'loss': 0.3932, 'grad_norm': 21.576202392578125, 'learning_rate': 1.0256171859135826e-07, 'epoch': 0.09981014374830485, 'num_input_tokens_seen': 7482638336, 'completed': '96.77% (3_568 / 3_687)', 'remaining time': '1:02:13', 'throughput': '8456.18', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:01:42,427 >> {'loss': 0.413, 'grad_norm': 15.524735450744629, 'learning_rate': 1.0251888543945458e-07, 'epoch': 0.1000813669650122, 'num_input_tokens_seen': 7484735488, 'completed': '96.80% (3_569 / 3_687)', 'remaining time': '1:01:42', 'throughput': '8603.66', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:02:13,056 >> {'loss': 0.4167, 'grad_norm': 14.626888275146484, 'learning_rate': 1.0247641240332397e-07, 'epoch': 0.10035259018171956, 'num_input_tokens_seen': 7486832640, 'completed': '96.83% (3_570 / 3_687)', 'remaining time': '1:01:10', 'throughput': '8558.67', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:02:46,602 >> {'loss': 0.42, 'grad_norm': 14.8865327835083, 'learning_rate': 1.0243429951714714e-07, 'epoch': 0.10062381339842691, 'num_input_tokens_seen': 7488929792, 'completed': '96.85% (3_571 / 3_687)', 'remaining time': '1:00:39', 'throughput': '7814.54', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:03:16,668 >> {'loss': 0.4622, 'grad_norm': 17.109973907470703, 'learning_rate': 1.023925468148149e-07, 'epoch': 0.10089503661513426, 'num_input_tokens_seen': 7491026944, 'completed': '96.88% (3_572 / 3_687)', 'remaining time': '1:00:08', 'throughput': '8719.00', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:03:49,290 >> {'loss': 0.1671, 'grad_norm': 8.667223930358887, 'learning_rate': 1.023511543299283e-07, 'epoch': 0.1011662598318416, 'num_input_tokens_seen': 7493124096, 'completed': '96.91% (3_573 / 3_687)', 'remaining time': '0:59:37', 'throughput': '8035.69', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:04:19,317 >> {'loss': 0.4523, 'grad_norm': 17.68228530883789, 'learning_rate': 1.0231012209579831e-07, 'epoch': 0.10143748304854895, 'num_input_tokens_seen': 7495221248, 'completed': '96.94% (3_574 / 3_687)', 'remaining time': '0:59:05', 'throughput': '8730.29', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:04:52,487 >> {'loss': 0.5904, 'grad_norm': 21.331649780273438, 'learning_rate': 1.0226945014544624e-07, 'epoch': 0.10170870626525631, 'num_input_tokens_seen': 7497318400, 'completed': '96.96% (3_575 / 3_687)', 'remaining time': '0:58:34', 'throughput': '7903.20', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:05:25,917 >> {'loss': 0.6652, 'grad_norm': 19.670902252197266, 'learning_rate': 1.0222913851160335e-07, 'epoch': 0.10197992948196366, 'num_input_tokens_seen': 7499415552, 'completed': '96.99% (3_576 / 3_687)', 'remaining time': '0:58:03', 'throughput': '7841.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:05:55,193 >> {'loss': 0.425, 'grad_norm': 15.4076509475708, 'learning_rate': 1.0218918722671074e-07, 'epoch': 0.10225115269867101, 'num_input_tokens_seen': 7501512704, 'completed': '97.02% (3_577 / 3_687)', 'remaining time': '0:57:31', 'throughput': '8954.21', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:06:24,888 >> {'loss': 0.4974, 'grad_norm': 19.192432403564453, 'learning_rate': 1.0214959632291984e-07, 'epoch': 0.10252237591537836, 'num_input_tokens_seen': 7503609856, 'completed': '97.04% (3_578 / 3_687)', 'remaining time': '0:56:59', 'throughput': '8828.05', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:06:55,719 >> {'loss': 0.3527, 'grad_norm': 15.295207977294922, 'learning_rate': 1.0211036583209181e-07, 'epoch': 0.1027935991320857, 'num_input_tokens_seen': 7505707008, 'completed': '97.07% (3_579 / 3_687)', 'remaining time': '0:56:28', 'throughput': '8502.58', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:07:26,287 >> {'loss': 0.2886, 'grad_norm': 13.086195945739746, 'learning_rate': 1.0207149578579789e-07, 'epoch': 0.10306482234879305, 'num_input_tokens_seen': 7507804160, 'completed': '97.10% (3_580 / 3_687)', 'remaining time': '0:55:56', 'throughput': '8575.61', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:07:55,531 >> {'loss': 0.4335, 'grad_norm': 18.579437255859375, 'learning_rate': 1.0203298621531923e-07, 'epoch': 0.10333604556550041, 'num_input_tokens_seen': 7509901312, 'completed': '97.13% (3_581 / 3_687)', 'remaining time': '0:55:24', 'throughput': '8964.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:08:24,762 >> {'loss': 0.4943, 'grad_norm': 18.218303680419922, 'learning_rate': 1.0199483715164687e-07, 'epoch': 0.10360726878220776, 'num_input_tokens_seen': 7511998464, 'completed': '97.15% (3_582 / 3_687)', 'remaining time': '0:54:52', 'throughput': '8967.88', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:08:54,437 >> {'loss': 0.8784, 'grad_norm': 27.71494483947754, 'learning_rate': 1.0195704862548167e-07, 'epoch': 0.1038784919989151, 'num_input_tokens_seen': 7514095616, 'completed': '97.18% (3_583 / 3_687)', 'remaining time': '0:54:20', 'throughput': '8833.90', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:09:24,844 >> {'loss': 0.3978, 'grad_norm': 17.35464096069336, 'learning_rate': 1.0191962066723448e-07, 'epoch': 0.10414971521562245, 'num_input_tokens_seen': 7516192768, 'completed': '97.21% (3_584 / 3_687)', 'remaining time': '0:53:49', 'throughput': '8621.19', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:09:55,397 >> {'loss': 0.251, 'grad_norm': 11.098041534423828, 'learning_rate': 1.0188255330702583e-07, 'epoch': 0.1044209384323298, 'num_input_tokens_seen': 7518289920, 'completed': '97.23% (3_585 / 3_687)', 'remaining time': '0:53:17', 'throughput': '8579.99', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:10:25,275 >> {'loss': 0.588, 'grad_norm': 20.41391372680664, 'learning_rate': 1.0184584657468615e-07, 'epoch': 0.10469216164903716, 'num_input_tokens_seen': 7520387072, 'completed': '97.26% (3_586 / 3_687)', 'remaining time': '0:52:46', 'throughput': '8773.75', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:10:55,430 >> {'loss': 0.2949, 'grad_norm': 15.097143173217773, 'learning_rate': 1.018095004997556e-07, 'epoch': 0.10496338486574451, 'num_input_tokens_seen': 7522484224, 'completed': '97.29% (3_587 / 3_687)', 'remaining time': '0:52:14', 'throughput': '8693.19', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:11:26,858 >> {'loss': 0.3288, 'grad_norm': 14.162942886352539, 'learning_rate': 1.0177351511148414e-07, 'epoch': 0.10523460808245186, 'num_input_tokens_seen': 7524581376, 'completed': '97.31% (3_588 / 3_687)', 'remaining time': '0:51:43', 'throughput': '8341.23', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:11:58,937 >> {'loss': 0.4861, 'grad_norm': 17.567161560058594, 'learning_rate': 1.0173789043883147e-07, 'epoch': 0.1055058312991592, 'num_input_tokens_seen': 7526678528, 'completed': '97.34% (3_589 / 3_687)', 'remaining time': '0:51:11', 'throughput': '8171.64', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:12:28,293 >> {'loss': 0.3948, 'grad_norm': 14.005763053894043, 'learning_rate': 1.0170262651046687e-07, 'epoch': 0.10577705451586655, 'num_input_tokens_seen': 7528775680, 'completed': '97.37% (3_590 / 3_687)', 'remaining time': '0:50:40', 'throughput': '8929.79', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:12:56,660 >> {'loss': 0.5062, 'grad_norm': 17.03684425354004, 'learning_rate': 1.0166772335476951e-07, 'epoch': 0.10604827773257391, 'num_input_tokens_seen': 7530872832, 'completed': '97.40% (3_591 / 3_687)', 'remaining time': '0:50:08', 'throughput': '9241.44', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:13:28,459 >> {'loss': 0.4414, 'grad_norm': 17.549358367919922, 'learning_rate': 1.0163318099982808e-07, 'epoch': 0.10631950094928126, 'num_input_tokens_seen': 7532969984, 'completed': '97.42% (3_592 / 3_687)', 'remaining time': '0:49:36', 'throughput': '8243.68', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:13:56,592 >> {'loss': 0.4222, 'grad_norm': 16.84699821472168, 'learning_rate': 1.0159899947344094e-07, 'epoch': 0.10659072416598861, 'num_input_tokens_seen': 7535067136, 'completed': '97.45% (3_593 / 3_687)', 'remaining time': '0:49:04', 'throughput': '9317.97', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:14:27,317 >> {'loss': 0.3399, 'grad_norm': 13.90577507019043, 'learning_rate': 1.0156517880311614e-07, 'epoch': 0.10686194738269596, 'num_input_tokens_seen': 7537164288, 'completed': '97.48% (3_594 / 3_687)', 'remaining time': '0:48:33', 'throughput': '8531.90', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:14:59,411 >> {'loss': 0.3014, 'grad_norm': 14.83264446258545, 'learning_rate': 1.0153171901607118e-07, 'epoch': 0.1071331705994033, 'num_input_tokens_seen': 7539261440, 'completed': '97.50% (3_595 / 3_687)', 'remaining time': '0:48:02', 'throughput': '8168.05', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:15:27,885 >> {'loss': 0.6559, 'grad_norm': 21.388154983520508, 'learning_rate': 1.0149862013923329e-07, 'epoch': 0.10740439381611067, 'num_input_tokens_seen': 7541358592, 'completed': '97.53% (3_596 / 3_687)', 'remaining time': '0:47:30', 'throughput': '9206.40', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:15:58,226 >> {'loss': 0.4451, 'grad_norm': 16.71360206604004, 'learning_rate': 1.0146588219923917e-07, 'epoch': 0.10767561703281801, 'num_input_tokens_seen': 7543455744, 'completed': '97.56% (3_597 / 3_687)', 'remaining time': '0:46:58', 'throughput': '8640.00', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:16:31,023 >> {'loss': 0.3744, 'grad_norm': 12.09107780456543, 'learning_rate': 1.0143350522243509e-07, 'epoch': 0.10794684024952536, 'num_input_tokens_seen': 7545552896, 'completed': '97.59% (3_598 / 3_687)', 'remaining time': '0:46:27', 'throughput': '7993.05', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:17:02,181 >> {'loss': 0.4641, 'grad_norm': 16.030540466308594, 'learning_rate': 1.0140148923487675e-07, 'epoch': 0.10821806346623271, 'num_input_tokens_seen': 7547650048, 'completed': '97.61% (3_599 / 3_687)', 'remaining time': '0:45:56', 'throughput': '8413.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:17:31,836 >> {'loss': 0.477, 'grad_norm': 18.602020263671875, 'learning_rate': 1.0136983426232945e-07, 'epoch': 0.10848928668294006, 'num_input_tokens_seen': 7549747200, 'completed': '97.64% (3_600 / 3_687)', 'remaining time': '0:45:24', 'throughput': '8839.50', 'gpu_mem_free': '5581MB'} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( [INFO|trainer.py:3503] 2024-12-27 02:17:56,022 >> Saving model checkpoint to /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3600 [INFO|configuration_utils.py:472] 2024-12-27 02:17:56,025 >> Configuration saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3600/config.json [INFO|configuration_utils.py:807] 2024-12-27 02:17:56,026 >> Configuration saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3600/generation_config.json [INFO|modeling_utils.py:2807] 2024-12-27 02:18:51,019 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 7 checkpoint shards. You can find where each parameters has been saved in the index located at /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3600/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2684] 2024-12-27 02:18:51,023 >> tokenizer config file saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3600/tokenizer_config.json [INFO|tokenization_utils_base.py:2693] 2024-12-27 02:18:51,024 >> Special tokens file saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3600/special_tokens_map.json /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( [WARNING|trainer.py:868] 2024-12-27 02:22:29,797 >> Save streaming dataset state: {'epoch': 0, 'sample_in_epoch': 7200, 'num_canonical_nodes': 1, 'shuffle_seed': 42, 'initial_physical_nodes': 1} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] [INFO|trainer.py:175] 2024-12-27 02:23:00,865 >> {'loss': 0.255, 'grad_norm': 11.135970115661621, 'learning_rate': 1.0133854033026789e-07, 'epoch': 0.1087605098996474, 'num_input_tokens_seen': 7551844352, 'completed': '97.67% (3_601 / 3_687)', 'remaining time': '0:45:57', 'throughput': '796.72', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:23:30,825 >> {'loss': 0.2572, 'grad_norm': 16.609188079833984, 'learning_rate': 1.0130760746387622e-07, 'epoch': 0.10903173311635476, 'num_input_tokens_seen': 7553941504, 'completed': '97.69% (3_602 / 3_687)', 'remaining time': '0:45:24', 'throughput': '8749.87', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:24:00,469 >> {'loss': 0.3303, 'grad_norm': 13.517497062683105, 'learning_rate': 1.0127703568804805e-07, 'epoch': 0.10930295633306211, 'num_input_tokens_seen': 7556038656, 'completed': '97.72% (3_603 / 3_687)', 'remaining time': '0:44:52', 'throughput': '8843.08', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:24:30,186 >> {'loss': 0.3366, 'grad_norm': 16.163911819458008, 'learning_rate': 1.0124682502738638e-07, 'epoch': 0.10957417954976946, 'num_input_tokens_seen': 7558135808, 'completed': '97.75% (3_604 / 3_687)', 'remaining time': '0:44:19', 'throughput': '8821.48', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:25:03,574 >> {'loss': 0.5456, 'grad_norm': 19.574230194091797, 'learning_rate': 1.0121697550620365e-07, 'epoch': 0.10984540276647681, 'num_input_tokens_seen': 7560232960, 'completed': '97.78% (3_605 / 3_687)', 'remaining time': '0:43:47', 'throughput': '7851.37', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:25:34,402 >> {'loss': 0.4387, 'grad_norm': 17.653837203979492, 'learning_rate': 1.0118748714852156e-07, 'epoch': 0.11011662598318415, 'num_input_tokens_seen': 7562330112, 'completed': '97.80% (3_606 / 3_687)', 'remaining time': '0:43:15', 'throughput': '8503.49', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:26:04,835 >> {'loss': 0.1869, 'grad_norm': 13.368509292602539, 'learning_rate': 1.011583599780712e-07, 'epoch': 0.11038784919989152, 'num_input_tokens_seen': 7564427264, 'completed': '97.83% (3_607 / 3_687)', 'remaining time': '0:42:43', 'throughput': '8613.62', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:26:35,939 >> {'loss': 0.6204, 'grad_norm': 21.617624282836914, 'learning_rate': 1.01129594018293e-07, 'epoch': 0.11065907241659886, 'num_input_tokens_seen': 7566524416, 'completed': '97.86% (3_608 / 3_687)', 'remaining time': '0:42:10', 'throughput': '8428.03', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:27:08,843 >> {'loss': 0.4534, 'grad_norm': 17.730438232421875, 'learning_rate': 1.0110118929233682e-07, 'epoch': 0.11093029563330621, 'num_input_tokens_seen': 7568621568, 'completed': '97.88% (3_609 / 3_687)', 'remaining time': '0:41:38', 'throughput': '7966.92', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:27:37,951 >> {'loss': 0.6147, 'grad_norm': 20.859384536743164, 'learning_rate': 1.0107314582306156e-07, 'epoch': 0.11120151885001356, 'num_input_tokens_seen': 7570718720, 'completed': '97.91% (3_610 / 3_687)', 'remaining time': '0:41:06', 'throughput': '9006.02', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:28:07,281 >> {'loss': 0.5003, 'grad_norm': 21.62270164489746, 'learning_rate': 1.0104546363303566e-07, 'epoch': 0.1114727420667209, 'num_input_tokens_seen': 7572815872, 'completed': '97.94% (3_611 / 3_687)', 'remaining time': '0:40:33', 'throughput': '8937.55', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:28:35,194 >> {'loss': 0.649, 'grad_norm': 22.07628631591797, 'learning_rate': 1.0101814274453661e-07, 'epoch': 0.11174396528342827, 'num_input_tokens_seen': 7574913024, 'completed': '97.97% (3_612 / 3_687)', 'remaining time': '0:40:01', 'throughput': '9391.57', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:29:10,780 >> {'loss': 0.4658, 'grad_norm': 19.241708755493164, 'learning_rate': 1.0099118317955127e-07, 'epoch': 0.11201518850013562, 'num_input_tokens_seen': 7577010176, 'completed': '97.99% (3_613 / 3_687)', 'remaining time': '0:39:29', 'throughput': '7366.63', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:29:39,809 >> {'loss': 0.5632, 'grad_norm': 17.93199348449707, 'learning_rate': 1.0096458495977564e-07, 'epoch': 0.11228641171684296, 'num_input_tokens_seen': 7579107328, 'completed': '98.02% (3_614 / 3_687)', 'remaining time': '0:38:57', 'throughput': '9030.27', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:30:12,372 >> {'loss': 0.3877, 'grad_norm': 18.892309188842773, 'learning_rate': 1.0093834810661498e-07, 'epoch': 0.11255763493355031, 'num_input_tokens_seen': 7581204480, 'completed': '98.05% (3_615 / 3_687)', 'remaining time': '0:38:25', 'throughput': '8050.48', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:30:42,843 >> {'loss': 0.3783, 'grad_norm': 14.152824401855469, 'learning_rate': 1.0091247264118372e-07, 'epoch': 0.11282885815025766, 'num_input_tokens_seen': 7583301632, 'completed': '98.07% (3_616 / 3_687)', 'remaining time': '0:37:52', 'throughput': '8603.01', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:31:16,783 >> {'loss': 0.261, 'grad_norm': 11.564390182495117, 'learning_rate': 1.0088695858430539e-07, 'epoch': 0.11310008136696502, 'num_input_tokens_seen': 7585398784, 'completed': '98.10% (3_617 / 3_687)', 'remaining time': '0:37:21', 'throughput': '7723.71', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:31:47,859 >> {'loss': 0.486, 'grad_norm': 23.13407325744629, 'learning_rate': 1.0086180595651278e-07, 'epoch': 0.11337130458367237, 'num_input_tokens_seen': 7587495936, 'completed': '98.13% (3_618 / 3_687)', 'remaining time': '0:36:49', 'throughput': '8435.43', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:32:15,859 >> {'loss': 0.554, 'grad_norm': 17.028270721435547, 'learning_rate': 1.0083701477804778e-07, 'epoch': 0.11364252780037971, 'num_input_tokens_seen': 7589593088, 'completed': '98.16% (3_619 / 3_687)', 'remaining time': '0:36:16', 'throughput': '9362.46', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:32:47,885 >> {'loss': 0.5359, 'grad_norm': 17.975419998168945, 'learning_rate': 1.0081258506886134e-07, 'epoch': 0.11391375101708706, 'num_input_tokens_seen': 7591690240, 'completed': '98.18% (3_620 / 3_687)', 'remaining time': '0:35:44', 'throughput': '8185.41', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:33:18,482 >> {'loss': 0.3641, 'grad_norm': 14.600398063659668, 'learning_rate': 1.0078851684861357e-07, 'epoch': 0.11418497423379441, 'num_input_tokens_seen': 7593787392, 'completed': '98.21% (3_621 / 3_687)', 'remaining time': '0:35:12', 'throughput': '8567.60', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:33:47,741 >> {'loss': 0.2436, 'grad_norm': 14.766218185424805, 'learning_rate': 1.0076481013667376e-07, 'epoch': 0.11445619745050176, 'num_input_tokens_seen': 7595884544, 'completed': '98.24% (3_622 / 3_687)', 'remaining time': '0:34:39', 'throughput': '8959.42', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:34:19,246 >> {'loss': 0.2984, 'grad_norm': 15.354504585266113, 'learning_rate': 1.0074146495212001e-07, 'epoch': 0.11472742066720912, 'num_input_tokens_seen': 7597981696, 'completed': '98.26% (3_623 / 3_687)', 'remaining time': '0:34:07', 'throughput': '8320.73', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:34:50,298 >> {'loss': 0.7206, 'grad_norm': 19.848752975463867, 'learning_rate': 1.0071848131373972e-07, 'epoch': 0.11499864388391647, 'num_input_tokens_seen': 7600078848, 'completed': '98.29% (3_624 / 3_687)', 'remaining time': '0:33:35', 'throughput': '8442.15', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:35:20,431 >> {'loss': 0.3718, 'grad_norm': 13.409833908081055, 'learning_rate': 1.0069585924002924e-07, 'epoch': 0.11526986710062381, 'num_input_tokens_seen': 7602176000, 'completed': '98.32% (3_625 / 3_687)', 'remaining time': '0:33:03', 'throughput': '8699.47', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:35:49,435 >> {'loss': 0.7582, 'grad_norm': 25.440654754638672, 'learning_rate': 1.0067359874919395e-07, 'epoch': 0.11554109031733116, 'num_input_tokens_seen': 7604273152, 'completed': '98.35% (3_626 / 3_687)', 'remaining time': '0:32:30', 'throughput': '9038.33', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:36:21,647 >> {'loss': 0.2706, 'grad_norm': 12.968038558959961, 'learning_rate': 1.0065169985914826e-07, 'epoch': 0.11581231353403851, 'num_input_tokens_seen': 7606370304, 'completed': '98.37% (3_627 / 3_687)', 'remaining time': '0:31:58', 'throughput': '8138.10', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:36:50,396 >> {'loss': 0.6234, 'grad_norm': 21.27517318725586, 'learning_rate': 1.0063016258751553e-07, 'epoch': 0.11608353675074587, 'num_input_tokens_seen': 7608467456, 'completed': '98.40% (3_628 / 3_687)', 'remaining time': '0:31:26', 'throughput': '9118.07', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:37:19,223 >> {'loss': 0.4209, 'grad_norm': 17.9497127532959, 'learning_rate': 1.0060898695162816e-07, 'epoch': 0.11635475996745322, 'num_input_tokens_seen': 7610564608, 'completed': '98.43% (3_629 / 3_687)', 'remaining time': '0:30:54', 'throughput': '9093.76', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:37:50,120 >> {'loss': 0.4628, 'grad_norm': 18.859006881713867, 'learning_rate': 1.005881729685275e-07, 'epoch': 0.11662598318416056, 'num_input_tokens_seen': 7612661760, 'completed': '98.45% (3_630 / 3_687)', 'remaining time': '0:30:22', 'throughput': '8484.41', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:38:17,474 >> {'loss': 0.8165, 'grad_norm': 22.217741012573242, 'learning_rate': 1.0056772065496387e-07, 'epoch': 0.11689720640086791, 'num_input_tokens_seen': 7614758912, 'completed': '98.48% (3_631 / 3_687)', 'remaining time': '0:29:49', 'throughput': '9583.54', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:38:49,953 >> {'loss': 0.6362, 'grad_norm': 22.35140037536621, 'learning_rate': 1.005476300273965e-07, 'epoch': 0.11716842961757526, 'num_input_tokens_seen': 7616856064, 'completed': '98.51% (3_632 / 3_687)', 'remaining time': '0:29:17', 'throughput': '8071.29', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:39:18,441 >> {'loss': 0.4338, 'grad_norm': 16.316476821899414, 'learning_rate': 1.0052790110199348e-07, 'epoch': 0.11743965283428262, 'num_input_tokens_seen': 7618953216, 'completed': '98.54% (3_633 / 3_687)', 'remaining time': '0:28:45', 'throughput': '9201.66', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:39:50,009 >> {'loss': 0.6887, 'grad_norm': 23.569665908813477, 'learning_rate': 1.0050853389463205e-07, 'epoch': 0.11771087605098997, 'num_input_tokens_seen': 7621050368, 'completed': '98.56% (3_634 / 3_687)', 'remaining time': '0:28:13', 'throughput': '8304.13', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:40:23,057 >> {'loss': 0.3326, 'grad_norm': 12.867155075073242, 'learning_rate': 1.0048952842089805e-07, 'epoch': 0.11798209926769732, 'num_input_tokens_seen': 7623147520, 'completed': '98.59% (3_635 / 3_687)', 'remaining time': '0:27:41', 'throughput': '7932.15', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:40:52,436 >> {'loss': 0.3111, 'grad_norm': 14.628716468811035, 'learning_rate': 1.0047088469608648e-07, 'epoch': 0.11825332248440466, 'num_input_tokens_seen': 7625244672, 'completed': '98.62% (3_636 / 3_687)', 'remaining time': '0:27:09', 'throughput': '8922.82', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:41:21,817 >> {'loss': 0.2856, 'grad_norm': 12.584244728088379, 'learning_rate': 1.00452602735201e-07, 'epoch': 0.11852454570111201, 'num_input_tokens_seen': 7627341824, 'completed': '98.64% (3_637 / 3_687)', 'remaining time': '0:26:36', 'throughput': '8922.31', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:41:50,631 >> {'loss': 0.6433, 'grad_norm': 19.87360954284668, 'learning_rate': 1.0043468255295435e-07, 'epoch': 0.11879576891781937, 'num_input_tokens_seen': 7629438976, 'completed': '98.67% (3_638 / 3_687)', 'remaining time': '0:26:04', 'throughput': '9098.03', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:42:19,868 >> {'loss': 0.5521, 'grad_norm': 17.3902530670166, 'learning_rate': 1.0041712416376795e-07, 'epoch': 0.11906699213452672, 'num_input_tokens_seen': 7631536128, 'completed': '98.70% (3_639 / 3_687)', 'remaining time': '0:25:32', 'throughput': '8966.06', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:42:53,594 >> {'loss': 0.2688, 'grad_norm': 13.591824531555176, 'learning_rate': 1.0039992758177211e-07, 'epoch': 0.11933821535123407, 'num_input_tokens_seen': 7633633280, 'completed': '98.73% (3_640 / 3_687)', 'remaining time': '0:25:00', 'throughput': '7772.79', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:43:24,110 >> {'loss': 0.2729, 'grad_norm': 18.85526466369629, 'learning_rate': 1.0038309282080596e-07, 'epoch': 0.11960943856794141, 'num_input_tokens_seen': 7635730432, 'completed': '98.75% (3_641 / 3_687)', 'remaining time': '0:24:28', 'throughput': '8590.33', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:43:55,822 >> {'loss': 0.5371, 'grad_norm': 16.387048721313477, 'learning_rate': 1.0036661989441755e-07, 'epoch': 0.11988066178464876, 'num_input_tokens_seen': 7637827584, 'completed': '98.78% (3_642 / 3_687)', 'remaining time': '0:23:56', 'throughput': '8266.25', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:44:25,091 >> {'loss': 0.4256, 'grad_norm': 16.59734344482422, 'learning_rate': 1.0035050881586364e-07, 'epoch': 0.12015188500135611, 'num_input_tokens_seen': 7639924736, 'completed': '98.81% (3_643 / 3_687)', 'remaining time': '0:23:24', 'throughput': '8956.38', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:44:56,059 >> {'loss': 0.3089, 'grad_norm': 15.033208847045898, 'learning_rate': 1.0033475959810974e-07, 'epoch': 0.12042310821806347, 'num_input_tokens_seen': 7642021888, 'completed': '98.83% (3_644 / 3_687)', 'remaining time': '0:22:52', 'throughput': '8465.17', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:45:28,401 >> {'loss': 0.4245, 'grad_norm': 15.11539077758789, 'learning_rate': 1.0031937225383036e-07, 'epoch': 0.12069433143477082, 'num_input_tokens_seen': 7644119040, 'completed': '98.86% (3_645 / 3_687)', 'remaining time': '0:22:20', 'throughput': '8105.23', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:46:01,817 >> {'loss': 0.2949, 'grad_norm': 13.645261764526367, 'learning_rate': 1.0030434679540853e-07, 'epoch': 0.12096555465147817, 'num_input_tokens_seen': 7646216192, 'completed': '98.89% (3_646 / 3_687)', 'remaining time': '0:21:48', 'throughput': '7845.00', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:46:31,593 >> {'loss': 0.5613, 'grad_norm': 21.710878372192383, 'learning_rate': 1.0028968323493623e-07, 'epoch': 0.12123677786818551, 'num_input_tokens_seen': 7648313344, 'completed': '98.92% (3_647 / 3_687)', 'remaining time': '0:21:16', 'throughput': '8803.95', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:47:02,639 >> {'loss': 0.6029, 'grad_norm': 18.812923431396484, 'learning_rate': 1.0027538158421413e-07, 'epoch': 0.12150800108489286, 'num_input_tokens_seen': 7650410496, 'completed': '98.94% (3_648 / 3_687)', 'remaining time': '0:20:44', 'throughput': '8443.55', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:47:33,548 >> {'loss': 0.6727, 'grad_norm': 18.887786865234375, 'learning_rate': 1.002614418547516e-07, 'epoch': 0.12177922430160022, 'num_input_tokens_seen': 7652507648, 'completed': '98.97% (3_649 / 3_687)', 'remaining time': '0:20:12', 'throughput': '8481.22', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:48:02,107 >> {'loss': 0.5487, 'grad_norm': 19.344097137451172, 'learning_rate': 1.0024786405776686e-07, 'epoch': 0.12205044751830757, 'num_input_tokens_seen': 7654604800, 'completed': '99.00% (3_650 / 3_687)', 'remaining time': '0:19:40', 'throughput': '9179.08', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:48:31,889 >> {'loss': 0.4229, 'grad_norm': 17.132566452026367, 'learning_rate': 1.0023464820418676e-07, 'epoch': 0.12232167073501492, 'num_input_tokens_seen': 7656701952, 'completed': '99.02% (3_651 / 3_687)', 'remaining time': '0:19:08', 'throughput': '8802.18', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:49:03,667 >> {'loss': 0.312, 'grad_norm': 13.882312774658203, 'learning_rate': 1.00221794304647e-07, 'epoch': 0.12259289395172226, 'num_input_tokens_seen': 7658799104, 'completed': '99.05% (3_652 / 3_687)', 'remaining time': '0:18:36', 'throughput': '8249.06', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:49:37,109 >> {'loss': 0.3204, 'grad_norm': 16.40239143371582, 'learning_rate': 1.0020930236949182e-07, 'epoch': 0.12286411716842961, 'num_input_tokens_seen': 7660896256, 'completed': '99.08% (3_653 / 3_687)', 'remaining time': '0:18:04', 'throughput': '7838.81', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:50:08,908 >> {'loss': 0.277, 'grad_norm': 13.31047248840332, 'learning_rate': 1.0019717240877424e-07, 'epoch': 0.12313534038513697, 'num_input_tokens_seen': 7662993408, 'completed': '99.10% (3_654 / 3_687)', 'remaining time': '0:17:32', 'throughput': '8243.65', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:50:36,327 >> {'loss': 0.7039, 'grad_norm': 21.159040451049805, 'learning_rate': 1.001854044322561e-07, 'epoch': 0.12340656360184432, 'num_input_tokens_seen': 7665090560, 'completed': '99.13% (3_655 / 3_687)', 'remaining time': '0:17:00', 'throughput': '9560.66', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:51:06,430 >> {'loss': 0.4774, 'grad_norm': 42.46569061279297, 'learning_rate': 1.0017399844940774e-07, 'epoch': 0.12367778681855167, 'num_input_tokens_seen': 7667187712, 'completed': '99.16% (3_656 / 3_687)', 'remaining time': '0:16:28', 'throughput': '8708.26', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:51:38,916 >> {'loss': 0.3837, 'grad_norm': 16.749025344848633, 'learning_rate': 1.0016295446940827e-07, 'epoch': 0.12394901003525902, 'num_input_tokens_seen': 7669284864, 'completed': '99.19% (3_657 / 3_687)', 'remaining time': '0:15:56', 'throughput': '8069.64', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:52:08,263 >> {'loss': 0.6936, 'grad_norm': 19.79422378540039, 'learning_rate': 1.001522725011455e-07, 'epoch': 0.12422023325196636, 'num_input_tokens_seen': 7671382016, 'completed': '99.21% (3_658 / 3_687)', 'remaining time': '0:15:24', 'throughput': '8932.45', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:52:38,045 >> {'loss': 0.415, 'grad_norm': 19.400150299072266, 'learning_rate': 1.0014195255321583e-07, 'epoch': 0.12449145646867373, 'num_input_tokens_seen': 7673479168, 'completed': '99.24% (3_659 / 3_687)', 'remaining time': '0:14:52', 'throughput': '8802.04', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:53:10,450 >> {'loss': 0.3181, 'grad_norm': 12.765349388122559, 'learning_rate': 1.0013199463392433e-07, 'epoch': 0.12476267968538107, 'num_input_tokens_seen': 7675576320, 'completed': '99.27% (3_660 / 3_687)', 'remaining time': '0:14:20', 'throughput': '8089.64', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:53:41,165 >> {'loss': 0.2617, 'grad_norm': 15.518762588500977, 'learning_rate': 1.0012239875128484e-07, 'epoch': 0.1250339029020884, 'num_input_tokens_seen': 7677673472, 'completed': '99.29% (3_661 / 3_687)', 'remaining time': '0:13:48', 'throughput': '8534.81', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:54:12,866 >> {'loss': 0.4696, 'grad_norm': 18.00017738342285, 'learning_rate': 1.0011316491301973e-07, 'epoch': 0.12530512611879577, 'num_input_tokens_seen': 7679770624, 'completed': '99.32% (3_662 / 3_687)', 'remaining time': '0:13:16', 'throughput': '8269.24', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:54:42,946 >> {'loss': 0.5292, 'grad_norm': 19.197542190551758, 'learning_rate': 1.0010429312656006e-07, 'epoch': 0.12557634933550313, 'num_input_tokens_seen': 7681867776, 'completed': '99.35% (3_663 / 3_687)', 'remaining time': '0:12:44', 'throughput': '8714.78', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:55:10,925 >> {'loss': 0.6401, 'grad_norm': 20.32849884033203, 'learning_rate': 1.000957833990454e-07, 'epoch': 0.12584757255221046, 'num_input_tokens_seen': 7683964928, 'completed': '99.38% (3_664 / 3_687)', 'remaining time': '0:12:12', 'throughput': '9369.32', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:55:41,776 >> {'loss': 0.2865, 'grad_norm': 13.460699081420898, 'learning_rate': 1.0008763573732421e-07, 'epoch': 0.12611879576891782, 'num_input_tokens_seen': 7686062080, 'completed': '99.40% (3_665 / 3_687)', 'remaining time': '0:11:40', 'throughput': '8497.41', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:56:15,950 >> {'loss': 0.4368, 'grad_norm': 16.389801025390625, 'learning_rate': 1.0007985014795331e-07, 'epoch': 0.12639001898562516, 'num_input_tokens_seen': 7688159232, 'completed': '99.43% (3_666 / 3_687)', 'remaining time': '0:11:09', 'throughput': '7670.68', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:56:43,479 >> {'loss': 0.3929, 'grad_norm': 16.63237762451172, 'learning_rate': 1.0007242663719824e-07, 'epoch': 0.12666124220233252, 'num_input_tokens_seen': 7690256384, 'completed': '99.46% (3_667 / 3_687)', 'remaining time': '0:10:37', 'throughput': '9522.31', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:57:12,450 >> {'loss': 0.5468, 'grad_norm': 22.51099967956543, 'learning_rate': 1.0006536521103325e-07, 'epoch': 0.12693246541903988, 'num_input_tokens_seen': 7692353536, 'completed': '99.48% (3_668 / 3_687)', 'remaining time': '0:10:05', 'throughput': '9048.62', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:57:43,463 >> {'loss': 0.5824, 'grad_norm': 21.233259201049805, 'learning_rate': 1.0005866587514106e-07, 'epoch': 0.12720368863574721, 'num_input_tokens_seen': 7694450688, 'completed': '99.51% (3_669 / 3_687)', 'remaining time': '0:09:33', 'throughput': '8452.85', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:58:15,137 >> {'loss': 0.5925, 'grad_norm': 18.1636962890625, 'learning_rate': 1.0005232863491297e-07, 'epoch': 0.12747491185245458, 'num_input_tokens_seen': 7696547840, 'completed': '99.54% (3_670 / 3_687)', 'remaining time': '0:09:01', 'throughput': '8276.26', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:58:44,815 >> {'loss': 0.5679, 'grad_norm': 20.759021759033203, 'learning_rate': 1.0004635349544907e-07, 'epoch': 0.1277461350691619, 'num_input_tokens_seen': 7698644992, 'completed': '99.57% (3_671 / 3_687)', 'remaining time': '0:08:29', 'throughput': '8832.95', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:59:19,781 >> {'loss': 0.3588, 'grad_norm': 16.30278778076172, 'learning_rate': 1.0004074046155789e-07, 'epoch': 0.12801735828586927, 'num_input_tokens_seen': 7700742144, 'completed': '99.59% (3_672 / 3_687)', 'remaining time': '0:07:57', 'throughput': '7496.95', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 02:59:51,484 >> {'loss': 0.8094, 'grad_norm': 23.158729553222656, 'learning_rate': 1.000354895377565e-07, 'epoch': 0.12828858150257663, 'num_input_tokens_seen': 7702839296, 'completed': '99.62% (3_673 / 3_687)', 'remaining time': '0:07:25', 'throughput': '8268.80', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 03:00:19,949 >> {'loss': 0.5029, 'grad_norm': 20.49749755859375, 'learning_rate': 1.0003060072827073e-07, 'epoch': 0.12855980471928397, 'num_input_tokens_seen': 7704936448, 'completed': '99.65% (3_674 / 3_687)', 'remaining time': '0:06:53', 'throughput': '9209.34', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 03:00:52,456 >> {'loss': 0.253, 'grad_norm': 12.774348258972168, 'learning_rate': 1.0002607403703492e-07, 'epoch': 0.12883102793599133, 'num_input_tokens_seen': 7707033600, 'completed': '99.67% (3_675 / 3_687)', 'remaining time': '0:06:22', 'throughput': '8064.36', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 03:01:21,919 >> {'loss': 0.7074, 'grad_norm': 22.22560691833496, 'learning_rate': 1.000219094676919e-07, 'epoch': 0.12910225115269866, 'num_input_tokens_seen': 7709130752, 'completed': '99.70% (3_676 / 3_687)', 'remaining time': '0:05:50', 'throughput': '8897.37', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 03:01:52,710 >> {'loss': 0.3633, 'grad_norm': 15.757925033569336, 'learning_rate': 1.0001810702359326e-07, 'epoch': 0.12937347436940602, 'num_input_tokens_seen': 7711227904, 'completed': '99.73% (3_677 / 3_687)', 'remaining time': '0:05:18', 'throughput': '8513.68', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 03:02:23,978 >> {'loss': 0.3336, 'grad_norm': 14.108972549438477, 'learning_rate': 1.0001466670779896e-07, 'epoch': 0.12964469758611338, 'num_input_tokens_seen': 7713325056, 'completed': '99.76% (3_678 / 3_687)', 'remaining time': '0:04:46', 'throughput': '8383.73', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 03:02:55,941 >> {'loss': 0.1844, 'grad_norm': 9.747844696044922, 'learning_rate': 1.000115885230777e-07, 'epoch': 0.12991592080282072, 'num_input_tokens_seen': 7715422208, 'completed': '99.78% (3_679 / 3_687)', 'remaining time': '0:04:14', 'throughput': '8201.45', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 03:03:25,571 >> {'loss': 0.3654, 'grad_norm': 13.866641998291016, 'learning_rate': 1.0000887247190662e-07, 'epoch': 0.13018714401952808, 'num_input_tokens_seen': 7717519360, 'completed': '99.81% (3_680 / 3_687)', 'remaining time': '0:03:42', 'throughput': '8847.39', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 03:03:56,115 >> {'loss': 0.7339, 'grad_norm': 20.240041732788086, 'learning_rate': 1.000065185564716e-07, 'epoch': 0.1304583672362354, 'num_input_tokens_seen': 7719616512, 'completed': '99.84% (3_681 / 3_687)', 'remaining time': '0:03:10', 'throughput': '8582.40', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 03:04:23,232 >> {'loss': 0.52, 'grad_norm': 17.532907485961914, 'learning_rate': 1.0000452677866691e-07, 'epoch': 0.13072959045294277, 'num_input_tokens_seen': 7721713664, 'completed': '99.86% (3_682 / 3_687)', 'remaining time': '0:02:39', 'throughput': '9667.18', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 03:04:53,517 >> {'loss': 0.5367, 'grad_norm': 20.656099319458008, 'learning_rate': 1.0000289714009542e-07, 'epoch': 0.13100081366965013, 'num_input_tokens_seen': 7723810816, 'completed': '99.89% (3_683 / 3_687)', 'remaining time': '0:02:07', 'throughput': '8655.91', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 03:05:24,750 >> {'loss': 0.4698, 'grad_norm': 19.092506408691406, 'learning_rate': 1.000016296420687e-07, 'epoch': 0.13127203688635747, 'num_input_tokens_seen': 7725907968, 'completed': '99.92% (3_684 / 3_687)', 'remaining time': '0:01:35', 'throughput': '8393.15', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 03:05:55,306 >> {'loss': 0.5093, 'grad_norm': 19.04652976989746, 'learning_rate': 1.0000072428560674e-07, 'epoch': 0.13154326010306483, 'num_input_tokens_seen': 7728005120, 'completed': '99.95% (3_685 / 3_687)', 'remaining time': '0:01:03', 'throughput': '8579.09', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 03:06:28,355 >> {'loss': 0.409, 'grad_norm': 15.921603202819824, 'learning_rate': 1.000001810714381e-07, 'epoch': 0.13181448331977216, 'num_input_tokens_seen': 7730102272, 'completed': '99.97% (3_686 / 3_687)', 'remaining time': '0:00:31', 'throughput': '7932.01', 'gpu_mem_free': '5581MB'} [INFO|trainer.py:175] 2024-12-27 03:06:57,970 >> {'loss': 0.3447, 'grad_norm': 14.163284301757812, 'learning_rate': 1e-07, 'epoch': 0.13208570653647952, 'num_input_tokens_seen': 7732199424, 'completed': '100.00% (3_687 / 3_687)', 'remaining time': '0:00:00', 'throughput': '8851.63', 'gpu_mem_free': '5581MB'} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( [INFO|trainer.py:3503] 2024-12-27 03:07:22,100 >> Saving model checkpoint to /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3687 [INFO|configuration_utils.py:472] 2024-12-27 03:07:22,103 >> Configuration saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3687/config.json [INFO|configuration_utils.py:807] 2024-12-27 03:07:22,104 >> Configuration saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3687/generation_config.json [INFO|modeling_utils.py:2807] 2024-12-27 03:08:16,745 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 7 checkpoint shards. You can find where each parameters has been saved in the index located at /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3687/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2684] 2024-12-27 03:08:16,748 >> tokenizer config file saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3687/tokenizer_config.json [INFO|tokenization_utils_base.py:2693] 2024-12-27 03:08:16,750 >> Special tokens file saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3687/special_tokens_map.json /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( [WARNING|trainer.py:868] 2024-12-27 03:11:47,997 >> Save streaming dataset state: {'epoch': 0, 'sample_in_epoch': 7374, 'num_canonical_nodes': 1, 'shuffle_seed': 42, 'initial_physical_nodes': 1} [INFO|trainer.py:2394] 2024-12-27 03:11:48,390 >> Training completed. Do not forget to share your model on huggingface.co/models =) [INFO|trainer.py:175] 2024-12-27 03:11:48,392 >> {'train_runtime': 15793.1588, 'train_samples_per_second': 0.467, 'train_steps_per_second': 0.233, 'train_loss': 0.059379862848258066, 'epoch': 0.13208570653647952, 'num_input_tokens_seen': 7732199424, 'completed': '100.00% (3_687 / 3_687)', 'remaining time': '0:00:00', 'throughput': '0.00', 'gpu_mem_free': '5581MB'} /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( [INFO|trainer.py:3503] 2024-12-27 03:12:13,301 >> Saving model checkpoint to /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_ [INFO|configuration_utils.py:472] 2024-12-27 03:12:13,315 >> Configuration saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/config.json [INFO|configuration_utils.py:807] 2024-12-27 03:12:13,316 >> Configuration saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/generation_config.json [INFO|modeling_utils.py:2807] 2024-12-27 03:13:09,634 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 7 checkpoint shards. You can find where each parameters has been saved in the index located at /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2684] 2024-12-27 03:13:09,638 >> tokenizer config file saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/tokenizer_config.json [INFO|tokenization_utils_base.py:2693] 2024-12-27 03:13:09,639 >> Special tokens file saved in /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/special_tokens_map.json ***** train metrics ***** epoch = 0.1321 num_input_tokens_seen = 7732199424 train_loss = 0.0594 train_runtime = 4:23:13.15 train_samples_per_second = 0.467 train_steps_per_second = 0.233 [rank5]:[W1227 03:13:11.657057302 ProcessGroupNCCL.cpp:1168] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator()) [rank3]:[W1227 03:13:11.766298830 ProcessGroupNCCL.cpp:1168] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator()) [rank7]:[W1227 03:13:11.766528445 ProcessGroupNCCL.cpp:1168] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator()) [rank1]:[W1227 03:13:12.767104815 ProcessGroupNCCL.cpp:1168] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator()) [rank4]:[W1227 03:13:12.767902396 ProcessGroupNCCL.cpp:1168] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator()) [rank2]:[W1227 03:13:12.768622888 ProcessGroupNCCL.cpp:1168] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator()) [rank6]:[W1227 03:13:12.768986308 ProcessGroupNCCL.cpp:1168] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator()) wandb: - 0.015 MB of 0.015 MB uploaded wandb: \ 0.015 MB of 0.015 MB uploaded wandb: | 0.015 MB of 0.015 MB uploaded wandb: / 0.015 MB of 0.015 MB uploaded wandb: - 0.015 MB of 0.015 MB uploaded wandb: \ 0.015 MB of 0.015 MB uploaded wandb: | 0.015 MB of 0.015 MB uploaded wandb: / 0.015 MB of 0.015 MB uploaded wandb: - 0.015 MB of 0.015 MB uploaded wandb: \ 0.015 MB of 0.015 MB uploaded wandb: | 0.015 MB of 0.015 MB uploaded wandb: / 0.015 MB of 0.015 MB uploaded wandb: - 0.015 MB of 0.015 MB uploaded wandb: \ 0.015 MB of 0.015 MB uploaded wandb: | 0.015 MB of 0.015 MB uploaded wandb: / 0.015 MB of 0.015 MB uploaded wandb: - 0.015 MB of 0.015 MB uploaded wandb: \ 0.015 MB of 0.015 MB uploaded wandb: | 0.015 MB of 0.015 MB uploaded wandb: / 0.015 MB of 0.015 MB uploaded wandb: - 0.015 MB of 0.015 MB uploaded wandb: \ 0.015 MB of 0.015 MB uploaded wandb: | 0.043 MB of 0.223 MB uploaded (0.004 MB deduped) wandb: / 0.229 MB of 0.229 MB uploaded (0.004 MB deduped) wandb: wandb: Run history: wandb: train/epoch ▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███ wandb: train/global_step ▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███ wandb: train/grad_norm ▃▂▁▄▁▂▁▂▃▂▃▂▃▃▂▂▂▃▂▃▂▂▂▁▂▂▁▄▅▂▂▂▃▃▂▃▂█▃▂ wandb: train/learning_rate ██▇▇▇▆▆▆▆▅▅▅▄▄▄▄▃▃▃▃▃▃▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁ wandb: train/loss ▅▅▁█▂▅▂▄▄▄▅▃▄▄▃▃▄▃▁█▄▃▃▄▃▅▂▆▆▃▄▃▆▆▅▆▂▄▅▄ wandb: train/num_input_tokens_seen ▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███ wandb: wandb: Run summary: wandb: total_flos 4.352218066950134e+19 wandb: train/epoch 0.13209 wandb: train/global_step 3687 wandb: train/grad_norm 14.16328 wandb: train/learning_rate 0.0 wandb: train/loss 0.3447 wandb: train/num_input_tokens_seen 7732199424 wandb: train_loss 0.05938 wandb: train_runtime 15793.1588 wandb: train_samples_per_second 0.467 wandb: train_steps_per_second 0.233 wandb: wandb: 🚀 View run balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_ at: https://wandb.ai/chtmp223/prolong/runs/fta6xses wandb: ⭐️ View project at: https://wandb.ai/chtmp223/prolong wandb: Synced 6 W&B file(s), 0 media file(s), 2 artifact file(s) and 0 other file(s) wandb: Find logs at: /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/wandb/run-20241226_224836-fta6xses/logs wandb: WARNING The new W&B backend becomes opt-out in version 0.18.0; try it out with `wandb.require("core")`! See https://wandb.me/wandb-core for more information. [rank0]:[W1227 03:13:37.015040104 ProcessGroupNCCL.cpp:1168] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator()) /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/import_utils.py:575: FutureWarning: `is_torch_tpu_available` is deprecated and will be removed in 4.41.0. Please use the `is_torch_xla_available` instead. warnings.warn( 12/31/2024 15:06:56 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, fp16 training: False, bf16 training: True 12/31/2024 15:06:56 - INFO - __main__ - Additional arguments ScriptArguments(model_family='llama', model_name_or_path='/scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/', config_overrides=None, config_overrides_json='', config_name='/scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/', tokenizer_name='/scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/', cache_dir=None, use_fast_tokenizer=False, model_revision='main', use_auth_token=False, tokenized_mds_train=['/work/pi_miyyer_umass_edu/ctpham/BookClaim-dev/data/ft/bookclaim_balanced_pack_complete'], tokenized_mds_validation=[], tokenized_mds_test=[], token_scaled_loss=True) 12/31/2024 15:06:56 - INFO - __main__ - Checkpoint detected, resuming training at /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3687. To avoid this behavior, change the `--output_dir` or add `--overwrite_output_dir` to train from scratch. [INFO|tokenization_utils_base.py:2267] 2024-12-31 15:06:56,088 >> loading file tokenizer.json [INFO|tokenization_utils_base.py:2267] 2024-12-31 15:06:56,088 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:2267] 2024-12-31 15:06:56,088 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:2267] 2024-12-31 15:06:56,088 >> loading file tokenizer_config.json [INFO|tokenization_utils_base.py:2513] 2024-12-31 15:06:56,444 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [INFO|configuration_utils.py:731] 2024-12-31 15:06:56,446 >> loading configuration file /scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/config.json [INFO|configuration_utils.py:800] 2024-12-31 15:06:56,447 >> Model config LlamaConfig { "_name_or_path": "/scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": 128009, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 524288, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "rope_theta": 128000000.0, "tie_word_embeddings": false, "torch_dtype": "float32", "transformers_version": "4.44.2", "use_cache": true, "vocab_size": 128256 } 12/31/2024 15:06:56 - INFO - __main__ - Loaded tokenizer - CPU Memory usage: 638.86 MB RSS 12/31/2024 15:06:56 - INFO - __main__ - GPU 0: 5/81920 MB [INFO|modeling_utils.py:3675] 2024-12-31 15:06:56,467 >> loading weights file /scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/model.safetensors.index.json [INFO|configuration_utils.py:1038] 2024-12-31 15:06:56,472 >> Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128009, "pad_token_id": 0 } Loading checkpoint shards: 0%| | 0/7 [00:00> All model checkpoint weights were used when initializing LlamaForCausalLM. [INFO|modeling_utils.py:4515] 2024-12-31 15:06:57,505 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at /scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/. If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. [INFO|modeling_utils.py:4003] 2024-12-31 15:06:57,509 >> Generation config file not found, using a generation config created from the model config. 12/31/2024 15:06:57 - INFO - __main__ - Model: LlamaForCausalLM( (model): LlamaModel( (embed_tokens): Embedding(128256, 4096, padding_idx=0) (layers): ModuleList( (0-31): 32 x LlamaDecoderLayer( (self_attn): LlamaAttention( (q_proj): Linear(in_features=4096, out_features=4096, bias=False) (k_proj): Linear(in_features=4096, out_features=1024, bias=False) (v_proj): Linear(in_features=4096, out_features=1024, bias=False) (o_proj): Linear(in_features=4096, out_features=4096, bias=False) (rotary_emb): LlamaRotaryEmbedding() (distributed_attn_func): DistributedAttention() (distributed_varlen_attn_func): DistributedAttention() ) (mlp): LlamaMLP( (gate_proj): Linear(in_features=4096, out_features=14336, bias=False) (up_proj): Linear(in_features=4096, out_features=14336, bias=False) (down_proj): Linear(in_features=14336, out_features=4096, bias=False) (act_fn): SiLU() ) (input_layernorm): LlamaRMSNorm() (post_attention_layernorm): LlamaRMSNorm() ) ) (norm): LlamaRMSNorm() ) (lm_head): Linear(in_features=4096, out_features=128256, bias=False) ) 12/31/2024 15:06:57 - INFO - __main__ - Loaded model - CPU Memory usage: 663.36 MB RSS 12/31/2024 15:06:57 - INFO - __main__ - GPU 0: 5/81920 MB 12/31/2024 15:06:58 - INFO - training.dataset - Loading datasets for training 12/31/2024 15:06:58 - WARNING - streaming.base.dataset - Because `predownload` was not specified, it will default to 8*batch_size if batch_size is not None, otherwise 64. Prior to Streaming v0.7.0, `predownload` defaulted to max(batch_size, 256 * batch_size // num_canonical_nodes). 12/31/2024 15:06:58 - INFO - __main__ - Loaded training dataset > /work/pi_miyyer_umass_edu/ctpham/BookClaim-dev/prolong-final/finetune.py(229)main() -> print(tokenizer.decode(train_dataset[0]['input_ids'])) (Pdb) <|begin_of_text|><|start_header_id|>system<|end_header_id|> You are an expert at verifying claims from fictional narratives.<|eot_id|><|start_header_id|>user<|end_header_id|> You are provided with a context and a statement. Your task is to carefully read the context and then determine whether the statement is TRUE or FALSE. Answer TRUE if the statement is true in its entirety based on the context provided. Answer FALSE if any part of the statement is false based on the context provided. STORY OF THE DOOR Mr. Utterson the lawyer was a man of a rugged countenance that was never lighted by a smile; cold, scanty and embarrassed in discourse; backward in sentiment; lean, long, dusty, dreary and yet somehow lovable. At friendly meetings, and when the wine was to his taste, something eminently human beaconed from his eye; something indeed which never found its way into his talk, but which spoke not only in these silent symbols of the after-dinner face, but more often and loudly in the acts of his life. He was austere with himself; drank gin when he was alone, to mortify a taste for vintages; and though he enjoyed the theatre, had not crossed the doors of one for twenty years. But he had an approved tolerance for others; sometimes wondering, almost with envy, at the high pressure of spirits involved in their misdeeds; and in any extremity inclined to help rather than to reprove. “I incline to Cain’s heresy,” he used to say quaintly: “I let my brother go to the devil in his own way.” In this character, it was frequently his fortune to be the last reputable acquaintance and the last good influence in the lives of downgoing men. And to such as these, so long as they came about his chambers, he never marked a shade of change in his demeanour. No doubt the feat was easy to Mr. Utterson; for he was undemonstrative at the best, and even his friendship seemed to be founded in a similar catholicity of good-nature. It is the mark of a modest man to accept his friendly circle ready-made from the hands of opportunity; and that was the lawyer’s way. His friends were those of his own blood or those whom he had known the longest; his affections, like ivy, were the growth of time, they implied no aptness in the object. Hence, no doubt the bond that united him to Mr. Richard Enfield, his distant kinsman, the well-known man about town. It was a nut to crack for many, what these two could see in each other, or what subject they could find in common. It was reported by those who encountered them in their Sunday walks, that they said nothing, looked singularly dull and would hail with obvious relief the appearance of a friend. For all that, the two men put the greatest store by these excursions, counted them the chief jewel of each week, and not only set aside occasions of pleasure, but even resisted the calls of business, that they might enjoy them uninterrupted. It chanced on one of these rambles that their way led them down a by-street in a busy quarter of London. The street was small and what is called quiet, but it drove a thriving trade on the weekdays. The inhabitants were all doing well, it seemed and all emulously hoping to do better still, and laying out the surplus of their grains in coquetry; so that the shop fronts stood along that thoroughfare with an air of invitation, like rows of smiling saleswomen. Even on Sunday, when it veiled its more florid charms and lay comparatively empty of passage, the street shone out in contrast to its dingy neighbourhood, like a fire in a forest; and with its freshly painted shutters, well-polished brasses, and general cleanliness and gaiety of note, instantly caught and pleased the eye of the passenger. Two doors from one corner, on the left hand going east the line was broken by the entry of a court; and just at that point a certain sinister block of building thrust forward its gable on the street. It was two storeys high; showed no window, nothing but a door on the lower storey and a blind forehead of discoloured wall on the upper; and bore in every feature, the marks of prolonged and sordid negligence. The door, which was equipped with neither bell nor knocker, was blistered and distained. Tramps slouched into the recess and struck matches on the panels; children kept shop upon the steps; the schoolboy had tried his knife on the mouldings; and for close on a generation, no one had appeared to drive away these random visitors or to repair their ravages. Mr. Enfield and the lawyer were on the other side of the by-street; but when they came abreast of the entry, the former lifted up his cane and pointed. “Did you ever remark that door?” he asked; and when his companion had replied in the affirmative, “It is connected in my mind,” added he, “with a very odd story.” “Indeed?” said Mr. Utterson, with a slight change of voice, “and what was that?” “Well, it was this way,” returned Mr. Enfield: “I was coming home from some place at the end of the world, about three o’clock of a black winter morning, and my way lay through a part of town where there was literally nothing to be seen but lamps. Street after street and all the folks asleep—street after street, all lighted up as if for a procession and all as empty as a church—till at last I got into that state of mind when a man listens and listens and begins to long for the sight of a policeman. All at once, I saw two figures: one a little man who was stumping along eastward at a good walk, and the other a girl of maybe eight or ten who was running as hard as she was able down a cross street. Well, sir, the two ran into one another naturally enough at the corner; and then came the horrible part of the thing; for the man trampled calmly over the child’s body and left her screaming on the ground. It sounds nothing to hear, but it was hellish to see. It wasn’t like a man; it was like some damned Juggernaut. I gave a few halloa, took to my heels, collared my gentleman, and brought him back to where there was already quite a group about the screaming child. He was perfectly cool and made no resistance, but gave me one look, so ugly that it brought out the sweat on me like running. The people who had turned out were the girl’s own family; and pretty soon, the doctor, for whom she had been sent put in his appearance. Well, the child was not much the worse, more frightened, according to the sawbones; and there you might have supposed would be an end to it. But there was one curious circumstance. I had taken a loathing to my gentleman at first sight. So had the child’s family, which was only natural. But the doctor’s case was what struck me. He was the usual cut and dry apothecary, of no particular age and colour, with a strong Edinburgh accent and about as emotional as a bagpipe. Well, sir, he was like the rest of us; every time he looked at my prisoner, I saw that sawbones turn sick and white with the desire to kill him. I knew what was in his mind, just as he knew what was in mine; and killing being out of the question, we did the next best. We told the man we could and would make such a scandal out of this as should make his name stink from one end of London to the other. If he had any friends or any credit, we undertook that he should lose them. And all the time, as we were pitching it in red hot, we were keeping the women off him as best we could for they were as wild as harpies. I never saw a circle of such hateful faces; and there was the man in the middle, with a kind of black sneering coolness—frightened too, I could see that—but carrying it off, sir, really like Satan. ‘If you choose to make capital out of this accident,’ said he, ‘I am naturally helpless. No gentleman but wishes to avoid a scene,’ says he. ‘Name your figure.’ Well, we screwed him up to a hundred pounds for the child’s family; he would have clearly liked to stick out; but there was something about the lot of us that meant mischief, and at last he struck. The next thing was to get the money; and where do you think he carried us but to that place with the door?—whipped out a key, went in, and presently came back with the matter of ten pounds in gold and a cheque for the balance on Coutts’s, drawn payable to bearer and signed with a name that I can’t mention, though it’s one of the points of my story, but it was a name at least very well known and often printed. The figure was stiff; but the signature was good for more than that if it was only genuine. I took the liberty of pointing out to my gentleman that the whole business looked apocryphal, and that a man does not, in real life, walk into a cellar door at four in the morning and come out with another man’s cheque for close upon a hundred pounds. But he was quite easy and sneering. ‘Set your mind at rest,’ says he, ‘I will stay with you till the banks open and cash the cheque myself.’ So we all set off, the doctor, and the child’s father, and our friend and myself, and passed the rest of the night in my chambers; and next day, when we had breakfasted, went in a body to the bank. I gave in the cheque myself, and said I had every reason to believe it was a forgery. Not a bit of it. The cheque was genuine.” “Tut-tut!” said Mr. Utterson. “I see you feel as I do,” said Mr. Enfield. “Yes, it’s a bad story. For my man was a fellow that nobody could have to do with, a really damnable man; and the person that drew the cheque is the very pink of the proprieties, celebrated too, and (what makes it worse) one of your fellows who do what they call good. Blackmail, I suppose; an honest man paying through the nose for some of the capers of his youth. Black Mail House is what I call the place with the door, in consequence. Though even that, you know, is far from explaining all,” he added, and with the words fell into a vein of musing. From this he was recalled by Mr. Utterson asking rather suddenly: “And you don’t know if the drawer of the cheque lives there?” “A likely place, isn’t it?” returned Mr. Enfield. “But I happen to have noticed his address; he lives in some square or other.” “And you never asked about the—place with the door?” said Mr. Utterson. “No, sir; I had a delicacy,” was the reply. “I feel very strongly about putting questions; it partakes too much of the style of the day of judgment. You start a question, and it’s like starting a stone. You sit quietly on the top of a hill; and away the stone goes, starting others; and presently some bland old bird (the last you would have thought of) is knocked on the head in his own back garden and the family have to change their name. No sir, I make it a rule of mine: the more it looks like Queer Street, the less I ask.” “A very good rule, too,” said the lawyer. “But I have studied the place for myself,” continued Mr. Enfield. “It seems scarcely a house. There is no other door, and nobody goes in or out of that one but, once in a great while, the gentleman of my adventure. There are three windows looking on the court on the first floor; none below; the windows are always shut but they’re clean. And then there is a chimney which is generally smoking; so somebody must live there. And yet it’s not so sure; for the buildings are so packed together about the court, that it’s hard to say where one ends and another begins.” The pair walked on again for a while in silence; and then “Enfield,” said Mr. Utterson, “that’s a good rule of yours.” “Yes, I think it is,” returned Enfield. “But for all that,” continued the lawyer, “there’s one point I want to ask. I want to ask the name of that man who walked over the child.” “Well,” said Mr. Enfield, “I can’t see what harm it would do. It was a man of the name of Hyde.” “Hm,” said Mr. Utterson. “What sort of a man is he to see?” “He is not easy to describe. There is something wrong with his appearance; something displeasing, something down-right detestable. I never saw a man I so disliked, and yet I scarce know why. He must be deformed somewhere; he gives a strong feeling of deformity, although I couldn’t specify the point. He’s an extraordinary looking man, and yet I really can name nothing out of the way. No, sir; I can make no hand of it; I can’t describe him. And it’s not want of memory; for I declare I can see him this moment.” Mr. Utterson again walked some way in silence and obviously under a weight of consideration. “You are sure he used a key?” he inquired at last. “My dear sir...” began Enfield, surprised out of himself. “Yes, I know,” said Utterson; “I know it must seem strange. The fact is, if I do not ask you the name of the other party, it is because I know it already. You see, Richard, your tale has gone home. If you have been inexact in any point you had better correct it.” “I think you might have warned me,” returned the other with a touch of sullenness. “But I have been pedantically exact, as you call it. The fellow had a key; and what’s more, he has it still. I saw him use it not a week ago.” Mr. Utterson sighed deeply but said never a word; and the young man presently resumed. “Here is another lesson to say nothing,” said he. “I am ashamed of my long tongue. Let us make a bargain never to refer to this again.” “With all my heart,” said the lawyer. “I shake hands on that, Richard.” SEARCH FOR MR. HYDE That evening Mr. Utterson came home to his bachelor house in sombre spirits and sat down to dinner without relish. It was his custom of a Sunday, when this meal was over, to sit close by the fire, a volume of some dry divinity on his reading desk, until the clock of the neighbouring church rang out the hour of twelve, when he would go soberly and gratefully to bed. On this night however, as soon as the cloth was taken away, he took up a candle and went into his business room. There he opened his safe, took from the most private part of it a document endorsed on the envelope as Dr. Jekyll’s Will and sat down with a clouded brow to study its contents. The will was holograph, for Mr. Utterson though he took charge of it now that it was made, had refused to lend the least assistance in the making of it; it provided not only that, in case of the decease of Henry Jekyll, M.D., D.C.L., L.L.D., F.R.S., etc., all his possessions were to pass into the hands of his “friend and benefactor Edward Hyde,” but that in case of Dr. Jekyll’s “disappearance or unexplained absence for any period exceeding three calendar months,” the said Edward Hyde should step into the said Henry Jekyll’s shoes without further delay and free from any burthen or obligation beyond the payment of a few small sums to the members of the doctor’s household. This document had long been the lawyer’s eyesore. It offended him both as a lawyer and as a lover of the sane and customary sides of life, to whom the fanciful was the immodest. And hitherto it was his ignorance of Mr. Hyde that had swelled his indignation; now, by a sudden turn, it was his knowledge. It was already bad enough when the name was but a name of which he could learn no more. It was worse when it began to be clothed upon with detestable attributes; and out of the shifting, insubstantial mists that had so long baffled his eye, there leaped up the sudden, definite presentment of a fiend. “I thought it was madness,” he said, as he replaced the obnoxious paper in the safe, “and now I begin to fear it is disgrace.” With that he blew out his candle, put on a greatcoat, and set forth in the direction of Cavendish Square, that citadel of medicine, where his friend, the great Dr. Lanyon, had his house and received his crowding patients. “If anyone knows, it will be Lanyon,” he had thought. The solemn butler knew and welcomed him; he was subjected to no stage of delay, but ushered direct from the door to the dining-room where Dr. Lanyon sat alone over his wine. This was a hearty, healthy, dapper, red-faced gentleman, with a shock of hair prematurely white, and a boisterous and decided manner. At sight of Mr. Utterson, he sprang up from his chair and welcomed him with both hands. The geniality, as was the way of the man, was somewhat theatrical to the eye; but it reposed on genuine feeling. For these two were old friends, old mates both at school and college, both thorough respectors of themselves and of each other, and what does not always follow, men who thoroughly enjoyed each other’s company. After a little rambling talk, the lawyer led up to the subject which so disagreeably preoccupied his mind. “I suppose, Lanyon,” said he, “you and I must be the two oldest friends that Henry Jekyll has?” “I wish the friends were younger,” chuckled Dr. Lanyon. “But I suppose we are. And what of that? I see little of him now.” “Indeed?” said Utterson. “I thought you had a bond of common interest.” “We had,” was the reply. “But it is more than ten years since Henry Jekyll became too fanciful for me. He began to go wrong, wrong in mind; and though of course I continue to take an interest in him for old sake’s sake, as they say, I see and I have seen devilish little of the man. Such unscientific balderdash,” added the doctor, flushing suddenly purple, “would have estranged Damon and Pythias.” This little spirit of temper was somewhat of a relief to Mr. Utterson. “They have only differed on some point of science,” he thought; and being a man of no scientific passions (except in the matter of conveyancing), he even added: “It is nothing worse than that!” He gave his friend a few seconds to recover his composure, and then approached the question he had come to put. “Did you ever come across a _protégé_ of his—one Hyde?” he asked. “Hyde?” repeated Lanyon. “No. Never heard of him. Since my time.” That was the amount of information that the lawyer carried back with him to the great, dark bed on which he tossed to and fro, until the small hours of the morning began to grow large. It was a night of little ease to his toiling mind, toiling in mere darkness and besieged by questions. Six o’clock struck on the bells of the church that was so conveniently near to Mr. Utterson’s dwelling, and still he was digging at the problem. Hitherto it had touched him on the intellectual side alone; but now his imagination also was engaged, or rather enslaved; and as he lay and tossed in the gross darkness of the night and the curtained room, Mr. Enfield’s tale went by before his mind in a scroll of lighted pictures. He would be aware of the great field of lamps of a nocturnal city; then of the figure of a man walking swiftly; then of a child running from the doctor’s; and then these met, and that human Juggernaut trod the child down and passed on regardless of her screams. Or else he would see a room in a rich house, where his friend lay asleep, dreaming and smiling at his dreams; and then the door of that room would be opened, the curtains of the bed plucked apart, the sleeper recalled, and lo! there would stand by his side a figure to whom power was given, and even at that dead hour, he must rise and do its bidding. The figure in these two phases haunted the lawyer all night; and if at any time he dozed over, it was but to see it glide more stealthily through sleeping houses, or move the more swiftly and still the more swiftly, even to dizziness, through wider labyrinths of lamplighted city, and at every street corner crush a child and leave her screaming. And still the figure had no face by which he might know it; even in his dreams, it had no face, or one that baffled him and melted before his eyes; and thus it was that there sprang up and grew apace in the lawyer’s mind a singularly strong, almost an inordinate, curiosity to behold the features of the real Mr. Hyde. If he could but once set eyes on him, he thought the mystery would lighten and perhaps roll altogether away, as was the habit of mysterious things when well examined. He might see a reason for his friend’s strange preference or bondage (call it which you please) and even for the startling clause of the will. At least it would be a face worth seeing: the face of a man who was without bowels of mercy: a face which had but to show itself to raise up, in the mind of the unimpressionable Enfield, a spirit of enduring hatred. From that time forward, Mr. Utterson began to haunt the door in the by-street of shops. In the morning before office hours, at noon when business was plenty and time scarce, at night under the face of the fogged city moon, by all lights and at all hours of solitude or concourse, the lawyer was to be found on his chosen post. “If he be Mr. Hyde,” he had thought, “I shall be Mr. Seek.” And at last his patience was rewarded. It was a fine dry night; frost in the air; the streets as clean as a ballroom floor; the lamps, unshaken by any wind, drawing a regular pattern of light and shadow. By ten o’clock, when the shops were closed, the by-street was very solitary and, in spite of the low growl of London from all round, very silent. Small sounds carried far; domestic sounds out of the houses were clearly audible on either side of the roadway; and the rumour of the approach of any passenger preceded him by a long time. Mr. Utterson had been some minutes at his post, when he was aware of an odd light footstep drawing near. In the course of his nightly patrols, he had long grown accustomed to the quaint effect with which the footfalls of a single person, while he is still a great way off, suddenly spring out distinct from the vast hum and clatter of the city. Yet his attention had never before been so sharply and decisively arrested; and it was with a strong, superstitious prevision of success that he withdrew into the entry of the court. The steps drew swiftly nearer, and swelled out suddenly louder as they turned the end of the street. The lawyer, looking forth from the entry, could soon see what manner of man he had to deal with. He was small and very plainly dressed and the look of him, even at that distance, went somehow strongly against the watcher’s inclination. But he made straight for the door, crossing the roadway to save time; and as he came, he drew a key from his pocket like one approaching home. Mr. Utterson stepped out and touched him on the shoulder as he passed. “Mr. Hyde, I think?” Mr. Hyde shrank back with a hissing intake of the breath. But his fear was only momentary; and though he did not look the lawyer in the face, he answered coolly enough: “That is my name. What do you want?” “I see you are going in,” returned the lawyer. “I am an old friend of Dr. Jekyll’s—Mr. Utterson of Gaunt Street—you must have heard of my name; and meeting you so conveniently, I thought you might admit me.” “You will not find Dr. Jekyll; he is from home,” replied Mr. Hyde, blowing in the key. And then suddenly, but still without looking up, “How did you know me?” he asked. “On your side,” said Mr. Utterson “will you do me a favour?” “With pleasure,” replied the other. “What shall it be?” “Will you let me see your face?” asked the lawyer. Mr. Hyde appeared to hesitate, and then, as if upon some sudden reflection, fronted about with an air of defiance; and the pair stared at each other pretty fixedly for a few seconds. “Now I shall know you again,” said Mr. Utterson. “It may be useful.” “Yes,” returned Mr. Hyde, “It is as well we have met; and _à propos_, you should have my address.” And he gave a number of a street in Soho. “Good God!” thought Mr. Utterson, “can he, too, have been thinking of the will?” But he kept his feelings to himself and only grunted in acknowledgment of the address. “And now,” said the other, “how did you know me?” “By description,” was the reply. “Whose description?” “We have common friends,” said Mr. Utterson. “Common friends,” echoed Mr. Hyde, a little hoarsely. “Who are they?” “Jekyll, for instance,” said the lawyer. “He never told you,” cried Mr. Hyde, with a flush of anger. “I did not think you would have lied.” “Come,” said Mr. Utterson, “that is not fitting language.” The other snarled aloud into a savage laugh; and the next moment, with extraordinary quickness, he had unlocked the door and disappeared into the house. The lawyer stood awhile when Mr. Hyde had left him, the picture of disquietude. Then he began slowly to mount the street, pausing every step or two and putting his hand to his brow like a man in mental perplexity. The problem he was thus debating as he walked, was one of a class that is rarely solved. Mr. Hyde was pale and dwarfish, he gave an impression of deformity without any nameable malformation, he had a displeasing smile, he had borne himself to the lawyer with a sort of murderous mixture of timidity and boldness, and he spoke with a husky, whispering and somewhat broken voice; all these were points against him, but not all of these together could explain the hitherto unknown disgust, loathing and fear with which Mr. Utterson regarded him. “There must be something else,” said the perplexed gentleman. “There _is_ something more, if I could find a name for it. God bless me, the man seems hardly human! Something troglodytic, shall we say? or can it be the old story of Dr. Fell? or is it the mere radiance of a foul soul that thus transpires through, and transfigures, its clay continent? The last, I think; for, O my poor old Harry Jekyll, if ever I read Satan’s signature upon a face, it is on that of your new friend.” Round the corner from the by-street, there was a square of ancient, handsome houses, now for the most part decayed from their high estate and let in flats and chambers to all sorts and conditions of men; map-engravers, architects, shady lawyers and the agents of obscure enterprises. One house, however, second from the corner, was still occupied entire; and at the door of this, which wore a great air of wealth and comfort, though it was now plunged in darkness except for the fanlight, Mr. Utterson stopped and knocked. A well-dressed, elderly servant opened the door. “Is Dr. Jekyll at home, Poole?” asked the lawyer. “I will see, Mr. Utterson,” said Poole, admitting the visitor, as he spoke, into a large, low-roofed, comfortable hall paved with flags, warmed (after the fashion of a country house) by a bright, open fire, and furnished with costly cabinets of oak. “Will you wait here by the fire, sir? or shall I give you a light in the dining-room?” “Here, thank you,” said the lawyer, and he drew near and leaned on the tall fender. This hall, in which he was now left alone, was a pet fancy of his friend the doctor’s; and Utterson himself was wont to speak of it as the pleasantest room in London. But tonight there was a shudder in his blood; the face of Hyde sat heavy on his memory; he felt (what was rare with him) a nausea and distaste of life; and in the gloom of his spirits, he seemed to read a menace in the flickering of the firelight on the polished cabinets and the uneasy starting of the shadow on the roof. He was ashamed of his relief, when Poole presently returned to announce that Dr. Jekyll was gone out. “I saw Mr. Hyde go in by the old dissecting room, Poole,” he said. “Is that right, when Dr. Jekyll is from home?” “Quite right, Mr. Utterson, sir,” replied the servant. “Mr. Hyde has a key.” “Your master seems to repose a great deal of trust in that young man, Poole,” resumed the other musingly. “Yes, sir, he does indeed,” said Poole. “We have all orders to obey him.” “I do not think I ever met Mr. Hyde?” asked Utterson. “O, dear no, sir. He never _dines_ here,” replied the butler. “Indeed we see very little of him on this side of the house; he mostly comes and goes by the laboratory.” “Well, good-night, Poole.” “Good-night, Mr. Utterson.” And the lawyer set out homeward with a very heavy heart. “Poor Harry Jekyll,” he thought, “my mind misgives me he is in deep waters! He was wild when he was young; a long while ago to be sure; but in the law of God, there is no statute of limitations. Ay, it must be that; the ghost of some old sin, the cancer of some concealed disgrace: punishment coming, _pede claudo_, years after memory has forgotten and self-love condoned the fault.” And the lawyer, scared by the thought, brooded awhile on his own past, groping in all the corners of memory, least by chance some Jack-in-the-Box of an old iniquity should leap to light there. His past was fairly blameless; few men could read the rolls of their life with less apprehension; yet he was humbled to the dust by the many ill things he had done, and raised up again into a sober and fearful gratitude by the many he had come so near to doing yet avoided. And then by a return on his former subject, he conceived a spark of hope. “This Master Hyde, if he were studied,” thought he, “must have secrets of his own; black secrets, by the look of him; secrets compared to which poor Jekyll’s worst would be like sunshine. Things cannot continue as they are. It turns me cold to think of this creature stealing like a thief to Harry’s bedside; poor Harry, what a wakening! And the danger of it; for if this Hyde suspects the existence of the will, he may grow impatient to inherit. Ay, I must put my shoulders to the wheel—if Jekyll will but let me,” he added, “if Jekyll will only let me.” For once more he saw before his mind’s eye, as clear as transparency, the strange clauses of the will. DR. JEKYLL WAS QUITE AT EASE A fortnight later, by excellent good fortune, the doctor gave one of his pleasant dinners to some five or six old cronies, all intelligent, reputable men and all judges of good wine; and Mr. Utterson so contrived that he remained behind after the others had departed. This was no new arrangement, but a thing that had befallen many scores of times. Where Utterson was liked, he was liked well. Hosts loved to detain the dry lawyer, when the light-hearted and loose-tongued had already their foot on the threshold; they liked to sit a while in his unobtrusive company, practising for solitude, sobering their minds in the man’s rich silence after the expense and strain of gaiety. To this rule, Dr. Jekyll was no exception; and as he now sat on the opposite side of the fire—a large, well-made, smooth-faced man of fifty, with something of a slyish cast perhaps, but every mark of capacity and kindness—you could see by his looks that he cherished for Mr. Utterson a sincere and warm affection. “I have been wanting to speak to you, Jekyll,” began the latter. “You know that will of yours?” A close observer might have gathered that the topic was distasteful; but the doctor carried it off gaily. “My poor Utterson,” said he, “you are unfortunate in such a client. I never saw a man so distressed as you were by my will; unless it were that hide-bound pedant, Lanyon, at what he called my scientific heresies. O, I know he’s a good fellow—you needn’t frown—an excellent fellow, and I always mean to see more of him; but a hide-bound pedant for all that; an ignorant, blatant pedant. I was never more disappointed in any man than Lanyon.” “You know I never approved of it,” pursued Utterson, ruthlessly disregarding the fresh topic. “My will? Yes, certainly, I know that,” said the doctor, a trifle sharply. “You have told me so.” “Well, I tell you so again,” continued the lawyer. “I have been learning something of young Hyde.” The large handsome face of Dr. Jekyll grew pale to the very lips, and there came a blackness about his eyes. “I do not care to hear more,” said he. “This is a matter I thought we had agreed to drop.” “What I heard was abominable,” said Utterson. “It can make no change. You do not understand my position,” returned the doctor, with a certain incoherency of manner. “I am painfully situated, Utterson; my position is a very strange—a very strange one. It is one of those affairs that cannot be mended by talking.” “Jekyll,” said Utterson, “you know me: I am a man to be trusted. Make a clean breast of this in confidence; and I make no doubt I can get you out of it.” “My good Utterson,” said the doctor, “this is very good of you, this is downright good of you, and I cannot find words to thank you in. I believe you fully; I would trust you before any man alive, ay, before myself, if I could make the choice; but indeed it isn’t what you fancy; it is not as bad as that; and just to put your good heart at rest, I will tell you one thing: the moment I choose, I can be rid of Mr. Hyde. I give you my hand upon that; and I thank you again and again; and I will just add one little word, Utterson, that I’m sure you’ll take in good part: this is a private matter, and I beg of you to let it sleep.” Utterson reflected a little, looking in the fire. “I have no doubt you are perfectly right,” he said at last, getting to his feet. “Well, but since we have touched upon this business, and for the last time I hope,” continued the doctor, “there is one point I should like you to understand. I have really a very great interest in poor Hyde. I know you have seen him; he told me so; and I fear he was rude. But I do sincerely take a great, a very great interest in that young man; and if I am taken away, Utterson, I wish you to promise me that you will bear with him and get his rights for him. I think you would, if you knew all; and it would be a weight off my mind if you would promise.” “I can’t pretend that I shall ever like him,” said the lawyer. “I don’t ask that,” pleaded Jekyll, laying his hand upon the other’s arm; “I only ask for justice; I only ask you to help him for my sake, when I am no longer here.” Utterson heaved an irrepressible sigh. “Well,” said he, “I promise.” THE CAREW MURDER CASE Nearly a year later, in the month of October, 18—, London was startled by a crime of singular ferocity and rendered all the more notable by the high position of the victim. The details were few and startling. A maid servant living alone in a house not far from the river, had gone upstairs to bed about eleven. Although a fog rolled over the city in the small hours, the early part of the night was cloudless, and the lane, which the maid’s window overlooked, was brilliantly lit by the full moon. It seems she was romantically given, for she sat down upon her box, which stood immediately under the window, and fell into a dream of musing. Never (she used to say, with streaming tears, when she narrated that experience), never had she felt more at peace with all men or thought more kindly of the world. And as she so sat she became aware of an aged beautiful gentleman with white hair, drawing near along the lane; and advancing to meet him, another and very small gentleman, to whom at first she paid less attention. When they had come within speech (which was just under the maid’s eyes) the older man bowed and accosted the other with a very pretty manner of politeness. It did not seem as if the subject of his address were of great importance; indeed, from his pointing, it sometimes appeared as if he were only inquiring his way; but the moon shone on his face as he spoke, and the girl was pleased to watch it, it seemed to breathe such an innocent and old-world kindness of disposition, yet with something high too, as of a well-founded self-content. Presently her eye wandered to the other, and she was surprised to recognise in him a certain Mr. Hyde, who had once visited her master and for whom she had conceived a dislike. He had in his hand a heavy cane, with which he was trifling; but he answered never a word, and seemed to listen with an ill-contained impatience. And then all of a sudden he broke out in a great flame of anger, stamping with his foot, brandishing the cane, and carrying on (as the maid described it) like a madman. The old gentleman took a step back, with the air of one very much surprised and a trifle hurt; and at that Mr. Hyde broke out of all bounds and clubbed him to the earth. And next moment, with ape-like fury, he was trampling his victim under foot and hailing down a storm of blows, under which the bones were audibly shattered and the body jumped upon the roadway. At the horror of these sights and sounds, the maid fainted. It was two o’clock when she came to herself and called for the police. The murderer was gone long ago; but there lay his victim in the middle of the lane, incredibly mangled. The stick with which the deed had been done, although it was of some rare and very tough and heavy wood, had broken in the middle under the stress of this insensate cruelty; and one splintered half had rolled in the neighbouring gutter—the other, without doubt, had been carried away by the murderer. A purse and gold watch were found upon the victim: but no cards or papers, except a sealed and stamped envelope, which he had been probably carrying to the post, and which bore the name and address of Mr. Utterson. This was brought to the lawyer the next morning, before he was out of bed; and he had no sooner seen it and been told the circumstances, than he shot out a solemn lip. “I shall say nothing till I have seen the body,” said he; “this may be very serious. Have the kindness to wait while I dress.” And with the same grave countenance he hurried through his breakfast and drove to the police station, whither the body had been carried. As soon as he came into the cell, he nodded. “Yes,” said he, “I recognise him. I am sorry to say that this is Sir Danvers Carew.” “Good God, sir,” exclaimed the officer, “is it possible?” And the next moment his eye lighted up with professional ambition. “This will make a deal of noise,” he said. “And perhaps you can help us to the man.” And he briefly narrated what the maid had seen, and showed the broken stick. Mr. Utterson had already quailed at the name of Hyde; but when the stick was laid before him, he could doubt no longer; broken and battered as it was, he recognised it for one that he had himself presented many years before to Henry Jekyll. “Is this Mr. Hyde a person of small stature?” he inquired. “Particularly small and particularly wicked-looking, is what the maid calls him,” said the officer. Mr. Utterson reflected; and then, raising his head, “If you will come with me in my cab,” he said, “I think I can take you to his house.” It was by this time about nine in the morning, and the first fog of the season. A great chocolate-coloured pall lowered over heaven, but the wind was continually charging and routing these embattled vapours; so that as the cab crawled from street to street, Mr. Utterson beheld a marvelous number of degrees and hues of twilight; for here it would be dark like the back-end of evening; and there would be a glow of a rich, lurid brown, like the light of some strange conflagration; and here, for a moment, the fog would be quite broken up, and a haggard shaft of daylight would glance in between the swirling wreaths. The dismal quarter of Soho seen under these changing glimpses, with its muddy ways, and slatternly passengers, and its lamps, which had never been extinguished or had been kindled afresh to combat this mournful reinvasion of darkness, seemed, in the lawyer’s eyes, like a district of some city in a nightmare. The thoughts of his mind, besides, were of the gloomiest dye; and when he glanced at the companion of his drive, he was conscious of some touch of that terror of the law and the law’s officers, which may at times assail the most honest. As the cab drew up before the address indicated, the fog lifted a little and showed him a dingy street, a gin palace, a low French eating house, a shop for the retail of penny numbers and twopenny salads, many ragged children huddled in the doorways, and many women of many different nationalities passing out, key in hand, to have a morning glass; and the next moment the fog settled down again upon that part, as brown as umber, and cut him off from his blackguardly surroundings. This was the home of Henry Jekyll’s favourite; of a man who was heir to a quarter of a million sterling. An ivory-faced and silvery-haired old woman opened the door. She had an evil face, smoothed by hypocrisy: but her manners were excellent. Yes, she said, this was Mr. Hyde’s, but he was not at home; he had been in that night very late, but he had gone away again in less than an hour; there was nothing strange in that; his habits were very irregular, and he was often absent; for instance, it was nearly two months since she had seen him till yesterday. “Very well, then, we wish to see his rooms,” said the lawyer; and when the woman began to declare it was impossible, “I had better tell you who this person is,” he added. “This is Inspector Newcomen of Scotland Yard.” A flash of odious joy appeared upon the woman’s face. “Ah!” said she, “he is in trouble! What has he done?” Mr. Utterson and the inspector exchanged glances. “He don’t seem a very popular character,” observed the latter. “And now, my good woman, just let me and this gentleman have a look about us.” In the whole extent of the house, which but for the old woman remained otherwise empty, Mr. Hyde had only used a couple of rooms; but these were furnished with luxury and good taste. A closet was filled with wine; the plate was of silver, the napery elegant; a good picture hung upon the walls, a gift (as Utterson supposed) from Henry Jekyll, who was much of a connoisseur; and the carpets were of many plies and agreeable in colour. At this moment, however, the rooms bore every mark of having been recently and hurriedly ransacked; clothes lay about the floor, with their pockets inside out; lock-fast drawers stood open; and on the hearth there lay a pile of grey ashes, as though many papers had been burned. From these embers the inspector disinterred the butt end of a green cheque book, which had resisted the action of the fire; the other half of the stick was found behind the door; and as this clinched his suspicions, the officer declared himself delighted. A visit to the bank, where several thousand pounds were found to be lying to the murderer’s credit, completed his gratification. “You may depend upon it, sir,” he told Mr. Utterson: “I have him in my hand. He must have lost his head, or he never would have left the stick or, above all, burned the cheque book. Why, money’s life to the man. We have nothing to do but wait for him at the bank, and get out the handbills.” This last, however, was not so easy of accomplishment; for Mr. Hyde had numbered few familiars—even the master of the servant maid had only seen him twice; his family could nowhere be traced; he had never been photographed; and the few who could describe him differed widely, as common observers will. Only on one point were they agreed; and that was the haunting sense of unexpressed deformity with which the fugitive impressed his beholders. INCIDENT OF THE LETTER It was late in the afternoon, when Mr. Utterson found his way to Dr. Jekyll’s door, where he was at once admitted by Poole, and carried down by the kitchen offices and across a yard which had once been a garden, to the building which was indifferently known as the laboratory or dissecting rooms. The doctor had bought the house from the heirs of a celebrated surgeon; and his own tastes being rather chemical than anatomical, had changed the destination of the block at the bottom of the garden. It was the first time that the lawyer had been received in that part of his friend’s quarters; and he eyed the dingy, windowless structure with curiosity, and gazed round with a distasteful sense of strangeness as he crossed the theatre, once crowded with eager students and now lying gaunt and silent, the tables laden with chemical apparatus, the floor strewn with crates and littered with packing straw, and the light falling dimly through the foggy cupola. At the further end, a flight of stairs mounted to a door covered with red baize; and through this, Mr. Utterson was at last received into the doctor’s cabinet. It was a large room fitted round with glass presses, furnished, among other things, with a cheval-glass and a business table, and looking out upon the court by three dusty windows barred with iron. The fire burned in the grate; a lamp was set lighted on the chimney shelf, for even in the houses the fog began to lie thickly; and there, close up to the warmth, sat Dr. Jekyll, looking deathly sick. He did not rise to meet his visitor, but held out a cold hand and bade him welcome in a changed voice. “And now,” said Mr. Utterson, as soon as Poole had left them, “you have heard the news?” The doctor shuddered. “They were crying it in the square,” he said. “I heard them in my dining-room.” “One word,” said the lawyer. “Carew was my client, but so are you, and I want to know what I am doing. You have not been mad enough to hide this fellow?” “Utterson, I swear to God,” cried the doctor, “I swear to God I will never set eyes on him again. I bind my honour to you that I am done with him in this world. It is all at an end. And indeed he does not want my help; you do not know him as I do; he is safe, he is quite safe; mark my words, he will never more be heard of.” The lawyer listened gloomily; he did not like his friend’s feverish manner. “You seem pretty sure of him,” said he; “and for your sake, I hope you may be right. If it came to a trial, your name might appear.” “I am quite sure of him,” replied Jekyll; “I have grounds for certainty that I cannot share with any one. But there is one thing on which you may advise me. I have—I have received a letter; and I am at a loss whether I should show it to the police. I should like to leave it in your hands, Utterson; you would judge wisely, I am sure; I have so great a trust in you.” “You fear, I suppose, that it might lead to his detection?” asked the lawyer. “No,” said the other. “I cannot say that I care what becomes of Hyde; I am quite done with him. I was thinking of my own character, which this hateful business has rather exposed.” Utterson ruminated awhile; he was surprised at his friend’s selfishness, and yet relieved by it. “Well,” said he, at last, “let me see the letter.” The letter was written in an odd, upright hand and signed “Edward Hyde”: and it signified, briefly enough, that the writer’s benefactor, Dr. Jekyll, whom he had long so unworthily repaid for a thousand generosities, need labour under no alarm for his safety, as he had means of escape on which he placed a sure dependence. The lawyer liked this letter well enough; it put a better colour on the intimacy than he had looked for; and he blamed himself for some of his past suspicions. “Have you the envelope?” he asked. “I burned it,” replied Jekyll, “before I thought what I was about. But it bore no postmark. The note was handed in.” “Shall I keep this and sleep upon it?” asked Utterson. “I wish you to judge for me entirely,” was the reply. “I have lost confidence in myself.” “Well, I shall consider,” returned the lawyer. “And now one word more: it was Hyde who dictated the terms in your will about that disappearance?” The doctor seemed seized with a qualm of faintness; he shut his mouth tight and nodded. “I knew it,” said Utterson. “He meant to murder you. You had a fine escape.” “I have had what is far more to the purpose,” returned the doctor solemnly: “I have had a lesson—O God, Utterson, what a lesson I have had!” And he covered his face for a moment with his hands. On his way out, the lawyer stopped and had a word or two with Poole. “By the bye,” said he, “there was a letter handed in to-day: what was the messenger like?” But Poole was positive nothing had come except by post; “and only circulars by that,” he added. This news sent off the visitor with his fears renewed. Plainly the letter had come by the laboratory door; possibly, indeed, it had been written in the cabinet; and if that were so, it must be differently judged, and handled with the more caution. The newsboys, as he went, were crying themselves hoarse along the footways: “Special edition. Shocking murder of an M.P.” That was the funeral oration of one friend and client; and he could not help a certain apprehension lest the good name of another should be sucked down in the eddy of the scandal. It was, at least, a ticklish decision that he had to make; and self-reliant as he was by habit, he began to cherish a longing for advice. It was not to be had directly; but perhaps, he thought, it might be fished for. Presently after, he sat on one side of his own hearth, with Mr. Guest, his head clerk, upon the other, and midway between, at a nicely calculated distance from the fire, a bottle of a particular old wine that had long dwelt unsunned in the foundations of his house. The fog still slept on the wing above the drowned city, where the lamps glimmered like carbuncles; and through the muffle and smother of these fallen clouds, the procession of the town’s life was still rolling in through the great arteries with a sound as of a mighty wind. But the room was gay with firelight. In the bottle the acids were long ago resolved; the imperial dye had softened with time, as the colour grows richer in stained windows; and the glow of hot autumn afternoons on hillside vineyards, was ready to be set free and to disperse the fogs of London. Insensibly the lawyer melted. There was no man from whom he kept fewer secrets than Mr. Guest; and he was not always sure that he kept as many as he meant. Guest had often been on business to the doctor’s; he knew Poole; he could scarce have failed to hear of Mr. Hyde’s familiarity about the house; he might draw conclusions: was it not as well, then, that he should see a letter which put that mystery to right? and above all since Guest, being a great student and critic of handwriting, would consider the step natural and obliging? The clerk, besides, was a man of counsel; he could scarce read so strange a document without dropping a remark; and by that remark Mr. Utterson might shape his future course. “This is a sad business about Sir Danvers,” he said. “Yes, sir, indeed. It has elicited a great deal of public feeling,” returned Guest. “The man, of course, was mad.” “I should like to hear your views on that,” replied Utterson. “I have a document here in his handwriting; it is between ourselves, for I scarce know what to do about it; it is an ugly business at the best. But there it is; quite in your way: a murderer’s autograph.” Guest’s eyes brightened, and he sat down at once and studied it with passion. “No sir,” he said: “not mad; but it is an odd hand.” “And by all accounts a very odd writer,” added the lawyer. Just then the servant entered with a note. “Is that from Dr. Jekyll, sir?” inquired the clerk. “I thought I knew the writing. Anything private, Mr. Utterson?” “Only an invitation to dinner. Why? Do you want to see it?” “One moment. I thank you, sir;” and the clerk laid the two sheets of paper alongside and sedulously compared their contents. “Thank you, sir,” he said at last, returning both; “it’s a very interesting autograph.” There was a pause, during which Mr. Utterson struggled with himself. “Why did you compare them, Guest?” he inquired suddenly. “Well, sir,” returned the clerk, “there’s a rather singular resemblance; the two hands are in many points identical: only differently sloped.” “Rather quaint,” said Utterson. “It is, as you say, rather quaint,” returned Guest. “I wouldn’t speak of this note, you know,” said the master. “No, sir,” said the clerk. “I understand.” But no sooner was Mr. Utterson alone that night, than he locked the note into his safe, where it reposed from that time forward. “What!” he thought. “Henry Jekyll forge for a murderer!” And his blood ran cold in his veins. INCIDENT OF DR. LANYON Time ran on; thousands of pounds were offered in reward, for the death of Sir Danvers was resented as a public injury; but Mr. Hyde had disappeared out of the ken of the police as though he had never existed. Much of his past was unearthed, indeed, and all disreputable: tales came out of the man’s cruelty, at once so callous and violent; of his vile life, of his strange associates, of the hatred that seemed to have surrounded his career; but of his present whereabouts, not a whisper. From the time he had left the house in Soho on the morning of the murder, he was simply blotted out; and gradually, as time drew on, Mr. Utterson began to recover from the hotness of his alarm, and to grow more at quiet with himself. The death of Sir Danvers was, to his way of thinking, more than paid for by the disappearance of Mr. Hyde. Now that that evil influence had been withdrawn, a new life began for Dr. Jekyll. He came out of his seclusion, renewed relations with his friends, became once more their familiar guest and entertainer; and whilst he had always been known for charities, he was now no less distinguished for religion. He was busy, he was much in the open air, he did good; his face seemed to open and brighten, as if with an inward consciousness of service; and for more than two months, the doctor was at peace. On the 8th of January Utterson had dined at the doctor’s with a small party; Lanyon had been there; and the face of the host had looked from one to the other as in the old days when the trio were inseparable friends. On the 12th, and again on the 14th, the door was shut against the lawyer. “The doctor was confined to the house,” Poole said, “and saw no one.” On the 15th, he tried again, and was again refused; and having now been used for the last two months to see his friend almost daily, he found this return of solitude to weigh upon his spirits. The fifth night he had in Guest to dine with him; and the sixth he betook himself to Dr. Lanyon’s. There at least he was not denied admittance; but when he came in, he was shocked at the change which had taken place in the doctor’s appearance. He had his death-warrant written legibly upon his face. The rosy man had grown pale; his flesh had fallen away; he was visibly balder and older; and yet it was not so much these tokens of a swift physical decay that arrested the lawyer’s notice, as a look in the eye and quality of manner that seemed to testify to some deep-seated terror of the mind. It was unlikely that the doctor should fear death; and yet that was what Utterson was tempted to suspect. “Yes,” he thought; “he is a doctor, he must know his own state and that his days are counted; and the knowledge is more than he can bear.” And yet when Utterson remarked on his ill looks, it was with an air of great firmness that Lanyon declared himself a doomed man. “I have had a shock,” he said, “and I shall never recover. It is a question of weeks. Well, life has been pleasant; I liked it; yes, sir, I used to like it. I sometimes think if we knew all, we should be more glad to get away.” “Jekyll is ill, too,” observed Utterson. “Have you seen him?” But Lanyon’s face changed, and he held up a trembling hand. “I wish to see or hear no more of Dr. Jekyll,” he said in a loud, unsteady voice. “I am quite done with that person; and I beg that you will spare me any allusion to one whom I regard as dead.” “Tut, tut!” said Mr. Utterson; and then after a considerable pause, “Can’t I do anything?” he inquired. “We are three very old friends, Lanyon; we shall not live to make others.” “Nothing can be done,” returned Lanyon; “ask himself.” “He will not see me,” said the lawyer. “I am not surprised at that,” was the reply. “Some day, Utterson, after I am dead, you may perhaps come to learn the right and wrong of this. I cannot tell you. And in the meantime, if you can sit and talk with me of other things, for God’s sake, stay and do so; but if you cannot keep clear of this accursed topic, then in God’s name, go, for I cannot bear it.” As soon as he got home, Utterson sat down and wrote to Jekyll, complaining of his exclusion from the house, and asking the cause of this unhappy break with Lanyon; and the next day brought him a long answer, often very pathetically worded, and sometimes darkly mysterious in drift. The quarrel with Lanyon was incurable. “I do not blame our old friend,” Jekyll wrote, “but I share his view that we must never meet. I mean from henceforth to lead a life of extreme seclusion; you must not be surprised, nor must you doubt my friendship, if my door is often shut even to you. You must suffer me to go my own dark way. I have brought on myself a punishment and a danger that I cannot name. If I am the chief of sinners, I am the chief of sufferers also. I could not think that this earth contained a place for sufferings and terrors so unmanning; and you can do but one thing, Utterson, to lighten this destiny, and that is to respect my silence.” Utterson was amazed; the dark influence of Hyde had been withdrawn, the doctor had returned to his old tasks and amities; a week ago, the prospect had smiled with every promise of a cheerful and an honoured age; and now in a moment, friendship, and peace of mind, and the whole tenor of his life were wrecked. So great and unprepared a change pointed to madness; but in view of Lanyon’s manner and words, there must lie for it some deeper ground. A week afterwards Dr. Lanyon took to his bed, and in something less than a fortnight he was dead. The night after the funeral, at which he had been sadly affected, Utterson locked the door of his business room, and sitting there by the light of a melancholy candle, drew out and set before him an envelope addressed by the hand and sealed with the seal of his dead friend. “PRIVATE: for the hands of G. J. Utterson ALONE, and in case of his predecease _to be destroyed unread_,” so it was emphatically superscribed; and the lawyer dreaded to behold the contents. “I have buried one friend to-day,” he thought: “what if this should cost me another?” And then he condemned the fear as a disloyalty, and broke the seal. Within there was another enclosure, likewise sealed, and marked upon the cover as “not to be opened till the death or disappearance of Dr. Henry Jekyll.” Utterson could not trust his eyes. Yes, it was disappearance; here again, as in the mad will which he had long ago restored to its author, here again were the idea of a disappearance and the name of Henry Jekyll bracketted. But in the will, that idea had sprung from the sinister suggestion of the man Hyde; it was set there with a purpose all too plain and horrible. Written by the hand of Lanyon, what should it mean? A great curiosity came on the trustee, to disregard the prohibition and dive at once to the bottom of these mysteries; but professional honour and faith to his dead friend were stringent obligations; and the packet slept in the inmost corner of his private safe. It is one thing to mortify curiosity, another to conquer it; and it may be doubted if, from that day forth, Utterson desired the society of his surviving friend with the same eagerness. He thought of him kindly; but his thoughts were disquieted and fearful. He went to call indeed; but he was perhaps relieved to be denied admittance; perhaps, in his heart, he preferred to speak with Poole upon the doorstep and surrounded by the air and sounds of the open city, rather than to be admitted into that house of voluntary bondage, and to sit and speak with its inscrutable recluse. Poole had, indeed, no very pleasant news to communicate. The doctor, it appeared, now more than ever confined himself to the cabinet over the laboratory, where he would sometimes even sleep; he was out of spirits, he had grown very silent, he did not read; it seemed as if he had something on his mind. Utterson became so used to the unvarying character of these reports, that he fell off little by little in the frequency of his visits. INCIDENT AT THE WINDOW It chanced on Sunday, when Mr. Utterson was on his usual walk with Mr. Enfield, that their way lay once again through the by-street; and that when they came in front of the door, both stopped to gaze on it. “Well,” said Enfield, “that story’s at an end at least. We shall never see more of Mr. Hyde.” “I hope not,” said Utterson. “Did I ever tell you that I once saw him, and shared your feeling of repulsion?” “It was impossible to do the one without the other,” returned Enfield. “And by the way, what an ass you must have thought me, not to know that this was a back way to Dr. Jekyll’s! It was partly your own fault that I found it out, even when I did.” “So you found it out, did you?” said Utterson. “But if that be so, we may step into the court and take a look at the windows. To tell you the truth, I am uneasy about poor Jekyll; and even outside, I feel as if the presence of a friend might do him good.” The court was very cool and a little damp, and full of premature twilight, although the sky, high up overhead, was still bright with sunset. The middle one of the three windows was half-way open; and sitting close beside it, taking the air with an infinite sadness of mien, like some disconsolate prisoner, Utterson saw Dr. Jekyll. “What! Jekyll!” he cried. “I trust you are better.” “I am very low, Utterson,” replied the doctor drearily, “very low. It will not last long, thank God.” “You stay too much indoors,” said the lawyer. “You should be out, whipping up the circulation like Mr. Enfield and me. (This is my cousin—Mr. Enfield—Dr. Jekyll.) Come now; get your hat and take a quick turn with us.” “You are very good,” sighed the other. “I should like to very much; but no, no, no, it is quite impossible; I dare not. But indeed, Utterson, I am very glad to see you; this is really a great pleasure; I would ask you and Mr. Enfield up, but the place is really not fit.” “Why, then,” said the lawyer, good-naturedly, “the best thing we can do is to stay down here and speak with you from where we are.” “That is just what I was about to venture to propose,” returned the doctor with a smile. But the words were hardly uttered, before the smile was struck out of his face and succeeded by an expression of such abject terror and despair, as froze the very blood of the two gentlemen below. They saw it but for a glimpse for the window was instantly thrust down; but that glimpse had been sufficient, and they turned and left the court without a word. In silence, too, they traversed the by-street; and it was not until they had come into a neighbouring thoroughfare, where even upon a Sunday there were still some stirrings of life, that Mr. Utterson at last turned and looked at his companion. They were both pale; and there was an answering horror in their eyes. “God forgive us, God forgive us,” said Mr. Utterson. But Mr. Enfield only nodded his head very seriously, and walked on once more in silence. THE LAST NIGHT Mr. Utterson was sitting by his fireside one evening after dinner, when he was surprised to receive a visit from Poole. “Bless me, Poole, what brings you here?” he cried; and then taking a second look at him, “What ails you?” he added; “is the doctor ill?” “Mr. Utterson,” said the man, “there is something wrong.” “Take a seat, and here is a glass of wine for you,” said the lawyer. “Now, take your time, and tell me plainly what you want.” “You know the doctor’s ways, sir,” replied Poole, “and how he shuts himself up. Well, he’s shut up again in the cabinet; and I don’t like it, sir—I wish I may die if I like it. Mr. Utterson, sir, I’m afraid.” “Now, my good man,” said the lawyer, “be explicit. What are you afraid of?” “I’ve been afraid for about a week,” returned Poole, doggedly disregarding the question, “and I can bear it no more.” The man’s appearance amply bore out his words; his manner was altered for the worse; and except for the moment when he had first announced his terror, he had not once looked the lawyer in the face. Even now, he sat with the glass of wine untasted on his knee, and his eyes directed to a corner of the floor. “I can bear it no more,” he repeated. “Come,” said the lawyer, “I see you have some good reason, Poole; I see there is something seriously amiss. Try to tell me what it is.” “I think there’s been foul play,” said Poole, hoarsely. “Foul play!” cried the lawyer, a good deal frightened and rather inclined to be irritated in consequence. “What foul play! What does the man mean?” “I daren’t say, sir,” was the answer; “but will you come along with me and see for yourself?” Mr. Utterson’s only answer was to rise and get his hat and greatcoat; but he observed with wonder the greatness of the relief that appeared upon the butler’s face, and perhaps with no less, that the wine was still untasted when he set it down to follow. It was a wild, cold, seasonable night of March, with a pale moon, lying on her back as though the wind had tilted her, and flying wrack of the most diaphanous and lawny texture. The wind made talking difficult, and flecked the blood into the face. It seemed to have swept the streets unusually bare of passengers, besides; for Mr. Utterson thought he had never seen that part of London so deserted. He could have wished it otherwise; never in his life had he been conscious of so sharp a wish to see and touch his fellow-creatures; for struggle as he might, there was borne in upon his mind a crushing anticipation of calamity. The square, when they got there, was full of wind and dust, and the thin trees in the garden were lashing themselves along the railing. Poole, who had kept all the way a pace or two ahead, now pulled up in the middle of the pavement, and in spite of the biting weather, took off his hat and mopped his brow with a red pocket-handkerchief. But for all the hurry of his coming, these were not the dews of exertion that he wiped away, but the moisture of some strangling anguish; for his face was white and his voice, when he spoke, harsh and broken. “Well, sir,” he said, “here we are, and God grant there be nothing wrong.” “Amen, Poole,” said the lawyer. Thereupon the servant knocked in a very guarded manner; the door was opened on the chain; and a voice asked from within, “Is that you, Poole?” “It’s all right,” said Poole. “Open the door.” The hall, when they entered it, was brightly lighted up; the fire was built high; and about the hearth the whole of the servants, men and women, stood huddled together like a flock of sheep. At the sight of Mr. Utterson, the housemaid broke into hysterical whimpering; and the cook, crying out “Bless God! it’s Mr. Utterson,” ran forward as if to take him in her arms. “What, what? Are you all here?” said the lawyer peevishly. “Very irregular, very unseemly; your master would be far from pleased.” “They’re all afraid,” said Poole. Blank silence followed, no one protesting; only the maid lifted her voice and now wept loudly. “Hold your tongue!” Poole said to her, with a ferocity of accent that testified to his own jangled nerves; and indeed, when the girl had so suddenly raised the note of her lamentation, they had all started and turned towards the inner door with faces of dreadful expectation. “And now,” continued the butler, addressing the knife-boy, “reach me a candle, and we’ll get this through hands at once.” And then he begged Mr. Utterson to follow him, and led the way to the back garden. “Now, sir,” said he, “you come as gently as you can. I want you to hear, and I don’t want you to be heard. And see here, sir, if by any chance he was to ask you in, don’t go.” Mr. Utterson’s nerves, at this unlooked-for termination, gave a jerk that nearly threw him from his balance; but he recollected his courage and followed the butler into the laboratory building through the surgical theatre, with its lumber of crates and bottles, to the foot of the stair. Here Poole motioned him to stand on one side and listen; while he himself, setting down the candle and making a great and obvious call on his resolution, mounted the steps and knocked with a somewhat uncertain hand on the red baize of the cabinet door. “Mr. Utterson, sir, asking to see you,” he called; and even as he did so, once more violently signed to the lawyer to give ear. A voice answered from within: “Tell him I cannot see anyone,” it said complainingly. “Thank you, sir,” said Poole, with a note of something like triumph in his voice; and taking up his candle, he led Mr. Utterson back across the yard and into the great kitchen, where the fire was out and the beetles were leaping on the floor. “Sir,” he said, looking Mr. Utterson in the eyes, “Was that my master’s voice?” “It seems much changed,” replied the lawyer, very pale, but giving look for look. “Changed? Well, yes, I think so,” said the butler. “Have I been twenty years in this man’s house, to be deceived about his voice? No, sir; master’s made away with; he was made away with eight days ago, when we heard him cry out upon the name of God; and _who’s_ in there instead of him, and _why_ it stays there, is a thing that cries to Heaven, Mr. Utterson!” “This is a very strange tale, Poole; this is rather a wild tale my man,” said Mr. Utterson, biting his finger. “Suppose it were as you suppose, supposing Dr. Jekyll to have been—well, murdered, what could induce the murderer to stay? That won’t hold water; it doesn’t commend itself to reason.” “Well, Mr. Utterson, you are a hard man to satisfy, but I’ll do it yet,” said Poole. “All this last week (you must know) him, or it, whatever it is that lives in that cabinet, has been crying night and day for some sort of medicine and cannot get it to his mind. It was sometimes his way—the master’s, that is—to write his orders on a sheet of paper and throw it on the stair. We’ve had nothing else this week back; nothing but papers, and a closed door, and the very meals left there to be smuggled in when nobody was looking. Well, sir, every day, ay, and twice and thrice in the same day, there have been orders and complaints, and I have been sent flying to all the wholesale chemists in town. Every time I brought the stuff back, there would be another paper telling me to return it, because it was not pure, and another order to a different firm. This drug is wanted bitter bad, sir, whatever for.” “Have you any of these papers?” asked Mr. Utterson. Poole felt in his pocket and handed out a crumpled note, which the lawyer, bending nearer to the candle, carefully examined. Its contents ran thus: “Dr. Jekyll presents his compliments to Messrs. Maw. He assures them that their last sample is impure and quite useless for his present purpose. In the year 18—, Dr. J. purchased a somewhat large quantity from Messrs. M. He now begs them to search with most sedulous care, and should any of the same quality be left, forward it to him at once. Expense is no consideration. The importance of this to Dr. J. can hardly be exaggerated.” So far the letter had run composedly enough, but here with a sudden splutter of the pen, the writer’s emotion had broken loose. “For God’s sake,” he added, “find me some of the old.” “This is a strange note,” said Mr. Utterson; and then sharply, “How do you come to have it open?” “The man at Maw’s was main angry, sir, and he threw it back to me like so much dirt,” returned Poole. “This is unquestionably the doctor’s hand, do you know?” resumed the lawyer. “I thought it looked like it,” said the servant rather sulkily; and then, with another voice, “But what matters hand of write?” he said. “I’ve seen him!” “Seen him?” repeated Mr. Utterson. “Well?” “That’s it!” said Poole. “It was this way. I came suddenly into the theatre from the garden. It seems he had slipped out to look for this drug or whatever it is; for the cabinet door was open, and there he was at the far end of the room digging among the crates. He looked up when I came in, gave a kind of cry, and whipped upstairs into the cabinet. It was but for one minute that I saw him, but the hair stood upon my head like quills. Sir, if that was my master, why had he a mask upon his face? If it was my master, why did he cry out like a rat, and run from me? I have served him long enough. And then...” The man paused and passed his hand over his face. “These are all very strange circumstances,” said Mr. Utterson, “but I think I begin to see daylight. Your master, Poole, is plainly seized with one of those maladies that both torture and deform the sufferer; hence, for aught I know, the alteration of his voice; hence the mask and the avoidance of his friends; hence his eagerness to find this drug, by means of which the poor soul retains some hope of ultimate recovery—God grant that he be not deceived! There is my explanation; it is sad enough, Poole, ay, and appalling to consider; but it is plain and natural, hangs well together, and delivers us from all exorbitant alarms.” “Sir,” said the butler, turning to a sort of mottled pallor, “that thing was not my master, and there’s the truth. My master”—here he looked round him and began to whisper—“is a tall, fine build of a man, and this was more of a dwarf.” Utterson attempted to protest. “O, sir,” cried Poole, “do you think I do not know my master after twenty years? Do you think I do not know where his head comes to in the cabinet door, where I saw him every morning of my life? No, sir, that thing in the mask was never Dr. Jekyll—God knows what it was, but it was never Dr. Jekyll; and it is the belief of my heart that there was murder done.” “Poole,” replied the lawyer, “if you say that, it will become my duty to make certain. Much as I desire to spare your master’s feelings, much as I am puzzled by this note which seems to prove him to be still alive, I shall consider it my duty to break in that door.” “Ah, Mr. Utterson, that’s talking!” cried the butler. “And now comes the second question,” resumed Utterson: “Who is going to do it?” “Why, you and me, sir,” was the undaunted reply. “That’s very well said,” returned the lawyer; “and whatever comes of it, I shall make it my business to see you are no loser.” “There is an axe in the theatre,” continued Poole; “and you might take the kitchen poker for yourself.” The lawyer took that rude but weighty instrument into his hand, and balanced it. “Do you know, Poole,” he said, looking up, “that you and I are about to place ourselves in a position of some peril?” “You may say so, sir, indeed,” returned the butler. “It is well, then that we should be frank,” said the other. “We both think more than we have said; let us make a clean breast. This masked figure that you saw, did you recognise it?” “Well, sir, it went so quick, and the creature was so doubled up, that I could hardly swear to that,” was the answer. “But if you mean, was it Mr. Hyde?—why, yes, I think it was! You see, it was much of the same bigness; and it had the same quick, light way with it; and then who else could have got in by the laboratory door? You have not forgot, sir, that at the time of the murder he had still the key with him? But that’s not all. I don’t know, Mr. Utterson, if you ever met this Mr. Hyde?” “Yes,” said the lawyer, “I once spoke with him.” “Then you must know as well as the rest of us that there was something queer about that gentleman—something that gave a man a turn—I don’t know rightly how to say it, sir, beyond this: that you felt in your marrow kind of cold and thin.” “I own I felt something of what you describe,” said Mr. Utterson. “Quite so, sir,” returned Poole. “Well, when that masked thing like a monkey jumped from among the chemicals and whipped into the cabinet, it went down my spine like ice. O, I know it’s not evidence, Mr. Utterson; I’m book-learned enough for that; but a man has his feelings, and I give you my bible-word it was Mr. Hyde!” “Ay, ay,” said the lawyer. “My fears incline to the same point. Evil, I fear, founded—evil was sure to come—of that connection. Ay truly, I believe you; I believe poor Harry is killed; and I believe his murderer (for what purpose, God alone can tell) is still lurking in his victim’s room. Well, let our name be vengeance. Call Bradshaw.” The footman came at the summons, very white and nervous. “Pull yourself together, Bradshaw,” said the lawyer. “This suspense, I know, is telling upon all of you; but it is now our intention to make an end of it. Poole, here, and I are going to force our way into the cabinet. If all is well, my shoulders are broad enough to bear the blame. Meanwhile, lest anything should really be amiss, or any malefactor seek to escape by the back, you and the boy must go round the corner with a pair of good sticks and take your post at the laboratory door. We give you ten minutes to get to your stations.” As Bradshaw left, the lawyer looked at his watch. “And now, Poole, let us get to ours,” he said; and taking the poker under his arm, led the way into the yard. The scud had banked over the moon, and it was now quite dark. The wind, which only broke in puffs and draughts into that deep well of building, tossed the light of the candle to and fro about their steps, until they came into the shelter of the theatre, where they sat down silently to wait. London hummed solemnly all around; but nearer at hand, the stillness was only broken by the sounds of a footfall moving to and fro along the cabinet floor. “So it will walk all day, sir,” whispered Poole; “ay, and the better part of the night. Only when a new sample comes from the chemist, there’s a bit of a break. Ah, it’s an ill conscience that’s such an enemy to rest! Ah, sir, there’s blood foully shed in every step of it! But hark again, a little closer—put your heart in your ears, Mr. Utterson, and tell me, is that the doctor’s foot?” The steps fell lightly and oddly, with a certain swing, for all they went so slowly; it was different indeed from the heavy creaking tread of Henry Jekyll. Utterson sighed. “Is there never anything else?” he asked. Poole nodded. “Once,” he said. “Once I heard it weeping!” “Weeping? how that?” said the lawyer, conscious of a sudden chill of horror. “Weeping like a woman or a lost soul,” said the butler. “I came away with that upon my heart, that I could have wept too.” But now the ten minutes drew to an end. Poole disinterred the axe from under a stack of packing straw; the candle was set upon the nearest table to light them to the attack; and they drew near with bated breath to where that patient foot was still going up and down, up and down, in the quiet of the night. “Jekyll,” cried Utterson, with a loud voice, “I demand to see you.” He paused a moment, but there came no reply. “I give you fair warning, our suspicions are aroused, and I must and shall see you,” he resumed; “if not by fair means, then by foul—if not of your consent, then by brute force!” “Utterson,” said the voice, “for God’s sake, have mercy!” “Ah, that’s not Jekyll’s voice—it’s Hyde’s!” cried Utterson. “Down with the door, Poole!” Poole swung the axe over his shoulder; the blow shook the building, and the red baize door leaped against the lock and hinges. A dismal screech, as of mere animal terror, rang from the cabinet. Up went the axe again, and again the panels crashed and the frame bounded; four times the blow fell; but the wood was tough and the fittings were of excellent workmanship; and it was not until the fifth, that the lock burst and the wreck of the door fell inwards on the carpet. The besiegers, appalled by their own riot and the stillness that had succeeded, stood back a little and peered in. There lay the cabinet before their eyes in the quiet lamplight, a good fire glowing and chattering on the hearth, the kettle singing its thin strain, a drawer or two open, papers neatly set forth on the business table, and nearer the fire, the things laid out for tea; the quietest room, you would have said, and, but for the glazed presses full of chemicals, the most commonplace that night in London. Right in the middle there lay the body of a man sorely contorted and still twitching. They drew near on tiptoe, turned it on its back and beheld the face of Edward Hyde. He was dressed in clothes far too large for him, clothes of the doctor’s bigness; the cords of his face still moved with a semblance of life, but life was quite gone; and by the crushed phial in the hand and the strong smell of kernels that hung upon the air, Utterson knew that he was looking on the body of a self-destroyer. “We have come too late,” he said sternly, “whether to save or punish. Hyde is gone to his account; and it only remains for us to find the body of your master.” The far greater proportion of the building was occupied by the theatre, which filled almost the whole ground storey and was lighted from above, and by the cabinet, which formed an upper storey at one end and looked upon the court. A corridor joined the theatre to the door on the by-street; and with this the cabinet communicated separately by a second flight of stairs. There were besides a few dark closets and a spacious cellar. All these they now thoroughly examined. Each closet needed but a glance, for all were empty, and all, by the dust that fell from their doors, had stood long unopened. The cellar, indeed, was filled with crazy lumber, mostly dating from the times of the surgeon who was Jekyll’s predecessor; but even as they opened the door they were advertised of the uselessness of further search, by the fall of a perfect mat of cobweb which had for years sealed up the entrance. Nowhere was there any trace of Henry Jekyll, dead or alive. Poole stamped on the flags of the corridor. “He must be buried here,” he said, hearkening to the sound. “Or he may have fled,” said Utterson, and he turned to examine the door in the by-street. It was locked; and lying near by on the flags, they found the key, already stained with rust. “This does not look like use,” observed the lawyer. “Use!” echoed Poole. “Do you not see, sir, it is broken? much as if a man had stamped on it.” “Ay,” continued Utterson, “and the fractures, too, are rusty.” The two men looked at each other with a scare. “This is beyond me, Poole,” said the lawyer. “Let us go back to the cabinet.” They mounted the stair in silence, and still with an occasional awestruck glance at the dead body, proceeded more thoroughly to examine the contents of the cabinet. At one table, there were traces of chemical work, various measured heaps of some white salt being laid on glass saucers, as though for an experiment in which the unhappy man had been prevented. “That is the same drug that I was always bringing him,” said Poole; and even as he spoke, the kettle with a startling noise boiled over. This brought them to the fireside, where the easy-chair was drawn cosily up, and the tea things stood ready to the sitter’s elbow, the very sugar in the cup. There were several books on a shelf; one lay beside the tea things open, and Utterson was amazed to find it a copy of a pious work, for which Jekyll had several times expressed a great esteem, annotated, in his own hand with startling blasphemies. Next, in the course of their review of the chamber, the searchers came to the cheval-glass, into whose depths they looked with an involuntary horror. But it was so turned as to show them nothing but the rosy glow playing on the roof, the fire sparkling in a hundred repetitions along the glazed front of the presses, and their own pale and fearful countenances stooping to look in. “This glass has seen some strange things, sir,” whispered Poole. “And surely none stranger than itself,” echoed the lawyer in the same tones. “For what did Jekyll”—he caught himself up at the word with a start, and then conquering the weakness—“what could Jekyll want with it?” he said. “You may say that!” said Poole. Next they turned to the business table. On the desk, among the neat array of papers, a large envelope was uppermost, and bore, in the doctor’s hand, the name of Mr. Utterson. The lawyer unsealed it, and several enclosures fell to the floor. The first was a will, drawn in the same eccentric terms as the one which he had returned six months before, to serve as a testament in case of death and as a deed of gift in case of disappearance; but in place of the name of Edward Hyde, the lawyer, with indescribable amazement read the name of Gabriel John Utterson. He looked at Poole, and then back at the paper, and last of all at the dead malefactor stretched upon the carpet. “My head goes round,” he said. “He has been all these days in possession; he had no cause to like me; he must have raged to see himself displaced; and he has not destroyed this document.” He caught up the next paper; it was a brief note in the doctor’s hand and dated at the top. “O Poole!” the lawyer cried, “he was alive and here this day. He cannot have been disposed of in so short a space; he must be still alive, he must have fled! And then, why fled? and how? and in that case, can we venture to declare this suicide? O, we must be careful. I foresee that we may yet involve your master in some dire catastrophe.” “Why don’t you read it, sir?” asked Poole. “Because I fear,” replied the lawyer solemnly. “God grant I have no cause for it!” And with that he brought the paper to his eyes and read as follows: “My dear Utterson,—When this shall fall into your hands, I shall have disappeared, under what circumstances I have not the penetration to foresee, but my instinct and all the circumstances of my nameless situation tell me that the end is sure and must be early. Go then, and first read the narrative which Lanyon warned me he was to place in your hands; and if you care to hear more, turn to the confession of “Your unworthy and unhappy friend, “HENRY JEKYLL.” “There was a third enclosure?” asked Utterson. “Here, sir,” said Poole, and gave into his hands a considerable packet sealed in several places. The lawyer put it in his pocket. “I would say nothing of this paper. If your master has fled or is dead, we may at least save his credit. It is now ten; I must go home and read these documents in quiet; but I shall be back before midnight, when we shall send for the police.” They went out, locking the door of the theatre behind them; and Utterson, once more leaving the servants gathered about the fire in the hall, trudged back to his office to read the two narratives in which this mystery was now to be explained. DR. LANYON’S NARRATIVE On the ninth of January, now four days ago, I received by the evening delivery a registered envelope, addressed in the hand of my colleague and old school companion, Henry Jekyll. I was a good deal surprised by this; for we were by no means in the habit of correspondence; I had seen the man, dined with him, indeed, the night before; and I could imagine nothing in our intercourse that should justify formality of registration. The contents increased my wonder; for this is how the letter ran: “10_th December_, 18—. “Dear Lanyon,—You are one of my oldest friends; and although we may have differed at times on scientific questions, I cannot remember, at least on my side, any break in our affection. There was never a day when, if you had said to me, ‘Jekyll, my life, my honour, my reason, depend upon you,’ I would not have sacrificed my left hand to help you. Lanyon, my life, my honour, my reason, are all at your mercy; if you fail me to-night, I am lost. You might suppose, after this preface, that I am going to ask you for something dishonourable to grant. Judge for yourself. “I want you to postpone all other engagements for to-night—ay, even if you were summoned to the bedside of an emperor; to take a cab, unless your carriage should be actually at the door; and with this letter in your hand for consultation, to drive straight to my house. Poole, my butler, has his orders; you will find him waiting your arrival with a locksmith. The door of my cabinet is then to be forced; and you are to go in alone; to open the glazed press (letter E) on the left hand, breaking the lock if it be shut; and to draw out, _with all its contents as they stand_, the fourth drawer from the top or (which is the same thing) the third from the bottom. In my extreme distress of mind, I have a morbid fear of misdirecting you; but even if I am in error, you may know the right drawer by its contents: some powders, a phial and a paper book. This drawer I beg of you to carry back with you to Cavendish Square exactly as it stands. “That is the first part of the service: now for the second. You should be back, if you set out at once on the receipt of this, long before midnight; but I will leave you that amount of margin, not only in the fear of one of those obstacles that can neither be prevented nor foreseen, but because an hour when your servants are in bed is to be preferred for what will then remain to do. At midnight, then, I have to ask you to be alone in your consulting room, to admit with your own hand into the house a man who will present himself in my name, and to place in his hands the drawer that you will have brought with you from my cabinet. Then you will have played your part and earned my gratitude completely. Five minutes afterwards, if you insist upon an explanation, you will have understood that these arrangements are of capital importance; and that by the neglect of one of them, fantastic as they must appear, you might have charged your conscience with my death or the shipwreck of my reason. “Confident as I am that you will not trifle with this appeal, my heart sinks and my hand trembles at the bare thought of such a possibility. Think of me at this hour, in a strange place, labouring under a blackness of distress that no fancy can exaggerate, and yet well aware that, if you will but punctually serve me, my troubles will roll away like a story that is told. Serve me, my dear Lanyon and save “Your friend, “H.J. “P.S.—I had already sealed this up when a fresh terror struck upon my soul. It is possible that the post-office may fail me, and this letter not come into your hands until to-morrow morning. In that case, dear Lanyon, do my errand when it shall be most convenient for you in the course of the day; and once more expect my messenger at midnight. It may then already be too late; and if that night passes without event, you will know that you have seen the last of Henry Jekyll.” Upon the reading of this letter, I made sure my colleague was insane; but till that was proved beyond the possibility of doubt, I felt bound to do as he requested. The less I understood of this farrago, the less I was in a position to judge of its importance; and an appeal so worded could not be set aside without a grave responsibility. I rose accordingly from table, got into a hansom, and drove straight to Jekyll’s house. The butler was awaiting my arrival; he had received by the same post as mine a registered letter of instruction, and had sent at once for a locksmith and a carpenter. The tradesmen came while we were yet speaking; and we moved in a body to old Dr. Denman’s surgical theatre, from which (as you are doubtless aware) Jekyll’s private cabinet is most conveniently entered. The door was very strong, the lock excellent; the carpenter avowed he would have great trouble and have to do much damage, if force were to be used; and the locksmith was near despair. But this last was a handy fellow, and after two hour’s work, the door stood open. The press marked E was unlocked; and I took out the drawer, had it filled up with straw and tied in a sheet, and returned with it to Cavendish Square. Here I proceeded to examine its contents. The powders were neatly enough made up, but not with the nicety of the dispensing chemist; so that it was plain they were of Jekyll’s private manufacture; and when I opened one of the wrappers I found what seemed to me a simple crystalline salt of a white colour. The phial, to which I next turned my attention, might have been about half full of a blood-red liquor, which was highly pungent to the sense of smell and seemed to me to contain phosphorus and some volatile ether. At the other ingredients I could make no guess. The book was an ordinary version book and contained little but a series of dates. These covered a period of many years, but I observed that the entries ceased nearly a year ago and quite abruptly. Here and there a brief remark was appended to a date, usually no more than a single word: “double” occurring perhaps six times in a total of several hundred entries; and once very early in the list and followed by several marks of exclamation, “total failure!!!” All this, though it whetted my curiosity, told me little that was definite. Here were a phial of some salt, and the record of a series of experiments that had led (like too many of Jekyll’s investigations) to no end of practical usefulness. How could the presence of these articles in my house affect either the honour, the sanity, or the life of my flighty colleague? If his messenger could go to one place, why could he not go to another? And even granting some impediment, why was this gentleman to be received by me in secret? The more I reflected the more convinced I grew that I was dealing with a case of cerebral disease; and though I dismissed my servants to bed, I loaded an old revolver, that I might be found in some posture of self-defence. Twelve o’clock had scarce rung out over London, ere the knocker sounded very gently on the door. I went myself at the summons, and found a small man crouching against the pillars of the portico. “Are you come from Dr. Jekyll?” I asked. He told me “yes” by a constrained gesture; and when I had bidden him enter, he did not obey me without a searching backward glance into the darkness of the square. There was a policeman not far off, advancing with his bull’s eye open; and at the sight, I thought my visitor started and made greater haste. These particulars struck me, I confess, disagreeably; and as I followed him into the bright light of the consulting room, I kept my hand ready on my weapon. Here, at last, I had a chance of clearly seeing him. I had never set eyes on him before, so much was certain. He was small, as I have said; I was struck besides with the shocking expression of his face, with his remarkable combination of great muscular activity and great apparent debility of constitution, and—last but not least—with the odd, subjective disturbance caused by his neighbourhood. This bore some resemblance to incipient rigour, and was accompanied by a marked sinking of the pulse. At the time, I set it down to some idiosyncratic, personal distaste, and merely wondered at the acuteness of the symptoms; but I have since had reason to believe the cause to lie much deeper in the nature of man, and to turn on some nobler hinge than the principle of hatred. This person (who had thus, from the first moment of his entrance, struck in me what I can only describe as a disgustful curiosity) was dressed in a fashion that would have made an ordinary person laughable; his clothes, that is to say, although they were of rich and sober fabric, were enormously too large for him in every measurement—the trousers hanging on his legs and rolled up to keep them from the ground, the waist of the coat below his haunches, and the collar sprawling wide upon his shoulders. Strange to relate, this ludicrous accoutrement was far from moving me to laughter. Rather, as there was something abnormal and misbegotten in the very essence of the creature that now faced me—something seizing, surprising and revolting—this fresh disparity seemed but to fit in with and to reinforce it; so that to my interest in the man’s nature and character, there was added a curiosity as to his origin, his life, his fortune and status in the world. These observations, though they have taken so great a space to be set down in, were yet the work of a few seconds. My visitor was, indeed, on fire with sombre excitement. “Have you got it?” he cried. “Have you got it?” And so lively was his impatience that he even laid his hand upon my arm and sought to shake me. I put him back, conscious at his touch of a certain icy pang along my blood. “Come, sir,” said I. “You forget that I have not yet the pleasure of your acquaintance. Be seated, if you please.” And I showed him an example, and sat down myself in my customary seat and with as fair an imitation of my ordinary manner to a patient, as the lateness of the hour, the nature of my preoccupations, and the horror I had of my visitor, would suffer me to muster. “I beg your pardon, Dr. Lanyon,” he replied civilly enough. “What you say is very well founded; and my impatience has shown its heels to my politeness. I come here at the instance of your colleague, Dr. Henry Jekyll, on a piece of business of some moment; and I understood...” He paused and put his hand to his throat, and I could see, in spite of his collected manner, that he was wrestling against the approaches of the hysteria—“I understood, a drawer...” But here I took pity on my visitor’s suspense, and some perhaps on my own growing curiosity. “There it is, sir,” said I, pointing to the drawer, where it lay on the floor behind a table and still covered with the sheet. He sprang to it, and then paused, and laid his hand upon his heart; I could hear his teeth grate with the convulsive action of his jaws; and his face was so ghastly to see that I grew alarmed both for his life and reason. “Compose yourself,” said I. He turned a dreadful smile to me, and as if with the decision of despair, plucked away the sheet. At sight of the contents, he uttered one loud sob of such immense relief that I sat petrified. And the next moment, in a voice that was already fairly well under control, “Have you a graduated glass?” he asked. I rose from my place with something of an effort and gave him what he asked. He thanked me with a smiling nod, measured out a few minims of the red tincture and added one of the powders. The mixture, which was at first of a reddish hue, began, in proportion as the crystals melted, to brighten in colour, to effervesce audibly, and to throw off small fumes of vapour. Suddenly and at the same moment, the ebullition ceased and the compound changed to a dark purple, which faded again more slowly to a watery green. My visitor, who had watched these metamorphoses with a keen eye, smiled, set down the glass upon the table, and then turned and looked upon me with an air of scrutiny. “And now,” said he, “to settle what remains. Will you be wise? will you be guided? will you suffer me to take this glass in my hand and to go forth from your house without further parley? or has the greed of curiosity too much command of you? Think before you answer, for it shall be done as you decide. As you decide, you shall be left as you were before, and neither richer nor wiser, unless the sense of service rendered to a man in mortal distress may be counted as a kind of riches of the soul. Or, if you shall so prefer to choose, a new province of knowledge and new avenues to fame and power shall be laid open to you, here, in this room, upon the instant; and your sight shall be blasted by a prodigy to stagger the unbelief of Satan.” “Sir,” said I, affecting a coolness that I was far from truly possessing, “you speak enigmas, and you will perhaps not wonder that I hear you with no very strong impression of belief. But I have gone too far in the way of inexplicable services to pause before I see the end.” “It is well,” replied my visitor. “Lanyon, you remember your vows: what follows is under the seal of our profession. And now, you who have so long been bound to the most narrow and material views, you who have denied the virtue of transcendental medicine, you who have derided your superiors—behold!” He put the glass to his lips and drank at one gulp. A cry followed; he reeled, staggered, clutched at the table and held on, staring with injected eyes, gasping with open mouth; and as I looked there came, I thought, a change—he seemed to swell—his face became suddenly black and the features seemed to melt and alter—and the next moment, I had sprung to my feet and leaped back against the wall, my arms raised to shield me from that prodigy, my mind submerged in terror. “O God!” I screamed, and “O God!” again and again; for there before my eyes—pale and shaken, and half fainting, and groping before him with his hands, like a man restored from death—there stood Henry Jekyll! What he told me in the next hour, I cannot bring my mind to set on paper. I saw what I saw, I heard what I heard, and my soul sickened at it; and yet now when that sight has faded from my eyes, I ask myself if I believe it, and I cannot answer. My life is shaken to its roots; sleep has left me; the deadliest terror sits by me at all hours of the day and night; and I feel that my days are numbered, and that I must die; and yet I shall die incredulous. As for the moral turpitude that man unveiled to me, even with tears of penitence, I cannot, even in memory, dwell on it without a start of horror. I will say but one thing, Utterson, and that (if you can bring your mind to credit it) will be more than enough. The creature who crept into my house that night was, on Jekyll’s own confession, known by the name of Hyde and hunted for in every corner of the land as the murderer of Carew. HASTIE LANYON. HENRY JEKYLL’S FULL STATEMENT OF THE CASE I was born in the year 18— to a large fortune, endowed besides with excellent parts, inclined by nature to industry, fond of the respect of the wise and good among my fellowmen, and thus, as might have been supposed, with every guarantee of an honourable and distinguished future. And indeed the worst of my faults was a certain impatient gaiety of disposition, such as has made the happiness of many, but such as I found it hard to reconcile with my imperious desire to carry my head high, and wear a more than commonly grave countenance before the public. Hence it came about that I concealed my pleasures; and that when I reached years of reflection, and began to look round me and take stock of my progress and position in the world, I stood already committed to a profound duplicity of life. Many a man would have even blazoned such irregularities as I was guilty of; but from the high views that I had set before me, I regarded and hid them with an almost morbid sense of shame. It was thus rather the exacting nature of my aspirations than any particular degradation in my faults, that made me what I was, and, with even a deeper trench than in the majority of men, severed in me those provinces of good and ill which divide and compound man’s dual nature. In this case, I was driven to reflect deeply and inveterately on that hard law of life, which lies at the root of religion and is one of the most plentiful springs of distress. Though so profound a double-dealer, I was in no sense a hypocrite; both sides of me were in dead earnest; I was no more myself when I laid aside restraint and plunged in shame, than when I laboured, in the eye of day, at the furtherance of knowledge or the relief of sorrow and suffering. And it chanced that the direction of my scientific studies, which led wholly towards the mystic and the transcendental, reacted and shed a strong light on this consciousness of the perennial war among my members. With every day, and from both sides of my intelligence, the moral and the intellectual, I thus drew steadily nearer to that truth, by whose partial discovery I have been doomed to such a dreadful shipwreck: that man is not truly one, but truly two. I say two, because the state of my own knowledge does not pass beyond that point. Others will follow, others will outstrip me on the same lines; and I hazard the guess that man will be ultimately known for a mere polity of multifarious, incongruous and independent denizens. I, for my part, from the nature of my life, advanced infallibly in one direction and in one direction only. It was on the moral side, and in my own person, that I learned to recognise the thorough and primitive duality of man; I saw that, of the two natures that contended in the field of my consciousness, even if I could rightly be said to be either, it was only because I was radically both; and from an early date, even before the course of my scientific discoveries had begun to suggest the most naked possibility of such a miracle, I had learned to dwell with pleasure, as a beloved daydream, on the thought of the separation of these elements. If each, I told myself, could be housed in separate identities, life would be relieved of all that was unbearable; the unjust might go his way, delivered from the aspirations and remorse of his more upright twin; and the just could walk steadfastly and securely on his upward path, doing the good things in which he found his pleasure, and no longer exposed to disgrace and penitence by the hands of this extraneous evil. It was the curse of mankind that these incongruous faggots were thus bound together—that in the agonised womb of consciousness, these polar twins should be continuously struggling. How, then were they dissociated? I was so far in my reflections when, as I have said, a side light began to shine upon the subject from the laboratory table. I began to perceive more deeply than it has ever yet been stated, the trembling immateriality, the mistlike transience, of this seemingly so solid body in which we walk attired. Certain agents I found to have the power to shake and pluck back that fleshly vestment, even as a wind might toss the curtains of a pavilion. For two good reasons, I will not enter deeply into this scientific branch of my confession. First, because I have been made to learn that the doom and burthen of our life is bound for ever on man’s shoulders, and when the attempt is made to cast it off, it but returns upon us with more unfamiliar and more awful pressure. Second, because, as my narrative will make, alas! too evident, my discoveries were incomplete. Enough then, that I not only recognised my natural body from the mere aura and effulgence of certain of the powers that made up my spirit, but managed to compound a drug by which these powers should be dethroned from their supremacy, and a second form and countenance substituted, none the less natural to me because they were the expression, and bore the stamp of lower elements in my soul. I hesitated long before I put this theory to the test of practice. I knew well that I risked death; for any drug that so potently controlled and shook the very fortress of identity, might, by the least scruple of an overdose or at the least inopportunity in the moment of exhibition, utterly blot out that immaterial tabernacle which I looked to it to change. But the temptation of a discovery so singular and profound at last overcame the suggestions of alarm. I had long since prepared my tincture; I purchased at once, from a firm of wholesale chemists, a large quantity of a particular salt which I knew, from my experiments, to be the last ingredient required; and late one accursed night, I compounded the elements, watched them boil and smoke together in the glass, and when the ebullition had subsided, with a strong glow of courage, drank off the potion. The most racking pangs succeeded: a grinding in the bones, deadly nausea, and a horror of the spirit that cannot be exceeded at the hour of birth or death. Then these agonies began swiftly to subside, and I came to myself as if out of a great sickness. There was something strange in my sensations, something indescribably new and, from its very novelty, incredibly sweet. I felt younger, lighter, happier in body; within I was conscious of a heady recklessness, a current of disordered sensual images running like a millrace in my fancy, a solution of the bonds of obligation, an unknown but not an innocent freedom of the soul. I knew myself, at the first breath of this new life, to be more wicked, tenfold more wicked, sold a slave to my original evil; and the thought, in that moment, braced and delighted me like wine. I stretched out my hands, exulting in the freshness of these sensations; and in the act, I was suddenly aware that I had lost in stature. There was no mirror, at that date, in my room; that which stands beside me as I write, was brought there later on and for the very purpose of these transformations. The night however, was far gone into the morning—the morning, black as it was, was nearly ripe for the conception of the day—the inmates of my house were locked in the most rigorous hours of slumber; and I determined, flushed as I was with hope and triumph, to venture in my new shape as far as to my bedroom. I crossed the yard, wherein the constellations looked down upon me, I could have thought, with wonder, the first creature of that sort that their unsleeping vigilance had yet disclosed to them; I stole through the corridors, a stranger in my own house; and coming to my room, I saw for the first time the appearance of Edward Hyde. I must here speak by theory alone, saying not that which I know, but that which I suppose to be most probable. The evil side of my nature, to which I had now transferred the stamping efficacy, was less robust and less developed than the good which I had just deposed. Again, in the course of my life, which had been, after all, nine tenths a life of effort, virtue and control, it had been much less exercised and much less exhausted. And hence, as I think, it came about that Edward Hyde was so much smaller, slighter and younger than Henry Jekyll. Even as good shone upon the countenance of the one, evil was written broadly and plainly on the face of the other. Evil besides (which I must still believe to be the lethal side of man) had left on that body an imprint of deformity and decay. And yet when I looked upon that ugly idol in the glass, I was conscious of no repugnance, rather of a leap of welcome. This, too, was myself. It seemed natural and human. In my eyes it bore a livelier image of the spirit, it seemed more express and single, than the imperfect and divided countenance I had been hitherto accustomed to call mine. And in so far I was doubtless right. I have observed that when I wore the semblance of Edward Hyde, none could come near to me at first without a visible misgiving of the flesh. This, as I take it, was because all human beings, as we meet them, are commingled out of good and evil: and Edward Hyde, alone in the ranks of mankind, was pure evil. I lingered but a moment at the mirror: the second and conclusive experiment had yet to be attempted; it yet remained to be seen if I had lost my identity beyond redemption and must flee before daylight from a house that was no longer mine; and hurrying back to my cabinet, I once more prepared and drank the cup, once more suffered the pangs of dissolution, and came to myself once more with the character, the stature and the face of Henry Jekyll. That night I had come to the fatal cross-roads. Had I approached my discovery in a more noble spirit, had I risked the experiment while under the empire of generous or pious aspirations, all must have been otherwise, and from these agonies of death and birth, I had come forth an angel instead of a fiend. The drug had no discriminating action; it was neither diabolical nor divine; it but shook the doors of the prisonhouse of my disposition; and like the captives of Philippi, that which stood within ran forth. At that time my virtue slumbered; my evil, kept awake by ambition, was alert and swift to seize the occasion; and the thing that was projected was Edward Hyde. Hence, although I had now two characters as well as two appearances, one was wholly evil, and the other was still the old Henry Jekyll, that incongruous compound of whose reformation and improvement I had already learned to despair. The movement was thus wholly toward the worse. Even at that time, I had not conquered my aversions to the dryness of a life of study. I would still be merrily disposed at times; and as my pleasures were (to say the least) undignified, and I was not only well known and highly considered, but growing towards the elderly man, this incoherency of my life was daily growing more unwelcome. It was on this side that my new power tempted me until I fell in slavery. I had but to drink the cup, to doff at once the body of the noted professor, and to assume, like a thick cloak, that of Edward Hyde. I smiled at the notion; it seemed to me at the time to be humourous; and I made my preparations with the most studious care. I took and furnished that house in Soho, to which Hyde was tracked by the police; and engaged as a housekeeper a creature whom I knew well to be silent and unscrupulous. On the other side, I announced to my servants that a Mr. Hyde (whom I described) was to have full liberty and power about my house in the square; and to parry mishaps, I even called and made myself a familiar object, in my second character. I next drew up that will to which you so much objected; so that if anything befell me in the person of Dr. Jekyll, I could enter on that of Edward Hyde without pecuniary loss. And thus fortified, as I supposed, on every side, I began to profit by the strange immunities of my position. Men have before hired bravos to transact their crimes, while their own person and reputation sat under shelter. I was the first that ever did so for his pleasures. I was the first that could plod in the public eye with a load of genial respectability, and in a moment, like a schoolboy, strip off these lendings and spring headlong into the sea of liberty. But for me, in my impenetrable mantle, the safety was complete. Think of it—I did not even exist! Let me but escape into my laboratory door, give me but a second or two to mix and swallow the draught that I had always standing ready; and whatever he had done, Edward Hyde would pass away like the stain of breath upon a mirror; and there in his stead, quietly at home, trimming the midnight lamp in his study, a man who could afford to laugh at suspicion, would be Henry Jekyll. The pleasures which I made haste to seek in my disguise were, as I have said, undignified; I would scarce use a harder term. But in the hands of Edward Hyde, they soon began to turn toward the monstrous. When I would come back from these excursions, I was often plunged into a kind of wonder at my vicarious depravity. This familiar that I called out of my own soul, and sent forth alone to do his good pleasure, was a being inherently malign and villainous; his every act and thought centered on self; drinking pleasure with bestial avidity from any degree of torture to another; relentless like a man of stone. Henry Jekyll stood at times aghast before the acts of Edward Hyde; but the situation was apart from ordinary laws, and insidiously relaxed the grasp of conscience. It was Hyde, after all, and Hyde alone, that was guilty. Jekyll was no worse; he woke again to his good qualities seemingly unimpaired; he would even make haste, where it was possible, to undo the evil done by Hyde. And thus his conscience slumbered. Into the details of the infamy at which I thus connived (for even now I can scarce grant that I committed it) I have no design of entering; I mean but to point out the warnings and the successive steps with which my chastisement approached. I met with one accident which, as it brought on no consequence, I shall no more than mention. An act of cruelty to a child aroused against me the anger of a passer-by, whom I recognised the other day in the person of your kinsman; the doctor and the child’s family joined him; there were moments when I feared for my life; and at last, in order to pacify their too just resentment, Edward Hyde had to bring them to the door, and pay them in a cheque drawn in the name of Henry Jekyll. But this danger was easily eliminated from the future, by opening an account at another bank in the name of Edward Hyde himself; and when, by sloping my own hand backward, I had supplied my double with a signature, I thought I sat beyond the reach of fate. Some two months before the murder of Sir Danvers, I had been out for one of my adventures, had returned at a late hour, and woke the next day in bed with somewhat odd sensations. It was in vain I looked about me; in vain I saw the decent furniture and tall proportions of my room in the square; in vain that I recognised the pattern of the bed curtains and the design of the mahogany frame; something still kept insisting that I was not where I was, that I had not wakened where I seemed to be, but in the little room in Soho where I was accustomed to sleep in the body of Edward Hyde. I smiled to myself, and in my psychological way, began lazily to inquire into the elements of this illusion, occasionally, even as I did so, dropping back into a comfortable morning doze. I was still so engaged when, in one of my more wakeful moments, my eyes fell upon my hand. Now the hand of Henry Jekyll (as you have often remarked) was professional in shape and size; it was large, firm, white and comely. But the hand which I now saw, clearly enough, in the yellow light of a mid-London morning, lying half shut on the bedclothes, was lean, corded, knuckly, of a dusky pallor and thickly shaded with a swart growth of hair. It was the hand of Edward Hyde. I must have stared upon it for near half a minute, sunk as I was in the mere stupidity of wonder, before terror woke up in my breast as sudden and startling as the crash of cymbals; and bounding from my bed I rushed to the mirror. At the sight that met my eyes, my blood was changed into something exquisitely thin and icy. Yes, I had gone to bed Henry Jekyll, I had awakened Edward Hyde. How was this to be explained? I asked myself; and then, with another bound of terror—how was it to be remedied? It was well on in the morning; the servants were up; all my drugs were in the cabinet—a long journey down two pairs of stairs, through the back passage, across the open court and through the anatomical theatre, from where I was then standing horror-struck. It might indeed be possible to cover my face; but of what use was that, when I was unable to conceal the alteration in my stature? And then with an overpowering sweetness of relief, it came back upon my mind that the servants were already used to the coming and going of my second self. I had soon dressed, as well as I was able, in clothes of my own size: had soon passed through the house, where Bradshaw stared and drew back at seeing Mr. Hyde at such an hour and in such a strange array; and ten minutes later, Dr. Jekyll had returned to his own shape and was sitting down, with a darkened brow, to make a feint of breakfasting. Small indeed was my appetite. This inexplicable incident, this reversal of my previous experience, seemed, like the Babylonian finger on the wall, to be spelling out the letters of my judgment; and I began to reflect more seriously than ever before on the issues and possibilities of my double existence. That part of me which I had the power of projecting, had lately been much exercised and nourished; it had seemed to me of late as though the body of Edward Hyde had grown in stature, as though (when I wore that form) I were conscious of a more generous tide of blood; and I began to spy a danger that, if this were much prolonged, the balance of my nature might be permanently overthrown, the power of voluntary change be forfeited, and the character of Edward Hyde become irrevocably mine. The power of the drug had not been always equally displayed. Once, very early in my career, it had totally failed me; since then I had been obliged on more than one occasion to double, and once, with infinite risk of death, to treble the amount; and these rare uncertainties had cast hitherto the sole shadow on my contentment. Now, however, and in the light of that morning’s accident, I was led to remark that whereas, in the beginning, the difficulty had been to throw off the body of Jekyll, it had of late gradually but decidedly transferred itself to the other side. All things therefore seemed to point to this; that I was slowly losing hold of my original and better self, and becoming slowly incorporated with my second and worse. Between these two, I now felt I had to choose. My two natures had memory in common, but all other faculties were most unequally shared between them. Jekyll (who was composite) now with the most sensitive apprehensions, now with a greedy gusto, projected and shared in the pleasures and adventures of Hyde; but Hyde was indifferent to Jekyll, or but remembered him as the mountain bandit remembers the cavern in which he conceals himself from pursuit. Jekyll had more than a father’s interest; Hyde had more than a son’s indifference. To cast in my lot with Jekyll, was to die to those appetites which I had long secretly indulged and had of late begun to pamper. To cast it in with Hyde, was to die to a thousand interests and aspirations, and to become, at a blow and forever, despised and friendless. The bargain might appear unequal; but there was still another consideration in the scales; for while Jekyll would suffer smartingly in the fires of abstinence, Hyde would be not even conscious of all that he had lost. Strange as my circumstances were, the terms of this debate are as old and commonplace as man; much the same inducements and alarms cast the die for any tempted and trembling sinner; and it fell out with me, as it falls with so vast a majority of my fellows, that I chose the better part and was found wanting in the strength to keep to it. Yes, I preferred the elderly and discontented doctor, surrounded by friends and cherishing honest hopes; and bade a resolute farewell to the liberty, the comparative youth, the light step, leaping impulses and secret pleasures, that I had enjoyed in the disguise of Hyde. I made this choice perhaps with some unconscious reservation, for I neither gave up the house in Soho, nor destroyed the clothes of Edward Hyde, which still lay ready in my cabinet. For two months, however, I was true to my determination; for two months, I led a life of such severity as I had never before attained to, and enjoyed the compensations of an approving conscience. But time began at last to obliterate the freshness of my alarm; the praises of conscience began to grow into a thing of course; I began to be tortured with throes and longings, as of Hyde struggling after freedom; and at last, in an hour of moral weakness, I once again compounded and swallowed the transforming draught. I do not suppose that, when a drunkard reasons with himself upon his vice, he is once out of five hundred times affected by the dangers that he runs through his brutish, physical insensibility; neither had I, long as I had considered my position, made enough allowance for the complete moral insensibility and insensate readiness to evil, which were the leading characters of Edward Hyde. Yet it was by these that I was punished. My devil had been long caged, he came out roaring. I was conscious, even when I took the draught, of a more unbridled, a more furious propensity to ill. It must have been this, I suppose, that stirred in my soul that tempest of impatience with which I listened to the civilities of my unhappy victim; I declare, at least, before God, no man morally sane could have been guilty of that crime upon so pitiful a provocation; and that I struck in no more reasonable spirit than that in which a sick child may break a plaything. But I had voluntarily stripped myself of all those balancing instincts by which even the worst of us continues to walk with some degree of steadiness among temptations; and in my case, to be tempted, however slightly, was to fall. Instantly the spirit of hell awoke in me and raged. With a transport of glee, I mauled the unresisting body, tasting delight from every blow; and it was not till weariness had begun to succeed, that I was suddenly, in the top fit of my delirium, struck through the heart by a cold thrill of terror. A mist dispersed; I saw my life to be forfeit; and fled from the scene of these excesses, at once glorying and trembling, my lust of evil gratified and stimulated, my love of life screwed to the topmost peg. I ran to the house in Soho, and (to make assurance doubly sure) destroyed my papers; thence I set out through the lamplit streets, in the same divided ecstasy of mind, gloating on my crime, light-headedly devising others in the future, and yet still hastening and still hearkening in my wake for the steps of the avenger. Hyde had a song upon his lips as he compounded the draught, and as he drank it, pledged the dead man. The pangs of transformation had not done tearing him, before Henry Jekyll, with streaming tears of gratitude and remorse, had fallen upon his knees and lifted his clasped hands to God. The veil of self-indulgence was rent from head to foot. I saw my life as a whole: I followed it up from the days of childhood, when I had walked with my father’s hand, and through the self-denying toils of my professional life, to arrive again and again, with the same sense of unreality, at the damned horrors of the evening. I could have screamed aloud; I sought with tears and prayers to smother down the crowd of hideous images and sounds with which my memory swarmed against me; and still, between the petitions, the ugly face of my iniquity stared into my soul. As the acuteness of this remorse began to die away, it was succeeded by a sense of joy. The problem of my conduct was solved. Hyde was thenceforth impossible; whether I would or not, I was now confined to the better part of my existence; and O, how I rejoiced to think of it! with what willing humility I embraced anew the restrictions of natural life! with what sincere renunciation I locked the door by which I had so often gone and come, and ground the key under my heel! The next day, came the news that the murder had not been overlooked, that the guilt of Hyde was patent to the world, and that the victim was a man high in public estimation. It was not only a crime, it had been a tragic folly. I think I was glad to know it; I think I was glad to have my better impulses thus buttressed and guarded by the terrors of the scaffold. Jekyll was now my city of refuge; let but Hyde peep out an instant, and the hands of all men would be raised to take and slay him. I resolved in my future conduct to redeem the past; and I can say with honesty that my resolve was fruitful of some good. You know yourself how earnestly, in the last months of the last year, I laboured to relieve suffering; you know that much was done for others, and that the days passed quietly, almost happily for myself. Nor can I truly say that I wearied of this beneficent and innocent life; I think instead that I daily enjoyed it more completely; but I was still cursed with my duality of purpose; and as the first edge of my penitence wore off, the lower side of me, so long indulged, so recently chained down, began to growl for licence. Not that I dreamed of resuscitating Hyde; the bare idea of that would startle me to frenzy: no, it was in my own person that I was once more tempted to trifle with my conscience; and it was as an ordinary secret sinner that I at last fell before the assaults of temptation. There comes an end to all things; the most capacious measure is filled at last; and this brief condescension to my evil finally destroyed the balance of my soul. And yet I was not alarmed; the fall seemed natural, like a return to the old days before I had made my discovery. It was a fine, clear, January day, wet under foot where the frost had melted, but cloudless overhead; and the Regent’s Park was full of winter chirrupings and sweet with spring odours. I sat in the sun on a bench; the animal within me licking the chops of memory; the spiritual side a little drowsed, promising subsequent penitence, but not yet moved to begin. After all, I reflected, I was like my neighbours; and then I smiled, comparing myself with other men, comparing my active good-will with the lazy cruelty of their neglect. And at the very moment of that vainglorious thought, a qualm came over me, a horrid nausea and the most deadly shuddering. These passed away, and left me faint; and then as in its turn faintness subsided, I began to be aware of a change in the temper of my thoughts, a greater boldness, a contempt of danger, a solution of the bonds of obligation. I looked down; my clothes hung formlessly on my shrunken limbs; the hand that lay on my knee was corded and hairy. I was once more Edward Hyde. A moment before I had been safe of all men’s respect, wealthy, beloved—the cloth laying for me in the dining-room at home; and now I was the common quarry of mankind, hunted, houseless, a known murderer, thrall to the gallows. My reason wavered, but it did not fail me utterly. I have more than once observed that in my second character, my faculties seemed sharpened to a point and my spirits more tensely elastic; thus it came about that, where Jekyll perhaps might have succumbed, Hyde rose to the importance of the moment. My drugs were in one of the presses of my cabinet; how was I to reach them? That was the problem that (crushing my temples in my hands) I set myself to solve. The laboratory door I had closed. If I sought to enter by the house, my own servants would consign me to the gallows. I saw I must employ another hand, and thought of Lanyon. How was he to be reached? how persuaded? Supposing that I escaped capture in the streets, how was I to make my way into his presence? and how should I, an unknown and displeasing visitor, prevail on the famous physician to rifle the study of his colleague, Dr. Jekyll? Then I remembered that of my original character, one part remained to me: I could write my own hand; and once I had conceived that kindling spark, the way that I must follow became lighted up from end to end. Thereupon, I arranged my clothes as best I could, and summoning a passing hansom, drove to an hotel in Portland Street, the name of which I chanced to remember. At my appearance (which was indeed comical enough, however tragic a fate these garments covered) the driver could not conceal his mirth. I gnashed my teeth upon him with a gust of devilish fury; and the smile withered from his face—happily for him—yet more happily for myself, for in another instant I had certainly dragged him from his perch. At the inn, as I entered, I looked about me with so black a countenance as made the attendants tremble; not a look did they exchange in my presence; but obsequiously took my orders, led me to a private room, and brought me wherewithal to write. Hyde in danger of his life was a creature new to me; shaken with inordinate anger, strung to the pitch of murder, lusting to inflict pain. Yet the creature was astute; mastered his fury with a great effort of the will; composed his two important letters, one to Lanyon and one to Poole; and that he might receive actual evidence of their being posted, sent them out with directions that they should be registered. Thenceforward, he sat all day over the fire in the private room, gnawing his nails; there he dined, sitting alone with his fears, the waiter visibly quailing before his eye; and thence, when the night was fully come, he set forth in the corner of a closed cab, and was driven to and fro about the streets of the city. He, I say—I cannot say, I. That child of Hell had nothing human; nothing lived in him but fear and hatred. And when at last, thinking the driver had begun to grow suspicious, he discharged the cab and ventured on foot, attired in his misfitting clothes, an object marked out for observation, into the midst of the nocturnal passengers, these two base passions raged within him like a tempest. He walked fast, hunted by his fears, chattering to himself, skulking through the less frequented thoroughfares, counting the minutes that still divided him from midnight. Once a woman spoke to him, offering, I think, a box of lights. He smote her in the face, and she fled. When I came to myself at Lanyon’s, the horror of my old friend perhaps affected me somewhat: I do not know; it was at least but a drop in the sea to the abhorrence with which I looked back upon these hours. A change had come over me. It was no longer the fear of the gallows, it was the horror of being Hyde that racked me. I received Lanyon’s condemnation partly in a dream; it was partly in a dream that I came home to my own house and got into bed. I slept after the prostration of the day, with a stringent and profound slumber which not even the nightmares that wrung me could avail to break. I awoke in the morning shaken, weakened, but refreshed. I still hated and feared the thought of the brute that slept within me, and I had not of course forgotten the appalling dangers of the day before; but I was once more at home, in my own house and close to my drugs; and gratitude for my escape shone so strong in my soul that it almost rivalled the brightness of hope. I was stepping leisurely across the court after breakfast, drinking the chill of the air with pleasure, when I was seized again with those indescribable sensations that heralded the change; and I had but the time to gain the shelter of my cabinet, before I was once again raging and freezing with the passions of Hyde. It took on this occasion a double dose to recall me to myself; and alas! six hours after, as I sat looking sadly in the fire, the pangs returned, and the drug had to be re-administered. In short, from that day forth it seemed only by a great effort as of gymnastics, and only under the immediate stimulation of the drug, that I was able to wear the countenance of Jekyll. At all hours of the day and night, I would be taken with the premonitory shudder; above all, if I slept, or even dozed for a moment in my chair, it was always as Hyde that I awakened. Under the strain of this continually impending doom and by the sleeplessness to which I now condemned myself, ay, even beyond what I had thought possible to man, I became, in my own person, a creature eaten up and emptied by fever, languidly weak both in body and mind, and solely occupied by one thought: the horror of my other self. But when I slept, or when the virtue of the medicine wore off, I would leap almost without transition (for the pangs of transformation grew daily less marked) into the possession of a fancy brimming with images of terror, a soul boiling with causeless hatreds, and a body that seemed not strong enough to contain the raging energies of life. The powers of Hyde seemed to have grown with the sickliness of Jekyll. And certainly the hate that now divided them was equal on each side. With Jekyll, it was a thing of vital instinct. He had now seen the full deformity of that creature that shared with him some of the phenomena of consciousness, and was co-heir with him to death: and beyond these links of community, which in themselves made the most poignant part of his distress, he thought of Hyde, for all his energy of life, as of something not only hellish but inorganic. This was the shocking thing; that the slime of the pit seemed to utter cries and voices; that the amorphous dust gesticulated and sinned; that what was dead, and had no shape, should usurp the offices of life. And this again, that that insurgent horror was knit to him closer than a wife, closer than an eye; lay caged in his flesh, where he heard it mutter and felt it struggle to be born; and at every hour of weakness, and in the confidence of slumber, prevailed against him, and deposed him out of life. The hatred of Hyde for Jekyll was of a different order. His terror of the gallows drove him continually to commit temporary suicide, and return to his subordinate station of a part instead of a person; but he loathed the necessity, he loathed the despondency into which Jekyll was now fallen, and he resented the dislike with which he was himself regarded. Hence the ape-like tricks that he would play me, scrawling in my own hand blasphemies on the pages of my books, burning the letters and destroying the portrait of my father; and indeed, had it not been for his fear of death, he would long ago have ruined himself in order to involve me in the ruin. But his love of life is wonderful; I go further: I, who sicken and freeze at the mere thought of him, when I recall the abjection and passion of this attachment, and when I know how he fears my power to cut him off by suicide, I find it in my heart to pity him. It is useless, and the time awfully fails me, to prolong this description; no one has ever suffered such torments, let that suffice; and yet even to these, habit brought—no, not alleviation—but a certain callousness of soul, a certain acquiescence of despair; and my punishment might have gone on for years, but for the last calamity which has now fallen, and which has finally severed me from my own face and nature. My provision of the salt, which had never been renewed since the date of the first experiment, began to run low. I sent out for a fresh supply and mixed the draught; the ebullition followed, and the first change of colour, not the second; I drank it and it was without efficiency. You will learn from Poole how I have had London ransacked; it was in vain; and I am now persuaded that my first supply was impure, and that it was that unknown impurity which lent efficacy to the draught. About a week has passed, and I am now finishing this statement under the influence of the last of the old powders. This, then, is the last time, short of a miracle, that Henry Jekyll can think his own thoughts or see his own face (now how sadly altered!) in the glass. Nor must I delay too long to bring my writing to an end; for if my narrative has hitherto escaped destruction, it has been by a combination of great prudence and great good luck. Should the throes of change take me in the act of writing it, Hyde will tear it in pieces; but if some time shall have elapsed after I have laid it by, his wonderful selfishness and circumscription to the moment will probably save it once again from the action of his ape-like spite. And indeed the doom that is closing on us both has already changed and crushed him. Half an hour from now, when I shall again and forever reindue that hated personality, I know how I shall sit shuddering and weeping in my chair, or continue, with the most strained and fearstruck ecstasy of listening, to pace up and down this room (my last earthly refuge) and give ear to every sound of menace. Will Hyde die upon the scaffold? or will he find courage to release himself at the last moment? God knows; I am careless; this is my true hour of death, and what is to follow concerns another than myself. Here then, as I lay down the pen and proceed to seal up my confession, I bring the life of that unhappy Henry Jekyll to an end. The similarity between Hyde's and Jekyll's handwriting, noticed by Mr. Guest when examining Hyde's letter, is a result of Jekyll's dual nature as revealed in his account of developing a theory about man's dual nature and creating a potion to transform into Hyde. Based on the context provided, is the above statement TRUE or FALSE? First provide an explanation of your decision-making process, and then provide your final answer. Use the following format: YOUR EXPLANATION YOUR ANSWER<|eot_id|><|start_header_id|>assistant<|end_header_id|> Here are the relevant details from the text: - Chapter 5: Mr. Utterson takes the letter home and discusses it with his head clerk, Mr. Guest, who notices similarities between Hyde's handwriting and Jekyll's. - Chapter 10: Jekyll describes his early life, his reputation, and the development of his theory about the dual nature of man. - Relationship: Evidence of Jekyll and Hyde's connection In Chapter 5, Mr. Guest notices the similarity between Hyde's and Jekyll's handwriting when examining a letter. Chapter 10 reveals that Jekyll developed a theory about the dual nature of man and created a potion to transform into Hyde. This claim connects the observed handwriting similarity with the truth about Jekyll and Hyde being the same person, providing physical evidence of their connection. True<|eot_id|><|end_of_text|><|begin_of_text|><|start_header_id|>system<|end_header_id|> You are an expert at verifying claims from fictional narratives.<|eot_id|><|start_header_id|>user<|end_header_id|> You are provided with a context and a statement. Your task is to carefully read the context and then determine whether the statement is TRUE or FALSE. Answer TRUE if the statement is true in its entirety based on the context provided. Answer FALSE if any part of the statement is false based on the context provided. DR. SHEPPARD AT THE BREAKFAST TABLE Mrs. Ferrars died on the night of the 16th–17th September—a Thursday. I was sent for at eight o’clock on the morning of Friday the 17th. There was nothing to be done. She had been dead some hours. It was just a few minutes after nine when I reached home once more. I opened the front door with my latch-key, and purposely delayed a few moments in the hall, hanging up my hat and the light overcoat that I had deemed a wise precaution against the chill of an early autumn morning. To tell the truth, I was considerably upset and worried. I am not going to pretend that at that moment I foresaw the events of the next few weeks. I emphatically did not do so. But my instinct told me that there were stirring times ahead. From the dining-room on my left there came the rattle of tea-cups and the short, dry cough of my sister Caroline. “Is that you, James?” she called. An unnecessary question, since who else could it be? To tell the truth, it was precisely my sister Caroline who was the cause of my few minutes’ delay. The motto of the mongoose family, so Mr. Kipling tells us, is: “Go and find out.” If Caroline ever adopts a crest, I should certainly suggest a mongoose rampant. One might omit the first part of the motto. Caroline can do any amount of finding out by sitting placidly at home. I don’t know how she manages it, but there it is. I suspect that the servants and the tradesmen constitute her Intelligence Corps. When she goes out, it is not to gather in information, but to spread it. At that, too, she is amazingly expert. It was really this last named trait of hers which was causing me these pangs of indecision. Whatever I told Caroline now concerning the demise of Mrs. Ferrars would be common knowledge all over the village within the space of an hour and a half. As a professional man, I naturally aim at discretion. Therefore I have got into the habit of continually withholding all information possible from my sister. She usually finds out just the same, but I have the moral satisfaction of knowing that I am in no way to blame. Mrs. Ferrars’ husband died just over a year ago, and Caroline has constantly asserted, without the least foundation for the assertion, that his wife poisoned him. She scorns my invariable rejoinder that Mr. Ferrars died of acute gastritis, helped on by habitual over-indulgence in alcoholic beverages. The symptoms of gastritis and arsenical poisoning are not, I agree, unlike, but Caroline bases her accusation on quite different lines. “You’ve only got to look at her,” I have heard her say. Mrs. Ferrars, though not in her first youth, was a very attractive woman, and her clothes, though simple, always seemed to fit her very well, but all the same, lots of women buy their clothes in Paris and have not, on that account, necessarily poisoned their husbands. As I stood hesitating in the hall, with all this passing through my mind, Caroline’s voice came again, with a sharper note in it. “What on earth are you doing out there, James? Why don’t you come and get your breakfast?” “Just coming, my dear,” I said hastily. “I’ve been hanging up my overcoat.” “You could have hung up half a dozen overcoats in this time.” She was quite right. I could have. I walked into the dining-room, gave Caroline the accustomed peck on the cheek, and sat down to eggs and bacon. The bacon was rather cold. “You’ve had an early call,” remarked Caroline. “Yes,” I said. “King’s Paddock. Mrs. Ferrars.” “I know,” said my sister. “How did you know?” “Annie told me.” Annie is the house parlormaid. A nice girl, but an inveterate talker. There was a pause. I continued to eat eggs and bacon. My sister’s nose, which is long and thin, quivered a little at the tip, as it always does when she is interested or excited over anything. “Well?” she demanded. “A bad business. Nothing to be done. Must have died in her sleep.” “I know,” said my sister again. This time I was annoyed. “You can’t know,” I snapped. “I didn’t know myself until I got there, and I haven’t mentioned it to a soul yet. If that girl Annie knows, she must be a clairvoyant.” “It wasn’t Annie who told me. It was the milkman. He had it from the Ferrars’ cook.” As I say, there is no need for Caroline to go out to get information. She sits at home, and it comes to her. My sister continued: “What did she die of? Heart failure?” “Didn’t the milkman tell you that?” I inquired sarcastically. Sarcasm is wasted on Caroline. She takes it seriously and answers accordingly. “He didn’t know,” she explained. After all, Caroline was bound to hear sooner or later. She might as well hear from me. “She died of an overdose of veronal. She’s been taking it lately for sleeplessness. Must have taken too much.” “Nonsense,” said Caroline immediately. “She took it on purpose. Don’t tell me!” It is odd how, when you have a secret belief of your own which you do not wish to acknowledge, the voicing of it by some one else will rouse you to a fury of denial. I burst immediately into indignant speech. “There you go again,” I said. “Rushing along without rhyme or reason. Why on earth should Mrs. Ferrars wish to commit suicide? A widow, fairly young still, very well off, good health, and nothing to do but enjoy life. It’s absurd.” “Not at all. Even you must have noticed how different she has been looking lately. It’s been coming on for the last six months. She’s looked positively hag-ridden. And you have just admitted that she hasn’t been able to sleep.” “What is your diagnosis?” I demanded coldly. “An unfortunate love affair, I suppose?” My sister shook her head. “_Remorse_,” she said, with great gusto. “Remorse?” “Yes. You never would believe me when I told you she poisoned her husband. I’m more than ever convinced of it now.” “I don’t think you’re very logical,” I objected. “Surely if a woman committed a crime like murder, she’d be sufficiently cold-blooded to enjoy the fruits of it without any weak-minded sentimentality such as repentance.” Caroline shook her head. “There probably are women like that—but Mrs. Ferrars wasn’t one of them. She was a mass of nerves. An overmastering impulse drove her on to get rid of her husband because she was the sort of person who simply can’t endure suffering of any kind, and there’s no doubt that the wife of a man like Ashley Ferrars must have had to suffer a good deal——” I nodded. “And ever since she’s been haunted by what she did. I can’t help feeling sorry for her.” I don’t think Caroline ever felt sorry for Mrs. Ferrars whilst she was alive. Now that she has gone where (presumably) Paris frocks can no longer be worn, Caroline is prepared to indulge in the softer emotions of pity and comprehension. I told her firmly that her whole idea was nonsense. I was all the more firm because I secretly agreed with some part, at least, of what she had said. But it is all wrong that Caroline should arrive at the truth simply by a kind of inspired guesswork. I wasn’t going to encourage that sort of thing. She will go round the village airing her views, and every one will think that she is doing so on medical data supplied by me. Life is very trying. “Nonsense,” said Caroline, in reply to my strictures. “You’ll see. Ten to one she’s left a letter confessing everything.” “She didn’t leave a letter of any kind,” I said sharply, and not seeing where the admission was going to land me. “Oh!” said Caroline. “So you _did_ inquire about that, did you? I believe, James, that in your heart of hearts, you think very much as I do. You’re a precious old humbug.” “One always has to take the possibility of suicide into consideration,” I said repressively. “Will there be an inquest?” “There may be. It all depends. If I am able to declare myself absolutely satisfied that the overdose was taken accidentally, an inquest might be dispensed with.” “And are you absolutely satisfied?” asked my sister shrewdly. I did not answer, but got up from table. WHO’S WHO IN KING’S ABBOT Before I proceed further with what I said to Caroline and what Caroline said to me, it might be as well to give some idea of what I should describe as our local geography. Our village, King’s Abbot, is, I imagine, very much like any other village. Our big town is Cranchester, nine miles away. We have a large railway station, a small post office, and two rival “General Stores.” Able-bodied men are apt to leave the place early in life, but we are rich in unmarried ladies and retired military officers. Our hobbies and recreations can be summed up in the one word, “gossip.” There are only two houses of any importance in King’s Abbot. One is King’s Paddock, left to Mrs. Ferrars by her late husband. The other, Fernly Park, is owned by Roger Ackroyd. Ackroyd has always interested me by being a man more impossibly like a country squire than any country squire could really be. He reminds one of the red-faced sportsmen who always appeared early in the first act of an old-fashioned musical comedy, the setting being the village green. They usually sang a song about going up to London. Nowadays we have revues, and the country squire has died out of musical fashion. Of course, Ackroyd is not really a country squire. He is an immensely successful manufacturer of (I think) wagon wheels. He is a man of nearly fifty years of age, rubicund of face and genial of manner. He is hand and glove with the vicar, subscribes liberally to parish funds (though rumor has it that he is extremely mean in personal expenditure), encourages cricket matches, Lads’ Clubs, and Disabled Soldiers’ Institutes. He is, in fact, the life and soul of our peaceful village of King’s Abbot. Now when Roger Ackroyd was a lad of twenty-one, he fell in love with, and married, a beautiful woman some five or six years his senior. Her name was Paton, and she was a widow with one child. The history of the marriage was short and painful. To put it bluntly, Mrs. Ackroyd was a dipsomaniac. She succeeded in drinking herself into her grave four years after her marriage. In the years that followed, Ackroyd showed no disposition to make a second matrimonial adventure. His wife’s child by her first marriage was only seven years old when his mother died. He is now twenty-five. Ackroyd has always regarded him as his own son, and has brought him up accordingly, but he has been a wild lad and a continual source of worry and trouble to his stepfather. Nevertheless we are all very fond of Ralph Paton in King’s Abbot. He is such a good-looking youngster for one thing. As I said before, we are ready enough to gossip in our village. Everybody noticed from the first that Ackroyd and Mrs. Ferrars got on very well together. After her husband’s death, the intimacy became more marked. They were always seen about together, and it was freely conjectured that at the end of her period of mourning, Mrs. Ferrars would become Mrs. Roger Ackroyd. It was felt, indeed, that there was a certain fitness in the thing. Roger Ackroyd’s wife had admittedly died of drink. Ashley Ferrars had been a drunkard for many years before his death. It was only fitting that these two victims of alcoholic excess should make up to each other for all that they had previously endured at the hands of their former spouses. The Ferrars only came to live here just over a year ago, but a halo of gossip has surrounded Ackroyd for many years past. All the time that Ralph Paton was growing up to manhood, a series of lady housekeepers presided over Ackroyd’s establishment, and each in turn was regarded with lively suspicion by Caroline and her cronies. It is not too much to say that for at least fifteen years the whole village has confidently expected Ackroyd to marry one of his housekeepers. The last of them, a redoubtable lady called Miss Russell, has reigned undisputed for five years, twice as long as any of her predecessors. It is felt that but for the advent of Mrs. Ferrars, Ackroyd could hardly have escaped. That—and one other factor—the unexpected arrival of a widowed sister-in-law with her daughter from Canada. Mrs. Cecil Ackroyd, widow of Ackroyd’s ne’er-do-well younger brother, has taken up her residence at Fernly Park, and has succeeded, according to Caroline, in putting Miss Russell in her proper place. I don’t know exactly what a “proper place” constitutes—it sounds chilly and unpleasant—but I know that Miss Russell goes about with pinched lips, and what I can only describe as an acid smile, and that she professes the utmost sympathy for “poor Mrs. Ackroyd—dependent on the charity of her husband’s brother. The bread of charity is so bitter, is it not? _I_ should be quite miserable if I did not work for my living.” I don’t know what Mrs. Cecil Ackroyd thought of the Ferrars affair when it came on the tapis. It was clearly to her advantage that Ackroyd should remain unmarried. She was always very charming—not to say gushing—to Mrs. Ferrars when they met. Caroline says that proves less than nothing. Such have been our preoccupations in King’s Abbot for the last few years. We have discussed Ackroyd and his affairs from every standpoint. Mrs. Ferrars has fitted into her place in the scheme. Now there has been a rearrangement of the kaleidoscope. From a mild discussion of probable wedding presents, we have been jerked into the midst of tragedy. Revolving these and sundry other matters in my mind, I went mechanically on my round. I had no cases of special interest to attend, which was, perhaps, as well, for my thoughts returned again and again to the mystery of Mrs. Ferrars’s death. Had she taken her own life? Surely, if she had done so, she would have left some word behind to say what she contemplated doing? Women, in my experience, if they once reach the determination to commit suicide, usually wish to reveal the state of mind that led to the fatal action. They covet the limelight. When had I last seen her? Not for over a week. Her manner then had been normal enough considering—well—considering everything. Then I suddenly remembered that I had seen her, though not to speak to, only yesterday. She had been walking with Ralph Paton, and I had been surprised because I had had no idea that he was likely to be in King’s Abbot. I thought, indeed, that he had quarreled finally with his stepfather. Nothing had been seen of him down here for nearly six months. They had been walking along, side by side, their heads close together, and she had been talking very earnestly. I think I can safely say that it was at this moment that a foreboding of the future first swept over me. Nothing tangible as yet—but a vague premonition of the way things were setting. That earnest _tête-à-tête_ between Ralph Paton and Mrs. Ferrars the day before struck me disagreeably. I was still thinking of it when I came face to face with Roger Ackroyd. “Sheppard!” he exclaimed. “Just the man I wanted to get hold of. This is a terrible business.” “You’ve heard then?” He nodded. He had felt the blow keenly, I could see. His big red cheeks seemed to have fallen in, and he looked a positive wreck of his usual jolly, healthy self. “It’s worse than you know,” he said quietly. “Look here, Sheppard, I’ve got to talk to you. Can you come back with me now?” “Hardly. I’ve got three patients to see still, and I must be back by twelve to see my surgery patients.” “Then this afternoon—no, better still, dine to-night. At 7.30? Will that suit you?” “Yes—I can manage that all right. What’s wrong? Is it Ralph?” I hardly knew why I said that—except, perhaps, that it had so often been Ralph. Ackroyd stared blankly at me as though he hardly understood. I began to realize that there must be something very wrong indeed somewhere. I had never seen Ackroyd so upset before. “Ralph?” he said vaguely. “Oh! no, it’s not Ralph. Ralph’s in London——Damn! Here’s old Miss Ganett coming. I don’t want to have to talk to her about this ghastly business. See you to-night, Sheppard. Seven-thirty.” I nodded, and he hurried away, leaving me wondering. Ralph in London? But he had certainly been in King’s Abbot the preceding afternoon. He must have gone back to town last night or early this morning, and yet Ackroyd’s manner had conveyed quite a different impression. He had spoken as though Ralph had not been near the place for months. I had no time to puzzle the matter out further. Miss Ganett was upon me, thirsting for information. Miss Ganett has all the characteristics of my sister Caroline, but she lacks that unerring aim in jumping to conclusions which lends a touch of greatness to Caroline’s maneuvers. Miss Ganett was breathless and interrogatory. Wasn’t it sad about poor dear Mrs. Ferrars? A lot of people were saying she had been a confirmed drug-taker for years. So wicked the way people went about saying things. And yet, the worst of it was, there was usually a grain of truth somewhere in these wild statements. No smoke without fire! They were saying too that Mr. Ackroyd had found out about it, and had broken off the engagement—because there _was_ an engagement. She, Miss Ganett, had proof positive of that. Of course _I_ must know all about it—doctors always did—but they never tell? And all this with a sharp beady eye on me to see how I reacted to these suggestions. Fortunately long association with Caroline has led me to preserve an impassive countenance, and to be ready with small non-committal remarks. On this occasion I congratulated Miss Ganett on not joining in ill-natured gossip. Rather a neat counterattack, I thought. It left her in difficulties, and before she could pull herself together, I had passed on. I went home thoughtful, to find several patients waiting for me in the surgery. I had dismissed the last of them, as I thought, and was just contemplating a few minutes in the garden before lunch when I perceived one more patient waiting for me. She rose and came towards me as I stood somewhat surprised. I don’t know why I should have been, except that there is a suggestion of cast iron about Miss Russell, a something that is above the ills of the flesh. Ackroyd’s housekeeper is a tall woman, handsome but forbidding in appearance. She has a stern eye, and lips that shut tightly, and I feel that if I were an under housemaid or a kitchenmaid I should run for my life whenever I heard her coming. “Good morning, Dr. Sheppard,” said Miss Russell. “I should be much obliged if you would take a look at my knee.” I took a look, but, truth to tell, I was very little wiser when I had done so. Miss Russell’s account of vague pains was so unconvincing that with a woman of less integrity of character I should have suspected a trumped-up tale. It did cross my mind for one moment that Miss Russell might have deliberately invented this affection of the knee in order to pump me on the subject of Mrs. Ferrars’s death, but I soon saw that there, at least, I had misjudged her. She made a brief reference to the tragedy, nothing more. Yet she certainly seemed disposed to linger and chat. “Well, thank you very much for this bottle of liniment, doctor,” she said at last. “Not that I believe it will do the least good.” I didn’t think it would either, but I protested in duty bound. After all, it couldn’t do any harm, and one must stick up for the tools of one’s trade. “I don’t believe in all these drugs,” said Miss Russell, her eyes sweeping over my array of bottles disparagingly. “Drugs do a lot of harm. Look at the cocaine habit.” “Well, as far as that goes——” “It’s very prevalent in high society.” I’m sure Miss Russell knows far more about high society than I do. I didn’t attempt to argue with her. “Just tell me this, doctor,” said Miss Russell. “Suppose you are really a slave of the drug habit. Is there any cure?” One cannot answer a question like that offhand. I gave her a short lecture on the subject, and she listened with close attention. I still suspected her of seeking information about Mrs. Ferrars. “Now, veronal, for instance——” I proceeded. But, strangely enough, she didn’t seem interested in veronal. Instead she changed the subject, and asked me if it was true that there were certain poisons so rare as to baffle detection. “Ah!” I said. “You’ve been reading detective stories.” She admitted that she had. “The essence of a detective story,” I said, “is to have a rare poison—if possible something from South America, that nobody has ever heard of—something that one obscure tribe of savages use to poison their arrows with. Death is instantaneous, and Western science is powerless to detect it. That is the kind of thing you mean?” “Yes. Is there really such a thing?” I shook my head regretfully. “I’m afraid there isn’t. There’s _curare_, of course.” I told her a good deal about curare, but she seemed to have lost interest once more. She asked me if I had any in my poison cupboard, and when I replied in the negative I fancy I fell in her estimation. She said she must be getting back, and I saw her out at the surgery door just as the luncheon gong went. I should never have suspected Miss Russell of a fondness for detective stories. It pleases me very much to think of her stepping out of the housekeeper’s room to rebuke a delinquent housemaid, and then returning to a comfortable perusal of _The Mystery of the Seventh Death_, or something of the kind. THE MAN WHO GREW VEGETABLE MARROWS I told Caroline at lunch time that I should be dining at Fernly. She expressed no objection—on the contrary—— “Excellent,” she said. “You’ll hear all about it. By the way, what is the trouble with Ralph?” “With Ralph?” I said, surprised; “there’s isn’t any.” “Then why is he staying at the Three Boars instead of at Fernly Park?” I did not for a minute question Caroline’s statement that Ralph Paton was staying at the local inn. That Caroline said so was enough for me. “Ackroyd told me he was in London,” I said. In the surprise of the moment I departed from my valuable rule of never parting with information. “Oh!” said Caroline. I could see her nose twitching as she worked on this. “He arrived at the Three Boars yesterday morning,” she said. “And he’s still there. Last night he was out with a girl.” That did not surprise me in the least. Ralph, I should say, is out with a girl most nights of his life. But I did rather wonder that he chose to indulge in the pastime in King’s Abbot instead of in the gay metropolis. “One of the barmaids?” I asked. “No. That’s just it. He went out to meet her. I don’t know who she is.” (Bitter for Caroline to have to admit such a thing.) “But I can guess,” continued my indefatigable sister. I waited patiently. “His cousin.” “Flora Ackroyd?” I exclaimed in surprise. Flora Ackroyd is, of course, no relation whatever really to Ralph Paton, but Ralph has been looked upon for so long as practically Ackroyd’s own son, that cousinship is taken for granted. “Flora Ackroyd,” said my sister. “But why not go to Fernly if he wanted to see her?” “Secretly engaged,” said Caroline, with immense enjoyment. “Old Ackroyd won’t hear of it, and they have to meet this way.” I saw a good many flaws in Caroline’s theory, but I forbore to point them out to her. An innocent remark about our new neighbor created a diversion. The house next door, The Larches, has recently been taken by a stranger. To Caroline’s extreme annoyance, she has not been able to find out anything about him, except that he is a foreigner. The Intelligence Corps has proved a broken reed. Presumably the man has milk and vegetables and joints of meat and occasional whitings just like everybody else, but none of the people who make it their business to supply these things seem to have acquired any information. His name, apparently, is Mr. Porrott—a name which conveys an odd feeling of unreality. The one thing we do know about him is that he is interested in the growing of vegetable marrows. But that is certainly not the sort of information that Caroline is after. She wants to know where he comes from, what he does, whether he is married, what his wife was, or is, like, whether he has children, what his mother’s maiden name was—and so on. Somebody very like Caroline must have invented the questions on passports, I think. “My dear Caroline,” I said. “There’s no doubt at all about what the man’s profession has been. He’s a retired hairdresser. Look at that mustache of his.” Caroline dissented. She said that if the man was a hairdresser, he would have wavy hair—not straight. All hairdressers did. I cited several hairdressers personally known to me who had straight hair, but Caroline refused to be convinced. “I can’t make him out at all,” she said in an aggrieved voice. “I borrowed some garden tools the other day, and he was most polite, but I couldn’t get anything out of him. I asked him point blank at last whether he was a Frenchman, and he said he wasn’t—and somehow I didn’t like to ask him any more.” I began to be more interested in our mysterious neighbor. A man who is capable of shutting up Caroline and sending her, like the Queen of Sheba, empty away must be something of a personality. “I believe,” said Caroline, “that he’s got one of those new vacuum cleaners——” I saw a meditated loan and the opportunity of further questioning gleaming from her eye. I seized the chance to escape into the garden. I am rather fond of gardening. I was busily exterminating dandelion roots when a shout of warning sounded from close by and a heavy body whizzed by my ear and fell at my feet with a repellant squelch. It was a vegetable marrow! I looked up angrily. Over the wall, to my left, there appeared a face. An egg-shaped head, partially covered with suspiciously black hair, two immense mustaches, and a pair of watchful eyes. It was our mysterious neighbor, Mr. Porrott. He broke at once into fluent apologies. “I demand of you a thousand pardons, monsieur. I am without defense. For some months now I cultivate the marrows. This morning suddenly I enrage myself with these marrows. I send them to promenade themselves—alas! not only mentally but physically. I seize the biggest. I hurl him over the wall. Monsieur, I am ashamed. I prostrate myself.” Before such profuse apologies, my anger was forced to melt. After all, the wretched vegetable hadn’t hit me. But I sincerely hoped that throwing large vegetables over walls was not our new friend’s hobby. Such a habit could hardly endear him to us as a neighbor. The strange little man seemed to read my thoughts. “Ah! no,” he exclaimed. “Do not disquiet yourself. It is not with me a habit. But can you figure to yourself, monsieur, that a man may work towards a certain object, may labor and toil to attain a certain kind of leisure and occupation, and then find that, after all, he yearns for the old busy days, and the old occupations that he thought himself so glad to leave?” “Yes,” I said slowly. “I fancy that that is a common enough occurrence. I myself am perhaps an instance. A year ago I came into a legacy—enough to enable me to realize a dream. I have always wanted to travel, to see the world. Well, that was a year ago, as I said, and—I am still here.” My little neighbor nodded. “The chains of habit. We work to attain an object, and the object gained, we find that what we miss is the daily toil. And mark you, monsieur, my work was interesting work. The most interesting work there is in the world.” “Yes?” I said encouragingly. For the moment the spirit of Caroline was strong within me. “The study of human nature, monsieur!” “Just so,” I said kindly. Clearly a retired hairdresser. Who knows the secrets of human nature better than a hairdresser? “Also, I had a friend—a friend who for many years never left my side. Occasionally of an imbecility to make one afraid, nevertheless he was very dear to me. Figure to yourself that I miss even his stupidity. His _naïveté_, his honest outlook, the pleasure of delighting and surprising him by my superior gifts—all these I miss more than I can tell you.” “He died?” I asked sympathetically. “Not so. He lives and flourishes—but on the other side of the world. He is now in the Argentine.” “In the Argentine,” I said enviously. I have always wanted to go to South America. I sighed, and then looked up to find Mr. Porrott eyeing me sympathetically. He seemed an understanding little man. “You will go there, yes?” he asked. I shook my head with a sigh. “I could have gone,” I said, “a year ago. But I was foolish—and worse than foolish—greedy. I risked the substance for the shadow.” “I comprehend,” said Mr. Porrott. “You speculated?” I nodded mournfully, but in spite of myself I felt secretly entertained. This ridiculous little man was so portentously solemn. “Not the Porcupine Oilfields?” he asked suddenly. I stared. “I thought of them, as a matter of fact, but in the end I plumped for a gold mine in Western Australia.” My neighbor was regarding me with a strange expression which I could not fathom. “It is Fate,” he said at last. “What is Fate?” I asked irritably. “That I should live next to a man who seriously considers Porcupine Oilfields, and also West Australian Gold Mines. Tell me, have you also a penchant for auburn hair?” I stared at him open-mouthed, and he burst out laughing. “No, no, it is not the insanity that I suffer from. Make your mind easy. It was a foolish question that I put to you there, for, see you, my friend of whom I spoke was a young man, a man who thought all women good, and most of them beautiful. But you are a man of middle age, a doctor, a man who knows the folly and the vanity of most things in this life of ours. Well, well, we are neighbors. I beg of you to accept and present to your excellent sister my best marrow.” He stooped, and with a flourish produced an immense specimen of the tribe, which I duly accepted in the spirit in which it was offered. “Indeed,” said the little man cheerfully, “this has not been a wasted morning. I have made the acquaintance of a man who in some ways resembles my far-off friend. By the way, I should like to ask you a question. You doubtless know every one in this tiny village. Who is the young man with the very dark hair and eyes, and the handsome face. He walks with his head flung back, and an easy smile on his lips?” The description left me in no doubt. “That must be Captain Ralph Paton,” I said slowly. “I have not seen him about here before?” “No, he has not been here for some time. But he is the son—adopted son, rather—of Mr. Ackroyd of Fernly Park.” My neighbor made a slight gesture of impatience. “Of course, I should have guessed. Mr. Ackroyd spoke of him many times.” “You know Mr. Ackroyd?” I said, slightly surprised. “Mr. Ackroyd knew me in London—when I was at work there. I have asked him to say nothing of my profession down here.” “I see,” I said, rather amused by this patent snobbery, as I thought it. But the little man went on with an almost grandiloquent smirk. “One prefers to remain incognito. I am not anxious for notoriety. I have not even troubled to correct the local version of my name.” “Indeed,” I said, not knowing quite what to say. “Captain Ralph Paton,” mused Mr. Porrott. “And so he is engaged to Mr. Ackroyd’s niece, the charming Miss Flora.” “Who told you so?” I asked, very much surprised. “Mr. Ackroyd. About a week ago. He is very pleased about it—has long desired that such a thing should come to pass, or so I understood from him. I even believe that he brought some pressure to bear upon the young man. That is never wise. A young man should marry to please himself—not to please a stepfather from whom he has expectations.” My ideas were completely upset. I could not see Ackroyd taking a hairdresser into his confidence, and discussing the marriage of his niece and stepson with him. Ackroyd extends a genial patronage to the lower orders, but he has a very great sense of his own dignity. I began to think that Porrott couldn’t be a hairdresser after all. To hide my confusion, I said the first thing that came into my head. “What made you notice Ralph Paton? His good looks?” “No, not that alone—though he is unusually good-looking for an Englishman—what your lady novelists would call a Greek God. No, there was something about that young man that I did not understand.” He said the last sentence in a musing tone of voice which made an indefinable impression upon me. It was as though he was summing up the boy by the light of some inner knowledge that I did not share. It was that impression that was left with me, for at that moment my sister’s voice called me from the house. I went in. Caroline had her hat on, and had evidently just come in from the village. She began without preamble. “I met Mr. Ackroyd.” “Yes?” I said. “I stopped him, of course, but he seemed in a great hurry, and anxious to get away.” I have no doubt but that that was the case. He would feel towards Caroline much as he had felt towards Miss Ganett earlier in the day—perhaps more so. Caroline is less easy to shake off. “I asked him at once about Ralph. He was absolutely astonished. Had no idea the boy was down here. He actually said he thought I must have made a mistake. I! A mistake!” “Ridiculous,” I said. “He ought to have known you better.” “Then he went on to tell me that Ralph and Flora are engaged.” “I know that too,” I interrupted, with modest pride. “Who told you?” “Our new neighbor.” Caroline visibly wavered for a second or two, much as a roulette ball might coyly hover between two numbers. Then she declined the tempting red herring. “I told Mr. Ackroyd that Ralph was staying at the Three Boars.” “Caroline,” I said, “do you never reflect that you might do a lot of harm with this habit of yours of repeating everything indiscriminately?” “Nonsense,” said my sister. “People ought to know things. I consider it my duty to tell them. Mr. Ackroyd was very grateful to me.” “Well?” I said, for there was clearly more to come. “I think he went straight off to the Three Boars, but if so he didn’t find Ralph there.” “No?” “No. Because as I was coming back through the wood——” “Coming back through the wood?” I interrupted. Caroline had the grace to blush. “It was such a lovely day,” she exclaimed. “I thought I would make a little round. The woods with their autumnal tints are so perfect at this time of year.” Caroline does not care a hang for woods at any time of year. Normally she regards them as places where you get your feet damp, and where all kinds of unpleasant things may drop on your head. No, it was good sound mongoose instinct which took her to our local wood. It is the only place adjacent to the village of King’s Abbot where you can talk with a young woman unseen by the whole of the village. It adjoins the Park of Fernly. “Well,” I said, “go on.” “As I say, I was just coming back through the wood when I heard voices.” Caroline paused. “Yes?” “One was Ralph Paton’s—I knew it at once. The other was a girl’s. Of course I didn’t mean to listen——” “Of course not,” I interjected, with patent sarcasm—which was, however, wasted on Caroline. “But I simply couldn’t help overhearing. The girl said something—I didn’t quite catch what it was, and Ralph answered. He sounded very angry. ‘My dear girl,’ he said. ‘Don’t you realize that it is quite on the cards the old man will cut me off with a shilling? He’s been pretty fed up with me for the last few years. A little more would do it. And we need the dibs, my dear. I shall be a very rich man when the old fellow pops off. He’s mean as they make ’em, but he’s rolling in money really. I don’t want him to go altering his will. You leave it to me, and don’t worry.’ Those were his exact words. I remember them perfectly. Unfortunately, just then I stepped on a dry twig or something, and they lowered their voices and moved away. I couldn’t, of course, go rushing after them, so wasn’t able to see who the girl was.” “That must have been most vexing,” I said. “I suppose, though, you hurried on to the Three Boars, felt faint, and went into the bar for a glass of brandy, and so were able to see if both the barmaids were on duty?” “It wasn’t a barmaid,” said Caroline unhesitatingly. “In fact, I’m almost sure that it was Flora Ackroyd, only——” “Only it doesn’t seem to make sense,” I agreed. “But if it wasn’t Flora, who could it have been?” Rapidly my sister ran over a list of maidens living in the neighborhood, with profuse reasons for and against. When she paused for breath, I murmured something about a patient, and slipped out. I proposed to make my way to the Three Boars. It seemed likely that Ralph Paton would have returned there by now. I knew Ralph very well—better, perhaps, than any one else in King’s Abbot, for I had known his mother before him, and therefore I understood much in him that puzzled others. He was, to a certain extent, the victim of heredity. He had not inherited his mother’s fatal propensity for drink, but nevertheless he had in him a strain of weakness. As my new friend of this morning had declared, he was extraordinarily handsome. Just on six feet, perfectly proportioned, with the easy grace of an athlete, he was dark, like his mother, with a handsome, sunburnt face always ready to break into a smile. Ralph Paton was of those born to charm easily and without effort. He was self-indulgent and extravagant, with no veneration for anything on earth, but he was lovable nevertheless, and his friends were all devoted to him. Could I do anything with the boy? I thought I could. On inquiry at the Three Boars I found that Captain Paton had just come in. I went up to his room and entered unannounced. For a moment, remembering what I had heard and seen, I was doubtful of my reception, but I need have had no misgivings. “Why, it’s Sheppard! Glad to see you.” He came forward to meet me, hand outstretched, a sunny smile lighting up his face. “The one person I am glad to see in this infernal place.” I raised my eyebrows. “What’s the place been doing?” He gave a vexed laugh. “It’s a long story. Things haven’t been going well with me, doctor. But have a drink, won’t you?” “Thanks,” I said, “I will.” He pressed the bell, then, coming back, threw himself into a chair. “Not to mince matters,” he said gloomily, “I’m in the devil of a mess. In fact, I haven’t the least idea what to do next.” “What’s the matter?” I asked sympathetically. “It’s my confounded stepfather.” “What has he done?” “It isn’t what he’s done yet, but what he’s likely to do.” The bell was answered, and Ralph ordered the drinks. When the man had gone again, he sat hunched in the arm-chair, frowning to himself. “Is it really—serious?” I asked. He nodded. “I’m fairly up against it this time,” he said soberly. The unusual ring of gravity in his voice told me that he spoke the truth. It took a good deal to make Ralph grave. “In fact,” he continued, “I can’t see my way ahead.... I’m damned if I can.” “If I could help——” I suggested diffidently. But he shook his head very decidedly. “Good of you, doctor. But I can’t let you in on this. I’ve got to play a lone hand.” He was silent a minute and then repeated in a slightly different tone of voice:— “Yes—I’ve got to play a lone hand....” DINNER AT FERNLY It was just a few minutes before half-past seven when I rang the front door bell of Fernly Park. The door was opened with admirable promptitude by Parker, the butler. The night was such a fine one that I had preferred to come on foot. I stepped into the big square hall and Parker relieved me of my overcoat. Just then Ackroyd’s secretary, a pleasant young fellow by the name of Raymond, passed through the hall on his way to Ackroyd’s study, his hands full of papers. “Good-evening, doctor. Coming to dine? Or is this a professional call?” The last was in allusion to my black bag, which I had laid down on the oak chest. I explained that I expected a summons to a confinement case at any moment, and so had come out prepared for an emergency call. Raymond nodded, and went on his way, calling over his shoulder:— “Go into the drawing-room. You know the way. The ladies will be down in a minute. I must just take these papers to Mr. Ackroyd, and I’ll tell him you’re here.” On Raymond’s appearance Parker had withdrawn, so I was alone in the hall. I settled my tie, glanced in a large mirror which hung there, and crossed to the door directly facing me, which was, as I knew, the door of the drawing-room. I noticed, just as I was turning the handle, a sound from within—the shutting down of a window, I took it to be. I noted it, I may say, quite mechanically, without attaching any importance to it at the time. I opened the door and walked in. As I did so, I almost collided with Miss Russell, who was just coming out. We both apologized. For the first time I found myself appraising the housekeeper and thinking what a handsome woman she must once have been—indeed, as far as that goes, still was. Her dark hair was unstreaked with gray, and when she had a color, as she had at this minute, the stern quality of her looks was not so apparent. Quite subconsciously I wondered whether she had been out, for she was breathing hard, as though she had been running. “I’m afraid I’m a few minutes early,” I said. “Oh! I don’t think so. It’s gone half-past seven, Dr. Sheppard.” She paused a minute before saying, “I—didn’t know you were expected to dinner to-night. Mr. Ackroyd didn’t mention it.” I received a vague impression that my dining there displeased her in some way, but I couldn’t imagine why. “How’s the knee?” I inquired. “Much the same, thank you, doctor. I must be going now. Mrs. Ackroyd will be down in a moment. I—I only came in here to see if the flowers were all right.” She passed quickly out of the room. I strolled to the window, wondering at her evident desire to justify her presence in the room. As I did so, I saw what, of course, I might have known all the time had I troubled to give my mind to it, namely, that the windows were long French ones opening on the terrace. The sound I had heard, therefore, could not have been that of a window being shut down. Quite idly, and more to distract my mind from painful thoughts than for any other reason, I amused myself by trying to guess what could have caused the sound in question. Coals on the fire? No, that was not the kind of noise at all. A drawer of the bureau pushed in? No, not that. Then my eye was caught by what, I believe, is called a silver table, the lid of which lifts, and through the glass of which you can see the contents. I crossed over to it, studying the things. There were one or two pieces of old silver, a baby shoe belonging to King Charles the First, some Chinese jade figures, and quite a number of African implements and curios. Wanting to examine one of the jade figures more closely, I lifted the lid. It slipped through my fingers and fell. At once I recognized the sound I had heard. It was this same table lid being shut down gently and carefully. I repeated the action once or twice for my own satisfaction. Then I lifted the lid to scrutinize the contents more closely. I was still bending over the open silver table when Flora Ackroyd came into the room. Quite a lot of people do not like Flora Ackroyd, but nobody can help admiring her. And to her friends she can be very charming. The first thing that strikes you about her is her extraordinary fairness. She has the real Scandinavian pale gold hair. Her eyes are blue—blue as the waters of a Norwegian fiord, and her skin is cream and roses. She has square, boyish shoulders and slight hips. And to a jaded medical man it is very refreshing to come across such perfect health. A simple straight-forward English girl—I may be old-fashioned, but I think the genuine article takes a lot of beating. Flora joined me by the silver table, and expressed heretical doubts as to King Charles I ever having worn the baby shoe. “And anyway,” continued Miss Flora, “all this making a fuss about things because some one wore or used them seems to me all nonsense. They’re not wearing or using them now. The pen that George Eliot wrote _The Mill on the Floss_ with—that sort of thing—well, it’s only just a pen after all. If you’re really keen on George Eliot, why not get _The Mill on the Floss_ in a cheap edition and read it.” “I suppose you never read such old out-of-date stuff, Miss Flora?” “You’re wrong, Dr. Sheppard. I love _The Mill on the Floss_.” I was rather pleased to hear it. The things young women read nowadays and profess to enjoy positively frighten me. “You haven’t congratulated me yet, Dr. Sheppard,” said Flora. “Haven’t you heard?” She held out her left hand. On the third finger of it was an exquisitely set single pearl. “I’m going to marry Ralph, you know,” she went on. “Uncle is very pleased. It keeps me in the family, you see.” I took both her hands in mine. “My dear,” I said, “I hope you’ll be very happy.” “We’ve been engaged for about a month,” continued Flora in her cool voice, “but it was only announced yesterday. Uncle is going to do up Cross-stones, and give it to us to live in, and we’re going to pretend to farm. Really, we shall hunt all the winter, town for the season, and then go yachting. I love the sea. And, of course, I shall take a great interest in the parish affairs, and attend all the Mothers’ Meetings.” Just then Mrs. Ackroyd rustled in, full of apologies for being late. I am sorry to say I detest Mrs. Ackroyd. She is all chains and teeth and bones. A most unpleasant woman. She has small pale flinty blue eyes, and however gushing her words may be, those eyes of hers always remain coldly speculative. I went across to her, leaving Flora by the window. She gave me a handful of assorted knuckles and rings to squeeze, and began talking volubly. Had I heard about Flora’s engagement? So suitable in every way. The dear young things had fallen in love at first sight. Such a perfect pair, he so dark and she so fair. “I can’t tell you, my dear Dr. Sheppard, the relief to a mother’s heart.” Mrs. Ackroyd sighed—a tribute to her mother’s heart, whilst her eyes remained shrewdly observant of me. “I was wondering. You are such an old friend of dear Roger’s. We know how much he trusts to your judgment. So difficult for me—in my position, as poor Cecil’s widow. But there are so many tiresome things—settlements, you know—all that. I fully believe that Roger intends to make settlements upon dear Flora, but, as you know, he is just a _leetle_ peculiar about money. Very usual, I’ve heard, amongst men who are captains of industry. I wondered, you know, if you could just _sound_ him on the subject? Flora is so fond of you. We feel you are quite an old friend, although we have only really known you just over two years.” Mrs. Ackroyd’s eloquence was cut short as the drawing-room door opened once more. I was pleased at the interruption. I hate interfering in other people’s affairs, and I had not the least intention of tackling Ackroyd on the subject of Flora’s settlements. In another moment I should have been forced to tell Mrs. Ackroyd as much. “You know Major Blunt, don’t you, doctor?” “Yes, indeed,” I said. A lot of people know Hector Blunt—at least by repute. He has shot more wild animals in unlikely places than any man living, I suppose. When you mention him, people say: “Blunt—you don’t mean the big game man, do you?” His friendship with Ackroyd has always puzzled me a little. The two men are so totally dissimilar. Hector Blunt is perhaps five years Ackroyd’s junior. They made friends early in life, and though their ways have diverged, the friendship still holds. About once in two years Blunt spends a fortnight at Fernly, and an immense animal’s head, with an amazing number of horns which fixes you with a glazed stare as soon as you come inside the front door, is a permanent reminder of the friendship. Blunt had entered the room now with his own peculiar, deliberate, yet soft-footed tread. He is a man of medium height, sturdily and rather stockily built. His face is almost mahogany-colored, and is peculiarly expressionless. He has gray eyes that give the impression of always watching something that is happening very far away. He talks little, and what he does say is said jerkily, as though the words were forced out of him unwillingly. He said now: “How are you, Sheppard?” in his usual abrupt fashion, and then stood squarely in front of the fireplace looking over our heads as though he saw something very interesting happening in Timbuctoo. “Major Blunt,” said Flora, “I wish you’d tell me about these African things. I’m sure you know what they all are.” I have heard Hector Blunt described as a woman hater, but I noticed that he joined Flora at the silver table with what might be described as alacrity. They bent over it together. I was afraid Mrs. Ackroyd would begin talking about settlements again, so I made a few hurried remarks about the new sweet pea. I knew there was a new sweet pea because the _Daily Mail_ had told me so that morning. Mrs. Ackroyd knows nothing about horticulture, but she is the kind of woman who likes to appear well-informed about the topics of the day, and she, too, reads the _Daily Mail_. We were able to converse quite intelligently until Ackroyd and his secretary joined us, and immediately afterwards Parker announced dinner. My place at table was between Mrs. Ackroyd and Flora. Blunt was on Mrs. Ackroyd’s other side, and Geoffrey Raymond next to him. Dinner was not a cheerful affair. Ackroyd was visibly preoccupied. He looked wretched, and ate next to nothing. Mrs. Ackroyd, Raymond, and I kept the conversation going. Flora seemed affected by her uncle’s depression, and Blunt relapsed into his usual taciturnity. Immediately after dinner Ackroyd slipped his arm through mine and led me off to his study. “Once we’ve had coffee, we shan’t be disturbed again,” he explained. “I told Raymond to see to it that we shouldn’t be interrupted.” I studied him quietly without appearing to do so. He was clearly under the influence of some strong excitement. For a minute or two he paced up and down the room, then, as Parker entered with the coffee tray, he sank into an arm-chair in front of the fire. The study was a comfortable apartment. Book-shelves lined one wall of it. The chairs were big and covered in dark blue leather. A large desk stood by the window and was covered with papers neatly docketed and filed. On a round table were various magazines and sporting papers. “I’ve had a return of that pain after food lately,” remarked Ackroyd casually, as he helped himself to coffee. “You must give me some more of those tablets of yours.” It struck me that he was anxious to convey the impression that our conference was a medical one. I played up accordingly. “I thought as much. I brought some up with me.” “Good man. Hand them over now.” “They’re in my bag in the hall. I’ll get them.” Ackroyd arrested me. “Don’t you trouble. Parker will get them. Bring in the doctor’s bag, will you, Parker?” “Very good, sir.” Parker withdrew. As I was about to speak, Ackroyd threw up his hand. “Not yet. Wait. Don’t you see I’m in such a state of nerves that I can hardly contain myself?” I saw that plainly enough. And I was very uneasy. All sorts of forebodings assailed me. Ackroyd spoke again almost immediately. “Make certain that window’s closed, will you?” he asked. Somewhat surprised, I got up and went to it. It was not a French window, but one of the ordinary sash type. The heavy blue velvet curtains were drawn in front of it, but the window itself was open at the top. Parker reëntered the room with my bag while I was still at the window. “That’s all right,” I said, emerging again into the room. “You’ve put the latch across?” “Yes, yes. What’s the matter with you, Ackroyd?” The door had just closed behind Parker, or I would not have put the question. Ackroyd waited just a minute before replying. “I’m in hell,” he said slowly, after a minute. “No, don’t bother with those damned tablets. I only said that for Parker. Servants are so curious. Come here and sit down. The door’s closed too, isn’t it?” “Yes. Nobody can overhear; don’t be uneasy.” “Sheppard, nobody knows what I’ve gone through in the last twenty-four hours. If a man’s house ever fell in ruins about him, mine has about me. This business of Ralph’s is the last straw. But we won’t talk about that now. It’s the other—the other——! I don’t know what to do about it. And I’ve got to make up my mind soon.” “What’s the trouble?” Ackroyd remained silent for a minute or two. He seemed curiously averse to begin. When he did speak, the question he asked came as a complete surprise. It was the last thing I expected. “Sheppard, you attended Ashley Ferrars in his last illness, didn’t you?” “Yes, I did.” He seemed to find even greater difficulty in framing his next question. “Did you never suspect—did it ever enter your head—that—well, that he might have been poisoned?” I was silent for a minute or two. Then I made up my mind what to say. Roger Ackroyd was not Caroline. “I’ll tell you the truth,” I said. “At the time I had no suspicion whatever, but since—well, it was mere idle talk on my sister’s part that first put the idea into my head. Since then I haven’t been able to get it out again. But, mind you, I’ve no foundation whatever for that suspicion.” “He _was_ poisoned,” said Ackroyd. He spoke in a dull heavy voice. “Who by?” I asked sharply. “His wife.” “How do you know that?” “She told me so herself.” “When?” “Yesterday! My God! yesterday! It seems ten years ago.” I waited a minute, and then he went on. “You understand, Sheppard, I’m telling you this in confidence. It’s to go no further. I want your advice—I can’t carry the whole weight by myself. As I said just now, I don’t know what to do.” “Can you tell me the whole story?” I said. “I’m still in the dark. How did Mrs. Ferrars come to make this confession to you?” “It’s like this. Three months ago I asked Mrs. Ferrars to marry me. She refused. I asked her again and she consented, but she refused to allow me to make the engagement public until her year of mourning was up. Yesterday I called upon her, pointed out that a year and three weeks had now elapsed since her husband’s death, and that there could be no further objection to making the engagement public property. I had noticed that she had been very strange in her manner for some days. Now, suddenly, without the least warning, she broke down completely. She—she told me everything. Her hatred of her brute of a husband, her growing love for me, and the—the dreadful means she had taken. Poison! My God! It was murder in cold blood.” I saw the repulsion, the horror, in Ackroyd’s face. So Mrs. Ferrars must have seen it. Ackroyd is not the type of the great lover who can forgive all for love’s sake. He is fundamentally a good citizen. All that was sound and wholesome and law-abiding in him must have turned from her utterly in that moment of revelation. “Yes,” he went on, in a low, monotonous voice, “she confessed everything. It seems that there is one person who has known all along—who has been blackmailing her for huge sums. It was the strain of that that drove her nearly mad.” “Who was the man?” Suddenly before my eyes there arose the picture of Ralph Paton and Mrs. Ferrars side by side. Their heads so close together. I felt a momentary throb of anxiety. Supposing—oh! but surely that was impossible. I remembered the frankness of Ralph’s greeting that very afternoon. Absurd! “She wouldn’t tell me his name,” said Ackroyd slowly. “As a matter of fact, she didn’t actually say that it was a man. But of course——” “Of course,” I agreed. “It must have been a man. And you’ve no suspicion at all?” For answer Ackroyd groaned and dropped his head into his hands. “It can’t be,” he said. “I’m mad even to think of such a thing. No, I won’t even admit to you the wild suspicion that crossed my mind. I’ll tell you this much, though. Something she said made me think that the person in question might be actually among my household—but that can’t be so. I must have misunderstood her.” “What did you say to her?” I asked. “What could I say? She saw, of course, the awful shock it had been to me. And then there was the question, what was my duty in the matter? She had made me, you see, an accessory after the fact. She saw all that, I think, quicker than I did. I was stunned, you know. She asked me for twenty-four hours—made me promise to do nothing till the end of that time. And she steadfastly refused to give me the name of the scoundrel who had been blackmailing her. I suppose she was afraid that I might go straight off and hammer him, and then the fat would have been in the fire as far as she was concerned. She told me that I should hear from her before twenty-four hours had passed. My God! I swear to you, Sheppard, that it never entered my head what she meant to do. Suicide! And I drove her to it.” “No, no,” I said. “Don’t take an exaggerated view of things. The responsibility for her death doesn’t lie at your door.” “The question is, what am I to do now? The poor lady is dead. Why rake up past trouble?” “I rather agree with you,” I said. “But there’s another point. How am I to get hold of that scoundrel who drove her to death as surely as if he’d killed her. He knew of the first crime, and he fastened on to it like some obscene vulture. She’s paid the penalty. Is he to go scot-free?” “I see,” I said slowly. “You want to hunt him down? It will mean a lot of publicity, you know.” “Yes, I’ve thought of that. I’ve zigzagged to and fro in my mind.” “I agree with you that the villain ought to be punished, but the cost has got to be reckoned.” Ackroyd rose and walked up and down. Presently he sank into the chair again. “Look here, Sheppard, suppose we leave it like this. If no word comes from her, we’ll let the dead things lie.” “What do you mean by word coming from her?” I asked curiously. “I have the strongest impression that somewhere or somehow she must have left a message for me—before she went. I can’t argue about it, but there it is.” I shook my head. “She left no letter or word of any kind. I asked.” “Sheppard, I’m convinced that she did. And more, I’ve a feeling that by deliberately choosing death, she wanted the whole thing to come out, if only to be revenged on the man who drove her to desperation. I believe that if I could have seen her then, she would have told me his name and bid me go for him for all I was worth.” He looked at me. “You don’t believe in impressions?” “Oh, yes, I do, in a sense. If, as you put it, word should come from her——” I broke off. The door opened noiselessly and Parker entered with a salver on which were some letters. “The evening post, sir,” he said, handing the salver to Ackroyd. Then he collected the coffee cups and withdrew. My attention, diverted for a moment, came back to Ackroyd. He was staring like a man turned to stone at a long blue envelope. The other letters he had let drop to the ground. “_Her writing_,” he said in a whisper. “She must have gone out and posted it last night, just before—before——” He ripped open the envelope and drew out a thick enclosure. Then he looked up sharply. “You’re sure you shut the window?” he said. “Quite sure,” I said, surprised. “Why?” “All this evening I’ve had a queer feeling of being watched, spied upon. What’s that——?” He turned sharply. So did I. We both had the impression of hearing the latch of the door give ever so slightly. I went across to it and opened it. There was no one there. “Nerves,” murmured Ackroyd to himself. He unfolded the thick sheets of paper, and read aloud in a low voice. “_My dear, my very dear Roger,—A life calls for a life. I see that—I saw it in your face this afternoon. So I am taking the only road open to me. I leave to you the punishment of the person who has made my life a hell upon earth for the last year. I would not tell you the name this afternoon, but I propose to write it to you now. I have no children or near relations to be spared, so do not fear publicity. If you can, Roger, my very dear Roger, forgive me the wrong I meant to do you, since when the time came, I could not do it after all...._” Ackroyd, his finger on the sheet to turn it over, paused. “Sheppard, forgive me, but I must read this alone,” he said unsteadily. “It was meant for my eyes, and my eyes only.” He put the letter in the envelope and laid it on the table. “Later, when I am alone.” “No,” I cried impulsively, “read it now.” Ackroyd stared at me in some surprise. “I beg your pardon,” I said, reddening. “I do not mean read it aloud to me. But read it through whilst I am still here.” Ackroyd shook his head. “No, I’d rather wait.” But for some reason, obscure to myself, I continued to urge him. “At least, read the name of the man,” I said. Now Ackroyd is essentially pig-headed. The more you urge him to do a thing, the more determined he is not to do it. All my arguments were in vain. The letter had been brought in at twenty minutes to nine. It was just on ten minutes to nine when I left him, the letter still unread. I hesitated with my hand on the door handle, looking back and wondering if there was anything I had left undone. I could think of nothing. With a shake of the head I passed out and closed the door behind me. I was startled by seeing the figure of Parker close at hand. He looked embarrassed, and it occurred to me that he might have been listening at the door. What a fat, smug, oily face the man had, and surely there was something decidedly shifty in his eye. “Mr. Ackroyd particularly does not want to be disturbed,” I said coldly. “He told me to tell you so.” “Quite so, sir. I—I fancied I heard the bell ring.” This was such a palpable untruth that I did not trouble to reply. Preceding me to the hall, Parker helped me on with my overcoat, and I stepped out into the night. The moon was overcast and everything seemed very dark and still. The village church clock chimed nine o’clock as I passed through the lodge gates. I turned to the left towards the village, and almost cannoned into a man coming in the opposite direction. “This the way to Fernly Park, mister?” asked the stranger in a hoarse voice. I looked at him. He was wearing a hat pulled down over his eyes, and his coat collar turned up. I could see little or nothing of his face, but he seemed a young fellow. The voice was rough and uneducated. “These are the lodge gates here,” I said. “Thank you, mister.” He paused, and then added, quite unnecessarily, “I’m a stranger in these parts, you see.” He went on, passing through the gates as I turned to look after him. The odd thing was that his voice reminded me of some one’s voice that I knew, but whose it was I could not think. Ten minutes later I was at home once more. Caroline was full of curiosity to know why I had returned so early. I had to make up a slightly fictitious account of the evening in order to satisfy her, and I had an uneasy feeling that she saw through the transparent device. At ten o’clock I rose, yawned, and suggested bed. Caroline acquiesced. It was Friday night, and on Friday night I wind the clocks. I did it as usual, whilst Caroline satisfied herself that the servants had locked up the kitchen properly. It was a quarter past ten as we went up the stairs. I had just reached the top when the telephone rang in the hall below. “Mrs. Bates,” said Caroline immediately. “I’m afraid so,” I said ruefully. I ran down the stairs and took up the receiver. “What?” I said. “_What?_ Certainly, I’ll come at once.” I ran upstairs, caught up my bag, and stuffed a few extra dressings into it. “Parker telephoning,” I shouted to Caroline, “from Fernly. They’ve just found Roger Ackroyd murdered.” MURDER I got out the car in next to no time, and drove rapidly to Fernly. Jumping out, I pulled the bell impatiently. There was some delay in answering, and I rang again. Then I heard the rattle of the chain and Parker, his impassivity of countenance quite unmoved, stood in the open doorway. I pushed past him into the hall. “Where is he?” I demanded sharply. “I beg your pardon, sir?” “Your master. Mr. Ackroyd. Don’t stand there staring at me, man. Have you notified the police?” “The police, sir? Did you say the police?” Parker stared at me as though I were a ghost. “What’s the matter with you, Parker? If, as you say, your master has been murdered——” A gasp broke from Parker. “The master? Murdered? Impossible, sir!” It was my turn to stare. “Didn’t you telephone to me, not five minutes ago, and tell me that Mr. Ackroyd had been found murdered?” “Me, sir? Oh! no indeed, sir. I wouldn’t dream of doing such a thing.” “Do you mean to say it’s all a hoax? That there’s nothing the matter with Mr. Ackroyd?” “Excuse me, sir, did the person telephoning use my name?” “I’ll give you the exact words I heard. ‘_Is that Dr. Sheppard? Parker, the butler at Fernly, speaking. Will you please come at once, sir. Mr. Ackroyd has been murdered._’” Parker and I stared at each other blankly. “A very wicked joke to play, sir,” he said at last, in a shocked tone. “Fancy saying a thing like that.” “Where is Mr. Ackroyd?” I asked suddenly. “Still in the study, I fancy, sir. The ladies have gone to bed, and Major Blunt and Mr. Raymond are in the billiard room.” “I think I’ll just look in and see him for a minute,” I said. “I know he didn’t want to be disturbed again, but this odd practical joke has made me uneasy. I’d just like to satisfy myself that he’s all right.” “Quite so, sir. It makes me feel quite uneasy myself. If you don’t object to my accompanying you as far as the door, sir——?” “Not at all,” I said. “Come along.” I passed through the door on the right, Parker on my heels, traversed the little lobby where a small flight of stairs led upstairs to Ackroyd’s bedroom, and tapped on the study door. There was no answer. I turned the handle, but the door was locked. “Allow me, sir,” said Parker. Very nimbly, for a man of his build, he dropped on one knee and applied his eye to the keyhole. “Key is in the lock all right, sir,” he said, rising. “On the inside. Mr. Ackroyd must have locked himself in and possibly just dropped off to sleep.” I bent down and verified Parker’s statement. “It seems all right,” I said, “but, all the same, Parker, I’m going to wake your master up. I shouldn’t be satisfied to go home without hearing from his own lips that he’s quite all right.” So saying, I rattled the handle and called out, “Ackroyd, Ackroyd, just a minute.” But still there was no answer. I glanced over my shoulder. “I don’t want to alarm the household,” I said hesitatingly. Parker went across and shut the door from the big hall through which we had come. “I think that will be all right now, sir. The billiard room is at the other side of the house, and so are the kitchen quarters and the ladies’ bedrooms.” I nodded comprehendingly. Then I banged once more frantically on the door, and stooping down, fairly bawled through the keyhole:— “Ackroyd, Ackroyd! It’s Sheppard. Let me in.” And still—silence. Not a sign of life from within the locked room. Parker and I glanced at each other. “Look here, Parker,” I said, “I’m going to break this door in—or rather, we are. I’ll take the responsibility.” “If you say so, sir,” said Parker, rather doubtfully. “I do say so. I’m seriously alarmed about Mr. Ackroyd.” I looked round the small lobby and picked up a heavy oak chair. Parker and I held it between us and advanced to the assault. Once, twice, and three times we hurled it against the lock. At the third blow it gave, and we staggered into the room. Ackroyd was sitting as I had left him in the arm-chair before the fire. His head had fallen sideways, and clearly visible, just below the collar of his coat, was a shining piece of twisted metalwork. Parker and I advanced till we stood over the recumbent figure. I heard the butler draw in his breath with a sharp hiss. “Stabbed from be’ind,” he murmured. “’Orrible!” He wiped his moist brow with his handkerchief, then stretched out a hand gingerly towards the hilt of the dagger. “You mustn’t touch that,” I said sharply. “Go at once to the telephone and ring up the police station. Inform them of what has happened. Then tell Mr. Raymond and Major Blunt.” “Very good, sir.” Parker hurried away, still wiping his perspiring brow. I did what little had to be done. I was careful not to disturb the position of the body, and not to handle the dagger at all. No object was to be attained by moving it. Ackroyd had clearly been dead some little time. Then I heard young Raymond’s voice, horror-stricken and incredulous, outside. “What do you say? Oh! impossible! Where’s the doctor?” He appeared impetuously in the doorway, then stopped dead, his face very white. A hand put him aside, and Hector Blunt came past him into the room. “My God!” said Raymond from behind him; “it’s true, then.” Blunt came straight on till he reached the chair. He bent over the body, and I thought that, like Parker, he was going to lay hold of the dagger hilt. I drew him back with one hand. “Nothing must be moved,” I explained. “The police must see him exactly as he is now.” Blunt nodded in instant comprehension. His face was expressionless as ever, but I thought I detected signs of emotion beneath the stolid mask. Geoffrey Raymond had joined us now, and stood peering over Blunt’s shoulder at the body. “This is terrible,” he said in a low voice. He had regained his composure, but as he took off the pince-nez he habitually wore and polished them I observed that his hand was shaking. “Robbery, I suppose,” he said. “How did the fellow get in? Through the window? Has anything been taken?” He went towards the desk. “You think it’s burglary?” I said slowly. “What else could it be? There’s no question of suicide, I suppose?” “No man could stab himself in such a way,” I said confidently. “It’s murder right enough. But with what motive?” “Roger hadn’t an enemy in the world,” said Blunt quietly. “Must have been burglars. But what was the thief after? Nothing seems to be disarranged?” He looked round the room. Raymond was still sorting the papers on the desk. “There seems nothing missing, and none of the drawers show signs of having been tampered with,” the secretary observed at last. “It’s very mysterious.” Blunt made a slight motion with his head. “There are some letters on the floor here,” he said. I looked down. Three or four letters still lay where Ackroyd had dropped them earlier in the evening. But the blue envelope containing Mrs. Ferrars’s letter had disappeared. I half opened my mouth to speak, but at that moment the sound of a bell pealed through the house. There was a confused murmur of voices in the hall, and then Parker appeared with our local inspector and a police constable. “Good evening, gentlemen,” said the inspector. “I’m terribly sorry for this! A good kind gentleman like Mr. Ackroyd. The butler says it is murder. No possibility of accident or suicide, doctor?” “None whatever,” I said. “Ah! A bad business.” He came and stood over the body. “Been moved at all?” he asked sharply. “Beyond making certain that life was extinct—an easy matter—I have not disturbed the body in any way.” “Ah! And everything points to the murderer having got clear away—for the moment, that is. Now then, let me hear all about it. Who found the body?” I explained the circumstances carefully. “A telephone message, you say? From the butler?” “A message that I never sent,” declared Parker earnestly. “I’ve not been near the telephone the whole evening. The others can bear me out that I haven’t.” “Very odd, that. Did it sound like Parker’s voice, doctor?” “Well—I can’t say I noticed. I took it for granted, you see.” “Naturally. Well, you got up here, broke in the door, and found poor Mr. Ackroyd like this. How long should you say he had been dead, doctor?” “Half an hour at least—perhaps longer,” I said. “The door was locked on the inside, you say? What about the window?” “I myself closed and bolted it earlier in the evening at Mr. Ackroyd’s request.” The inspector strode across to it and threw back the curtains. “Well, it’s open now anyway,” he remarked. True enough, the window was open, the lower sash being raised to its fullest extent. The inspector produced a pocket torch and flashed it along the sill outside. “This is the way he went all right,” he remarked, “_and_ got in. See here.” In the light of the powerful torch, several clearly defined footmarks could be seen. They seemed to be those of shoes with rubber studs in the soles. One particularly clear one pointed inwards, another, slightly overlapping it, pointed outwards. “Plain as a pikestaff,” said the inspector. “Any valuables missing?” Geoffrey Raymond shook his head. “Not so that we can discover. Mr. Ackroyd never kept anything of particular value in this room.” “H’m,” said the inspector. “Man found an open window. Climbed in, saw Mr. Ackroyd sitting there—maybe he’d fallen asleep. Man stabbed him from behind, then lost his nerve and made off. But he’s left his tracks pretty clearly. We ought to get hold of _him_ without much difficulty. No suspicious strangers been hanging about anywhere?” “Oh!” I said suddenly. “What is it, doctor?” “I met a man this evening—just as I was turning out of the gate. He asked me the way to Fernly Park.” “What time would that be?” “Just nine o’clock. I heard it chime the hour as I was turning out of the gate.” “Can you describe him?” I did so to the best of my ability. The inspector turned to the butler. “Any one answering that description come to the front door?” “No, sir. No one has been to the house at all this evening.” “What about the back?” “I don’t think so, sir, but I’ll make inquiries.” He moved towards the door, but the inspector held up a large hand. “No, thanks. I’ll do my own inquiring. But first of all I want to fix the time a little more clearly. When was Mr. Ackroyd last seen alive?” “Probably by me,” I said, “when I left at—let me see—about ten minutes to nine. He told me that he didn’t wish to be disturbed, and I repeated the order to Parker.” “Just so, sir,” said Parker respectfully. “Mr. Ackroyd was certainly alive at half-past nine,” put in Raymond, “for I heard his voice in here talking.” “Who was he talking to?” “That I don’t know. Of course, at the time I took it for granted that it was Dr. Sheppard who was with him. I wanted to ask him a question about some papers I was engaged upon, but when I heard the voices I remembered that he had said he wanted to talk to Dr. Sheppard without being disturbed, and I went away again. But now it seems that the doctor had already left?” I nodded. “I was at home by a quarter-past nine,” I said. “I didn’t go out again until I received the telephone call.” “Who could have been with him at half-past nine?” queried the inspector. “It wasn’t you, Mr.—er——” “Major Blunt,” I said. “Major Hector Blunt?” asked the inspector, a respectful tone creeping into his voice. Blunt merely jerked his head affirmatively. “I think we’ve seen you down here before, sir,” said the inspector. “I didn’t recognize you for the moment, but you were staying with Mr. Ackroyd a year ago last May.” “June,” corrected Blunt. “Just so, June it was. Now, as I was saying, it wasn’t you with Mr. Ackroyd at nine-thirty this evening?” Blunt shook his head. “Never saw him after dinner,” he volunteered. The inspector turned once more to Raymond. “You didn’t overhear any of the conversation going on, did you, sir?” “I did catch just a fragment of it,” said the secretary, “and, supposing as I did that it was Dr. Sheppard who was with Mr. Ackroyd, that fragment struck me as distinctly odd. As far as I can remember, the exact words were these. Mr. Ackroyd was speaking. ‘The calls on my purse have been so frequent of late’—that is what he was saying—‘of late, that I fear it is impossible for me to accede to your request....’ I went away again at once, of course, so did not hear any more. But I rather wondered because Dr. Sheppard——” “——Does not ask for loans for himself or subscriptions for others,” I finished. “A demand for money,” said the inspector musingly. “It may be that here we have a very important clew.” He turned to the butler. “You say, Parker, that nobody was admitted by the front door this evening?” “That’s what I say, sir.” “Then it seems almost certain that Mr. Ackroyd himself must have admitted this stranger. But I don’t quite see——” The inspector went into a kind of day-dream for some minutes. “One thing’s clear,” he said at length, rousing himself from his absorption. “Mr. Ackroyd was alive and well at nine-thirty. That is the last moment at which he is known to have been alive.” Parker gave vent to an apologetic cough which brought the inspector’s eyes on him at once. “Well?” he said sharply. “If you’ll excuse me, sir, Miss Flora saw him after that.” “Miss Flora?” “Yes, sir. About a quarter to ten that would be. It was after that that she told me Mr. Ackroyd wasn’t to be disturbed again to-night.” “Did he send her to you with that message?” “Not exactly, sir. I was bringing a tray with soda and whisky when Miss Flora, who was just coming out of this room, stopped me and said her uncle didn’t want to be disturbed.” The inspector looked at the butler with rather closer attention than he had bestowed on him up to now. “You’d already been told that Mr. Ackroyd didn’t want to be disturbed, hadn’t you?” Parker began to stammer. His hands shook. “Yes, sir. Yes, sir. Quite so, sir.” “And yet you were proposing to do so?” “I’d forgotten, sir. At least I mean, I always bring the whisky and soda about that time, sir, and ask if there’s anything more, and I thought—well, I was doing as usual without thinking.” It was at this moment that it began to dawn upon me that Parker was most suspiciously flustered. The man was shaking and twitching all over. “H’m,” said the inspector. “I must see Miss Ackroyd at once. For the moment we’ll leave this room exactly as it is. I can return here after I’ve heard what Miss Ackroyd has to tell me. I shall just take the precaution of shutting and bolting the window.” This precaution accomplished, he led the way into the hall and we followed him. He paused a moment, as he glanced up at the little staircase, then spoke over his shoulder to the constable. “Jones, you’d better stay here. Don’t let any one go into that room.” Parker interposed deferentially. “If you’ll excuse me, sir. If you were to lock the door into the main hall, nobody could gain access to this part. That staircase leads only to Mr. Ackroyd’s bedroom and bathroom. There is no communication with the other part of the house. There once was a door through, but Mr. Ackroyd had it blocked up. He liked to feel that his suite was entirely private.” To make things clear and explain the position, I have appended a rough sketch of the right-hand wing of the house. The small staircase leads, as Parker explained, to a big bedroom (made by two being knocked into one) and an adjoining bathroom and lavatory. The inspector took in the position at a glance. We went through into the large hall and he locked the door behind him, slipping the key into his pocket. Then he gave the constable some low-voiced instructions, and the latter prepared to depart. “We must get busy on those shoe tracks,” explained the inspector. “But first of all, I must have a word with Miss Ackroyd. She was the last person to see her uncle alive. Does she know yet?” Raymond shook his head. “Well, no need to tell her for another five minutes. She can answer my questions better without being upset by knowing the truth about her uncle. Tell her there’s been a burglary, and ask her if she would mind dressing and coming down to answer a few questions.” It was Raymond who went upstairs on this errand. “Miss Ackroyd will be down in a minute,” he said, when he returned. “I told her just what you suggested.” In less than five minutes Flora descended the staircase. She was wrapped in a pale pink silk kimono. She looked anxious and excited. The inspector stepped forward. “Good-evening, Miss Ackroyd,” he said civilly. “We’re afraid there’s been an attempt at robbery, and we want you to help us. What’s this room—the billiard room? Come in here and sit down.” Flora sat down composedly on the wide divan which ran the length of the wall, and looked up at the inspector. “I don’t quite understand. What has been stolen? What do you want me to tell you?” “It’s just this, Miss Ackroyd. Parker here says you came out of your uncle’s study at about a quarter to ten. Is that right?” “Quite right. I had been to say good-night to him.” “And the time is correct?” “Well, it must have been about then. I can’t say exactly. It might have been later.” “Was your uncle alone, or was there any one with him?” “He was alone. Dr. Sheppard had gone.” “Did you happen to notice whether the window was open or shut?” Flora shook her head. “I can’t say. The curtains were drawn.” “Exactly. And your uncle seemed quite as usual?” “I think so.” “Do you mind telling us exactly what passed between you?” Flora paused a minute, as though to collect her recollections. “I went in and said, ‘Good-night, uncle, I’m going to bed now. I’m tired to-night.’ He gave a sort of grunt, and—I went over and kissed him, and he said something about my looking nice in the frock I had on, and then he told me to run away as he was busy. So I went.” “Did he ask specially not to be disturbed?” “Oh! yes, I forgot. He said: ‘Tell Parker I don’t want anything more to-night, and that he’s not to disturb me.’ I met Parker just outside the door and gave him uncle’s message.” “Just so,” said the inspector. “Won’t you tell me what it is that has been stolen?” “We’re not quite—certain,” said the inspector hesitatingly. A wide look of alarm came into the girl’s eyes. She started up. “What is it? You’re hiding something from me?” Moving in his usual unobtrusive manner, Hector Blunt came between her and the inspector. She half stretched out her hand, and he took it in both of his, patting it as though she were a very small child, and she turned to him as though something in his stolid, rocklike demeanor promised comfort and safety. “It’s bad news, Flora,” he said quietly. “Bad news for all of us. Your Uncle Roger——” “Yes?” “It will be a shock to you. Bound to be. Poor Roger’s dead.” Flora drew away from him, her eyes dilating with horror. “When?” she whispered. “When?” “Very soon after you left him, I’m afraid,” said Blunt gravely. Flora raised her hand to her throat, gave a little cry, and I hurried to catch her as she fell. She had fainted, and Blunt and I carried her upstairs and laid her on her bed. Then I got him to wake Mrs. Ackroyd and tell her the news. Flora soon revived, and I brought her mother to her, telling her what to do for the girl. Then I hurried downstairs again. THE TUNISIAN DAGGER I met the inspector just coming from the door which led into the kitchen quarters. “How’s the young lady, doctor?” “Coming round nicely. Her mother’s with her.” “That’s good. I’ve been questioning the servants. They all declare that no one has been to the back door to-night. Your description of that stranger was rather vague. Can’t you give us something more definite to go upon?” “I’m afraid not,” I said regretfully. “It was a dark night, you see, and the fellow had his coat collar well pulled up and his hat squashed down over his eyes.” “H’m,” said the inspector. “Looked as though he wanted to conceal his face. Sure it was no one you know?” I replied in the negative, but not as decidedly as I might have done. I remembered my impression that the stranger’s voice was not unfamiliar to me. I explained this rather haltingly to the inspector. “It was a rough, uneducated voice, you say?” I agreed, but it occurred to me that the roughness had been of an almost exaggerated quality. If, as the inspector thought, the man had wished to hide his face, he might equally well have tried to disguise his voice. “Do you mind coming into the study with me again, doctor? There are one or two things I want to ask you.” I acquiesced. Inspector Davis unlocked the door of the lobby, we passed through, and he locked the door again behind him. “We don’t want to be disturbed,” he said grimly. “And we don’t want any eavesdropping either. What’s all this about blackmail?” “Blackmail!” I exclaimed, very much startled. “Is it an effort of Parker’s imagination? Or is there something in it?” “If Parker heard anything about blackmail,” I said slowly, “he must have been listening outside this door with his ear glued against the keyhole.” Davis nodded. “Nothing more likely. You see, I’ve been instituting a few inquiries as to what Parker has been doing with himself this evening. To tell the truth, I didn’t like his manner. The man knows something. When I began to question him, he got the wind up, and plumped out some garbled story of blackmail.” I took an instant decision. “I’m rather glad you’ve brought the matter up,” I said. “I’ve been trying to decide whether to make a clean breast of things or not. I’d already practically decided to tell you everything, but I was going to wait for a favorable opportunity. You might as well have it now.” And then and there I narrated the whole events of the evening as I have set them down here. The inspector listened keenly, occasionally interjecting a question. “Most extraordinary story I ever heard,” he said, when I had finished. “And you say that letter has completely disappeared? It looks bad—it looks very bad indeed. It gives us what we’ve been looking for—a motive for the murder.” I nodded. “I realize that.” “You say that Mr. Ackroyd hinted at a suspicion he had that some member of his household was involved? Household’s rather an elastic term.” “You don’t think that Parker himself might be the man we’re after?” I suggested. “It looks very like it. He was obviously listening at the door when you came out. Then Miss Ackroyd came across him later bent on entering the study. Say he tried again when she was safely out of the way. He stabbed Ackroyd, locked the door on the inside, opened the window, and got out that way, and went round to a side door which he had previously left open. How’s that?” “There’s only one thing against it,” I said slowly. “If Ackroyd went on reading that letter as soon as I left, as he intended to do, I don’t see him continuing to sit on here and turn things over in his mind for another hour. He’d have had Parker in at once, accused him then and there, and there would have been a fine old uproar. Remember, Ackroyd was a man of choleric temper.” “Mightn’t have had time to go on with the letter just then,” suggested the inspector. “We know some one was with him at half-past nine. If that visitor turned up as soon as you left, and after he went, Miss Ackroyd came in to say good-night—well, he wouldn’t be able to go on with the letter until close upon ten o’clock.” “And the telephone call?” “Parker sent that all right—perhaps before he thought of the locked door and open window. Then he changed his mind—or got in a panic—and decided to deny all knowledge of it. That was it, depend upon it.” “Ye-es,” I said rather doubtfully. “Anyway, we can find out the truth about the telephone call from the exchange. If it was put through from here, I don’t see how any one else but Parker could have sent it. Depend upon it, he’s our man. But keep it dark—we don’t want to alarm him just yet, till we’ve got all the evidence. I’ll see to it he doesn’t give us the slip. To all appearances we’ll be concentrating on your mysterious stranger.” He rose from where he had been sitting astride the chair belonging to the desk, and crossed over to the still form in the arm-chair. “The weapon ought to give us a clew,” he remarked, looking up. “It’s something quite unique—a curio, I should think, by the look of it.” He bent down, surveying the handle attentively, and I heard him give a grunt of satisfaction. Then, very gingerly, he pressed his hands down below the hilt and drew the blade out from the wound. Still carrying it so as not to touch the handle, he placed it in a wide china mug which adorned the mantelpiece. “Yes,” he said, nodding at it. “Quite a work of art. There can’t be many of them about.” It was indeed a beautiful object. A narrow, tapering blade, and a hilt of elaborately intertwined metals of curious and careful workmanship. He touched the blade gingerly with his finger, testing its sharpness, and made an appreciative grimace. “Lord, what an edge,” he exclaimed. “A child could drive that into a man—as easy as cutting butter. A dangerous sort of toy to have about.” “May I examine the body properly now?” I asked. He nodded. “Go ahead.” I made a thorough examination. “Well?” said the inspector, when I had finished. “I’ll spare you the technical language,” I said. “We’ll keep that for the inquest. The blow was delivered by a right-handed man standing behind him, and death must have been instantaneous. By the expression on the dead man’s face, I should say that the blow was quite unexpected. He probably died without knowing who his assailant was.” “Butlers can creep about as soft-footed as cats,” said Inspector Davis. “There’s not going to be much mystery about this crime. Take a look at the hilt of that dagger.” I took the look. “I dare say they’re not apparent to you, but I can see them clearly enough.” He lowered his voice. “_Fingerprints!_” He stood off a few steps to judge of his effect. “Yes,” I said mildly. “I guessed that.” I do not see why I should be supposed to be totally devoid of intelligence. After all, I read detective stories, and the newspapers, and am a man of quite average ability. If there had been toe marks on the dagger handle, now, that would have been quite a different thing. I would then have registered any amount of surprise and awe. I think the inspector was annoyed with me for declining to get thrilled. He picked up the china mug and invited me to accompany him to the billiard room. “I want to see if Mr. Raymond can tell us anything about this dagger,” he explained. Locking the outer door behind us again, we made our way to the billiard room, where we found Geoffrey Raymond. The inspector held up his exhibit. “Ever seen this before, Mr. Raymond?” “Why—I believe—I’m almost sure that is a curio given to Mr. Ackroyd by Major Blunt. It comes from Morocco—no, Tunis. So the crime was committed with that? What an extraordinary thing. It seems almost impossible, and yet there could hardly be two daggers the same. May I fetch Major Blunt?” Without waiting for an answer, he hurried off. “Nice young fellow that,” said the inspector. “Something honest and ingenuous about him.” I agreed. In the two years that Geoffrey Raymond has been secretary to Ackroyd, I have never seen him ruffled or out of temper. And he has been, I know, a most efficient secretary. In a minute or two Raymond returned, accompanied by Blunt. “I was right,” said Raymond excitedly. “It _is_ the Tunisian dagger.” “Major Blunt hasn’t looked at it yet,” objected the inspector. “Saw it the moment I came into the study,” said the quiet man. “You recognized it then?” Blunt nodded. “You said nothing about it,” said the inspector suspiciously. “Wrong moment,” said Blunt. “Lot of harm done by blurting out things at the wrong time.” He returned the inspector’s stare placidly enough. The latter grunted at last and turned away. He brought the dagger over to Blunt. “You’re quite sure about it, sir. You identify it positively?” “Absolutely. No doubt whatever.” “Where was this—er—curio usually kept? Can you tell me that, sir?” It was the secretary who answered. “In the silver table in the drawing-room.” “What?” I exclaimed. The others looked at me. “Yes, doctor?” said the inspector encouragingly. “It’s nothing.” “Yes, doctor?” said the inspector again, still more encouragingly. “It’s so trivial,” I explained apologetically. “Only that when I arrived last night for dinner I heard the lid of the silver table being shut down in the drawing-room.” I saw profound skepticism and a trace of suspicion on the inspector’s countenance. “How did you know it was the silver table lid?” I was forced to explain in detail—a long, tedious explanation which I would infinitely rather not have had to make. The inspector heard me to the end. “Was the dagger in its place when you were looking over the contents?” he asked. “I don’t know,” I said. “I can’t say I remember noticing it—but, of course, it may have been there all the time.” “We’d better get hold of the housekeeper,” remarked the inspector, and pulled the bell. A few minutes later Miss Russell, summoned by Parker, entered the room. “I don’t think I went near the silver table,” she said, when the inspector had posed his question. “I was looking to see that all the flowers were fresh. Oh! yes, I remember now. The silver table was open—which it had no business to be, and I shut the lid down as I passed.” She looked at him aggressively. “I see,” said the inspector. “Can you tell me if this dagger was in its place then?” Miss Russell looked at the weapon composedly. “I can’t say, I’m sure,” she replied. “I didn’t stop to look. I knew the family would be down any minute, and I wanted to get away.” “Thank you,” said the inspector. There was just a trace of hesitation in his manner, as though he would have liked to question her further, but Miss Russell clearly accepted the words as a dismissal, and glided from the room. “Rather a Tartar, I should fancy, eh?” said the inspector, looking after her. “Let me see. This silver table is in front of one of the windows, I think you said, doctor?” Raymond answered for me. “Yes, the left-hand window.” “And the window was open?” “They were both ajar.” “Well, I don’t think we need go into the question much further. Somebody—I’ll just say somebody—could get that dagger any time he liked, and exactly when he got it doesn’t matter in the least. I’ll be coming up in the morning with the chief constable, Mr. Raymond. Until then, I’ll keep the key of that door. I want Colonel Melrose to see everything exactly as it is. I happen to know that he’s dining out the other side of the county, and, I believe, staying the night....” We watched the inspector take up the jar. “I shall have to pack this carefully,” he observed. “It’s going to be an important piece of evidence in more ways than one.” A few minutes later as I came out of the billiard room with Raymond, the latter gave a low chuckle of amusement. I felt the pressure of his hand on my arm, and followed the direction of his eyes. Inspector Davis seemed to be inviting Parker’s opinion of a small pocket diary. “A little obvious,” murmured my companion. “So Parker is the suspect, is he? Shall we oblige Inspector Davis with a set of our fingerprints also?” He took two cards from the card tray, wiped them with his silk handkerchief, then handed one to me and took the other himself. Then, with a grin, he handed them to the police inspector. “Souvenirs,” he said. “No. 1, Dr. Sheppard; No. 2, my humble self. One from Major Blunt will be forthcoming in the morning.” Youth is very buoyant. Even the brutal murder of his friend and employer could not dim Geoffrey Raymond’s spirits for long. Perhaps that is as it should be. I do not know. I have lost the quality of resilience long since myself. It was very late when I got back, and I hoped that Caroline would have gone to bed. I might have known better. She had hot cocoa waiting for me, and whilst I drank it, she extracted the whole history of the evening from me. I said nothing of the blackmailing business, but contented myself with giving her the facts of the murder. “The police suspect Parker,” I said, as I rose to my feet and prepared to ascend to bed. “There seems a fairly clear case against him.” “Parker!” said my sister. “Fiddlesticks! That inspector must be a perfect fool. Parker indeed! Don’t tell me.” With which obscure pronouncement we went up to bed. I LEARN MY NEIGHBOR’S PROFESSION On the following morning I hurried unforgivably over my round. My excuse can be that I had no very serious cases to attend. On my return Caroline came into the hall to greet me. “Flora Ackroyd is here,” she announced in an excited whisper. “What?” I concealed my surprise as best I could. “She’s very anxious to see you. She’s been here half an hour.” Caroline led the way into our small sitting-room, and I followed. Flora was sitting on the sofa by the window. She was in black and she sat nervously twisting her hands together. I was shocked by the sight of her face. All the color had faded away from it. But when she spoke her manner was as composed and resolute as possible. “Dr. Sheppard, I have come to ask you to help me.” “Of course he’ll help you, my dear,” said Caroline. I don’t think Flora really wished Caroline to be present at the interview. She would, I am sure, have infinitely preferred to speak to me privately. But she also wanted to waste no time, so she made the best of it. “I want you to come to The Larches with me.” “The Larches?” I queried, surprised. “To see that funny little man?” exclaimed Caroline. “Yes. You know who he is, don’t you?” “We fancied,” I said, “that he might be a retired hairdresser.” Flora’s blue eyes opened very wide. “Why, he’s Hercule Poirot! You know who I mean—the private detective. They say he’s done the most wonderful things—just like detectives do in books. A year ago he retired and came to live down here. Uncle knew who he was, but he promised not to tell any one, because M. Poirot wanted to live quietly without being bothered by people.” “So that’s who he is,” I said slowly. “You’ve heard of him, of course?” “I’m rather an old fogey, as Caroline tells me,” I said, “but I _have_ just heard of him.” “Extraordinary!” commented Caroline. I don’t know what she was referring to—possibly her own failure to discover the truth. “You want to go and see him?” I asked slowly. “Now why?” “To get him to investigate this murder, of course,” said Caroline sharply. “Don’t be so stupid, James.” I was not really being stupid. Caroline does not always understand what I am driving at. “You haven’t got confidence in Inspector Davis?” I went on. “Of course she hasn’t,” said Caroline. “I haven’t either.” Any one would have thought it was Caroline’s uncle who had been murdered. “And how do you know he would take up the case?” I asked. “Remember he has retired from active work.” “That’s just it,” said Flora simply. “I’ve got to persuade him.” “You are sure you are doing wisely?” I asked gravely. “Of course she is,” said Caroline. “I’ll go with her myself if she likes.” “I’d rather the doctor came with me if you don’t mind, Miss Sheppard,” said Flora. She knows the value of being direct on certain occasions. Any hints would certainly have been wasted on Caroline. “You see,” she explained, following directness with tact, “Dr. Sheppard being the doctor, and having found the body, he would be able to give all the details to M. Poirot.” “Yes,” said Caroline grudgingly, “I see that.” I took a turn or two up and down the room. “Flora,” I said gravely, “be guided by me. I advise you not to drag this detective into the case.” Flora sprang to her feet. The color rushed into her cheeks. “I know why you say that,” she cried. “But it’s exactly for that reason I’m so anxious to go. You’re afraid! But I’m not. I know Ralph better than you do.” “Ralph,” said Caroline. “What has Ralph got to do with it?” Neither of us heeded her. “Ralph may be weak,” continued Flora. “He may have done foolish things in the past—wicked things even—but he wouldn’t murder any one.” “No, no,” I exclaimed. “I never thought it of him.” “Then why did you go to the Three Boars last night?” demanded Flora, “on your way home—after uncle’s body was found?” I was momentarily silenced. I had hoped that that visit of mine would remain unnoticed. “How did you know about that?” I countered. “I went there this morning,” said Flora. “I heard from the servants that Ralph was staying there——” I interrupted her. “You had no idea that he was in King’s Abbot?” “No. I was astounded. I couldn’t understand it. I went there and asked for him. They told me, what I suppose they told you last night, that he went out at about nine o’clock yesterday evening—and—and never came back.” Her eyes met mine defiantly, and as though answering something in my look, she burst out:— “Well, why shouldn’t he? He might have gone—anywhere. He may even have gone back to London.” “Leaving his luggage behind?” I asked gently. Flora stamped her foot. “I don’t care. There must be a simple explanation.” “And that’s why you want to go to Hercule Poirot? Isn’t it better to leave things as they are? The police don’t suspect Ralph in the least, remember. They’re working on quite another tack.” “But that’s just _it_,” cried the girl. “They _do_ suspect him. A man from Cranchester turned up this morning—Inspector Raglan, a horrid, weaselly little man. I found he had been to the Three Boars this morning before me. They told me all about his having been there, and the questions he had asked. He must think Ralph did it.” “That’s a change of mind from last night, if so,” I said slowly. “He doesn’t believe in Davis’s theory that it was Parker then?” “Parker indeed,” said my sister, and snorted. Flora came forward and laid her hand on my arm. “Oh! Dr. Sheppard, let us go at once to this M. Poirot. He will find out the truth.” “My dear Flora,” I said gently, laying my hand on hers. “Are you quite sure it is the truth we want?” She looked at me, nodding her head gravely. “You’re not sure,” she said. “I am. I know Ralph better than you do.” “Of course he didn’t do it,” said Caroline, who had been keeping silent with great difficulty. “Ralph may be extravagant, but he’s a dear boy, and has the nicest manners.” I wanted to tell Caroline that large numbers of murderers have had nice manners, but the presence of Flora restrained me. Since the girl was determined, I was forced to give in to her and we started at once, getting away before my sister was able to fire off any more pronouncements beginning with her favorite words, “Of course.” An old woman with an immense Breton cap opened the door of The Larches to us. M. Poirot was at home, it seemed. We were ushered into a little sitting-room arranged with formal precision, and there, after the lapse of a minute or so, my friend of yesterday came to us. “Monsieur le docteur,” he said, smiling. “Mademoiselle.” He bowed to Flora. “Perhaps,” I began, “you have heard of the tragedy which occurred last night.” His face grew grave. “But certainly I have heard. It is horrible. I offer mademoiselle all my sympathy. In what way can I serve you?” “Miss Ackroyd,” I said, “wants you to—to——” “To find the murderer,” said Flora in a clear voice. “I see,” said the little man. “But the police will do that, will they not?” “They might make a mistake,” said Flora. “They are on their way to make a mistake now, I think. Please, M. Poirot, won’t you help us? If—if it is a question of money——” Poirot held up his hand. “Not that, I beg of you, mademoiselle. Not that I do not care for money.” His eyes showed a momentary twinkle. “Money, it means much to me and always has done. No, if I go into this, you must understand one thing clearly. _I shall go through with it to the end._ The good dog, he does not leave the scent, remember! You may wish that, after all, you had left it to the local police.” “I want the truth,” said Flora, looking him straight in the eyes. “All the truth?” “All the truth.” “Then I accept,” said the little man quietly. “And I hope you will not regret those words. Now, tell me all the circumstances.” “Dr. Sheppard had better tell you,” said Flora. “He knows more than I do.” Thus enjoined, I plunged into a careful narrative, embodying all the facts I have previously set down. Poirot listened carefully, inserting a question here and there, but for the most part sitting in silence, his eyes on the ceiling. I brought my story to a close with the departure of the inspector and myself from Fernly Park the previous night. “And now,” said Flora, as I finished, “tell him all about Ralph.” I hesitated, but her imperious glance drove me on. “You went to this inn—this Three Boars—last night on your way home?” asked Poirot, as I brought my tale to a close. “Now exactly why was that?” I paused a moment to choose my words carefully. “I thought some one ought to inform the young man of his uncle’s death. It occurred to me after I had left Fernly that possibly no one but myself and Mr. Ackroyd were aware that he was staying in the village.” Poirot nodded. “Quite so. That was your only motive in going there, eh?” “That was my only motive,” I said stiffly. “It was not to—shall we say—reassure yourself about _ce jeune homme_?” “Reassure myself?” “I think, M. le docteur, that you know very well what I mean, though you pretend not to do so. I suggest that it would have been a relief to you if you had found that Captain Paton had been at home all the evening.” “Not at all,” I said sharply. The little detective shook his head at me gravely. “You have not the trust in me of Miss Flora,” he said. “But no matter. What we have to look at is this—Captain Paton is missing, under circumstances which call for an explanation. I will not hide from you that the matter looks grave. Still, it may admit of a perfectly simple explanation.” “That’s just what I keep saying,” cried Flora eagerly. Poirot touched no more upon that theme. Instead he suggested an immediate visit to the local police. He thought it better for Flora to return home, and for me to be the one to accompany him there and introduce him to the officer in charge of the case. We carried out this plan forthwith. We found Inspector Davis outside the police station looking very glum indeed. With him was Colonel Melrose, the Chief Constable, and another man whom, from Flora’s description of “weaselly,” I had no difficulty in recognizing as Inspector Raglan from Cranchester. I know Melrose fairly well, and I introduced Poirot to him and explained the situation. The chief constable was clearly vexed, and Inspector Raglan looked as black as thunder. Davis, however, seemed slightly exhilarated by the sight of his superior officer’s annoyance. “The case is going to be plain as a pikestaff,” said Raglan. “Not the least need for amateurs to come butting in. You’d think any fool would have seen the way things were last night, and then we shouldn’t have lost twelve hours.” He directed a vengeful glance at poor Davis, who received it with perfect stolidity. “Mr. Ackroyd’s family must, of course, do what they see fit,” said Colonel Melrose. “But we cannot have the official investigation hampered in any way. I know M. Poirot’s great reputation, of course,” he added courteously. “The police can’t advertise themselves, worse luck,” said Raglan. It was Poirot who saved the situation. “It is true that I have retired from the world,” he said. “I never intended to take up a case again. Above all things, I have a horror of publicity. I must beg, that in the case of my being able to contribute something to the solution of the mystery, my name may not be mentioned.” Inspector Raglan’s face lightened a little. “I’ve heard of some very remarkable successes of yours,” observed the colonel, thawing. “I have had much experience,” said Poirot quietly. “But most of my successes have been obtained by the aid of the police. I admire enormously your English police. If Inspector Raglan permits me to assist him, I shall be both honored and flattered.” The inspector’s countenance became still more gracious. Colonel Melrose drew me aside. “From all I hear, this little fellow’s done some really remarkable things,” he murmured. “We’re naturally anxious not to have to call in Scotland Yard. Raglan seems very sure of himself, but I’m not quite certain that I agree with him. You see, I—er—know the parties concerned better than he does. This fellow doesn’t seem out after kudos, does he? Would work in with us unobtrusively, eh?” “To the greater glory of Inspector Raglan,” I said solemnly. “Well, well,” said Colonel Melrose breezily in a louder voice, “we must put you wise to the latest developments, M. Poirot.” “I thank you,” said Poirot. “My friend, Dr. Sheppard, said something of the butler being suspected?” “That’s all bunkum,” said Raglan instantly. “These high-class servants get in such a funk that they act suspiciously for nothing at all.” “The fingerprints?” I hinted. “Nothing like Parker’s.” He gave a faint smile, and added: “And yours and Mr. Raymond’s don’t fit either, doctor.” “What about those of Captain Ralph Paton?” asked Poirot quietly. I felt a secret admiration for the way he took the bull by the horns. I saw a look of respect creep into the inspector’s eye. “I see you don’t let the grass grow under your feet, Mr. Poirot. It will be a pleasure to work with you, I’m sure. We’re going to take that young gentleman’s fingerprints as soon as we can lay hands upon him.” “I can’t help thinking you’re mistaken, inspector,” said Colonel Melrose warmly. “I’ve known Ralph Paton from a boy upward. He’d never stoop to murder.” “Maybe not,” said the inspector tonelessly. “What have you got against him?” I asked. “Went out just on nine o’clock last night. Was seen in neighborhood of Fernly Park somewhere about nine-thirty. Not been seen since. Believed to be in serious money difficulties. I’ve got a pair of his shoes here—shoes with rubber studs in them. He had two pairs, almost exactly alike. I’m going up now to compare them with those footmarks. The constable is up there seeing that no one tampers with them.” “We’ll go at once,” said Colonel Melrose. “You and M. Poirot will accompany us, will you not?” We assented, and all drove up in the colonel’s car. The inspector was anxious to get at once to the footmarks, and asked to be put down at the lodge. About half-way up the drive, on the right, a path branched off which led round to the terrace and the window of Ackroyd’s study. “Would you like to go with the inspector, M. Poirot?” asked the chief constable, “or would you prefer to examine the study?” Poirot chose the latter alternative. Parker opened the door to us. His manner was smug and deferential, and he seemed to have recovered from his panic of the night before. Colonel Melrose took a key from his pocket, and unlocking the door which led into the lobby, he ushered us through into the study. “Except for the removal of the body, M. Poirot, this room is exactly as it was last night.” “And the body was found—where?” As precisely as possible, I described Ackroyd’s position. The arm-chair still stood in front of the fire. Poirot went and sat down in it. “The blue letter you speak of, where was it when you left the room?” “Mr. Ackroyd had laid it down on this little table at his right hand.” Poirot nodded. “Except for that, everything was in its place?” “Yes, I think so.” “Colonel Melrose, would you be so extremely obliging as to sit down in this chair a minute. I thank you. Now, M. le docteur, will you kindly indicate to me the exact position of the dagger?” I did so, whilst the little man stood in the doorway. “The hilt of the dagger was plainly visible from the door then. Both you and Parker could see it at once?” “Yes.” Poirot went next to the window. “The electric light was on, of course, when you discovered the body?” he asked over his shoulder. I assented, and joined him where he was studying the marks on the window-sill. “The rubber studs are the same pattern as those in Captain Paton’s shoes,” he said quietly. Then he came back once more to the middle of the room. His eye traveled round, searching everything in the room with a quick, trained glance. “Are you a man of good observation, Dr. Sheppard?” he asked at last. “I think so,” I said, surprised. “There was a fire in the grate, I see. When you broke the door down and found Mr. Ackroyd dead, how was the fire? Was it low?” I gave a vexed laugh. “I—I really can’t say. I didn’t notice. Perhaps Mr. Raymond or Major Blunt——” The little man opposite me shook his head with a faint smile. “One must always proceed with method. I made an error of judgment in asking you that question. To each man his own knowledge. You could tell me the details of the patient’s appearance—nothing there would escape you. If I wanted information about the papers on that desk, Mr. Raymond would have noticed anything there was to see. To find out about the fire, I must ask the man whose business it is to observe such things. You permit——” He moved swiftly to the fireplace and rang the bell. After a lapse of a minute or two Parker appeared. “The bell rang, sir,” he said hesitatingly. “Come in, Parker,” said Colonel Melrose. “This gentleman wants to ask you something.” Parker transferred a respectful attention to Poirot. “Parker,” said the little man, “when you broke down the door with Dr. Sheppard last night, and found your master dead, what was the state of the fire?” Parker replied without a pause. “It had burned very low, sir. It was almost out.” “Ah!” said Poirot. The exclamation sounded almost triumphant. He went on:— “Look round you, my good Parker. Is this room exactly as it was then?” The butler’s eye swept round. It came to rest on the windows. “The curtains were drawn, sir, and the electric light was on.” Poirot nodded approval. “Anything else?” “Yes, sir, this chair was drawn out a little more.” He indicated a big grandfather chair to the left of the door between it and the window. I append a plan of the room with the chair in question marked with an X. “Just show me,” said Poirot. | | | + / SMALL X ] | | | \ / TABLE +---+ | | + | +++ | ||| +---+ CHAIR IN WHICH ___ | ||| ] | ACKROYD WAS / \ | ||| +---+ FOUND TABLE | | | ||| \___/ | ||| | ||| | +++ +---+ + | | ^ ] | CHAIR IN WHICH / \ | | | +---+ SHEPPARD SAT / / + | | | / / | | | DESK & + / | | | FIREPLACE CHAIR \ / | | +------------ + | +----------------------------------------+ ] The butler drew the chair in question out a good two feet from the wall, turning it so that the seat faced the door. “_Voilà ce qui est curieux_,” murmured Poirot. “No one would want to sit in a chair in such a position, I fancy. Now who pushed it back into place again, I wonder? Did you, my friend?” “No, sir,” said Parker. “I was too upset with seeing the master and all.” Poirot looked across at me. “Did you, doctor?” I shook my head. “It was back in position when I arrived with the police, sir,” put in Parker. “I’m sure of that.” “Curious,” said Poirot again. “Raymond or Blunt must have pushed it back,” I suggested. “Surely it isn’t important?” “It is completely unimportant,” said Poirot. “That is why it is so interesting,” he added softly. “Excuse me a minute,” said Colonel Melrose. He left the room with Parker. “Do you think Parker is speaking the truth?” I asked. “About the chair, yes. Otherwise I do not know. You will find, M. le docteur, if you have much to do with cases of this kind, that they all resemble each other in one thing.” “What is that?” I asked curiously. “Every one concerned in them has something to hide.” “Have I?” I asked, smiling. Poirot looked at me attentively. “I think you have,” he said quietly. “But——” “Have you told me everything known to you about this young man Paton?” He smiled as I grew red. “Oh! do not fear. I will not press you. I shall learn it in good time.” “I wish you’d tell me something of your methods,” I said hastily, to cover my confusion. “The point about the fire, for instance?” “Oh! that was very simple. You leave Mr. Ackroyd at—ten minutes to nine, was it not?” “Yes, exactly, I should say.” “The window is then closed and bolted and the door unlocked. At a quarter past ten when the body is discovered, the door is locked and the window is open. Who opened it? Clearly only Mr. Ackroyd himself could have done so, and for one of two reasons. Either because the room became unbearably hot (but since the fire was nearly out and there was a sharp drop in temperature last night, that cannot be the reason), or because he admitted some one that way. And if he admitted some one that way, it must have been some one well known to him, since he had previously shown himself uneasy on the subject of that same window.” “It sounds very simple,” I said. “Everything is simple, if you arrange the facts methodically. We are concerned now with the personality of the person who was with him at nine-thirty last night. Everything goes to show that that was the individual admitted by the window, and though Mr. Ackroyd was seen alive later by Miss Flora, we cannot approach a solution of the mystery until we know who that visitor was. The window may have been left open after his departure and so afforded entrance to the murderer, or the same person may have returned a second time. Ah! here is the colonel who returns.” Colonel Melrose entered with an animated manner. “That telephone call has been traced at last,” he said. “It did not come from here. It was put through to Dr. Sheppard at 10.15 last night from a public call office at King’s Abbot station. And at 10.23 the night mail leaves for Liverpool.” INSPECTOR RAGLAN IS CONFIDENT We looked at each other. “You’ll have inquiries made at the station, of course?” I said. “Naturally, but I’m not over sanguine as to the result. You know what that station is like.” I did. King’s Abbot is a mere village, but its station happens to be an important junction. Most of the big expresses stop there, and trains are shunted, re-sorted, and made up. It has two or three public telephone boxes. At that time of night three local trains come in close upon each other, to catch the connection with the express for the north which comes in at 10.19 and leaves at 10.23. The whole place is in a bustle, and the chances of one particular person being noticed telephoning or getting into the express are very small indeed. “But why telephone at all?” demanded Melrose. “That is what I find so extraordinary. There seems no rhyme or reason in the thing.” Poirot carefully straightened a china ornament on one of the bookcases. “Be sure there was a reason,” he said over his shoulder. “But what reason could it be?” “When we know that, we shall know everything. This case is very curious and very interesting.” There was something almost indescribable in the way he said those last words. I felt that he was looking at the case from some peculiar angle of his own, and what he saw I could not tell. He went to the window and stood there, looking out. “You say it was nine o’clock, Dr. Sheppard, when you met this stranger outside the gate?” He asked the question without turning round. “Yes,” I replied. “I heard the church clock chime the hour.” “How long would it take him to reach the house—to reach this window, for instance?” “Five minutes at the outside. Two or three minutes only if he took the path at the right of the drive and came straight here.” “But to do that he would have to know the way. How can I explain myself?—it would mean that he had been here before—that he knew his surroundings.” “That is true,” replied Colonel Melrose. “We could find out, doubtless, if Mr. Ackroyd had received any strangers during the past week?” “Young Raymond could tell us that,” I said. “Or Parker,” suggested Colonel Melrose. “_Ou tous les deux_,” suggested Poirot, smiling. Colonel Melrose went in search of Raymond, and I rang the bell once more for Parker. Colonel Melrose returned almost immediately, accompanied by the young secretary, whom he introduced to Poirot. Geoffrey Raymond was fresh and debonair as ever. He seemed surprised and delighted to make Poirot’s acquaintance. “No idea you’d been living among us incognito, M. Poirot,” he said. “It will be a great privilege to watch you at work——Hallo, what’s this?” Poirot had been standing just to the left of the door. Now he moved aside suddenly, and I saw that while my back was turned he must have swiftly drawn out the arm-chair till it stood in the position Parker had indicated. “Want me to sit in the chair whilst you take a blood test?” asked Raymond good-humoredly. “What’s the idea?” “M. Raymond, this chair was pulled out—so—last night when Mr. Ackroyd was found killed. Some one moved it back again into place. Did you do so?” The secretary’s reply came without a second’s hesitation. “No, indeed I didn’t. I don’t even remember that it was in that position, but it must have been if you say so. Anyway, somebody else must have moved it back to its proper place. Have they destroyed a clew in doing so? Too bad!” “It is of no consequence,” said the detective. “Of no consequence whatever. What I really want to ask you is this, M. Raymond: Did any stranger come to see Mr. Ackroyd during this past week?” The secretary reflected for a minute or two, knitting his brows, and during the pause Parker appeared in answer to the bell. “No,” said Raymond at last. “I can’t remember any one. Can you, Parker?” “I beg your pardon, sir?” “Any stranger coming to see Mr. Ackroyd this week?” The butler reflected for a minute or two. “There was the young man who came on Wednesday, sir,” he said at last. “From Curtis and Troute, I understood he was.” Raymond moved this aside with an impatient hand. “Oh! yes, I remember, but that is not the kind of stranger this gentleman means.” He turned to Poirot. “Mr. Ackroyd had some idea of purchasing a dictaphone,” he explained. “It would have enabled us to get through a lot more work in a limited time. The firm in question sent down their representative, but nothing came of it. Mr. Ackroyd did not make up his mind to purchase.” Poirot turned to the butler. “Can you describe this young man to me, my good Parker?” “He was fair-haired, sir, and short. Very neatly dressed in a blue serge suit. A very presentable young man, sir, for his station in life.” Poirot turned to me. “The man you met outside the gate, doctor, was tall, was he not?” “Yes,” I said. “Somewhere about six feet, I should say.” “There is nothing in that, then,” declared the Belgian. “I thank you, Parker.” The butler spoke to Raymond. “Mr. Hammond has just arrived, sir,” he said. “He is anxious to know if he can be of any service, and he would be glad to have a word with you.” “I’ll come at once,” said the young man. He hurried out. Poirot looked inquiringly at the chief constable. “The family solicitor, M. Poirot,” said the latter. “It is a busy time for this young M. Raymond,” murmured M. Poirot. “He has the air efficient, that one.” “I believe Mr. Ackroyd considered him a most able secretary.” “He has been here—how long?” “Just on two years, I fancy.” “His duties he fulfills punctiliously. Of that I am sure. In what manner does he amuse himself? Does he go in for _le sport_?” “Private secretaries haven’t much time for that sort of thing,” said Colonel Melrose, smiling. “Raymond plays golf, I believe. And tennis in the summer time.” “He does not attend the courses—I should say the running of the horses?” “Race meetings? No, I don’t think he’s interested in racing.” Poirot nodded and seemed to lose interest. He glanced slowly round the study. “I have seen, I think, all that there is to be seen here.” I, too, looked round. “If those walls could speak,” I murmured. Poirot shook his head. “A tongue is not enough,” he said. “They would have to have also eyes and ears. But do not be too sure that these dead things”—he touched the top of the bookcase as he spoke—“are always dumb. To me they speak sometimes—chairs, tables—they have their message!” He turned away towards the door. “What message?” I cried. “What have they said to you to-day?” He looked over his shoulder and raised one eyebrow quizzically. “An opened window,” he said. “A locked door. A chair that apparently moved itself. To all three I say, ‘Why?’ and I find no answer.” He shook his head, puffed out his chest, and stood blinking at us. He looked ridiculously full of his own importance. It crossed my mind to wonder whether he was really any good as a detective. Had his big reputation been built up on a series of lucky chances? I think the same thought must have occurred to Colonel Melrose, for he frowned. “Anything more you want to see, M. Poirot?” he inquired brusquely. “You would perhaps be so kind as to show me the silver table from which the weapon was taken? After that, I will trespass on your kindness no longer.” We went to the drawing-room, but on the way the constable waylaid the colonel, and after a muttered conversation the latter excused himself and left us together. I showed Poirot the silver table, and after raising the lid once or twice and letting it fall, he pushed open the window and stepped out on the terrace. I followed him. Inspector Raglan had just turned the corner of the house, and was coming towards us. His face looked grim and satisfied. “So there you are, M. Poirot,” he said. “Well, this isn’t going to be much of a case. I’m sorry, too. A nice enough young fellow gone wrong.” Poirot’s face fell, and he spoke very mildly. “I’m afraid I shall not be able to be of much aid to you, then?” “Next time, perhaps,” said the inspector soothingly. “Though we don’t have murders every day in this quiet little corner of the world.” Poirot’s gaze took on an admiring quality. “You have been of a marvelous promptness,” he observed. “How exactly did you go to work, if I may ask?” “Certainly,” said the inspector. “To begin with—method. That’s what I always say—method!” “Ah!” cried the other. “That, too, is my watchword. Method, order, and the little gray cells.” “The cells?” said the inspector, staring. “The little gray cells of the brain,” explained the Belgian. “Oh, of course; well, we all use them, I suppose.” “In a greater or lesser degree,” murmured Poirot. “And there are, too, differences in quality. Then there is the psychology of a crime. One must study that.” “Ah!” said the inspector, “you’ve been bitten with all this psychoanalysis stuff? Now, I’m a plain man——” “Mrs. Raglan would not agree, I am sure, to that,” said Poirot, making him a little bow. Inspector Raglan, a little taken aback, bowed. “You don’t understand,” he said, grinning broadly. “Lord, what a lot of difference language makes. I’m telling you how I set to work. First of all, method. Mr. Ackroyd was last seen alive at a quarter to ten by his niece, Miss Flora Ackroyd. That’s fact number one, isn’t it?” “If you say so.” “Well, it is. At half-past ten, the doctor here says that Mr. Ackroyd has been dead at least half an hour. You stick to that, doctor?” “Certainly,” I said. “Half an hour or longer.” “Very good. That gives us exactly a quarter of an hour in which the crime must have been committed. I make a list of every one in the house, and work through it, setting down opposite their names where they were and what they were doing between the hour of 9.45 and 10 p.m.” He handed a sheet of paper to Poirot. I read it over his shoulder. It ran as follows, written in a neat script:— _Major Blunt.—In billiard room with Mr. Raymond. (Latter confirms.)_ _Mr. Raymond.—Billiard room. (See above.)_ _Mrs. Ackroyd.—9.45 watching billiard match. Went up to bed 9.55. (Raymond and Blunt watched her up staircase.)_ _Miss Ackroyd.—Went straight from her uncle’s room upstairs. (Confirmed by Parker, also housemaid, Elsie Dale.)_ _Servants_:— _Parker.—Went straight to butler’s pantry. (Confirmed by housekeeper, Miss Russell, who came down to speak to him about something at 9.47, and remained at least ten minutes.)_ _Miss Russell.—As above. Spoke to housemaid, Elsie Dale, upstairs at 9.45._ _Ursula Bourne (parlormaid).—In her own room until 9.55. Then in Servants’ Hall._ _Mrs. Cooper (cook).—In Servants’ Hall._ _Gladys Jones (second housemaid).—In Servants’ Hall._ _Elsie Dale.—Upstairs in bedroom. Seen there by Miss Russell and Miss Flora Ackroyd._ _Mary Thripp (kitchenmaid).—Servants’ Hall._ “The cook has been here seven years, the parlormaid eighteen months, and Parker just over a year. The others are new. Except for something fishy about Parker, they all seem quite all right.” “A very complete list,” said Poirot, handing it back to him. “I am quite sure that Parker did not do the murder,” he added gravely. “So is my sister,” I struck in. “And she’s usually right.” Nobody paid any attention to my interpolation. “That disposes pretty effectually of the household,” continued the inspector. “Now we come to a very grave point. The woman at the lodge—Mary Black—was pulling the curtains last night when she saw Ralph Paton turn in at the gate and go up towards the house.” “She is sure of that?” I asked sharply. “Quite sure. She knows him well by sight. He went past very quickly and turned off by the path to the right, which is a short cut to the terrace.” “And what time was that?” asked Poirot, who had sat with an immovable face. “Exactly twenty-five minutes past nine,” said the inspector gravely. There was a silence. Then the inspector spoke again. “It’s all clear enough. It fits in without a flaw. At twenty-five minutes past nine, Captain Paton is seen passing the lodge; at nine-thirty or thereabouts, Mr. Geoffrey Raymond hears some one in here asking for money and Mr. Ackroyd refusing. What happens next? Captain Paton leaves the same way—through the window. He walks along the terrace, angry and baffled. He comes to the open drawing-room window. Say it’s now a quarter to ten. Miss Flora Ackroyd is saying good-night to her uncle. Major Blunt, Mr. Raymond, and Mrs. Ackroyd are in the billiard room. The drawing-room is empty. He steals in, takes the dagger from the silver table, and returns to the study window. He slips off his shoes, climbs in, and—well, I don’t need to go into details. Then he slips out again and goes off. Hadn’t the nerve to go back to the inn. He makes for the station, rings up from there——” “Why?” said Poirot softly. I jumped at the interruption. The little man was leaning forward. His eyes shone with a queer green light. For a moment Inspector Raglan was taken aback by the question. “It’s difficult to say exactly why he did that,” he said at last. “But murderers do funny things. You’d know that if you were in the police force. The cleverest of them make stupid mistakes sometimes. But come along and I’ll show you those footprints.” We followed him round the corner of the terrace to the study window. At a word from Raglan a police constable produced the shoes which had been obtained from the local inn. The inspector laid them over the marks. “They’re the same,” he said confidently. “That is to say, they’re not the same pair that actually made these prints. He went away in those. This is a pair just like them, but older—see how the studs are worn down.” “Surely a great many people wear shoes with rubber studs in them?” asked Poirot. “That’s so, of course,” said the inspector. “I shouldn’t put so much stress on the footmarks if it wasn’t for everything else.” “A very foolish young man, Captain Ralph Paton,” said Poirot thoughtfully. “To leave so much evidence of his presence.” “Ah! well,” said the inspector, “it was a dry, fine night, you know. He left no prints on the terrace or on the graveled path. But, unluckily for him, a spring must have welled up just lately at the end of the path from the drive. See here.” A small graveled path joined the terrace a few feet away. In one spot, a few yards from its termination, the ground was wet and boggy. Crossing this wet place there were again the marks of footsteps, and amongst them the shoes with rubber studs. Poirot followed the path on a little way, the inspector by his side. “You noticed the women’s footprints?” he said suddenly. The inspector laughed. “Naturally. But several different women have walked this way—and men as well. It’s a regular short cut to the house, you see. It would be impossible to sort out all the footsteps. After all, it’s the ones on the window-sill that are really important.” Poirot nodded. “It’s no good going farther,” said the inspector, as we came in view of the drive. “It’s all graveled again here, and hard as it can be.” Again Poirot nodded, but his eyes were fixed on a small garden house—a kind of superior summer-house. It was a little to the left of the path ahead of us, and a graveled walk ran up to it. Poirot lingered about until the inspector had gone back towards the house. Then he looked at me. “You must have indeed been sent from the good God to replace my friend Hastings,” he said, with a twinkle. “I observe that you do not quit my side. How say you, Dr. Sheppard, shall we investigate that summer-house? It interests me.” He went up to the door and opened it. Inside, the place was almost dark. There were one or two rustic seats, a croquet set, and some folded deck-chairs. I was startled to observe my new friend. He had dropped to his hands and knees and was crawling about the floor. Every now and then he shook his head as though not satisfied. Finally, he sat back on his heels. “Nothing,” he murmured. “Well, perhaps it was not to be expected. But it would have meant so much——” He broke off, stiffening all over. Then he stretched out his hand to one of the rustic chairs. He detached something from one side of it. “What is it?” I cried. “What have you found?” He smiled, unclosing his hand so that I should see what lay in the palm of it. A scrap of stiff white cambric. I took it from him, looked at it curiously, and then handed it back. “What do you make of it, eh, my friend?” he asked, eyeing me keenly. “A scrap torn from a handkerchief,” I suggested, shrugging my shoulders. He made another dart and picked up a small quill—a goose quill by the look of it. “And that?” he cried triumphantly. “What do you make of that?” I only stared. He slipped the quill into his pocket, and looked again at the scrap of white stuff. “A fragment of a handkerchief?” he mused. “Perhaps you are right. But remember this—_a good laundry does not starch a handkerchief_.” He nodded at me triumphantly, then he put away the scrap carefully in his pocket-book. THE GOLDFISH POND We walked back to the house together. There was no sign of the inspector. Poirot paused on the terrace and stood with his back to the house, slowly turning his head from side to side. “_Une belle propriété_,” he said at last appreciatively. “Who inherits it?” His words gave me almost a shock. It is an odd thing, but until that moment the question of inheritance had never come into my head. Poirot watched me keenly. “It is a new idea to you, that,” he said at last. “You had not thought of it before—eh?” “No,” I said truthfully. “I wish I had.” He looked at me again curiously. “I wonder just what you mean by that,” he said thoughtfully. “Ah! no,” as I was about to speak. “_Inutile!_ You would not tell me your real thought.” “Every one has something to hide,” I quoted, smiling. “Exactly.” “You still believe that?” “More than ever, my friend. But it is not easy to hide things from Hercule Poirot. He has a knack of finding out.” He descended the steps of the Dutch garden as he spoke. “Let us walk a little,” he said over his shoulder. “The air is pleasant to-day.” I followed him. He led me down a path to the left enclosed in yew hedges. A walk led down the middle, bordered each side with formal flower beds, and at the end was a round paved recess with a seat and a pond of goldfish. Instead of pursuing the path to the end, Poirot took another which wound up the side of a wooded slope. In one spot the trees had been cleared away, and a seat had been put. Sitting there one had a splendid view over the countryside, and one looked right down on the paved recess and the goldfish pond. “England is very beautiful,” said Poirot, his eyes straying over the prospect. Then he smiled. “And so are English girls,” he said in a lower tone. “Hush, my friend, and look at the pretty picture below us.” It was then that I saw Flora. She was moving along the path we had just left and she was humming a little snatch of song. Her step was more dancing than walking, and in spite of her black dress, there was nothing but joy in her whole attitude. She gave a sudden pirouette on her toes, and her black draperies swung out. At the same time she flung her head back and laughed outright. As she did so a man stepped out from the trees. It was Hector Blunt. The girl started. Her expression changed a little. “How you startled me—I didn’t see you.” Blunt said nothing, but stood looking at her for a minute or two in silence. “What I like about you,” said Flora, with a touch of malice, “is your cheery conversation.” I fancy that at that Blunt reddened under his tan. His voice, when he spoke, sounded different—it had a curious sort of humility in it. “Never was much of a fellow for talking. Not even when I was young.” “That was a very long time ago, I suppose,” said Flora gravely. I caught the undercurrent of laughter in her voice, but I don’t think Blunt did. “Yes,” he said simply, “it was.” “How does it feel to be Methuselah?” asked Flora. This time the laughter was more apparent, but Blunt was following out an idea of his own. “Remember the Johnny who sold his soul to the devil? In return for being made young again? There’s an opera about it.” “Faust, you mean?” “That’s the beggar. Rum story. Some of us would do it if we could.” “Any one would think you were creaking at the joints to hear you talk,” cried Flora, half vexed, half amused. Blunt said nothing for a minute or two. Then he looked away from Flora into the middle distance and observed to an adjacent tree trunk that it was about time he got back to Africa. “Are you going on another expedition—shooting things?” “Expect so. Usually do, you know—shoot things, I mean.” “You shot that head in the hall, didn’t you?” Blunt nodded. Then he jerked out, going rather red, as he did so:— “Care for some decent skins any time? If so, I could get ’em for you.” “Oh! please do,” cried Flora. “Will you really? You won’t forget?” “I shan’t forget,” said Hector Blunt. He added, in a sudden burst of communicativeness:— “Time I went. I’m no good in this sort of life. Haven’t got the manners for it. I’m a rough fellow, no use in society. Never remember the things one’s expected to say. Yes, time I went.” “But you’re not going at once,” cried Flora. “Not—not while we’re in all this trouble. Oh! please. If you go——” She turned away a little. “You want me to stay?” asked Blunt. He spoke deliberately but quite simply. “We all——” “I meant you personally,” said Blunt, with directness. Flora turned slowly back again and met his eyes. “I want you to stay,” she said, “if—if that makes any difference.” “It makes all the difference,” said Blunt. There was a moment’s silence. They sat down on the stone seat by the goldfish pond. It seemed as though neither of them knew quite what to say next. “It—it’s such a lovely morning,” said Flora at last. “You know, I can’t help feeling happy, in spite—in spite of everything. That’s awful, I suppose?” “Quite natural,” said Blunt. “Never saw your uncle until two years ago, did you? Can’t be expected to grieve very much. Much better to have no humbug about it.” “There’s something awfully consoling about you,” said Flora. “You make things so simple.” “Things are simple as a rule,” said the big game hunter. “Not always,” said Flora. Her voice had lowered itself, and I saw Blunt turn and look at her, bringing his eyes back from (apparently) the coast of Africa to do so. He evidently put his own construction on her change of tone, for he said, after a minute or two, in rather an abrupt manner:— “I say, you know, you mustn’t worry. About that young chap, I mean. Inspector’s an ass. Everybody knows—utterly absurd to think he could have done it. Man from outside. Burglar chap. That’s the only possible solution.” Flora turned to look at him. “You really think so?” “Don’t you?” said Blunt quickly. “I—oh, yes, of course.” Another silence, and then Flora burst out:— “I’m—I’ll tell you why I felt so happy this morning. However heartless you think me, I’d rather tell you. It’s because the lawyer has been—Mr. Hammond. He told us about the will. Uncle Roger has left me twenty thousand pounds. Think of it—twenty thousand beautiful pounds.” Blunt looked surprised. “Does it mean so much to you?” “Mean much to me? Why, it’s everything. Freedom—life—no more scheming and scraping and lying——” “Lying?” said Blunt, sharply interrupting. Flora seemed taken aback for a minute. “You know what I mean,” she said uncertainly. “Pretending to be thankful for all the nasty castoff things rich relations give you. Last year’s coats and skirts and hats.” “Don’t know much about ladies’ clothes; should have said you were always very well turned out.” “It’s cost me something, though,” said Flora in a low voice. “Don’t let’s talk of horrid things. I’m so happy. I’m free. Free to do what I like. Free not to——” She stopped suddenly. “Not to what?” asked Blunt quickly. “I forget now. Nothing important.” Blunt had a stick in his hand, and he thrust it into the pond, poking at something. “What are you doing, Major Blunt?” “There’s something bright down there. Wondered what it was—looks like a gold brooch. Now I’ve stirred up the mud and it’s gone.” “Perhaps it’s a crown,” suggested Flora. “Like the one Mélisande saw in the water.” “Mélisande,” said Blunt reflectively—“she’s in an opera, isn’t she?” “Yes, you seem to know a lot about operas.” “People take me sometimes,” said Blunt sadly. “Funny idea of pleasure—worse racket than the natives make with their tom-toms.” Flora laughed. “I remember Mélisande,” continued Blunt, “married an old chap old enough to be her father.” He threw a small piece of flint into the goldfish pond. Then, with a change of manner, he turned to Flora. “Miss Ackroyd, can I do anything? About Paton, I mean. I know how dreadfully anxious you must be.” “Thank you,” said Flora in a cold voice. “There is really nothing to be done. Ralph will be all right. I’ve got hold of the most wonderful detective in the world, and he’s going to find out all about it.” For some time I had felt uneasy as to our position. We were not exactly eavesdropping, since the two in the garden below had only to lift their heads to see us. Nevertheless, I should have drawn attention to our presence before now, had not my companion put a warning pressure on my arm. Clearly he wished me to remain silent. But now he rose briskly to his feet, clearing his throat. “I demand pardon,” he cried. “I cannot allow mademoiselle thus extravagantly to compliment me, and not draw attention to my presence. They say the listener hears no good of himself, but that is not the case this time. To spare my blushes, I must join you and apologize.” He hurried down the path with me close behind him, and joined the others by the pond. “This is M. Hercule Poirot,” said Flora. “I expect you’ve heard of him.” Poirot bowed. “I know Major Blunt by reputation,” he said politely. “I am glad to have encountered you, monsieur. I am in need of some information that you can give me.” Blunt looked at him inquiringly. “When did you last see M. Ackroyd alive?” “At dinner.” “And you neither saw nor heard anything of him after that?” “Didn’t see him. Heard his voice.” “How was that?” “I strolled out on the terrace——” “Pardon me, what time was this?” “About half-past nine. I was walking up and down smoking in front of the drawing-room window. I heard Ackroyd talking in his study——” Poirot stooped and removed a microscopic weed. “Surely you couldn’t hear voices in the study from that part of the terrace,” he murmured. He was not looking at Blunt, but I was, and to my intense surprise, I saw the latter flush. “Went as far as the corner,” he explained unwillingly. “Ah! indeed?” said Poirot. In the mildest manner he conveyed an impression that more was wanted. “Thought I saw—a woman disappearing into the bushes. Just a gleam of white, you know. Must have been mistaken. It was while I was standing at the corner of the terrace that I heard Ackroyd’s voice speaking to that secretary of his.” “Speaking to Mr. Geoffrey Raymond?” “Yes—that’s what I supposed at the time. Seems I was wrong.” “Mr. Ackroyd didn’t address him by name?” “Oh, no.” “Then, if I may ask, why did you think——?” Blunt explained laboriously. “Took it for granted that it _would_ be Raymond, because he had said just before I came out that he was taking some papers to Ackroyd. Never thought of it being anybody else.” “Can you remember what the words you heard were?” “Afraid I can’t. Something quite ordinary and unimportant. Only caught a scrap of it. I was thinking of something else at the time.” “It is of no importance,” murmured Poirot. “Did you move a chair back against the wall when you went into the study after the body was discovered?” “Chair? No—why should I?” Poirot shrugged his shoulders but did not answer. He turned to Flora. “There is one thing I should like to know from you, mademoiselle. When you were examining the things in the silver table with Dr. Sheppard, was the dagger in its place, or was it not?” Flora’s chin shot up. “Inspector Raglan has been asking me that,” she said resentfully. “I’ve told him, and I’ll tell you. I’m perfectly certain the dagger was _not_ there. He thinks it was and that Ralph sneaked it later in the evening. And—and he doesn’t believe me. He thinks I’m saying it to—to shield Ralph.” “And aren’t you?” I asked gravely. Flora stamped her foot. “You, too, Dr. Sheppard! Oh! it’s too bad.” Poirot tactfully made a diversion. “It is true what I heard you say, Major Blunt. There is something that glitters in this pond. Let us see if I can reach it.” He knelt down by the pond, baring his arm to the elbow, and lowered it in very slowly, so as not to disturb the bottom of the pond. But in spite of all his precautions the mud eddied and swirled, and he was forced to draw his arm out again empty-handed. He gazed ruefully at the mud upon his arm. I offered him my handkerchief, which he accepted with fervent protestations of thanks. Blunt looked at his watch. “Nearly lunch time,” he said. “We’d better be getting back to the house.” “You will lunch with us, M. Poirot?” asked Flora. “I should like you to meet my mother. She is—very fond of Ralph.” The little man bowed. “I shall be delighted, mademoiselle.” “And you will stay, too, won’t you, Dr. Sheppard?” I hesitated. “Oh, do!” I wanted to, so I accepted the invitation without further ceremony. We set out towards the house, Flora and Blunt walking ahead. “What hair,” said Poirot to me in a low tone, nodding towards Flora. “The real gold! They will make a pretty couple. She and the dark, handsome Captain Paton. Will they not?” I looked at him inquiringly, but he began to fuss about a few microscopic drops of water on his coat sleeve. The man reminded me in some ways of a cat. His green eyes and his finicking habits. “And all for nothing, too,” I said sympathetically. “I wonder what it was in the pond?” “Would you like to see?” asked Poirot. I stared at him. He nodded. “My good friend,” he said gently and reproachfully, “Hercule Poirot does not run the risk of disarranging his costume without being sure of attaining his object. To do so would be ridiculous and absurd. I am never ridiculous.” “But you brought your hand out empty,” I objected. “There are times when it is necessary to have discretion. Do you tell your patients everything—everything, doctor? I think not. Nor do you tell your excellent sister everything either, is it not so? Before showing my empty hand, I dropped what it contained into my other hand. You shall see what that was.” He held out his left hand, palm open. On it lay a little circlet of gold. A woman’s wedding ring. I took it from him. “Look inside,” commanded Poirot. I did so. Inside was an inscription in fine writing:— _From R., March 13th._ I looked at Poirot, but he was busy inspecting his appearance in a tiny pocket glass. He paid particular attention to his mustaches, and none at all to me. I saw that he did not intend to be communicative. THE PARLORMAID We found Mrs. Ackroyd in the hall. With her was a small dried-up little man, with an aggressive chin and sharp gray eyes, and “lawyer” written all over him. “Mr. Hammond is staying to lunch with us,” said Mrs. Ackroyd. “You know Major Blunt, Mr. Hammond? And dear Dr. Sheppard—also a close friend of poor Roger’s. And, let me see——” She paused, surveying Hercule Poirot in some perplexity. “This is M. Poirot, mother,” said Flora. “I told you about him this morning.” “Oh! yes,” said Mrs. Ackroyd vaguely. “Of course, my dear, of course. He is to find Ralph, is he not?” “He is to find out who killed uncle,” said Flora. “Oh! my dear,” cried her mother. “Please! My poor nerves. I am a wreck this morning, a positive wreck. Such a dreadful thing to happen. I can’t help feeling that it must have been an accident of some kind. Roger was so fond of handling queer curios. His hand must have slipped, or something.” This theory was received in polite silence. I saw Poirot edge up to the lawyer, and speak to him in a confidential undertone. They moved aside into the embrasure of the window. I joined them—then hesitated. “Perhaps I’m intruding,” I said. “Not at all,” cried Poirot heartily. “You and I, M. le docteur, we investigate this affair side by side. Without you I should be lost. I desire a little information from the good Mr. Hammond.” “You are acting on behalf of Captain Ralph Paton, I understand,” said the lawyer cautiously. Poirot shook his head. “Not so. I am acting in the interests of justice. Miss Ackroyd has asked me to investigate the death of her uncle.” Mr. Hammond seemed slightly taken aback. “I cannot seriously believe that Captain Paton can be concerned in this crime,” he said, “however strong the circumstantial evidence against him may be. The mere fact that he was hard pressed for money——” “Was he hard pressed for money?” interpolated Poirot quickly. The lawyer shrugged his shoulders. “It was a chronic condition with Ralph Paton,” he said dryly. “Money went through his hands like water. He was always applying to his stepfather.” “Had he done so of late? During the last year, for instance?” “I cannot say. Mr. Ackroyd did not mention the fact to me.” “I comprehend. Mr. Hammond, I take it that you are acquainted with the provisions of Mr. Ackroyd’s will?” “Certainly. That is my principal business here to-day.” “Then, seeing that I am acting for Miss Ackroyd, you will not object to telling me the terms of that will?” “They are quite simple. Shorn of legal phraseology, and after paying certain legacies and bequests——” “Such as——?” interrupted Poirot. Mr. Hammond seemed a little surprised. “A thousand pounds to his housekeeper, Miss Russell; fifty pounds to the cook, Emma Cooper; five hundred pounds to his secretary, Mr. Geoffrey Raymond. Then to various hospitals——” Poirot held up his hand. “Ah! the charitable bequests, they interest me not.” “Quite so. The income on ten thousand pounds’ worth of shares to be paid to Mrs. Cecil Ackroyd during her lifetime. Miss Flora Ackroyd inherits twenty thousand pounds outright. The residue—including this property, and the shares in Ackroyd and Son—to his adopted son, Ralph Paton.” “Mr. Ackroyd possessed a large fortune?” “A very large fortune. Captain Paton will be an exceedingly wealthy young man.” There was a silence. Poirot and the lawyer looked at each other. “Mr. Hammond,” came Mrs. Ackroyd’s voice plaintively from the fireplace. The lawyer answered the summons. Poirot took my arm and drew me right into the window. “Regard the irises,” he remarked in rather a loud voice. “Magnificent, are they not? A straight and pleasing effect.” At the same time I felt the pressure of his hand on my arm, and he added in a low tone:— “Do you really wish to aid me? To take part in this investigation?” “Yes, indeed,” I said eagerly. “There’s nothing I should like better. You don’t know what a dull old fogey’s life I lead. Never anything out of the ordinary.” “Good, we will be colleagues then. In a minute or two I fancy Major Blunt will join us. He is not happy with the good mamma. Now there are some things I want to know—but I do not wish to seem to want to know them. You comprehend? So it will be your part to ask the questions.” “What questions do you want me to ask?” I asked apprehensively. “I want you to introduce the name of Mrs. Ferrars.” “Yes?” “Speak of her in a natural fashion. Ask him if he was down here when her husband died. You understand the kind of thing I mean. And while he replies, watch his face without seeming to watch it. _C’est compris?_” There was no time for more, for at that minute, as Poirot had prophesied, Blunt left the others in his abrupt fashion and came over to us. I suggested strolling on the terrace, and he acquiesced. Poirot stayed behind. I stopped to examine a late rose. “How things change in the course of a day or so,” I observed. “I was up here last Wednesday, I remember, walking up and down this same terrace. Ackroyd was with me—full of spirits. And now—three days later—Ackroyd’s dead, poor fellow, Mrs. Ferrars’s dead—you knew her, didn’t you? But of course you did.” Blunt nodded his head. “Had you seen her since you’d been down this time?” “Went with Ackroyd to call. Last Tuesday, think it was. Fascinating woman—but something queer about her. Deep—one would never know what she was up to.” I looked into his steady gray eyes. Nothing there surely. I went on:— “I suppose you’d met her before.” “Last time I was here—she and her husband had just come here to live.” He paused a minute and then added: “Rum thing, she had changed a lot between then and now.” “How—changed?” I asked. “Looked ten years older.” “Were you down here when her husband died?” I asked, trying to make the question sound as casual as possible. “No. From all I heard it would be a good riddance. Uncharitable, perhaps, but the truth.” I agreed. “Ashley Ferrars was by no means a pattern husband,” I said cautiously. “Blackguard, I thought,” said Blunt. “No,” I said, “only a man with more money than was good for him.” “Oh! money! All the troubles in the world can be put down to money—or the lack of it.” “Which has been your particular trouble?” I asked. “I’ve enough for what I want. I’m one of the lucky ones.” “Indeed.” “I’m not too flush just now, as a matter of fact. Came into a legacy a year ago, and like a fool let myself be persuaded into putting it into some wild-cat scheme.” I sympathized, and narrated my own similar trouble. Then the gong pealed out, and we all went in to lunch. Poirot drew me back a little. “_Eh! bien?_” “He’s all right,” I said. “I’m sure of it.” “Nothing—disturbing?” “He had a legacy just a year ago,” I said. “But why not? Why shouldn’t he? I’ll swear the man is perfectly square and aboveboard.” “Without doubt, without doubt,” said Poirot soothingly. “Do not upset yourself.” He spoke as though to a fractious child. We all trooped into the dining-room. It seemed incredible that less than twenty-four hours had passed since I last sat at that table. Afterwards, Mrs. Ackroyd took me aside and sat down with me on a sofa. “I can’t help feeling a little hurt,” she murmured, producing a handkerchief of the kind obviously not meant to be cried into. “Hurt, I mean, by Roger’s lack of confidence in me. That twenty thousand pounds ought to have been left to _me_—not to Flora. A mother could be trusted to safeguard the interests of her child. A lack of trust, I call it.” “You forget, Mrs. Ackroyd,” I said, “Flora was Ackroyd’s own niece, a blood relation. It would have been different had you been his sister instead of his sister-in-law.” “As poor Cecil’s widow, I think my feelings ought to have been considered,” said the lady, touching her eye-lashes gingerly with the handkerchief. “But Roger was always most peculiar—not to say _mean_—about money matters. It has been a most difficult position for both Flora and myself. He did not even give the poor child an allowance. He would pay her bills, you know, and even that with a good deal of reluctance and asking what she wanted all those fal-lals for—so like a man—but—now I’ve forgotten what it was I was going to say! Oh, yes, not a penny we could call our own, you know. Flora resented it—yes, I must say she resented it—very strongly. Though devoted to her uncle, of course. But any girl would have resented it. Yes, I must say Roger had very strange ideas about money. He wouldn’t even buy new face towels, though I told him the old ones were in holes. And then,” proceeded Mrs. Ackroyd, with a sudden leap highly characteristic of her conversation, “to leave all that money—a thousand pounds—fancy, a thousand pounds!—to that woman.” “What woman?” “That Russell woman. Something very queer about her, and so I’ve always said. But Roger wouldn’t hear a word against her. Said she was a woman of great force of character, and that he admired and respected her. He was always going on about her rectitude and independence and moral worth. _I_ think there’s something fishy about her. She was certainly doing her best to marry Roger. But I soon put a stop to that. She’s always hated me. Naturally. _I_ saw through her.” I began to wonder if there was any chance of stemming Mrs. Ackroyd’s eloquence, and getting away. Mr. Hammond provided the necessary diversion by coming up to say good-by. I seized my chance and rose also. “About the inquest,” I said. “Where would you prefer it to be held. Here, or at the Three Boars?” Mrs. Ackroyd stared at me with a dropped jaw. “The inquest?” she asked, the picture of consternation. “But surely there won’t have to be an inquest?” Mr. Hammond gave a dry little cough and murmured, “Inevitable. Under the circumstances,” in two short little barks. “But surely Dr. Sheppard can arrange——” “There are limits to my powers of arrangement,” I said dryly. “If his death was an accident——” “He was murdered, Mrs. Ackroyd,” I said brutally. She gave a little cry. “No theory of accident will hold water for a minute.” Mrs. Ackroyd looked at me in distress. I had no patience with what I thought was her silly fear of unpleasantness. “If there’s an inquest, I—I shan’t have to answer questions and all that, shall I?” she asked. “I don’t know what will be necessary,” I answered. “I imagine Mr. Raymond will take the brunt of it off you. He knows all the circumstances, and can give formal evidence of identification.” The lawyer assented with a little bow. “I really don’t think there is anything to dread, Mrs. Ackroyd,” he said. “You will be spared all unpleasantness. Now, as to the question of money, have you all you need for the present? I mean,” he added, as she looked at him inquiringly, “ready money. Cash, you know. If not, I can arrange to let you have whatever you require.” “That ought to be all right,” said Raymond, who was standing by. “Mr. Ackroyd cashed a cheque for a hundred pounds yesterday.” “A hundred pounds?” “Yes. For wages and other expenses due to-day. At the moment it is still intact.” “Where is this money? In his desk?” “No, he always kept his cash in his bedroom. In an old collar-box, to be accurate. Funny idea, wasn’t it?” “I think,” said the lawyer, “we ought to make sure the money is there before I leave.” “Certainly,” agreed the secretary. “I’ll take you up now.... Oh! I forgot. The door’s locked.” Inquiry from Parker elicited the information that Inspector Raglan was in the housekeeper’s room asking a few supplementary questions. A few minutes later the inspector joined the party in the hall, bringing the key with him. He unlocked the door and we passed into the lobby and up the small staircase. At the top of the stairs the door into Ackroyd’s bedroom stood open. Inside the room it was dark, the curtains were drawn, and the bed was turned down just as it had been last night. The inspector drew the curtains, letting in the sunlight, and Geoffrey Raymond went to the top drawer of a rosewood bureau. “He kept his money like that, in an unlocked drawer. Just fancy,” commented the inspector. The secretary flushed a little. “Mr. Ackroyd had perfect faith in the honesty of all the servants,” he said hotly. “Oh! quite so,” said the inspector hastily. Raymond opened the drawer, took out a round leather collar-box from the back of it, and opening it, drew out a thick wallet. “Here is the money,” he said, taking out a fat roll of notes. “You will find the hundred intact, I know, for Mr. Ackroyd put it in the collar-box in my presence last night when he was dressing for dinner, and of course it has not been touched since.” Mr. Hammond took the roll from him and counted it. He looked up sharply. “A hundred pounds, you said. But there is only sixty here.” Raymond stared at him. “Impossible,” he cried, springing forward. Taking the notes from the other’s hand, he counted them aloud. Mr. Hammond had been right. The total amounted to sixty pounds. “But—I can’t understand it,” cried the secretary, bewildered. Poirot asked a question. “You saw Mr. Ackroyd put this money away last night when he was dressing for dinner? You are sure he had not paid away any of it already?” “I’m sure he hadn’t. He even said, ‘I don’t want to take a hundred pounds down to dinner with me. Too bulgy.’” “Then the affair is very simple,” remarked Poirot. “Either he paid out that forty pounds sometime last evening, or else it has been stolen.” “That’s the matter in a nutshell,” agreed the inspector. He turned to Mrs. Ackroyd. “Which of the servants would come in here yesterday evening?” “I suppose the housemaid would turn down the bed.” “Who is she? What do you know about her?” “She’s not been here very long,” said Mrs. Ackroyd. “But she’s a nice ordinary country girl.” “I think we ought to clear this matter up,” said the inspector. “If Mr. Ackroyd paid that money away himself, it may have a bearing on the mystery of the crime. The other servants all right, as far as you know?” “Oh, I think so.” “Not missed anything before?” “No.” “None of them leaving, or anything like that?” “The parlormaid is leaving.” “When?” “She gave notice yesterday, I believe.” “To you?” “Oh, no. _I_ have nothing to do with the servants. Miss Russell attends to the household matters.” The inspector remained lost in thought for a minute or two. Then he nodded his head and remarked, “I think I’d better have a word with Miss Russell, and I’ll see the girl Dale as well.” Poirot and I accompanied him to the housekeeper’s room. Miss Russell received us with her usual sang-froid. Elsie Dale had been at Fernly five months. A nice girl, quick at her duties, and most respectable. Good references. The last girl in the world to take anything not belonging to her. What about the parlormaid? “She, too, was a most superior girl. Very quiet and ladylike. An excellent worker.” “Then why is she leaving?” asked the inspector. Miss Russell pursed up her lips. “It was none of my doing. I understand Mr. Ackroyd found fault with her yesterday afternoon. It was her duty to do the study, and she disarranged some of the papers on his desk, I believe. He was very annoyed about it, and she gave notice. At least, that is what I understood from her, but perhaps you’d like to see her yourselves?” The inspector assented. I had already noticed the girl when she was waiting on us at lunch. A tall girl, with a lot of brown hair rolled tightly away at the back of her neck, and very steady gray eyes. She came in answer to the housekeeper’s summons, and stood very straight with those same gray eyes fixed on us. “You are Ursula Bourne?” asked the inspector. “Yes, sir.” “I understand you are leaving?” “Yes, sir.” “Why is that?” “I disarranged some papers on Mr. Ackroyd’s desk. He was very angry about it, and I said I had better leave. He told me to go as soon as possible.” “Were you in Mr. Ackroyd’s bedroom at all last night? Tidying up or anything?” “No, sir. That is Elsie’s work. I never went near that part of the house.” “I must tell you, my girl, that a large sum of money is missing from Mr. Ackroyd’s room.” At last I saw her roused. A wave of color swept over her face. “I know nothing about any money. If you think I took it, and that that is why Mr. Ackroyd dismissed me, you are wrong.” “I’m not accusing you of taking it, my girl,” said the inspector. “Don’t flare up so.” The girl looked at him coldly. “You can search my things if you like,” she said disdainfully. “But you won’t find anything.” Poirot suddenly interposed. “It was yesterday afternoon that Mr. Ackroyd dismissed you—or you dismissed yourself, was it not?” he asked. The girl nodded. “How long did the interview last?” “The interview?” “Yes, the interview between you and Mr. Ackroyd in the study?” “I—I don’t know.” “Twenty minutes? Half an hour?” “Something like that.” “Not longer?” “Not longer than half an hour, certainly.” “Thank you, mademoiselle.” I looked curiously at him. He was rearranging a few objects on the table, setting them straight with precise fingers. His eyes were shining. “That’ll do,” said the inspector. Ursula Bourne disappeared. The inspector turned to Miss Russell. “How long has she been here? Have you got a copy of the reference you had with her?” Without answering the first question, Miss Russell moved to an adjacent bureau, opened one of the drawers, and took out a handful of letters clipped together with a patent fastener. She selected one and handed it to the inspector. “H’m,” said he. “Reads all right. Mrs. Richard Folliott, Marby Grange, Marby. Who’s this woman?” “Quite good county people,” said Miss Russell. “Well,” said the inspector, handing it back, “let’s have a look at the other one, Elsie Dale.” Elsie Dale was a big fair girl, with a pleasant but slightly stupid face. She answered our questions readily enough, and showed much distress and concern at the loss of the money. “I don’t think there’s anything wrong with her,” observed the inspector, after he had dismissed her. “What about Parker?” Miss Russell pursed her lips together and made no reply. “I’ve a feeling there’s something wrong about that man,” the inspector continued thoughtfully. “The trouble is that I don’t quite see when he got his opportunity. He’d be busy with his duties immediately after dinner, and he’s got a pretty good alibi all through the evening. I know, for I’ve been devoting particular attention to it. Well, thank you very much, Miss Russell. We’ll leave things as they are for the present. It’s highly probable Mr. Ackroyd paid that money away himself.” The housekeeper bade us a dry good-afternoon, and we took our leave. I left the house with Poirot. “I wonder,” I said, breaking the silence, “what the papers the girl disarranged could have been for Ackroyd to have got into such a state about them? I wonder if there is any clew there to the mystery.” “The secretary said there were no papers of particular importance on the desk,” said Poirot quietly. “Yes, but——” I paused. “It strikes you as odd that Ackroyd should have flown into a rage about so trivial a matter?” “Yes, it does rather.” “But was it a trivial matter?” “Of course,” I admitted, “we don’t know what those papers may have been. But Raymond certainly said——” “Leave M. Raymond out of it for a minute. What did you think of that girl?” “Which girl? The parlormaid?” “Yes, the parlormaid. Ursula Bourne.” “She seemed a nice girl,” I said hesitatingly. Poirot repeated my words, but whereas I had laid a slight stress on the fourth word, he put it on the second. “She _seemed_ a nice girl—yes.” Then, after a minute’s silence, he took something from his pocket and handed it to me. “See, my friend, I will show you something. Look there.” The paper he had handed me was that compiled by the inspector and given by him to Poirot that morning. Following the pointing finger, I saw a small cross marked in pencil opposite the name Ursula Bourne. “You may not have noticed it at the time, my good friend, but there was one person on this list whose alibi had no kind of confirmation. Ursula Bourne.” “You don’t think——” “Dr. Sheppard, I dare to think anything. Ursula Bourne may have killed Mr. Ackroyd, but I confess I can see no motive for her doing so. Can you?” He looked at me very hard—so hard that I felt uncomfortable. “Can you?” he repeated. “No motive whatsoever,” I said firmly. His gaze relaxed. He frowned and murmured to himself:— “Since the blackmailer was a man, it follows that she cannot be the blackmailer, then——” I coughed. “As far as that goes——” I began doubtfully. He spun round on me. “What? What are you going to say?” “Nothing. Nothing. Only that, strictly speaking, Mrs. Ferrars in her letter mentioned a _person_—she didn’t actually specify a man. But we took it for granted, Ackroyd and I, that it _was_ a man.” Poirot did not seem to be listening to me. He was muttering to himself again. “But then it is possible after all—yes, certainly it is possible—but then—ah! I must rearrange my ideas. Method, order; never have I needed them more. Everything must fit in—in its appointed place—otherwise I am on the wrong tack.” He broke off, and whirled round upon me again. “Where is Marby?” “It’s on the other side of Cranchester.” “How far away?” “Oh!—fourteen miles, perhaps.” “Would it be possible for you to go there? To-morrow, say?” “To-morrow? Let me see, that’s Sunday. Yes, I could arrange it. What do you want me to do there?” “See this Mrs. Folliott. Find out all you can about Ursula Bourne.” “Very well. But—I don’t much care for the job.” “It is not the time to make difficulties. A man’s life may hang on this.” “Poor Ralph,” I said with a sigh. “You believe him to be innocent, though?” Poirot looked at me very gravely. “Do you want to know the truth?” “Of course.” “Then you shall have it. My friend, everything points to the assumption that he is guilty.” “What!” I exclaimed. Poirot nodded. “Yes, that stupid inspector—for he is stupid—has everything pointing his way. I seek for the truth—and the truth leads me every time to Ralph Paton. Motive, opportunity, means. But I will leave no stone unturned. I promised Mademoiselle Flora. And she was very sure, that little one. But very sure indeed.” POIROT PAYS A CALL I was slightly nervous when I rang the bell at Marby Grange the following afternoon. I wondered very much what Poirot expected to find out. He had entrusted the job to me. Why? Was it because, as in the case of questioning Major Blunt, he wished to remain in the background? The wish, intelligible in the first case, seemed to me quite meaningless here. My meditations were interrupted by the advent of a smart parlormaid. Yes, Mrs. Folliott was at home. I was ushered into a big drawing-room, and looked round me curiously as I waited for the mistress of the house. A large bare room, some good bits of old china, and some beautiful etchings, shabby covers and curtains. A lady’s room in every sense of the term. I turned from the inspection of a Bartolozzi on the wall as Mrs. Folliott came into the room. She was a tall woman, with untidy brown hair, and a very winning smile. “Dr. Sheppard,” she said hesitatingly. “That is my name,” I replied. “I must apologize for calling upon you like this, but I wanted some information about a parlormaid previously employed by you, Ursula Bourne.” With the utterance of the name the smile vanished from her face, and all the cordiality froze out of her manner. She looked uncomfortable and ill at ease. “Ursula Bourne?” she said hesitatingly. “Yes,” I said. “Perhaps you don’t remember the name?” “Oh, yes, of course. I—I remember perfectly.” “She left you just over a year ago, I understand?” “Yes. Yes, she did. That is quite right.” “And you were satisfied with her whilst she was with you? How long was she with you, by the way?” “Oh! a year or two—I can’t remember exactly how long. She—she is very capable. I’m sure you will find her quite satisfactory. I didn’t know she was leaving Fernly. I hadn’t the least idea of it.” “Can you tell me anything about her?” I asked. “Anything about her?” “Yes, where she comes from, who her people are—that sort of thing?” Mrs. Folliott’s face wore more than ever its frozen look. “I don’t know at all.” “Who was she with before she came to you?” “I’m afraid I don’t remember.” There was a spark of anger now underlying her nervousness. She flung up her head in a gesture that was vaguely familiar. “Is it really necessary to ask all these questions?” “Not at all,” I said, with an air of surprise and a tinge of apology in my manner. “I had no idea you would mind answering them. I am very sorry.” Her anger left her and she became confused again. “Oh! I don’t mind answering them. I assure you I don’t. Why should I? It—it just seemed a little odd, you know. That’s all. A little odd.” One advantage of being a medical practitioner is that you can usually tell when people are lying to you. I should have known from Mrs. Folliott’s manner, if from nothing else, that she did mind answering my questions—minded intensely. She was thoroughly uncomfortable and upset, and there was plainly some mystery in the background. I judged her to be a woman quite unused to deception of any kind, and consequently rendered acutely uneasy when forced to practice it. A child could have seen through her. But it was also clear that she had no intention of telling me anything further. Whatever the mystery centering around Ursula Bourne might be, I was not going to learn it through Mrs. Folliott. Defeated, I apologized once more for disturbing her, took my hat and departed. I went to see a couple of patients and arrived home about six o’clock. Caroline was sitting beside the wreck of tea things. She had that look of suppressed exultation on her face which I know only too well. It is a sure sign with her, of either the getting or the giving of information. I wondered which it had been. “I’ve had a very interesting afternoon,” began Caroline as I dropped into my own particular easy chair, and stretched out my feet to the inviting blaze in the fireplace. “Have you?” I asked. “Miss Ganett drop in to tea?” Miss Ganett is one of the chief of our newsmongers. “Guess again,” said Caroline with intense complacency. I guessed several times, working slowly through all the members of Caroline’s Intelligence Corps. My sister received each guess with a triumphant shake of the head. In the end she volunteered the information herself. “M. Poirot!” she said. “Now what do you think of that?” I thought a good many things of it, but I was careful not to say them to Caroline. “Why did he come?” I asked. “To see me, of course. He said that knowing my brother so well, he hoped he might be permitted to make the acquaintance of his charming sister—your charming sister, I’ve got mixed up, but you know what I mean.” “What did he talk about?” I asked. “He told me a lot about himself and his cases. You know that Prince Paul of Mauretania—the one who’s just married a dancer?” “Yes?” “I saw a most intriguing paragraph about her in Society Snippets the other day, hinting that she was really a Russian Grand Duchess—one of the Czar’s daughters who managed to escape from the Bolsheviks. Well, it seems that M. Poirot solved a baffling murder mystery that threatened to involve them both. Prince Paul was beside himself with gratitude.” “Did he give him an emerald tie pin the size of a plover’s egg?” I inquired sarcastically. “He didn’t mention it. Why?” “Nothing,” I said. “I thought it was always done. It is in detective fiction anyway. The super detective always has his rooms littered with rubies and pearls and emeralds from grateful Royal clients.” “It’s very interesting to hear about these things from the inside,” said my sister complacently. It would be—to Caroline. I could not but admire the ingenuity of M. Hercule Poirot, who had selected unerringly the case of all others that would most appeal to an elderly maiden lady living in a small village. “Did he tell you if the dancer was really a Grand Duchess?” I inquired. “He was not at liberty to speak,” said Caroline importantly. I wondered how far Poirot had strained the truth in talking to Caroline—probably not at all. He had conveyed his innuendoes by means of his eyebrows and his shoulders. “And after all this,” I remarked, “I suppose you were ready to eat out of his hand.” “Don’t be coarse, James. I don’t know where you get these vulgar expressions from.” “Probably from my only link with the outside world—my patients. Unfortunately my practice does not lie amongst Royal princes and interesting Russian émigrés.” Caroline pushed her spectacles up and looked at me. “You seem very grumpy, James. It must be your liver. A blue pill, I think, to-night.” To see me in my own home, you would never imagine that I was a doctor of medicine. Caroline does the home prescribing both for herself and me. “Damn my liver,” I said irritably. “Did you talk about the murder at all?” “Well, naturally, James. What else is there to talk about locally? I was able to set M. Poirot right upon several points. He was very grateful to me. He said I had the makings of a born detective in me—and a wonderful psychological insight into human nature.” Caroline was exactly like a cat that is full to overflowing with rich cream. She was positively purring. “He talked a lot about the little gray cells of the brain, and of their functions. His own, he says, are of the first quality.” “He would say so,” I remarked bitterly. “Modesty is certainly not his middle name.” “I wish you would not be so horribly American, James. He thought it very important that Ralph should be found as soon as possible, and induced to come forward and give an account of himself. He says that his disappearance will produce a very unfortunate impression at the inquest.” “And what did you say to that?” “I agreed with him,” said Caroline importantly. “And I was able to tell him the way people were already talking about it.” “Caroline,” I said sharply, “did you tell M. Poirot what you overheard in the wood that day?” “I did,” said Caroline complacently. I got up and began to walk about. “You realize what you’re doing, I hope,” I jerked out. “You’re putting a halter round Ralph Paton’s neck as surely as you’re sitting in that chair.” “Not at all,” said Caroline, quite unruffled. “I was surprised _you_ hadn’t told him.” “I took very good care not to,” I said. “I’m fond of that boy.” “So am I. That’s why I say you’re talking nonsense. I don’t believe Ralph did it, and so the truth can’t hurt him, and we ought to give M. Poirot all the help we can. Why, think, very likely Ralph was out with that identical girl on the night of the murder, and if so, he’s got a perfect alibi.” “If he’s got a perfect alibi,” I retorted, “why doesn’t he come forward and say so?” “Might get the girl into trouble,” said Caroline sapiently. “But if M. Poirot gets hold of her, and puts it to her as her duty, she’ll come forward of her own accord and clear Ralph.” “You seem to have invented a romantic fairy story of your own,” I said. “You read too many trashy novels, Caroline. I’ve always told you so.” I dropped into my chair again. “Did Poirot ask you any more questions?” I inquired. “Only about the patients you had that morning.” “The patients?” I demanded, unbelievingly. “Yes, your surgery patients. How many and who they were?” “Do you mean to say you were able to tell him that?” I demanded. Caroline is really amazing. “Why not?” asked my sister triumphantly. “I can see the path up to the surgery door perfectly from this window. And I’ve got an excellent memory, James. Much better than yours, let me tell you.” “I’m sure you have,” I murmured mechanically. My sister went on, checking the names on her fingers. “There was old Mrs. Bennett, and that boy from the farm with the bad finger, Dolly Grice to have a needle out of her finger; that American steward off the liner. Let me see—that’s four. Yes, and old George Evans with his ulcer. And lastly——” She paused significantly. “Well?” Caroline brought out her climax triumphantly. She hissed in the most approved style—aided by the fortunate number of s’s at her disposal. “_Miss Russell!_” She sat back in her chair and looked at me meaningly, and when Caroline looks at you meaningly, it is impossible to miss it. “I don’t know what you mean,” I said, quite untruthfully. “Why shouldn’t Miss Russell consult me about her bad knee?” “Bad knee,” said Caroline. “Fiddlesticks! No more bad knee than you and I. She was after something else.” “What?” I asked. Caroline had to admit that she didn’t know. “But depend upon it, that was what he was trying to get at, M. Poirot, I mean. There’s something fishy about that woman, and he knows it.” “Precisely the remark Mrs. Ackroyd made to me yesterday,” I said. “That there was something fishy about Miss Russell.” “Ah!” said Caroline darkly, “Mrs. Ackroyd! There’s another!” “Another what?” Caroline refused to explain her remarks. She merely nodded her head several times, rolled up her knitting, and went upstairs to don the high mauve silk blouse and the gold locket which she calls dressing for dinner. I stayed there staring into the fire and thinking over Caroline’s words. Had Poirot really come to gain information about Miss Russell, or was it only Caroline’s tortuous mind that interpreted everything according to her own ideas? There had certainly been nothing in Miss Russell’s manner that morning to arouse suspicion. At least—— I remembered her persistent conversation on the subject of drug-taking and from that she had led the conversation to poisons and poisoning. But there was nothing in that. Ackroyd had not been poisoned. Still, it was odd.... I heard Caroline’s voice, rather acid in note, calling from the top of the stairs. “James, you will be late for dinner.” I put some coal on the fire and went upstairs obediently. It is well at any price to have peace in the home. ROUND THE TABLE A joint inquest was held on Monday. I do not propose to give the proceedings in detail. To do so would only be to go over the same ground again and again. By arrangement with the police, very little was allowed to come out. I gave evidence as to the cause of Ackroyd’s death and the probable time. The absence of Ralph Paton was commented on by the coroner, but not unduly stressed. Afterwards, Poirot and I had a few words with Inspector Raglan. The inspector was very grave. “It looks bad, Mr. Poirot,” he said. “I’m trying to judge the thing fair and square. I’m a local man, and I’ve seen Captain Paton many times in Cranchester. I’m not wanting him to be the guilty one—but it’s bad whichever way you look at it. If he’s innocent, why doesn’t he come forward? We’ve got evidence against him, but it’s just possible that that evidence could be explained away. Then why doesn’t he give an explanation?” A lot more lay behind the inspector’s words than I knew at the time. Ralph’s description had been wired to every port and railway station in England. The police everywhere were on the alert. His rooms in town were watched, and any houses he had been known to be in the habit of frequenting. With such a _cordon_ it seemed impossible that Ralph should be able to evade detection. He had no luggage, and, as far as any one knew, no money. “I can’t find any one who saw him at the station that night,” continued the inspector. “And yet he’s well known down here, and you’d think somebody would have noticed him. There’s no news from Liverpool either.” “You think he went to Liverpool?” queried Poirot. “Well, it’s on the cards. That telephone message from the station, just three minutes before the Liverpool express left—there ought to be something in that.” “Unless it was deliberately intended to throw you off the scent. That might just possibly be the point of the telephone message.” “That’s an idea,” said the inspector eagerly. “Do you really think that’s the explanation of the telephone call?” “My friend,” said Poirot gravely, “I do not know. But I will tell you this: I believe that when we find the explanation of that telephone call we shall find the explanation of the murder.” “You said something like that before, I remember,” I observed, looking at him curiously. Poirot nodded. “I always come back to it,” he said seriously. “It seems to me utterly irrelevant,” I declared. “I wouldn’t say that,” demurred the inspector. “But I must confess I think Mr. Poirot here harps on it a little too much. We’ve better clews than that. The fingerprints on the dagger, for instance.” Poirot became suddenly very foreign in manner, as he often did when excited over anything. “M. l’Inspecteur,” he said, “beware of the blind—the blind—_comment dire?_—the little street that has no end to it.” Inspector Raglan stared, but I was quicker. “You mean a blind alley?” I said. “That is it—the blind street that leads nowhere. So it may be with those fingerprints—they may lead you nowhere.” “I don’t see how that can well be,” said the police officer. “I suppose you’re hinting that they’re faked? I’ve read of such things being done, though I can’t say I’ve ever come across it in my experience. But fake or true—they’re bound to lead _somewhere_.” Poirot merely shrugged his shoulders, flinging out his arms wide. The inspector then showed us various enlarged photographs of the fingerprints, and proceeded to become technical on the subject of loops and whorls. “Come now,” he said at last, annoyed by Poirot’s detached manner, “you’ve got to admit that those prints were made by some one who was in the house that night?” “_Bien entendu_,” said Poirot, nodding his head. “Well, I’ve taken the prints of every member of the household, every one, mind you, from the old lady down to the kitchenmaid.” I don’t think Mrs. Ackroyd would enjoy being referred to as the old lady. She must spend a considerable amount on cosmetics. “Every one’s,” repeated the inspector fussily. “Including mine,” I said dryly. “Very well. None of them correspond. That leaves us two alternatives. Ralph Paton, or the mysterious stranger the doctor here tells us about. When we get hold of those two——” “Much valuable time may have been lost,” broke in Poirot. “I don’t quite get you, Mr. Poirot?” “You have taken the prints of every one in the house, you say,” murmured Poirot. “Is that the exact truth you are telling me there, M. l’Inspecteur?” “Certainly.” “Without overlooking any one?” “Without overlooking any one.” “The quick or the dead?” For a moment the inspector looked bewildered at what he took to be a religious observation. Then he reacted slowly. “You mean——” “The dead, M. l’Inspecteur.” The inspector still took a minute or two to understand. “I am suggesting,” said Poirot placidly, “that the fingerprints on the dagger handle are those of Mr. Ackroyd himself. It is an easy matter to verify. His body is still available.” “But why? What would be the point of it? You’re surely not suggesting suicide, Mr. Poirot?” “Ah! no. My theory is that the murderer wore gloves or wrapped something round his hand. After the blow was struck, he picked up the victim’s hand and closed it round the dagger handle.” “But why?” Poirot shrugged his shoulders again. “To make a confusing case even more confusing.” “Well,” said the inspector, “I’ll look into it. What gave you the idea in the first place?” “When you were so kind as to show me the dagger and draw attention to the fingerprints. I know very little of loops and whorls—see, I confess my ignorance frankly. But it did occur to me that the position of the prints was somewhat awkward. Not so would I have held a dagger in order to strike. Naturally, with the right hand brought up over the shoulder backwards, it would have been difficult to put it in exactly the right position.” Inspector Raglan stared at the little man. Poirot, with an air of great unconcern, flecked a speck of dust from his coat sleeve. “Well,” said the inspector, “it’s an idea. I’ll look into it all right, but don’t you be disappointed if nothing comes of it.” He endeavored to make his tone kindly and patronizing. Poirot watched him go off. Then he turned to me with twinkling eyes. “Another time,” he observed, “I must be more careful of his _amour propre_. And now that we are left to our own devices, what do you think, my good friend, of a little reunion of the family?” The “little reunion,” as Poirot called it, took place about half an hour later. We sat round the table in the dining-room at Fernly—Poirot at the head of the table, like the chairman of some ghastly board meeting. The servants were not present, so we were six in all. Mrs. Ackroyd, Flora, Major Blunt, young Raymond, Poirot, and myself. When every one was assembled, Poirot rose and bowed. “Messieurs, mesdames, I have called you together for a certain purpose.” He paused. “To begin with, I want to make a very special plea to mademoiselle.” “To me?” said Flora. “Mademoiselle, you are engaged to Captain Ralph Paton. If any one is in his confidence, you are. I beg you, most earnestly, if you know of his whereabouts, to persuade him to come forward. One little minute”—as Flora raised her head to speak—“say nothing till you have well reflected. Mademoiselle, his position grows daily more dangerous. If he had come forward at once, no matter how damning the facts, he might have had a chance of explaining them away. But this silence—this flight—what can it mean? Surely only one thing, knowledge of guilt. Mademoiselle, if you really believe in his innocence, persuade him to come forward before it is too late.” Flora’s face had gone very white. “Too late!” she repeated, very low. Poirot leant forward, looking at her. “See now, mademoiselle,” he said very gently, “it is Papa Poirot who asks you this. The old Papa Poirot who has much knowledge and much experience. I would not seek to entrap you, mademoiselle. Will you not trust me—and tell me where Ralph Paton is hiding?” The girl rose, and stood facing him. “M. Poirot,” she said in a clear voice, “I swear to you—swear solemnly—that I have no idea where Ralph is, and that I have neither seen him nor heard from him either on the day of—of the murder, or since.” She sat down again. Poirot gazed at her in silence for a minute or two, then he brought his hand down on the table with a sharp rap. “_Bien!_ That is that,” he said. His face hardened. “Now I appeal to these others who sit round this table, Mrs. Ackroyd, Major Blunt, Dr. Sheppard, Mr. Raymond. You are all friends and intimates of the missing man. If you know where Ralph Paton is hiding, speak out.” There was a long silence. Poirot looked to each in turn. “I beg of you,” he said in a low voice, “speak out.” But still there was silence, broken at last by Mrs. Ackroyd. “I must say,” she observed in a plaintive voice, “that Ralph’s absence is most peculiar—most peculiar indeed. Not to come forward at such a time. It looks, you know, as though there were something _behind_ it. I can’t help thinking, Flora dear, that it was a very fortunate thing your engagement was never formally announced.” “Mother!” cried Flora angrily. “Providence,” declared Mrs. Ackroyd. “I have a devout belief in Providence—a divinity that shapes our ends, as Shakespeare’s beautiful line runs.” “Surely you don’t make the Almighty directly responsible for thick ankles, Mrs. Ackroyd, do you?” asked Geoffrey Raymond, his irresponsible laugh ringing out. His idea was, I think, to loosen the tension, but Mrs. Ackroyd threw him a glance of reproach and took out her handkerchief. “Flora has been saved a terrible amount of notoriety and unpleasantness. Not for a moment that I think dear Ralph had anything to do with poor Roger’s death. I _don’t_ think so. But then I have a trusting heart—I always have had, ever since a child. I am loath to believe the worst of any one. But, of course, one must remember that Ralph was in several air raids as a young boy. The results are apparent long after, sometimes, they say. People are not responsible for their actions in the least. They lose control, you know, without being able to help it.” “Mother,” cried Flora, “you don’t think Ralph did it?” “Come, Mrs. Ackroyd,” said Blunt. “I don’t know what to think,” said Mrs. Ackroyd tearfully. “It’s all very upsetting. What would happen to the estate, I wonder, if Ralph were found guilty?” Raymond pushed his chair away from the table violently. Major Blunt remained very quiet, looking thoughtfully at her. “Like shell-shock, you know,” said Mrs. Ackroyd obstinately, “and I dare say Roger kept him very short of money—with the best intentions, of course. I can see you are all against me, but I do think it is very odd that Ralph has not come forward, and I must say I am thankful Flora’s engagement was never announced formally.” “It will be to-morrow,” said Flora in a clear voice. “Flora!” cried her mother, aghast. Flora had turned to the secretary. “Will you send the announcement to the _Morning Post_ and the _Times_, please, Mr. Raymond.” “If you are sure that it is wise, Miss Ackroyd,” he replied gravely. She turned impulsively to Blunt. “You understand,” she said. “What else can I do? As things are, I must stand by Ralph. Don’t you see that I must?” She looked very searchingly at him, and after a long pause he nodded abruptly. Mrs. Ackroyd burst out into shrill protests. Flora remained unmoved. Then Raymond spoke. “I appreciate your motives, Miss Ackroyd. But don’t you think you’re being rather precipitate? Wait a day or two.” “To-morrow,” said Flora, in a clear voice. “It’s no good, mother, going on like this. Whatever else I am, I’m not disloyal to my friends.” “M. Poirot,” Mrs. Ackroyd appealed tearfully, “can’t you say anything at all?” “Nothing to be said,” interpolated Blunt. “She’s doing the right thing. I’ll stand by her through thick and thin.” Flora held out her hand to him. “Thank you, Major Blunt,” she said. “Mademoiselle,” said Poirot, “will you let an old man congratulate you on your courage and your loyalty? And will you not misunderstand me if I ask you—ask you most solemnly—to postpone the announcement you speak of for at least two days more?” Flora hesitated. “I ask it in Ralph Paton’s interests as much as in yours, mademoiselle. You frown. You do not see how that can be. But I assure you that it is so. _Pas de blagues_. You put the case into my hands—you must not hamper me now.” Flora paused a few minutes before replying. “I do not like it,” she said at last, “but I will do what you say.” She sat down again at the table. “And now, messieurs et mesdames,” said Poirot rapidly, “I will continue with what I was about to say. Understand this, I mean to arrive at the truth. The truth, however ugly in itself, is always curious and beautiful to the seeker after it. I am much aged, my powers may not be what they were.” Here he clearly expected a contradiction. “In all probability this is the last case I shall ever investigate. But Hercule Poirot does not end with a failure. Messieurs et mesdames, I tell you, I mean to _know_. And I shall know—in spite of you all.” He brought out the last words provocatively, hurling them in our face as it were. I think we all flinched back a little, excepting Geoffrey Raymond, who remained good humored and imperturbable as usual. “How do you mean—in spite of us all?” he asked, with slightly raised eyebrows. “But—just that, monsieur. Every one of you in this room is concealing something from me.” He raised his hand as a faint murmur of protest arose. “Yes, yes, I know what I am saying. It may be something unimportant—trivial—which is supposed to have no bearing on the case, but there it is. _Each one of you has something to hide._ Come, now, am I right?” His glance, challenging and accusing, swept round the table. And every pair of eyes dropped before his. Yes, mine as well. “I am answered,” said Poirot, with a curious laugh. He got up from his seat. “I appeal to you all. Tell me the truth—the whole truth.” There was a silence. “Will no one speak?” He gave the same short laugh again. “_C’est dommage_,” he said, and went out. THE GOOSE QUILL That evening, at Poirot’s request, I went over to his house after dinner. Caroline saw me depart with visible reluctance. I think she would have liked to have accompanied me. Poirot greeted me hospitably. He had placed a bottle of Irish whisky (which I detest) on a small table, with a soda water siphon and a glass. He himself was engaged in brewing hot chocolate. It was a favorite beverage of his, I discovered later. He inquired politely after my sister, whom he declared to be a most interesting woman. “I’m afraid you’ve been giving her a swelled head,” I said dryly. “What about Sunday afternoon?” He laughed and twinkled. “I always like to employ the expert,” he remarked obscurely, but he refused to explain the remark. “You got all the local gossip anyway,” I remarked. “True, and untrue.” “And a great deal of valuable information,” he added quietly. “Such as——?” He shook his head. “Why not have told me the truth?” he countered. “In a place like this, all Ralph Paton’s doings were bound to be known. If your sister had not happened to pass through the wood that day somebody else would have done so.” “I suppose they would,” I said grumpily. “What about this interest of yours in my patients?” Again he twinkled. “Only one of them, doctor. Only one of them.” “The last?” I hazarded. “I find Miss Russell a study of the most interesting,” he said evasively. “Do you agree with my sister and Mrs. Ackroyd that there is something fishy about her?” I asked. “Eh? What do you say—fishy?” I explained to the best of my ability. “And they say that, do they?” “Didn’t my sister convey as much to you yesterday afternoon?” “_C’est possible._” “For no reason whatever,” I declared. “_Les femmes_,” generalized Poirot. “They are marvelous! They invent haphazard—and by miracle they are right. Not that it is that, really. Women observe subconsciously a thousand little details, without knowing that they are doing so. Their subconscious mind adds these little things together—and they call the result intuition. Me, I am very skilled in psychology. I know these things.” He swelled his chest out importantly, looking so ridiculous, that I found it difficult not to burst out laughing. Then he took a small sip of his chocolate, and carefully wiped his mustache. “I wish you’d tell me,” I burst out, “what you really think of it all?” He put down his cup. “You wish that?” “I do.” “You have seen what I have seen. Should not our ideas be the same?” “I’m afraid you’re laughing at me,” I said stiffly. “Of course, I’ve no experience of matters of this kind.” Poirot smiled at me indulgently. “You are like the little child who wants to know the way the engine works. You wish to see the affair, not as the family doctor sees it, but with the eye of a detective who knows and cares for no one—to whom they are all strangers and all equally liable to suspicion.” “You put it very well,” I said. “So I give you then, a little lecture. The first thing is to get a clear history of what happened that evening—always bearing in mind that the person who speaks may be lying.” I raised my eyebrows. “Rather a suspicious attitude.” “But necessary—I assure you, necessary. Now first—Dr. Sheppard leaves the house at ten minutes to nine. How do I know that?” “Because I told you so.” “But you might not be speaking the truth—or the watch you went by might be wrong. But Parker also says that you left the house at ten minutes to nine. So we accept that statement and pass on. At nine o’clock you run into a man—and here we come to what we will call the Romance of the Mysterious Stranger—just outside the Park gates. How do I know that that is so?” “I told you so,” I began again, but Poirot interrupted me with a gesture of impatience. “Ah! but it is that you are a little stupid to-night, my friend. _You_ know that it is so—but how am _I_ to know? _Eh bien_, I am able to tell you that the Mysterious Stranger was not a hallucination on your part, because the maid of a Miss Ganett met him a few minutes before you did, and of her too he inquired the way to Fernly Park. We accept his presence, therefore, and we can be fairly sure of two things about him—that he was a stranger to the neighborhood, and that whatever his object in going to Fernly, there was no great secrecy about it, since he twice asked the way there.” “Yes,” I said, “I see that.” “Now I have made it my business to find out more about this man. He had a drink at the Three Boars, I learn, and the barmaid there says that he spoke with an American accent and mentioned having just come over from the States. Did it strike you that he had an American accent?” “Yes, I think he had,” I said, after a minute or two, during which I cast my mind back; “but a very slight one.” “_Précisément._ There is also this which, you will remember, I picked up in the summer-house?” He held out to me the little quill. I looked at it curiously. Then a memory of something I had read stirred in me. Poirot, who had been watching my face, nodded. “Yes, heroin ‘snow.’ Drug-takers carry it like this, and sniff it up the nose.” “Diamorphine hydrochloride,” I murmured mechanically. “This method of taking the drug is very common on the other side. Another proof, if we wanted one, that the man came from Canada or the States.” “What first attracted your attention to that summer-house?” I asked curiously. “My friend the inspector took it for granted that any one using that path did so as a short cut to the house, but as soon as I saw the summer-house, I realized that the same path would be taken by any one using the summer-house as a rendezvous. Now it seems fairly certain that the stranger came neither to the front nor to the back door. Then did some one from the house go out and meet him? If so, what could be a more convenient place than that little summer-house? I searched it with the hope that I might find some clew inside. I found two, the scrap of cambric and the quill.” “And the scrap of cambric?” I asked curiously. “What about that?” Poirot raised his eyebrows. “You do not use your little gray cells,” he remarked dryly. “The scrap of starched cambric should be obvious.” “Not very obvious to me.” I changed the subject. “Anyway,” I said, “this man went to the summer-house to meet somebody. Who was that somebody?” “Exactly the question,” said Poirot. “You will remember that Mrs. Ackroyd and her daughter came over from Canada to live here?” “Is that what you meant to-day when you accused them of hiding the truth?” “Perhaps. Now another point. What did you think of the parlormaid’s story?” “What story?” “The story of her dismissal. Does it take half an hour to dismiss a servant? Was the story of those important papers a likely one? And remember, though she says she was in her bedroom from nine-thirty until ten o’clock, there is no one to confirm her statement.” “You bewilder me,” I said. “To me it grows clearer. But tell me now your own ideas and theories.” I drew a piece of paper from my pocket. “I just scribbled down a few suggestions,” I said apologetically. “But excellent—you have method. Let us hear them.” I read out in a somewhat embarrassed voice. “To begin with, one must look at the thing logically——” “Just what my poor Hastings used to say,” interrupted Poirot, “but alas! he never did so.” “_Point No. 1._—Mr. Ackroyd was heard talking to some one at half-past nine. “_Point No. 2._—At some time during the evening Ralph Paton must have come in through the window, as evidenced by the prints of his shoes. “_Point No. 3._—Mr. Ackroyd was nervous that evening, and would only have admitted some one he knew. “_Point No. 4._—The person with Mr. Ackroyd at nine-thirty was asking for money. We know Ralph Paton was in a scrape. “_These four points go to show that the person with Mr. Ackroyd at nine-thirty was Ralph Paton. But we know that Mr. Ackroyd was alive at a quarter to ten, therefore it was not Ralph who killed him. Ralph left the window open. Afterwards the murderer came in that way._” “And who was the murderer?” inquired Poirot. “The American stranger. He may have been in league with Parker, and possibly in Parker we have the man who blackmailed Mrs. Ferrars. If so, Parker may have heard enough to realize the game was up, have told his accomplice so, and the latter did the crime with the dagger which Parker gave him.” “It is a theory that,” admitted Poirot. “Decidedly you have cells of a kind. But it leaves a good deal unaccounted for.” “Such as——?” “The telephone call, the pushed-out chair——” “Do you really think the latter important?” I interrupted. “Perhaps not,” admitted my friend. “It may have been pulled out by accident, and Raymond or Blunt may have shoved it into place unconsciously under the stress of emotion. Then there is the missing forty pounds.” “Given by Ackroyd to Ralph,” I suggested. “He may have reconsidered his first refusal.” “That still leaves one thing unexplained?” “What?” “Why was Blunt so certain in his own mind that it was Raymond with Mr. Ackroyd at nine-thirty?” “He explained that,” I said. “You think so? I will not press the point. Tell me instead, what were Ralph Paton’s reasons for disappearing?” “That’s rather more difficult,” I said slowly. “I shall have to speak as a medical man. Ralph’s nerves must have gone phut! If he suddenly found out that his uncle had been murdered within a few minutes of his leaving him—after, perhaps, a rather stormy interview—well, he might get the wind up and clear right out. Men have been known to do that—act guiltily when they’re perfectly innocent.” “Yes, that is true,” said Poirot. “But we must not lose sight of one thing.” “I know what you’re going to say,” I remarked: “motive. Ralph Paton inherits a great fortune by his uncle’s death.” “That is one motive,” agreed Poirot. “One?” “_Mais oui._ Do you realize that there are three separate motives staring us in the face. Somebody certainly stole the blue envelope and its contents. That is one motive. Blackmail! Ralph Paton may have been the man who blackmailed Mrs. Ferrars. Remember, as far as Hammond knew, Ralph Paton had not applied to his uncle for help of late. That looks as though he were being supplied with money elsewhere. Then there is the fact that he was in some—how do you say—scrape?—which he feared might get to his uncle’s ears. And finally there is the one you have just mentioned.” “Dear me,” I said, rather taken aback. “The case does seem black against him.” “Does it?” said Poirot. “That is where we disagree, you and I. Three motives—it is almost too much. I am inclined to believe that, after all, Ralph Paton is innocent.” MRS. ACKROYD After the evening talk I have just chronicled, the affair seemed to me to enter on a different phase. The whole thing can be divided into two parts, each clear and distinct from the other. Part I. ranges from Ackroyd’s death on the Friday evening to the following Monday night. It is the straight-forward narrative of what occurred, as presented to Hercule Poirot. I was at Poirot’s elbow the whole time. I saw what he saw. I tried my best to read his mind. As I know now, I failed in this latter task. Though Poirot showed me all his discoveries—as, for instance, the gold wedding-ring—he held back the vital and yet logical impressions that he formed. As I came to know later, this secrecy was characteristic of him. He would throw out hints and suggestions, but beyond that he would not go. As I say, up till the Monday evening, my narrative might have been that of Poirot himself. I played Watson to his Sherlock. But after Monday our ways diverged. Poirot was busy on his own account. I got to hear of what he was doing, because, in King’s Abbot, you get to hear of everything, but he did not take me into his confidence beforehand. And I, too, had my own preoccupations. On looking back, the thing that strikes me most is the piecemeal character of this period. Every one had a hand in the elucidation of the mystery. It was rather like a jig-saw puzzle to which every one contributed their own little piece of knowledge or discovery. But their task ended there. To Poirot alone belongs the renown of fitting those pieces into their correct place. Some of the incidents seemed at the time irrelevant and unmeaning. There was, for instance, the question of the black boots. But that comes later.... To take things strictly in chronological order, I must begin with the summons from Mrs. Ackroyd. She sent for me early on Tuesday morning, and since the summons sounded an urgent one, I hastened there, expecting to find her _in extremis_. The lady was in bed. So much did she concede to the etiquette of the situation. She gave me her bony hand, and indicated a chair drawn up to the bedside. “Well, Mrs. Ackroyd,” I said, “and what’s the matter with you?” I spoke with that kind of spurious geniality which seems to be expected of general practitioners. “I’m prostrated,” said Mrs. Ackroyd in a faint voice. “Absolutely prostrated. It’s the shock of poor Roger’s death. They say these things often aren’t felt at the _time_, you know. It’s the reaction afterwards.” It is a pity that a doctor is precluded by his profession from being able sometimes to say what he really thinks. I would have given anything to be able to answer “Bunkum!” Instead, I suggested a tonic. Mrs. Ackroyd accepted the tonic. One move in the game seemed now to be concluded. Not for a moment did I imagine that I had been sent for because of the shock occasioned by Ackroyd’s death. But Mrs. Ackroyd is totally incapable of pursuing a straight-forward course on any subject. She always approaches her object by tortuous means. I wondered very much why it was she had sent for me. “And then that scene—yesterday,” continued my patient. She paused as though expecting me to take up a cue. “What scene?” “Doctor, how can you? Have you forgotten? That dreadful little Frenchman—or Belgian—or whatever he is. Bullying us all like he did. It has quite upset me. Coming on top of Roger’s death.” “I’m very sorry, Mrs. Ackroyd,” I said. “I don’t know what he meant—shouting at us like he did. I should hope I know my duty too well to _dream_ of concealing anything. I have given the police _every_ assistance in my power.” Mrs. Ackroyd paused, and I said, “Quite so.” I was beginning to have a glimmering of what all the trouble was about. “No one can say that I have failed in my duty,” continued Mrs. Ackroyd. “I am sure Inspector Raglan is perfectly satisfied. Why should this little upstart of a foreigner make a fuss? A most ridiculous-looking creature he is too—just like a comic Frenchman in a revue. I can’t think why Flora insisted on bringing him into the case. She never said a word to me about it. Just went off and did it on her own. Flora is too independent. I am a woman of the world and her mother. She should have come to me for advice first.” I listened to all this in silence. “What does he think? That’s what I want to know. Does he actually imagine I’m hiding something? He—he—positively _accused_ me yesterday.” I shrugged my shoulders. “It is surely of no consequence, Mrs. Ackroyd,” I said. “Since you are not concealing anything, any remarks he may have made do not apply to you.” Mrs. Ackroyd went off at a tangent, after her usual fashion. “Servants are so tiresome,” she said. “They gossip, and talk amongst themselves. And then it gets round—and all the time there’s probably nothing in it at all.” “Have the servants been talking?” I asked. “What about?” Mrs. Ackroyd cast a very shrewd glance at me. It quite threw me off my balance. “I was sure _you’d_ know, doctor, if any one did. You were with M. Poirot all the time, weren’t you?” “I was.” “Then of course you know. It was that girl, Ursula Bourne, wasn’t it? Naturally—she’s leaving. She _would_ want to make all the trouble she could. Spiteful, that’s what they are. They’re all alike. Now, you being there, doctor, you must know exactly what she did say? I’m most anxious that no wrong impression should get about. After all, you don’t repeat every little detail to the police, do you? There are family matters sometimes—nothing to do with the question of the murder. But if the girl was spiteful, she may have made out all sorts of things.” I was shrewd enough to see that a very real anxiety lay behind these outpourings. Poirot had been justified in his premises. Of the six people round the table yesterday, Mrs. Ackroyd at least had had something to hide. It was for me to discover what that something might be. “If I were you, Mrs. Ackroyd,” I said brusquely, “I should make a clean breast of things.” She gave a little scream. “Oh! doctor, how can you be so abrupt. It sounds as though—as though——And I can explain everything so simply.” “Then why not do so,” I suggested. Mrs. Ackroyd took out a frilled handkerchief, and became tearful. “I thought, doctor, that you might put it to M. Poirot—explain it, you know—because it’s so difficult for a foreigner to see our point of view. And you don’t know—nobody could know—what I’ve had to contend with. A martyrdom—a long martyrdom. That’s what my life has been. I don’t like to speak ill of the dead—but there it is. Not the smallest bill, but it had all to be gone over—just as though Roger had had a few miserly hundreds a year instead of being (as Mr. Hammond told me yesterday) one of the wealthiest men in these parts.” Mrs. Ackroyd paused to dab her eyes with the frilled handkerchief. “Yes,” I said encouragingly. “You were talking about bills?” “Those dreadful bills. And some I didn’t like to show Roger at all. They were things a man wouldn’t understand. He would have said the things weren’t necessary. And of course they mounted up, you know, and they kept coming in——” She looked at me appealingly, as though asking me to condole with her on this striking peculiarity. “It’s a habit they have,” I agreed. “And the tone altered—became quite abusive. I assure you, doctor, I was becoming a nervous wreck. I couldn’t sleep at nights. And a dreadful fluttering round the heart. And then I got a letter from a Scotch gentleman—as a matter of fact there were two letters—both Scotch gentlemen. Mr. Bruce MacPherson was one, and the other were Colin MacDonald. Quite a coincidence.” “Hardly that,” I said dryly. “They are usually Scotch gentlemen, but I suspect a Semitic strain in their ancestry.” “Ten pounds to ten thousand on note of hand alone,” murmured Mrs. Ackroyd reminiscently. “I wrote to one of them, but it seemed there were difficulties.” She paused. I gathered that we were just coming to delicate ground. I have never known any one more difficult to bring to the point. “You see,” murmured Mrs. Ackroyd, “it’s all a question of expectations, isn’t it? Testamentary expectations. And though, of course, I expected that Roger would provide for me, I didn’t _know_. I thought that if only I could glance over a copy of his will—not in any sense of vulgar prying—but just so that I could make my own arrangements.” She glanced sideways at me. The position was now very delicate indeed. Fortunately words, ingeniously used, will serve to mask the ugliness of naked facts. “I could only tell this to you, dear Dr. Sheppard,” said Mrs. Ackroyd rapidly. “I can trust you not to misjudge me, and to represent the matter in the right light to M. Poirot. It was on Friday afternoon——” She came to a stop and swallowed uncertainly. “Yes,” I repeated encouragingly. “On Friday afternoon. Well?” “Every one was out, or so I thought. And I went into Roger’s study—I had some real reason for going there—I mean, there was nothing underhand about it. And as I saw all the papers heaped on the desk, it just came to me, like a flash: ‘I wonder if Roger keeps his will in one of the drawers of the desk.’ I’m so impulsive, always was, from a child. I do things on the spur of the moment. He’d left his keys—very careless of him—in the lock of the top drawer.” “I see,” I said helpfully. “So you searched the desk. Did you find the will?” Mrs. Ackroyd gave a little scream, and I realized that I had not been sufficiently diplomatic. “How dreadful it sounds. But it wasn’t at all like that really.” “Of course it wasn’t,” I said hastily. “You must forgive my unfortunate way of putting things.” “You see, men are so peculiar. In dear Roger’s place, I should not have objected to revealing the provisions of my will. But men are so secretive. One is forced to adopt little subterfuges in self-defence.” “And the result of the little subterfuge?” I asked. “That’s just what I’m telling you. As I got to the bottom drawer, Bourne came in. Most awkward. Of course I shut the drawer and stood up, and I called her attention to a few specks of dust on the surface. But I didn’t like the way she looked—quite respectful in manner, but a very nasty light in her eyes. Almost contemptuous, if you know what I mean. I never have liked that girl very much. She’s a good servant, and she says Ma’am, and doesn’t object to wearing caps and aprons (which I declare to you a lot of them do nowadays), and she can say ‘Not at home’ without scruples if she has to answer the door instead of Parker, and she doesn’t have those peculiar gurgling noises inside which so many parlormaids seem to have when they wait at table——Let me see, where was I?” “You were saying, that in spite of several valuable qualities, you never liked Bourne.” “No more I do. She’s—odd. There’s something different about her from the others. Too well educated, that’s my opinion. You can’t tell who are ladies and who aren’t nowadays.” “And what happened next?” I asked. “Nothing. At least, Roger came in. And I thought he was out for a walk. And he said: ‘What’s all this?’ and I said, ‘Nothing. I just came in to fetch _Punch_.’ And I took _Punch_ and went out with it. Bourne stayed behind. I heard her asking Roger if she could speak to him for a minute. I went straight up to my room, to lie down. I was very upset.” There was a pause. “You will explain to M. Poirot, won’t you? You can see for yourself what a trivial matter the whole thing was. But, of course, when he was so stern about concealing things, I thought of this at once. Bourne may have made some extraordinary story out of it, but you can explain, can’t you?” “That is all?” I said. “You have told me everything?” “Ye-es,” said Mrs. Ackroyd. “Oh! yes,” she added firmly. But I had noted the momentary hesitation, and I knew that there was still something she was keeping back. It was nothing less than a flash of sheer genius that prompted me to ask the question I did. “Mrs. Ackroyd,” I said, “was it you who left the silver table open?” I had my answer in the blush of guilt that even rouge and powder could not conceal. “How did you know?” she whispered. “It was you, then?” “Yes—I—you see—there were one or two pieces of old silver—very interesting. I had been reading up the subject and there was an illustration of quite a small piece which had fetched an immense sum at Christy’s. It looked to me just the same as the one in the silver table. I thought I would take it up to London with me when I went—and—and have it valued. Then if it really was a valuable piece, just think what a charming surprise it would have been for Roger?” I refrained from comments, accepting Mrs. Ackroyd’s story on its merits. I even forbore to ask her why it was necessary to abstract what she wanted in such a surreptitious manner. “Why did you leave the lid open?” I asked. “Did you forget?” “I was startled,” said Mrs. Ackroyd. “I heard footsteps coming along the terrace outside. I hastened out of the room and just got up the stairs before Parker opened the front door to you.” “That must have been Miss Russell,” I said thoughtfully. Mrs. Ackroyd had revealed to me one fact that was extremely interesting. Whether her designs upon Ackroyd’s silver had been strictly honorable I neither knew nor cared. What did interest me was the fact that Miss Russell must have entered the drawing-room by the window, and that I had not been wrong when I judged her to be out of breath with running. Where had she been? I thought of the summer-house and the scrap of cambric. “I wonder if Miss Russell has her handkerchiefs starched!” I exclaimed on the spur of the moment. Mrs. Ackroyd’s start recalled me to myself, and I rose. “You think you can explain to M. Poirot?” she asked anxiously. “Oh, certainly. Absolutely.” I got away at last, after being forced to listen to more justifications of her conduct. The parlormaid was in the hall, and it was she who helped me on with my overcoat. I observed her more closely than I had done heretofore. It was clear that she had been crying. “How is it,” I asked, “that you told us that Mr. Ackroyd sent for you on Friday to his study? I hear now that it was _you_ who asked to speak to _him_?” For a minute the girl’s eyes dropped before mine. Then she spoke. “I meant to leave in any case,” she said uncertainly. I said no more. She opened the front door for me. Just as I was passing out, she said suddenly in a low voice:— “Excuse me, sir, is there any news of Captain Paton?” I shook my head, looking at her inquiringly. “He ought to come back,” she said. “Indeed—indeed he ought to come back.” She was looking at me with appealing eyes. “Does no one know where he is?” she asked. “Do you?” I said sharply. She shook her head. “No, indeed. I know nothing. But any one who was a friend to him would tell him this: he ought to come back.” I lingered, thinking that perhaps the girl would say more. Her next question surprised me. “When do they think the murder was done? Just before ten o’clock?” “That is the idea,” I said. “Between a quarter to ten and the hour.” “Not earlier? Not before a quarter to ten?” I looked at her attentively. She was so clearly eager for a reply in the affirmative. “That’s out of the question,” I said. “Miss Ackroyd saw her uncle alive at a quarter to ten.” She turned away, and her whole figure seemed to droop. “A handsome girl,” I said to myself as I drove off. “An exceedingly handsome girl.” Caroline was at home. She had had a visit from Poirot and was very pleased and important about it. “I am helping him with the case,” she explained. I felt rather uneasy. Caroline is bad enough as it is. What will she be like with her detective instincts encouraged? “Are you going round the neighborhood looking for Ralph Paton’s mysterious girl?” I inquired. “I might do that on my own account,” said Caroline. “No, this is a special thing M. Poirot wants me to find out for him.” “What is it?” I asked. “He wants to know whether Ralph Paton’s boots were black or brown,” said Caroline with tremendous solemnity. I stared at her. I see now that I was unbelievably stupid about these boots. I failed altogether to grasp the point. “They were brown shoes,” I said. “I saw them.” “Not shoes, James, boots. M. Poirot wants to know whether a pair of boots Ralph had with him at the hotel were brown or black. A lot hangs on it.” Call me dense if you like. I didn’t see. “And how are you going to find out?” I asked. Caroline said there would be no difficulty about that. Our Annie’s dearest friend was Miss Ganett’s maid, Clara. And Clara was walking out with the boots at the Three Boars. The whole thing was simplicity itself, and by the aid of Miss Ganett, who coöperated loyally, at once giving Clara leave of absence, the matter was rushed through at express speed. It was when we were sitting down to lunch that Caroline remarked, with would-be unconcern:— “About those boots of Ralph Paton’s.” “Well,” I said, “what about them?” “M. Poirot thought they were probably brown. He was wrong. They’re black.” And Caroline nodded her head several times. She evidently felt that she had scored a point over Poirot. I did not answer. I was puzzling over what the color of a pair of Ralph Paton’s boots had to do with the case. GEOFFREY RAYMOND I was to have a further proof that day of the success of Poirot’s tactics. That challenge of his had been a subtle touch born of his knowledge of human nature. A mixture of fear and guilt had wrung the truth from Mrs. Ackroyd. She was the first to react. That afternoon when I returned from seeing my patients, Caroline told me that Geoffrey Raymond had just left. “Did he want to see me?” I asked, as I hung up my coat in the hall. Caroline was hovering by my elbow. “It was M. Poirot he wanted to see,” she said. “He’d just come from The Larches. M. Poirot was out. Mr. Raymond thought that he might be here, or that you might know where he was.” “I haven’t the least idea.” “I tried to make him wait,” said Caroline, “but he said he would call back at The Larches in half an hour, and went away down the village. A great pity, because M. Poirot came in practically the minute after he left.” “Came in here?” “No, to his own house.” “How do you know?” “The side window,” said Caroline briefly. It seemed to me that we had now exhausted the topic. Caroline thought otherwise. “Aren’t you going across?” “Across where?” “To The Larches, of course.” “My dear Caroline,” I said, “what for?” “Mr. Raymond wanted to see him very particularly,” said Caroline. “You might hear what it’s all about.” I raised my eyebrows. “Curiosity is not my besetting sin,” I remarked coldly. “I can exist comfortably without knowing exactly what my neighbors are doing and thinking.” “Stuff and nonsense, James,” said my sister. “You want to know just as much as I do. You’re not so honest, that’s all. You always have to pretend.” “Really, Caroline,” I said, and retired into my surgery. Ten minutes later Caroline tapped at the door and entered. In her hand she held what seemed to be a pot of jam. “I wonder, James,” she said, “if you would mind taking this pot of medlar jelly across to M. Poirot? I promised it to him. He has never tasted any home-made medlar jelly.” “Why can’t Annie go?” I asked coldly. “She’s doing some mending. I can’t spare her.” Caroline and I looked at each other. “Very well,” I said, rising. “But if I take the beastly thing, I shall just leave it at the door. You understand that?” My sister raised her eyebrows. “Naturally,” she said. “Who suggested you should do anything else?” The honors were with Caroline. “If you _do_ happen to see M. Poirot,” she said, as I opened the front door, “you might tell him about the boots.” It was a most subtle parting shot. I wanted dreadfully to understand the enigma of the boots. When the old lady with the Breton cap opened the door to me, I found myself asking if M. Poirot was in, quite automatically. Poirot sprang up to meet me, with every appearance of pleasure. “Sit down, my good friend,” he said. “The big chair? This small one? The room is not too hot, no?” I thought it was stifling, but refrained from saying so. The windows were closed, and a large fire burned in the grate. “The English people, they have a mania for the fresh air,” declared Poirot. “The big air, it is all very well outside, where it belongs. Why admit it to the house? But let us not discuss such banalities. You have something for me, yes?” “Two things,” I said. “First—this—from my sister.” I handed over the pot of medlar jelly. “How kind of Mademoiselle Caroline. She has remembered her promise. And the second thing?” “Information—of a kind.” And I told him of my interview with Mrs. Ackroyd. He listened with interest, but not much excitement. “It clears the ground,” he said thoughtfully. “And it has a certain value as confirming the evidence of the housekeeper. She said, you remember, that she found the silver table lid open and closed it down in passing.” “What about her statement that she went into the drawing-room to see if the flowers were fresh?” “Ah! we never took that very seriously, did we, my friend? It was patently an excuse, trumped up in a hurry, by a woman who felt it urgent to explain her presence—which, by the way, you would probably never have thought of questioning. I considered it possible that her agitation might arise from the fact that she had been tampering with the silver table, but I think now that we must look for another cause.” “Yes,” I said. “Whom did she go out to meet? And why?” “You think she went to meet some one?” “I do.” Poirot nodded. “So do I,” he said thoughtfully. There was a pause. “By the way,” I said, “I’ve got a message for you from my sister. Ralph Paton’s boots were black, not brown.” I was watching him closely as I gave the message, and I fancied that I saw a momentary flicker of discomposure. If so, it passed almost immediately. “She is absolutely positive they are not brown?” “Absolutely.” “Ah!” said Poirot regretfully. “That is a pity.” And he seemed quite crestfallen. He entered into no explanations, but at once started a new subject of conversation. “The housekeeper, Miss Russell, who came to consult you on that Friday morning—is it indiscreet to ask what passed at the interview—apart from the medical details, I mean?” “Not at all,” I said. “When the professional part of the conversation was over, we talked for a few minutes about poisons, and the ease or difficulty of detecting them, and about drug-taking and drug-takers.” “With special reference to cocaine?” asked Poirot. “How did you know?” I asked, somewhat surprised. For answer, the little man rose and crossed the room to where newspapers were filed. He brought me a copy of the _Daily Budget_, dated Friday, 16th September, and showed me an article dealing with the smuggling of cocaine. It was a somewhat lurid article, written with an eye to picturesque effect. “That is what put cocaine into her head, my friend,” he said. I would have catechized him further, for I did not quite understand his meaning, but at that moment the door opened and Geoffrey Raymond was announced. He came in fresh and debonair as ever, and greeted us both. “How are you, doctor? M. Poirot, this is the second time I’ve been here this morning. I was anxious to catch you.” “Perhaps I’d better be off,” I suggested rather awkwardly. “Not on my account, doctor. No, it’s just this,” he went on, seating himself at a wave of invitation from Poirot, “I’ve got a confession to make.” “_En verité_?” said Poirot, with an air of polite interest. “Oh, it’s of no consequence, really. But, as a matter of fact, my conscience has been pricking me ever since yesterday afternoon. You accused us all of keeping back something, M. Poirot. I plead guilty. I’ve had something up my sleeve.” “And what is that, M. Raymond?” “As I say, it’s nothing of consequence—just this. I was in debt—badly, and that legacy came in the nick of time. Five hundred pounds puts me on my feet again with a little to spare.” He smiled at us both with that engaging frankness that made him such a likable youngster. “You know how it is. Suspicious looking policeman—don’t like to admit you were hard up for money—think it will look bad to them. But I was a fool, really, because Blunt and I were in the billiard room from a quarter to ten onwards, so I’ve got a watertight alibi and nothing to fear. Still, when you thundered out that stuff about concealing things, I felt a nasty prick of conscience, and I thought I’d like to get it off my mind.” He got up again and stood smiling at us. “You are a very wise young man,” said Poirot, nodding at him with approval. “See you, when I know that any one is hiding things from me, I suspect that the thing hidden may be something very bad indeed. You have done well.” “I’m glad I’m cleared from suspicion,” laughed Raymond. “I’ll be off now.” “So that is that,” I remarked, as the door closed behind the young secretary. “Yes,” agreed Poirot. “A mere bagatelle—but if he had not been in the billiard room—who knows? After all, many crimes have been committed for the sake of less than five hundred pounds. It all depends on what sum is sufficient to break a man. A question of the relativity, is it not so? Have you reflected, my friend, that many people in that house stood to benefit by Mr. Ackroyd’s death? Mrs. Ackroyd, Miss Flora, young Mr. Raymond, the housekeeper, Miss Russell. Only one, in fact, does not, Major Blunt.” His tone in uttering that name was so peculiar that I looked up, puzzled. “I don’t quite understand you?” I said. “Two of the people I accused have given me the truth.” “You think Major Blunt has something to conceal also?” “As for that,” remarked Poirot nonchalantly, “there is a saying, is there not, that Englishmen conceal only one thing—their love? And Major Blunt, I should say, is not good at concealments.” “Sometimes,” I said, “I wonder if we haven’t rather jumped to conclusions on one point.” “What is that?” “We’ve assumed that the blackmailer of Mrs. Ferrars is necessarily the murderer of Mr. Ackroyd. Mightn’t we be mistaken?” Poirot nodded energetically. “Very good. Very good indeed. I wondered if that idea would come to you. Of course it is possible. But we must remember one point. The letter disappeared. Still, that, as you say, may not necessarily mean that the murderer took it. When you first found the body, Parker may have abstracted the letter unnoticed by you.” “Parker?” “Yes, Parker. I always come back to Parker—not as the murderer—no, he did not commit the murder; but who is more suitable than he as the mysterious scoundrel who terrorized Mrs. Ferrars? He may have got his information about Mr. Ferrars’s death from one of the King’s Paddock servants. At any rate, he is more likely to have come upon it than a casual guest such as Blunt, for instance.” “Parker might have taken the letter,” I admitted. “It wasn’t till later that I noticed it was gone.” “How much later? After Blunt and Raymond were in the room, or before?” “I can’t remember,” I said slowly. “I think it was before—no, afterwards. Yes, I’m almost sure it was afterwards.” “That widens the field to three,” said Poirot thoughtfully. “But Parker is the most likely. It is in my mind to try a little experiment with Parker. How say you, my friend, will you accompany me to Fernly?” I acquiesced, and we set out at once. Poirot asked to see Miss Ackroyd, and presently Flora came to us. “Mademoiselle Flora,” said Poirot, “I have to confide in you a little secret. I am not yet satisfied of the innocence of Parker. I propose to make a little experiment with your assistance. I want to reconstruct some of his actions on that night. But we must think of something to tell him—ah! I have it. I wish to satisfy myself as to whether voices in the little lobby could have been heard outside on the terrace. Now, ring for Parker, if you will be so good.” I did so, and presently the butler appeared, suave as ever. “You rang, sir?” “Yes, my good Parker. I have in mind a little experiment. I have placed Major Blunt on the terrace outside the study window. I want to see if any one there could have heard the voices of Miss Ackroyd and yourself in the lobby that night. I want to enact that little scene over again. Perhaps you would fetch the tray or whatever it was you were carrying?” Parker vanished, and we repaired to the lobby outside the study door. Presently we heard a chink in the outer hall, and Parker appeared in the doorway carrying a tray with a siphon, a decanter of whisky, and two glasses on it. “One moment,” cried Poirot, raising his hand and seemingly very excited. “We must have everything in order. Just as it occurred. It is a little method of mine.” “A foreign custom, sir,” said Parker. “Reconstruction of the crime they call it, do they not?” He was quite imperturbable as he stood there politely waiting on Poirot’s orders. “Ah! he knows something, the good Parker,” cried Poirot. “He has read of these things. Now, I beg you, let us have everything of the most exact. You came from the outer hall—so. Mademoiselle was—where?” “Here,” said Flora, taking up her stand just outside the study door. “Quite right, sir,” said Parker. “I had just closed the door,” continued Flora. “Yes, miss,” agreed Parker. “Your hand was still on the handle as it is now.” “Then _allez_,” said Poirot. “Play me the little comedy.” Flora stood with her hand on the door handle, and Parker came stepping through the door from the hall, bearing the tray. He stopped just inside the door. Flora spoke. “Oh! Parker. Mr. Ackroyd doesn’t want to be disturbed again to-night.” “Is that right?” she added in an undertone. “To the best of my recollection, Miss Flora,” said Parker, “but I fancy you used the word evening instead of night.” Then, raising his voice in a somewhat theatrical fashion: “Very good, miss. Shall I lock up as usual?” “Yes, please.” Parker retired through the door, Flora followed him, and started to ascend the main staircase. “Is that enough?” she asked over her shoulder. “Admirable,” declared the little man, rubbing his hands. “By the way, Parker, are you sure there were two glasses on the tray that evening? Who was the second one for?” “I always bring two glasses, sir,” said Parker. “Is there anything further?” “Nothing. I thank you.” Parker withdrew, dignified to the last. Poirot stood in the middle of the hall frowning. Flora came down and joined us. “Has your experiment been successful?” she asked. “I don’t quite understand, you know——” Poirot smiled admiringly at her. “It is not necessary that you should,” he said. “But tell me, were there indeed two glasses on Parker’s tray that night?” Flora wrinkled her brows a minute. “I really can’t remember,” she said. “I think there were. Is—is that the object of your experiment?” Poirot took her hand and patted it. “Put it this way,” he said. “I am always interested to see if people will speak the truth.” “And did Parker speak the truth?” “I rather think he did,” said Poirot thoughtfully. A few minutes later saw us retracing our steps to the village. “What was the point of that question about the glasses?” I asked curiously. Poirot shrugged his shoulders. “One must say something,” he remarked. “That particular question did as well as any other.” I stared at him. “At any rate, my friend,” he said more seriously, “I know now something I wanted to know. Let us leave it at that.” AN EVENING AT MAH JONG That night we had a little Mah Jong party. This kind of simple entertainment is very popular in King’s Abbot. The guests arrive in goloshes and waterproofs after dinner. They partake of coffee and later of cake, sandwiches, and tea. On this particular night our guests were Miss Ganett and Colonel Carter, who lives near the church. A good deal of gossip is handed round at these evenings, sometimes seriously interfering with the game in progress. We used to play bridge—chatty bridge of the worst description. We find Mah Jong much more peaceful. The irritated demand as to why on earth your partner did not lead a certain card is entirely done away with, and though we still express criticisms frankly, there is not the same acrimonious spirit. “Very cold evening, eh, Sheppard?” said Colonel Carter, standing with his back to the fire. Caroline had taken Miss Ganett to her own room, and was there assisting her to disentangle herself from her many wraps. “Reminds me of the Afghan passes.” “Indeed?” I said politely. “Very mysterious business this about poor Ackroyd,” continued the colonel, accepting a cup of coffee. “A deuce of a lot behind it—that’s what I say. Between you and me, Sheppard, I’ve heard the word blackmail mentioned!” The colonel gave me the look which might be tabulated “one man of the world to another.” “A woman in it, no doubt,” he said. “Depend upon it, a woman in it.” Caroline and Miss Ganett joined us at this minute. Miss Ganett drank coffee whilst Caroline got out the Mah Jong box and poured out the tiles upon the table. “Washing the tiles,” said the colonel facetiously. “That’s right—washing the tiles, as we used to say in the Shanghai Club.” It is the private opinion of both Caroline and myself that Colonel Carter has never been in the Shanghai Club in his life. More, that he has never been farther east than India, where he juggled with tins of bully beef and plum and apple jam during the Great War. But the colonel is determinedly military, and in King’s Abbot we permit people to indulge their little idiosyncrasies freely. “Shall we begin?” said Caroline. We sat round the table. For some five minutes there was complete silence, owing to the fact that there is tremendous secret competition amongst us as to who can build their wall quickest. “Go on, James,” said Caroline at last. “You’re East Wind.” I discarded a tile. A round or two proceeded, broken by the monotonous remarks of “Three Bamboos,” “Two Circles,” “Pung,” and frequently from Miss Ganett “Unpung,” owing to that lady’s habit of too hastily claiming tiles to which she had no right. “I saw Flora Ackroyd this morning,” said Miss Ganett. “Pung—no—Unpung. I made a mistake.” “Four Circles,” said Caroline. “Where did you see her?” “She didn’t see _me_,” said Miss Ganett, with that tremendous significance only to be met with in small villages. “Ah!” said Caroline interestedly. “Chow.” “I believe,” said Miss Ganett, temporarily diverted, “that it’s the right thing nowadays to say ‘Chee’ not ‘Chow.’” “Nonsense,” said Caroline. “I have always said ‘_Chow_.’” “In the Shanghai Club,” said Colonel Carter, “they say ‘_Chow_.’” Miss Ganett retired, crushed. “What were you saying about Flora Ackroyd?” asked Caroline, after a moment or two devoted to the game. “Was she with any one?” “Very much so,” said Miss Ganett. The eyes of the two ladies met, and seemed to exchange information. “Really,” said Caroline interestedly. “Is that it? Well, it doesn’t surprise me in the least.” “We’re waiting for you to discard, Miss Caroline,” said the colonel. He sometimes affects the pose of the bluff male, intent on the game and indifferent to gossip. But nobody is deceived. “If you ask me,” said Miss Ganett. (“Was that a Bamboo you discarded, dear? Oh! no, I see now—it was a Circle.) As I was saying, if you ask me, Flora’s been exceedingly lucky. Exceedingly lucky she’s been.” “How’s that, Miss Ganett?” asked the colonel. “I’ll Pung that Green Dragon. How do you make out that Miss Flora’s been lucky? Very charming girl and all that, I know.” “I mayn’t know very much about crime,” said Miss Ganett, with the air of one who knows everything there is to know, “but I can tell you one thing. The first question that’s always asked is ‘Who last saw the deceased alive?’ And the person who did is regarded with suspicion. Now, Flora Ackroyd last saw her uncle alive. It might have looked very nasty for her—very nasty indeed. It’s my opinion—and I give it for what it’s worth, that Ralph Paton is staying away on her account, to draw suspicion away from her.” “Come, now,” I protested mildly, “you surely can’t suggest that a young girl like Flora Ackroyd is capable of stabbing her uncle in cold blood?” “Well, I don’t know,” said Miss Ganett. “I’ve just been reading a book from the library about the underworld of Paris, and it says that some of the worst women criminals are young girls with the faces of angels.” “That’s in France,” said Caroline instantly. “Just so,” said the colonel. “Now, I’ll tell you a very curious thing—a story that was going round the Bazaars in India....” The colonel’s story was one of interminable length, and of curiously little interest. A thing that happened in India many years ago cannot compare for a moment with an event that took place in King’s Abbot the day before yesterday. It was Caroline who brought the colonel’s story to a close by fortunately going Mah Jong. After the slight unpleasantness always occasioned by my corrections of Caroline’s somewhat faulty arithmetic, we started a new hand. “East Wind passes,” said Caroline. “I’ve got an idea of my own about Ralph Paton. Three Characters. But I’m keeping it to myself for the present.” “Are you, dear?” said Miss Ganett. “Chow—I mean Pung.” “Yes,” said Caroline firmly. “Was it all right about the boots?” asked Miss Ganett. “Their being black, I mean?” “Quite all right,” said Caroline. “What was the point, do you think?” asked Miss Ganett. Caroline pursed up her lips, and shook her head with an air of knowing all about it. “Pung,” said Miss Ganett. “No—Unpung. I suppose that now the doctor’s in with M. Poirot he knows all the secrets?” “Far from it,” I said. “James is so modest,” said Caroline. “Ah! a concealed Kong.” The colonel gave vent to a whistle. For the moment gossip was forgotten. “Your own wind, too,” he said. “_And_ you’ve got two Pungs of Dragons. We must be careful. Miss Caroline’s out for a big hand.” We played for some minutes with no irrelevant conversation. “This M. Poirot now,” said Colonel Carter, “is he really such a great detective?” “The greatest the world has ever known,” said Caroline solemnly. “He had to come here incognito to avoid publicity.” “Chow,” said Miss Ganett. “Quite wonderful for our little village, I’m sure. By the way, Clara—my maid, you know—is great friends with Elsie, the housemaid at Fernly, and what do you think Elsie told her? That there’s been a lot of money stolen, and it’s her opinion—Elsie’s—I mean, that the parlormaid had something to do with it. She’s leaving at the month, and she’s crying a good deal at night. If you ask me, the girl is very likely in league with a _gang_. She’s always been a queer girl—she’s not friends with any of the girls round here. She goes off by herself on her days out—very unnatural, I call it, and most suspicious. I asked her once to come to our Girls’ Friendly Evenings, but she refused, and then I asked her a few questions about her home and her family—all that sort of thing, and I’m bound to say I considered her manner most impertinent. Outwardly very respectful—but she shut me up in the most barefaced way.” Miss Ganett stopped for breath, and the colonel, who was totally uninterested in the servant question, remarked that in the Shanghai Club brisk play was the invariable rule. We had a round of brisk play. “That Miss Russell,” said Caroline. “She came here pretending to consult James on Friday morning. It’s my opinion she wanted to see where the poisons were kept. Five Characters.” “Chow,” said Miss Ganett. “What an extraordinary idea? I wonder if you can be right.” “Talking of poisons,” said the colonel. “Eh—what? Haven’t I discarded? Oh! Eight Bamboos.” “Mah Jong!” said Miss Ganett. Caroline was very much annoyed. “One Red Dragon,” she said regretfully, “and I should have had a hand of three doubles.” “I’ve had two Red Dragons all the time,” I mentioned. “So exactly like you, James,” said Caroline reproachfully. “You’ve no conception of the spirit of the game.” I myself thought I had played rather cleverly. I should have had to pay Caroline an enormous amount if she had gone Mah Jong. Miss Ganett’s Mah Jong was of the poorest variety possible, as Caroline did not fail to point out to her. East Wind passed, and we started a new hand in silence. “What I was going to tell you just now was this,” said Caroline. “Yes?” said Miss Ganett encouragingly. “My idea about Ralph Paton, I mean.” “Yes, dear,” said Miss Ganett, still more encouragingly. “Chow!” “It’s a sign of weakness to Chow so early,” said Caroline severely. “You should go for a big hand.” “I know,” said Miss Ganett. “You were saying—about Ralph Paton, you know?” “Yes. Well, I’ve a pretty shrewd idea where he is.” We all stopped to stare at her. “This is very interesting, Miss Caroline,” said Colonel Carter. “All your own idea, eh?” “Well, not exactly. I’ll tell you about it. You know that big map of the county we have in the hall?” We all said Yes. “As M. Poirot was going out the other day, he stopped and looked at it, and he made some remark—I can’t remember exactly what it was. Something about Cranchester being the only big town anywhere near us—which is true, of course. But after he had gone—it came to me suddenly.” “What came to you?” “His meaning. Of course Ralph is in Cranchester.” It was at that moment that I knocked down the rack that held my pieces. My sister immediately reproved me for clumsiness, but half-heartedly. She was intent on her theory. “Cranchester, Miss Caroline?” said Colonel Carter. “Surely not Cranchester! It’s so near.” “That’s exactly it,” cried Caroline triumphantly. “It seems quite clear by now that he didn’t get away from here by train. He must simply have walked into Cranchester. And I believe he’s there still. No one would dream of his being so near at hand.” I pointed out several objections to the theory, but when once Caroline has got something firmly into her head, nothing dislodges it. “And you think M. Poirot has the same idea,” said Miss Ganett thoughtfully. “It’s a curious coincidence, but I was out for a walk this afternoon on the Cranchester road, and he passed me in a car coming from that direction.” We all looked at each other. “Why, dear me,” said Miss Ganett suddenly, “I’m Mah Jong all the time, and I never noticed it.” Caroline’s attention was distracted from her own inventive exercises. She pointed out to Miss Ganett that a hand consisting of mixed suits and too many Chows was hardly worth going Mah Jong on. Miss Ganett listened imperturbably and collected her counters. “Yes, dear, I know what you mean,” she said. “But it rather depends on what kind of a hand you have to start with, doesn’t it?” “You’ll never get the big hands if you don’t go for them,” urged Caroline. “Well, we must all play our own way, mustn’t we?” said Miss Ganett. She looked down at her counters. “After all, I’m up, so far.” Caroline, who was considerably down, said nothing. East Wind passed, and we set to once more. Annie brought in the tea things. Caroline and Miss Ganett were both slightly ruffled as is often the case during one of these festive evenings. “If you would only play a leetle quicker, dear,” said Caroline, as Miss Ganett hesitated over her discard. “The Chinese put down the tiles so quickly it sounds like little birds pattering.” For some few minutes we played like the Chinese. “You haven’t contributed much to the sum of information, Sheppard,” said Colonel Carter genially. “You’re a sly dog. Hand in glove with the great detective, and not a hint as to the way things are going.” “James is an extraordinary creature,” said Caroline. “He can _not_ bring himself to part with information.” She looked at me with some disfavor. “I assure you,” I said, “that I don’t know anything. Poirot keeps his own counsel.” “Wise man,” said the colonel with a chuckle. “He doesn’t give himself away. But they’re wonderful fellows, these foreign detectives. Up to all sorts of dodges, I believe.” “Pung,” said Miss Ganett, in a tone of quiet triumph. “And Mah Jong.” The situation became more strained. It was annoyance at Miss Ganett’s going Mah Jong for the third time running which prompted Caroline to say to me as we built a fresh wall:— “You are too tiresome, James. You sit there like a dead head, and say nothing at all!” “But, my dear,” I protested, “I have really nothing to say—that is, of the kind you mean.” “Nonsense,” said Caroline, as she sorted her hand. “You _must_ know something interesting.” I did not answer for a moment. I was overwhelmed and intoxicated. I had read of there being such a thing as the Perfect Winning—going Mah Jong on one’s original hand. I had never hoped to hold the hand myself. With suppressed triumph I laid my hand face upwards on the table. “As they say in the Shanghai Club,” I remarked, “Tin-ho—the Perfect Winning!” The colonel’s eyes nearly bulged out of his head. “Upon my soul,” he said. “What an extraordinary thing. I never saw that happen before!” It was then that I went on, goaded by Caroline’s gibes, and rendered reckless by my triumph. “And as to anything interesting,” I said. “What about a gold wedding ring with a date and ‘From R.’ inside.” I pass over the scene that followed. I was made to say exactly where this treasure was found. I was made to reveal the date. “March 13th,” said Caroline. “Just six months ago. Ah!” Out of the babel of excited suggestions and suppositions three theories were evolved:— 1. That of Colonel Carter: that Ralph was secretly married to Flora. The first or most simple solution. 2. That of Miss Ganett: that Roger Ackroyd had been secretly married to Mrs. Ferrars. 3. That of my sister: that Roger Ackroyd had married his housekeeper, Miss Russell. A fourth or super-theory was propounded by Caroline later as we went up to bed. “Mark my words,” she said suddenly, “I shouldn’t be at all surprised if Geoffrey Raymond and Flora weren’t married.” “Surely it would be ‘From G,’ not ‘From R’ then,” I suggested. “You never know. Some girls call men by their surnames. And you heard what Miss Ganett said this evening—about Flora’s carryings on.” Strictly speaking, I had not heard Miss Ganett say anything of the kind, but I respected Caroline’s knowledge of innuendoes. “How about Hector Blunt,” I hinted. “If it’s anybody——” “Nonsense,” said Caroline. “I dare say he admires her—may even be in love with her. But depend upon it a girl isn’t going to fall in love with a man old enough to be her father when there’s a good-looking young secretary about. She may encourage Major Blunt just as a blind. Girls are very artful. But there’s one thing I _do_ tell you, James Sheppard. Flora Ackroyd does not care a penny piece for Ralph Paton, and never has. You can take it from me.” I took it from her meekly. PARKER It occurred to me the next morning that under the exhilaration produced by Tin-ho, or the Perfect Winning, I might have been slightly indiscreet. True, Poirot had not asked me to keep the discovery of the ring to myself. On the other hand, he had said nothing about it whilst at Fernly, and as far as I knew, I was the only person aware that it had been found. I felt distinctly guilty. The fact was by now spreading through King’s Abbot like wildfire. I was expecting wholesale reproaches from Poirot any minute. The joint funeral of Mrs. Ferrars and Roger Ackroyd was fixed for eleven o’clock. It was a melancholy and impressive ceremony. All the party from Fernly were there. After it was over, Poirot, who had also been present, took me by the arm, and invited me to accompany him back to The Larches. He was looking very grave, and I feared that my indiscretion of the night before had got round to his ears. But it soon transpired that his thoughts were occupied by something of a totally different nature. “See you,” he said. “We must act. With your help I propose to examine a witness. We will question him, we will put such fear into him that the truth is bound to come out.” “What witness are you talking of?” I asked, very much surprised. “Parker!” said Poirot. “I asked him to be at my house this morning at twelve o’clock. He should await us there at this very minute.” “What do you think,” I ventured, glancing sideways at his face. “I know this—that I am not satisfied.” “You think that it was he who blackmailed Mrs. Ferrars?” “Either that, or——” “Well?” I said, after waiting a minute or two. “My friend, I will say this to you—I hope it was he.” The gravity of his manner, and something indefinable that tinged it, reduced me to silence. On arrival at The Larches, we were informed that Parker was already there awaiting our return. As we entered the room, the butler rose respectfully. “Good morning, Parker,” said Poirot pleasantly. “One instant, I pray of you.” He removed his overcoat and gloves. “Allow me, sir,” said Parker, and sprang forward to assist him. He deposited the articles neatly on a chair by the door. Poirot watched him with approval. “Thank you, my good Parker,” he said. “Take a seat, will you not? What I have to say may take some time.” Parker seated himself with an apologetic bend of the head. “Now what do you think I asked you to come here for this morning—eh?” Parker coughed. “I understood, sir, that you wished to ask me a few questions about my late master—private like.” “_Précisément_,” said Poirot, beaming. “Have you made many experiments in blackmail?” “Sir!” The butler sprang to his feet. “Do not excite yourself,” said Poirot placidly. “Do not play the farce of the honest, injured man. You know all there is to know about the blackmail, is it not so?” “Sir, I—I’ve never—never been——” “Insulted,” suggested Poirot, “in such a way before. Then why, my excellent Parker, were you so anxious to overhear the conversation in Mr. Ackroyd’s study the other evening, after you had caught the word blackmail?” “I wasn’t—I——” “Who was your last master?” rapped out Poirot suddenly. “My last master?” “Yes, the master you were with before you came to Mr. Ackroyd.” “A Major Ellerby, sir——” Poirot took the words out of his mouth. “Just so, Major Ellerby. Major Ellerby was addicted to drugs, was he not? You traveled about with him. When he was in Bermuda there was some trouble—a man was killed. Major Ellerby was partly responsible. It was hushed up. But you knew about it. How much did Major Ellerby pay you to keep your mouth shut?” Parker was staring at him open-mouthed. The man had gone to pieces, his cheeks shook flabbily. “You see, me, I have made inquiries,” said Poirot pleasantly. “It is as I say. You got a good sum then as blackmail, and Major Ellerby went on paying you until he died. Now I want to hear about your latest experiment.” Parker still stared. “It is useless to deny. Hercule Poirot _knows_. It is so, what I have said about Major Ellerby, is it not?” As though against his will, Parker nodded reluctantly once. His face was ashen pale. “But I never hurt a hair of Mr. Ackroyd’s head,” he moaned. “Honest to God, sir, I didn’t. I’ve been afraid of this coming all the time. And I tell you I didn’t—I didn’t kill him.” His voice rose almost to a scream. “I am inclined to believe you, my friend,” said Poirot. “You have not the nerve—the courage. But I must have the truth.” “I’ll tell you anything, sir, anything you want to know. It’s true that I tried to listen that night. A word or two I heard made me curious. And Mr. Ackroyd’s wanting not to be disturbed, and shutting himself up with the doctor the way he did. It’s God’s own truth what I told the police. I heard the word blackmail, sir, and well——” He paused. “You thought there might be something in it for you?” suggested Poirot smoothly. “Well—well, yes, I did, sir. I thought that if Mr. Ackroyd was being blackmailed, why shouldn’t I have a share of the pickings?” A very curious expression passed over Poirot’s face. He leaned forward. “Had you any reason to suppose before that night that Mr. Ackroyd was being blackmailed?” “No, indeed, sir. It was a great surprise to me. Such a regular gentleman in all his habits.” “How much did you overhear?” “Not very much, sir. There seemed what I might call a spite against me. Of course I had to attend to my duties in the pantry. And when I did creep along once or twice to the study it was no use. The first time Dr. Sheppard came out and almost caught me in the act, and another time Mr. Raymond passed me in the big hall and went that way, so I knew it was no use; and when I went with the tray, Miss Flora headed me off.” Poirot stared for a long time at the man, as if to test his sincerity. Parker returned his gaze earnestly. “I hope you believe me, sir. I’ve been afraid all along the police would rake up that old business with Major Ellerby and be suspicious of me in consequence.” “_Eh bien_,” said Poirot at last. “I am disposed to believe you. But there is one thing I must request of you—to show me your bank-book. You have a bank-book, I presume?” “Yes, sir, as a matter of fact, I have it with me now.” With no sign of confusion, he produced it from his pocket. Poirot took the slim, green-covered book and perused the entries. “Ah! I perceive you have purchased £500 of National Savings Certificates this year?” “Yes, sir. I have already over a thousand pounds saved—the result of my connection with—er—my late master, Major Ellerby. And I have had quite a little flutter on some horses this year—very successful. If you remember, sir, a rank outsider won the Jubilee. I was fortunate enough to back it—£20.” Poirot handed him back the book. “I will wish you good-morning. I believe that you have told me the truth. If you have not—so much the worse for you, my friend.” When Parker had departed, Poirot picked up his overcoat once more. “Going out again?” I asked. “Yes, we will pay a little visit to the good M. Hammond.” “You believe Parker’s story?” “It is credible enough on the face of it. It seems clear that—unless he is a very good actor indeed—he genuinely believes it was Ackroyd himself who was the victim of blackmail. If so, he knows nothing at all about the Mrs. Ferrars business.” “Then in that case—who——” “_Précisément!_ Who? But our visit to M. Hammond will accomplish one purpose. It will either clear Parker completely or else——” “Well?” “I fall into the bad habit of leaving my sentences unfinished this morning,” said Poirot apologetically. “You must bear with me.” “By the way,” I said, rather sheepishly, “I’ve got a confession to make. I’m afraid I have inadvertently let out something about that ring.” “What ring?” “The ring you found in the goldfish pond.” “Ah! yes,” said Poirot, smiling broadly. “I hope you’re not annoyed? It was very careless of me.” “But not at all, my good friend, not at all. I laid no commands upon you. You were at liberty to speak of it if you so wished. She was interested, your sister?” “She was indeed. It created a sensation. All sorts of theories are flying about.” “Ah! And yet it is so simple. The true explanation leapt to the eye, did it not?” “Did it?” I said dryly. Poirot laughed. “The wise man does not commit himself,” he observed. “Is not that so? But here we are at Mr. Hammond’s.” The lawyer was in his office, and we were ushered in without any delay. He rose and greeted us in his dry, precise manner. Poirot came at once to the point. “Monsieur, I desire from you certain information, that is, if you will be so good as to give it to me. You acted, I understand, for the late Mrs. Ferrars of King’s Paddock?” I noticed the swift gleam of surprise which showed in the lawyer’s eyes, before his professional reserve came down once more like a mask over his face. “Certainly. All her affairs passed through our hands.” “Very good. Now, before I ask you to tell me anything, I should like you to listen to the story Dr. Sheppard will relate to you. You have no objection, have you, my friend, to repeating the conversation you had with Mr. Ackroyd last Friday night?” “Not in the least,” I said, and straightway began the recital of that strange evening. Hammond listened with close attention. “That is all,” I said, when I had finished. “Blackmail,” said the lawyer thoughtfully. “You are surprised?” asked Poirot. The lawyer took off his pince-nez and polished them with his handkerchief. “No,” he replied, “I can hardly say that I am surprised. I have suspected something of the kind for some time.” “That brings us,” said Poirot, “to the information for which I am asking. If any one can give us an idea of the actual sums paid, you are the man, monsieur.” “I see no object in withholding the information,” said Hammond, after a moment or two. “During the past year, Mrs. Ferrars has sold out certain securities, and the money for them was paid into her account and not reinvested. As her income was a large one, and she lived very quietly after her husband’s death, it seems certain that these sums of money were paid away for some special purpose. I once sounded her on the subject, and she said that she was obliged to support several of her husband’s poor relations. I let the matter drop, of course. Until now, I have always imagined that the money was paid to some woman who had had a claim on Ashley Ferrars. I never dreamed that Mrs. Ferrars herself was involved.” “And the amount?” asked Poirot. “In all, I should say the various sums totaled at least twenty thousand pounds.” “Twenty thousand pounds!” I exclaimed. “In one year!” “Mrs. Ferrars was a very wealthy woman,” said Poirot dryly. “And the penalty for murder is not a pleasant one.” “Is there anything else that I can tell you?” inquired Mr. Hammond. “I thank you, no,” said Poirot, rising. “All my excuses for having deranged you.” “Not at all, not at all.” “The word derange,” I remarked, when we were outside again, “is applicable to mental disorder only.” “Ah!” cried Poirot, “never will my English be quite perfect. A curious language. I should then have said disarranged, _n’est-ce pas_?” “Disturbed is the word you had in mind.” “I thank you, my friend. The word exact, you are zealous for it. _Eh bien_, what about our friend Parker now? With twenty thousand pounds in hand, would he have continued being a butler? _Je ne pense pas._ It is, of course, possible that he banked the money under another name, but I am disposed to believe he spoke the truth to us. If he is a scoundrel, he is a scoundrel on a mean scale. He has not the big ideas. That leaves us as a possibility, Raymond, or—well—Major Blunt.” “Surely not Raymond,” I objected. “Since we know that he was desperately hard up for a matter of five hundred pounds.” “That is what he says, yes.” “And as to Hector Blunt——” “I will tell you something as to the good Major Blunt,” interrupted Poirot. “It is my business to make inquiries. I make them. _Eh bien_—that legacy of which he speaks, I have discovered that the amount of it was close upon twenty thousand pounds. What do you think of that?” I was so taken aback that I could hardly speak. “It’s impossible,” I said at last. “A well-known man like Hector Blunt.” Poirot shrugged his shoulders. “Who knows? At least he is a man with big ideas. I confess that I hardly see him as a blackmailer, but there is another possibility that you have not even considered.” “What is that?” “The fire, my friend. Ackroyd himself may have destroyed that letter, blue envelope and all, after you left him.” “I hardly think that likely,” I said slowly. “And yet—of course, it may be so. He might have changed his mind.” We had just arrived at my house, and on the spur of the moment I invited Poirot to come in and take pot luck. I thought Caroline would be pleased with me, but it is hard to satisfy one’s women folk. It appears that we were eating chops for lunch—the kitchen staff being regaled on tripe and onions. And two chops set before three people are productive of embarrassment. But Caroline is seldom daunted for long. With magnificent mendacity, she explained to Poirot that although James laughed at her for doing so, she adhered strictly to a vegetarian diet. She descanted ecstatically on the delights of nut cutlets (which I am quite sure she has never tasted) and ate a Welsh rarebit with gusto and frequent cutting remarks as to the dangers of “flesh” foods. Afterwards, when we were sitting in front of the fire and smoking, Caroline attacked Poirot directly. “Not found Ralph Paton yet?” she asked. “Where should I find him, mademoiselle?” “I thought, perhaps, you’d found him in Cranchester,” said Caroline, with intense meaning in her tone. Poirot looked merely bewildered. “In Cranchester? But why in Cranchester?” I enlightened him with a touch of malice. “One of our ample staff of private detectives happened to see you in a car on the Cranchester road yesterday,” I explained. Poirot’s bewilderment vanished. He laughed heartily. “Ah, that! A simple visit to the dentist, _c’est tout_. My tooth, it aches. I go there. My tooth, it is at once better. I think to return quickly. The dentist, he says No. Better to have it out. I argue. He insists. He has his way! That particular tooth, it will never ache again.” Caroline collapsed rather like a pricked balloon. We fell to discussing Ralph Paton. “A weak nature,” I insisted. “But not a vicious one.” “Ah!” said Poirot. “But weakness, where does it end?” “Exactly,” said Caroline. “Take James here—weak as water, if I weren’t about to look after him.” “My dear Caroline,” I said irritably, “can’t you talk without dragging in personalities?” “You _are_ weak, James,” said Caroline, quite unmoved. “I’m eight years older than you are—oh! I don’t mind M. Poirot knowing that——” “I should never have guessed it, mademoiselle,” said Poirot, with a gallant little bow. “Eight years older. But I’ve always considered it my duty to look after you. With a bad bringing up, Heaven knows what mischief you might have got into by now.” “I might have married a beautiful adventuress,” I murmured, gazing at the ceiling, and blowing smoke rings. “Adventuress!” said Caroline, with a snort. “If we’re talking of adventuresses——” She left the sentence unfinished. “Well?” I said, with some curiosity. “Nothing. But I can think of some one not a hundred miles away.” Then she turned to Poirot suddenly. “James sticks to it that you believe some one in the house committed the murder. All I can say is, you’re wrong.” “I should not like to be wrong,” said Poirot. “It is not—how do you say—my _métier_?” “I’ve got the facts pretty clearly,” continued Caroline, taking no notice of Poirot’s remark, “from James and others. As far as I can see, of the people in the house, only two _could_ have had the chance of doing it. Ralph Paton and Flora Ackroyd.” “My dear Caroline——” “Now, James, don’t interrupt me. I know what I’m talking about. Parker met her _outside_ the door, didn’t he? He didn’t hear her uncle saying good-night to her. She could have killed him then and there.” “Caroline.” “I’m not saying she _did_, James. I’m saying she _could_ have done. As a matter of fact, though Flora is like all these young girls nowadays, with no veneration for their betters and thinking they know best on every subject under the sun, I don’t for a minute believe she’d kill even a chicken. But there it is. Mr. Raymond and Major Blunt have alibis. Mrs. Ackroyd’s got an alibi. Even that Russell woman seems to have one—and a good job for her it is she has. Who is left? Only Ralph and Flora! And say what you will, I don’t believe Ralph Paton is a murderer. A boy we’ve known all our lives.” Poirot was silent for a minute, watching the curling smoke rise from his cigarette. When at last he spoke, it was in a gentle far-away voice that produced a curious impression. It was totally unlike his usual manner. “Let us take a man—a very ordinary man. A man with no idea of murder in his heart. There is in him somewhere a strain of weakness—deep down. It has so far never been called into play. Perhaps it never will be—and if so he will go to his grave honored and respected by every one. But let us suppose that something occurs. He is in difficulties—or perhaps not that even. He may stumble by accident on a secret—a secret involving life or death to some one. And his first impulse will be to speak out—to do his duty as an honest citizen. And then the strain of weakness tells. Here is a chance of money—a great amount of money. He wants money—he desires it—and it is so easy. He has to do nothing for it—just keep silence. That is the beginning. The desire for money grows. He must have more—and more! He is intoxicated by the gold mine which has opened at his feet. He becomes greedy. And in his greed he overreaches himself. One can press a man as far as one likes—but with a woman one must not press too far. For a woman has at heart a great desire to speak the truth. How many husbands who have deceived their wives go comfortably to their graves, carrying their secret with them! How many wives who have deceived their husbands wreck their lives by throwing the fact in those same husbands’ teeth! They have been pressed too far. In a reckless moment (which they will afterwards regret, _bien entendu_) they fling safety to the winds and turn at bay, proclaiming the truth with great momentary satisfaction to themselves. So it was, I think, in this case. The strain was too great. And so there came your proverb, the death of the goose that laid the golden eggs. But that is not the end. Exposure faced the man of whom we are speaking. And he is not the same man he was—say, a year ago. His moral fiber is blunted. He is desperate. He is fighting a losing battle, and he is prepared to take any means that come to his hand, for exposure means ruin to him. And so—the dagger strikes!” He was silent for a moment. It was as though he had laid a spell upon the room. I cannot try to describe the impression his words produced. There was something in the merciless analysis, and the ruthless power of vision which struck fear into both of us. “Afterwards,” he went on softly, “the danger removed, he will be himself again, normal, kindly. But if the need again arises, then once more he will strike.” Caroline roused herself at last. “You are speaking of Ralph Paton,” she said. “You may be right, you may not, but you have no business to condemn a man unheard.” The telephone bell rang sharply. I went out into the hall, and took off the receiver. “What?” I said. “Yes. Dr. Sheppard speaking.” I listened for a minute or two, then replied briefly. Replacing the receiver, I went back into the drawing-room. “Poirot,” I said, “they have detained a man at Liverpool. His name is Charles Kent, and he is believed to be the stranger who visited Fernly that night. They want me to go to Liverpool at once and identify him.” CHARLES KENT Half an hour later saw Poirot, myself, and Inspector Raglan in the train on the way to Liverpool. The inspector was clearly very excited. “We may get a line on the blackmailing part of the business, if on nothing else,” he declared jubilantly. “He’s a rough customer, this fellow, by what I heard over the phone. Takes dope, too. We ought to find it easy to get what we want out of him. If there was the shadow of a motive, nothing’s more likely than that he killed Mr. Ackroyd. But in that case, why is young Paton keeping out of the way? The whole thing’s a muddle—that’s what it is. By the way, M. Poirot, you were quite right about those fingerprints. They were Mr. Ackroyd’s own. I had rather the same idea myself, but I dismissed it as hardly feasible.” I smiled to myself. Inspector Raglan was so very plainly saving his face. “As regards this man,” said Poirot, “he is not yet arrested, eh?” “No, detained under suspicion.” “And what account does he give of himself?” “Precious little,” said the inspector, with a grin. “He’s a wary bird, I gather. A lot of abuse, but very little more.” On arrival at Liverpool I was surprised to find that Poirot was welcomed with acclamation. Superintendent Hayes, who met us, had worked with Poirot over some case long ago, and had evidently an exaggerated opinion of his powers. “Now we’ve got M. Poirot here we shan’t be long,” he said cheerfully. “I thought you’d retired, moosior?” “So I had, my good Hayes, so I had. But how tedious is retirement! You cannot imagine to yourself the monotony with which day comes after day.” “Very likely. So you’ve come to have a look at our own particular find? Is this Dr. Sheppard? Think you’ll be able to identify him, sir?” “I’m not very sure,” I said doubtfully. “How did you get hold of him?” inquired Poirot. “Description was circulated, as you know. In the press and privately. Not much to go on, I admit. This fellow has an American accent all right, and he doesn’t deny that he was near King’s Abbot that night. Just asks what the hell it is to do with us, and that he’ll see us in —— before he answers any questions.” “Is it permitted that I, too, see him?” asked Poirot. The superintendent closed one eye knowingly. “Very glad to have you, sir. You’ve got permission to do anything you please. Inspector Japp of Scotland Yard was asking after you the other day. Said he’d heard you were connected unofficially with this case. Where’s Captain Paton hiding, sir, can you tell me that?” “I doubt if it would be wise at the present juncture,” said Poirot primly, and I bit my lips to prevent a smile. The little man really did it very well. After some further parley, we were taken to interview the prisoner. He was a young fellow, I should say not more than twenty-two or three. Tall, thin, with slightly shaking hands, and the evidences of considerable physical strength somewhat run to seed. His hair was dark, but his eyes were blue and shifty, seldom meeting a glance squarely. I had all along cherished the illusion that there was something familiar about the figure I had met that night, but if this were indeed he, I was completely mistaken. He did not remind me in the least of any one I knew. “Now then, Kent,” said the superintendent, “stand up. Here are some visitors come to see you. Recognize any of them.” Kent glared at us sullenly, but did not reply. I saw his glance waver over the three of us, and come back to rest on me. “Well, sir,” said the superintendent to me, “what do you say?” “The height’s the same,” I said, “and as far as general appearance goes it might well be the man in question. Beyond that, I couldn’t go.” “What the hell’s the meaning of all this?” asked Kent. “What have you got against me? Come on, out with it! What am I supposed to have done?” I nodded my head. “It’s the man,” I said. “I recognize the voice.” “Recognize my voice, do you? Where do you think you heard it before?” “On Friday evening last, outside the gates of Fernly Park. You asked me the way there.” “I did, did I?” “Do you admit it?” asked the inspector. “I don’t admit anything. Not till I know what you’ve got on me.” “Have you not read the papers in the last few days?” asked Poirot, speaking for the first time. The man’s eyes narrowed. “So that’s it, is it? I saw an old gent had been croaked at Fernly. Trying to make out I did the job, are you?” “You were there that night,” said Poirot quietly. “How do you know, mister?” “By this.” Poirot took something from his pocket and held it out. It was the goose quill we had found in the summer-house. At the sight of it the man’s face changed. He half held out his hand. “Snow,” said Poirot thoughtfully. “No, my friend, it is empty. It lay where you dropped it in the summer-house that night.” Charles Kent looked at him uncertainly. “You seem to know a hell of a lot about everything, you little foreign cock duck. Perhaps you remember this: the papers say that the old gent was croaked between a quarter to ten and ten o’clock?” “That is so,” agreed Poirot. “Yes, but is it really so? That’s what I’m getting at.” “This gentleman will tell you,” said Poirot. He indicated Inspector Raglan. The latter hesitated, glanced at Superintendent Hayes, then at Poirot, and finally, as though receiving sanction, he said:— “That’s right. Between a quarter to ten and ten o’clock.” “Then you’ve nothing to keep me here for,” said Kent. “I was away from Fernly Park by twenty-five minutes past nine. You can ask at the Dog and Whistle. That’s a saloon about a mile out of Fernly on the road to Cranchester. I kicked up a bit of a row there, I remember. As near as nothing to quarter to ten, it was. How about that?” Inspector Raglan wrote down something in his notebook. “Well?” demanded Kent. “Inquiries will be made,” said the inspector. “If you’ve spoken the truth, you won’t have anything to complain about. What were you doing at Fernly Park anyway?” “Went there to meet some one.” “Who?” “That’s none of your business.” “You’d better keep a civil tongue in your head, my man,” the superintendent warned him. “To hell with a civil tongue. I went there on my own business, and that’s all there is to it. If I was clear away before the murder was done, that’s all that concerns the cops.” “Your name, it is Charles Kent,” said Poirot. “Where were you born?” The man stared at him, then he grinned. “I’m a full-blown Britisher all right,” he said. “Yes,” said Poirot meditatively, “I think you are. I fancy you were born in Kent.” The man stared. “Why’s that? Because of my name? What’s that to do with it? Is a man whose name is Kent bound to be born in that particular county?” “Under certain circumstances, I can imagine he might be,” said Poirot very deliberately. “Under certain circumstances, you comprehend.” There was so much meaning in his voice as to surprise the two police officers. As for Charles Kent, he flushed a brick red, and for a moment I thought he was going to spring at Poirot. He thought better of it, however, and turned away with a kind of laugh. Poirot nodded as though satisfied, and made his way out through the door. He was joined presently by the two officers. “We’ll verify that statement,” remarked Raglan. “I don’t think he’s lying, though. But he’s got to come clear with a statement as to what he was doing at Fernly. It looks to me as though we’d got our blackmailer all right. On the other hand, granted his story’s correct, he couldn’t have had anything to do with the actual murder. He’d got ten pounds on him when he was arrested—rather a large sum. I fancy that forty pounds went to him—the numbers of the notes didn’t correspond, but of course he’d have changed them first thing. Mr. Ackroyd must have given him the money, and he made off with it as fast as possible. What was that about Kent being his birthplace? What’s that got to do with it?” “Nothing whatever,” said Poirot mildly. “A little idea of mine, that was all. Me, I am famous for my little ideas.” “Are you really?” said Raglan, studying him with a puzzled expression. The superintendent went into a roar of laughter. “Many’s the time I’ve heard Inspector Japp say that. M. Poirot and his little ideas! Too fanciful for me, he’d say, but always something in them.” “You mock yourself at me,” said Poirot, smiling; “but never mind. The old ones they laugh last sometimes, when the young, clever ones do not laugh at all.” And nodding his head at them in a sage manner, he walked out into the street. He and I lunched together at an hotel. I know now that the whole thing lay clearly unravelled before him. He had got the last thread he needed to lead him to the truth. But at the time I had no suspicion of the fact. I overestimated his general self-confidence, and I took it for granted that the things which puzzled me must be equally puzzling to him. My chief puzzle was what the man Charles Kent could have been doing at Fernly. Again and again I put the question to myself and could get no satisfactory reply. At last I ventured a tentative query to Poirot. His reply was immediate. “_Mon ami_, I do not think; I know.” “Really?” I said incredulously. “Yes, indeed. I suppose now that to you it would not make sense if I said that he went to Fernly that night because he was born in Kent?” I stared at him. “It certainly doesn’t seem to make sense to me,” I said dryly. “Ah!” said Poirot pityingly. “Well, no matter. I have still my little idea.” FLORA ACKROYD As I was returning from my round the following morning, I was hailed by Inspector Raglan. I pulled up, and the inspector mounted on the step. “Good-morning, Dr. Sheppard,” he said. “Well, that alibi is all right enough.” “Charles Kent’s?” “Charles Kent’s. The barmaid at the Dog and Whistle, Sally Jones, she remembers him perfectly. Picked out his photograph from among five others. It was just a quarter to ten when he came into the bar, and the Dog and Whistle is well over a mile from Fernly Park. The girl mentions that he had a lot of money on him—she saw him take a handful of notes out of his pocket. Rather surprised her, it did, seeing the class of fellow he was, with a pair of boots clean dropping off him. That’s where that forty pounds went right enough.” “The man still refuses to give an account of his visit to Fernly?” “Obstinate as a mule he is. I had a chat with Hayes at Liverpool over the wire this morning.” “Hercule Poirot says he knows the reason the man went there that night,” I observed. “Does he?” cried the inspector eagerly. “Yes,” I said maliciously. “He says he went there because he was born in Kent.” I felt a distinct pleasure in passing on my own discomfiture. Raglan stared at me for a moment or two uncomprehendingly. Then a grin overspread his weaselly countenance and he tapped his forehead significantly. “Bit gone here,” he said. “I’ve thought so for some time. Poor old chap, so that’s why he had to give up and come down here. In the family, very likely. He’s got a nephew who’s quite off his crumpet.” “Poirot has?” I said, very surprised. “Yes. Hasn’t he ever mentioned him to you? Quite docile, I believe, and all that, but mad as a hatter, poor lad.” “Who told you that?” Again a grin showed itself on Inspector Raglan’s face. “Your sister, Miss Sheppard, she told me all about it.” Really, Caroline is amazing. She never rests until she knows the last details of everybody’s family secrets. Unfortunately, I have never been able to instill into her the decency of keeping them to herself. “Jump in, inspector,” I said, opening the door of the car. “We’ll go up to The Larches together, and acquaint our Belgian friend with the latest news.” “Might as well, I suppose. After all, even if he is a bit balmy, it was a useful tip he gave me about those fingerprints. He’s got a bee in his bonnet about the man Kent, but who knows—there may be something useful behind it.” Poirot received us with his usual smiling courtesy. He listened to the information we had brought him, nodding his head now and then. “Seems quite O.K., doesn’t it?” said the inspector rather gloomily. “A chap can’t be murdering some one in one place when he’s drinking in the bar in another place a mile away.” “Are you going to release him?” “Don’t see what else we can do. We can’t very well hold him for obtaining money on false pretences. Can’t prove a ruddy thing.” The inspector tossed a match into the grate in a disgruntled fashion. Poirot retrieved it and put it neatly in a little receptacle designed for the purpose. His action was purely mechanical. I could see that his thoughts were on something very different. “If I were you,” he said at last, “I should not release the man Charles Kent yet.” “What do you mean?” Raglan stared at him. “What I say. I should not release him yet.” “You don’t think he can have had anything to do with the murder, do you?” “I think probably not—but one cannot be certain yet.” “But haven’t I just told you——” Poirot raised a hand protestingly. “_Mais oui, mais oui._ I heard. I am not deaf—nor stupid, thank the good God! But see you, you approach the matter from the wrong—the wrong—premises, is not that the word?” The inspector stared at him heavily. “I don’t see how you make that out. Look here, we know Mr. Ackroyd was alive at a quarter to ten. You admit that, don’t you?” Poirot looked at him for a moment, then shook his head with a quick smile. “I admit nothing that is not—_proved_!” “Well, we’ve got proof enough of that. We’ve got Miss Flora Ackroyd’s evidence.” “That she said good-night to her uncle? But me—I do not always believe what a young lady tells me—no, not even when she is charming and beautiful.” “But hang it all, man, Parker saw her coming out of the door.” “No.” Poirot’s voice rang out with sudden sharpness. “That is just what he did not see. I satisfied myself of that by a little experiment the other day—you remember, doctor? Parker saw her _outside_ the door, with her hand on the handle. He did not see her come out of the room.” “But—where else could she have been?” “Perhaps on the stairs.” “The stairs?” “That is my little idea—yes.” “But those stairs only lead to Mr. Ackroyd’s bedroom.” “Precisely.” And still the inspector stared. “You think she’d been up to her uncle’s bedroom? Well, why not? Why should she lie about it?” “Ah! that is just the question. It depends on what she was doing there, does it not?” “You mean—the money? Hang it all, you don’t suggest that it was Miss Ackroyd who took that forty pounds?” “I suggest nothing,” said Poirot. “But I will remind you of this. Life was not very easy for that mother and daughter. There were bills—there was constant trouble over small sums of money. Roger Ackroyd was a peculiar man over money matters. The girl might be at her wit’s end for a comparatively small sum. Figure to yourself then what happens. She has taken the money, she descends the little staircase. When she is half-way down she hears the chink of glass from the hall. She has not a doubt of what it is—Parker coming to the study. At all costs she must not be found on the stairs—Parker will not forget it, he will think it odd. If the money is missed, Parker is sure to remember having seen her come down those stairs. She has just time to rush down to the study door—with her hand on the handle to show that she has just come out, when Parker appears in the doorway. She says the first thing that comes into her head, a repetition of Roger Ackroyd’s orders earlier in the evening, and then goes upstairs to her own room.” “Yes, but later,” persisted the inspector, “she must have realized the vital importance of speaking the truth? Why, the whole case hinges on it!” “Afterwards,” said Poirot dryly, “it was a little difficult for Mademoiselle Flora. She is told simply that the police are here and that there has been a robbery. Naturally she jumps to the conclusion that the theft of the money has been discovered. Her one idea is to stick to her story. When she learns that her uncle is dead she is panic-stricken. Young women do not faint nowadays, monsieur, without considerable provocation. _Eh bien!_ there it is. She is bound to stick to her story, or else confess everything. And a young and pretty girl does not like to admit that she is a thief—especially before those whose esteem she is anxious to retain.” Raglan brought his fist down with a thump on the table. “I’ll not believe it,” he said. “It’s—it’s not credible. And you—you’ve known this all along?” “The possibility has been in my mind from the first,” admitted Poirot. “I was always convinced that Mademoiselle Flora was hiding something from us. To satisfy myself, I made the little experiment I told you of. Dr. Sheppard accompanied me.” “A test for Parker, you said it was,” I remarked bitterly. “_Mon ami_,” said Poirot apologetically, “as I told you at the time, one must say something.” The inspector rose. “There’s only one thing for it,” he declared. “We must tackle the young lady right away. You’ll come up to Fernly with me, M. Poirot?” “Certainly. Dr. Sheppard will drive us up in his car.” I acquiesced willingly. On inquiry for Miss Ackroyd, we were shown into the billiard room. Flora and Major Hector Blunt were sitting on the long window seat. “Good-morning, Miss Ackroyd,” said the inspector. “Can we have a word or two alone with you?” Blunt got up at once and moved to the door. “What is it?” asked Flora nervously. “Don’t go, Major Blunt. He can stay, can’t he?” she asked, turning to the inspector. “That’s as you like,” said the inspector dryly. “There’s a question or two it’s my duty to put to you, miss, but I’d prefer to do so privately, and I dare say you’d prefer it also.” Flora looked keenly at him. I saw her face grow whiter. Then she turned and spoke to Blunt. “I want you to stay—please—yes, I mean it. Whatever the inspector has to say to me, I’d rather you heard it.” Raglan shrugged his shoulders. “Well, if you will have it so, that’s all there is to it. Now, Miss Ackroyd, M. Poirot here has made a certain suggestion to me. He suggests that you weren’t in the study at all last Friday night, that you never saw Mr. Ackroyd to say good-night to him, that instead of being in the study you were on the stairs leading down from your uncle’s bedroom when you heard Parker coming across the hall.” Flora’s gaze shifted to Poirot. He nodded back at her. “Mademoiselle, the other day, when we sat round the table, I implored you to be frank with me. What one does not tell to Papa Poirot he finds out. It was that, was it not? See, I will make it easy for you. You took the money, did you not?” “The money,” said Blunt sharply. There was a silence which lasted for at least a minute. Then Flora drew herself up and spoke. “M. Poirot is right. I took that money. I stole. I am a thief—yes, a common, vulgar little thief. Now you know! I am glad it has come out. It’s been a nightmare, these last few days!” She sat down suddenly and buried her face in her hands. She spoke huskily through her fingers. “You don’t know what my life has been since I came here. Wanting things, scheming for them, lying, cheating, running up bills, promising to pay—oh! I hate myself when I think of it all! That’s what brought us together, Ralph and I. We were both weak! I understood him, and I was sorry—because I’m the same underneath. We’re not strong enough to stand alone, either of us. We’re weak, miserable, despicable things.” She looked at Blunt and suddenly stamped her foot. “Why do you look at me like that—as though you couldn’t believe? I may be a thief—but at any rate I’m real now. I’m not lying any more. I’m not pretending to be the kind of girl you like, young and innocent and simple. I don’t care if you never want to see me again. I hate myself, despise myself—but you’ve got to believe one thing, if speaking the truth would have made things better for Ralph, I would have spoken out. But I’ve seen all along that it wouldn’t be better for Ralph—it makes the case against him blacker than ever. I was not doing him any harm by sticking to my lie.” “Ralph,” said Blunt. “I see—always Ralph.” “You don’t understand,” said Flora hopelessly. “You never will.” She turned to the inspector. “I admit everything; I was at my wit’s end for money. I never saw my uncle that evening after he left the dinner-table. As to the money, you can take what steps you please. Nothing could be worse than it is now!” Suddenly she broke down again, hid her face in her hands, and rushed from the room. “Well,” said the inspector in a flat tone, “so that’s that.” He seemed rather at a loss what to do next. Blunt came forward. “Inspector Raglan,” he said quietly, “that money was given to me by Mr. Ackroyd for a special purpose. Miss Ackroyd never touched it. When she says she did, she is lying with the idea of shielding Captain Paton. The truth is as I said, and I am prepared to go into the witness box and swear to it.” He made a kind of jerky bow, then turning abruptly, he left the room. Poirot was after him in a flash. He caught the other up in the hall. “Monsieur—a moment, I beg of you, if you will be so good.” “Well, sir?” Blunt was obviously impatient. He stood frowning down on Poirot. “It is this,” said Poirot rapidly: “I am not deceived by your little fantasy. No, indeed. It was truly Miss Flora who took the money. All the same it is well imagined what you say—it pleases me. It is very good what you have done there. You are a man quick to think and to act.” “I’m not in the least anxious for your opinion, thank you,” said Blunt coldly. He made once more as though to pass on, but Poirot, not at all offended, laid a detaining hand on his arm. “Ah! but you are to listen to me. I have more to say. The other day I spoke of concealments. Very well, all along have I seen what you are concealing. Mademoiselle Flora, you love her with all your heart. From the first moment you saw her, is it not so? Oh! let us not mind saying these things—why must one in England think it necessary to mention love as though it were some disgraceful secret? You love Mademoiselle Flora. You seek to conceal that fact from all the world. That is very good—that is as it should be. But take the advice of Hercule Poirot—do not conceal it from mademoiselle herself.” Blunt had shown several signs of restlessness whilst Poirot was speaking, but the closing words seemed to rivet his attention. “What d’you mean by that?” he said sharply. “You think that she loves the Capitaine Ralph Paton—but I, Hercule Poirot, tell you that that is not so. Mademoiselle Flora accepted Captain Paton to please her uncle, and because she saw in the marriage a way of escape from her life here which was becoming frankly insupportable to her. She liked him, and there was much sympathy and understanding between them. But love—no! It is not Captain Paton Mademoiselle Flora loves.” “What the devil do you mean?” asked Blunt. I saw the dark flush under his tan. “You have been blind, monsieur. Blind! She is loyal, the little one. Ralph Paton is under a cloud, she is bound in honor to stick by him.” I felt it was time I put in a word to help on the good work. “My sister told me the other night,” I said encouragingly, “that Flora had never cared a penny piece for Ralph Paton, and never would. My sister is always right about these things.” Blunt ignored my well-meant efforts. He spoke to Poirot. “D’you really think——” he began, and stopped. He is one of those inarticulate men who find it hard to put things into words. Poirot knows no such disability. “If you doubt me, ask her yourself, monsieur. But perhaps you no longer care to—the affair of the money——” Blunt gave a sound like an angry laugh. “Think I’d hold that against her? Roger was always a queer chap about money. She got in a mess and didn’t dare tell him. Poor kid. Poor lonely kid.” Poirot looked thoughtfully at the side door. “Mademoiselle Flora went into the garden, I think,” he murmured. “I’ve been every kind of a fool,” said Blunt abruptly. “Rum conversation we’ve been having. Like one of those Danish plays. But you’re a sound fellow, M. Poirot. Thank you.” He took Poirot’s hand and gave it a grip which caused the other to wince in anguish. Then he strode to the side door and passed out into the garden. “Not every kind of a fool,” murmured Poirot, tenderly nursing the injured member. “Only one kind—the fool in love.” MISS RUSSELL Inspector Raglan had received a bad jolt. He was not deceived by Blunt’s valiant lie any more than we had been. Our way back to the village was punctuated by his complaints. “This alters everything, this does. I don’t know whether you’ve realized it, Monsieur Poirot?” “I think so, yes, I think so,” said Poirot. “You see, me, I have been familiar with the idea for some time.” Inspector Raglan, who had only had the idea presented to him a short half-hour ago, looked at Poirot unhappily, and went on with his discoveries. “Those alibis now. Worthless! Absolutely worthless. Got to start again. Find out what every one was doing from nine-thirty onwards. Nine-thirty—that’s the time we’ve got to hang on to. You were quite right about the man Kent—we don’t release _him_ yet awhile. Let me see now—nine-forty-five at the Dog and Whistle. He might have got there in a quarter of an hour if he ran. It’s just possible that it was _his_ voice Mr. Raymond heard talking to Mr. Ackroyd—asking for money which Mr. Ackroyd refused. But one thing’s clear—it wasn’t he who sent the telephone message. The station is half a mile in the other direction—over a mile and a half from the Dog and Whistle, and he was at the Dog and Whistle until about ten minutes past ten. Dang that telephone call! We always come up against it.” “We do indeed,” agreed Poirot. “It is curious.” “It’s just possible that if Captain Paton climbed into his uncle’s room and found him there murdered, _he_ may have sent it. Got the wind up, thought he’d be accused, and cleared out. That’s possible, isn’t it?” “Why should he have telephoned?” “May have had doubts if the old man was really dead. Thought he’d get the doctor up there as soon as possible, but didn’t want to give himself away. Yes, I say now, how’s that for a theory? Something in that, I should say.” The inspector swelled his chest out importantly. He was so plainly delighted with himself that any words of ours would have been quite superfluous. We arrived back at my house at this minute, and I hurried in to my surgery patients, who had all been waiting a considerable time, leaving Poirot to walk to the police station with the inspector. Having dismissed the last patient, I strolled into the little room at the back of the house which I call my workshop—I am rather proud of the home-made wireless set I turned out. Caroline hates my workroom. I keep my tools there, and Annie is not allowed to wreak havoc with a dustpan and brush. I was just adjusting the interior of an alarm clock which had been denounced as wholly unreliable by the household, when the door opened and Caroline put her head in. “Oh! there you are, James,” she said, with deep disapproval. “M. Poirot wants to see you.” “Well,” I said, rather irritably, for her sudden entrance had startled me and I had let go of a piece of delicate mechanism, “if he wants to see me, he can come in here.” “In here?” said Caroline. “That’s what I said—in here.” Caroline gave a sniff of disapproval and retired. She returned in a moment or two, ushering in Poirot, and then retired again, shutting the door with a bang. “Aha! my friend,” said Poirot, coming forward and rubbing his hands. “You have not got rid of me so easily, you see!” “Finished with the inspector?” I asked. “For the moment, yes. And you, you have seen all the patients?” “Yes.” Poirot sat down and looked at me, tilting his egg-shaped head on one side, with the air of one who savors a very delicious joke. “You are in error,” he said at last. “You have still one patient to see.” “Not you?” I exclaimed in surprise. “Ah, not me, _bien entendu_. Me, I have the health magnificent. No, to tell you the truth, it is a little _complot_ of mine. There is some one I wish to see, you understand—and at the same time it is not necessary that the whole village should intrigue itself about the matter—which is what would happen if the lady were seen to come to my house—for it is a lady. But to you she has already come as a patient before.” “Miss Russell!” I exclaimed. “_Précisément._ I wish much to speak with her, so I send her the little note and make the appointment in your surgery. You are not annoyed with me?” “On the contrary,” I said. “That is, presuming I am allowed to be present at the interview?” “But naturally! In your own surgery!” “You know,” I said, throwing down the pincers I was holding, “it’s extraordinarily intriguing, the whole thing. Every new development that arises is like the shake you give to a kaleidoscope—the thing changes entirely in aspect. Now, why are you so anxious to see Miss Russell?” Poirot raised his eyebrows. “Surely it is obvious?” he murmured. “There you go again,” I grumbled. “According to you everything is obvious. But you leave me walking about in a fog.” Poirot shook his head genially at me. “You mock yourself at me. Take the matter of Mademoiselle Flora. The inspector was surprised—but you—you were not.” “I never dreamed of her being the thief,” I expostulated. “That—perhaps no. But I was watching your face and you were not—like Inspector Raglan—startled and incredulous.” I thought for a minute or two. “Perhaps you are right,” I said at last. “All along I’ve felt that Flora was keeping back something—so the truth, when it came, was subconsciously expected. It upset Inspector Raglan very much indeed, poor man.” “Ah! _pour ça, oui_! The poor man must rearrange all his ideas. I profited by his state of mental chaos to induce him to grant me a little favor.” “What was that?” Poirot took a sheet of notepaper from his pocket. Some words were written on it, and he read them aloud. “The police have, for some days, been seeking for Captain Ralph Paton, the nephew of Mr. Ackroyd of Fernly Park, whose death occurred under such tragic circumstances last Friday. Captain Paton has been found at Liverpool, where he was on the point of embarking for America.” He folded up the piece of paper again. “That, my friend, will be in the newspapers to-morrow morning.” I stared at him, dumbfounded. “But—but it isn’t true! He’s not at Liverpool!” Poirot beamed on me. “You have the intelligence so quick! No, he has not been found at Liverpool. Inspector Raglan was very loath to let me send this paragraph to the press, especially as I could not take him into my confidence. But I assured him most solemnly that very interesting results would follow its appearance in print, so he gave in, after stipulating that he was, on no account, to bear the responsibility.” I stared at Poirot. He smiled back at me. “It beats me,” I said at last, “what you expect to get out of that.” “You should employ your little gray cells,” said Poirot gravely. He rose and came across to the bench. “It is that you have really the love of the machinery,” he said, after inspecting the débris of my labors. Every man has his hobby. I immediately drew Poirot’s attention to my home-made wireless. Finding him sympathetic, I showed him one or two little inventions of my own—trifling things, but useful in the house. “Decidedly,” said Poirot, “you should be an inventor by trade, not a doctor. But I hear the bell—that is your patient. Let us go into the surgery.” Once before I had been struck by the remnants of beauty in the housekeeper’s face. This morning I was struck anew. Very simply dressed in black, tall, upright and independent as ever, with her big dark eyes and an unwonted flush of color in her usually pale cheeks, I realized that as a girl she must have been startlingly handsome. “Good-morning, mademoiselle,” said Poirot. “Will you be seated? Dr. Sheppard is so kind as to permit me the use of his surgery for a little conversation I am anxious to have with you.” Miss Russell sat down with her usual composure. If she felt any inward agitation, it did not display itself in any outward manifestation. “It seems a queer way of doing things, if you’ll allow me to say so,” she remarked. “Miss Russell—I have news to give you.” “Indeed!” “Charles Kent has been arrested at Liverpool.” Not a muscle of her face moved. She merely opened her eyes a trifle wider, and asked, with a tinge of defiance: “Well, what of it?” But at that moment it came to me—the resemblance that had haunted me all along, something familiar in the defiance of Charles Kent’s manner. The two voices, one rough and coarse, the other painfully ladylike—were strangely the same in timbre. It was of Miss Russell that I had been reminded that night outside the gates of Fernly Park. I looked at Poirot, full of my discovery, and he gave me an imperceptible nod. In answer to Miss Russell’s question, he threw out his hands in a thoroughly French gesture. “I thought you might be interested, that is all,” he said mildly. “Well, I’m not particularly,” said Miss Russell. “Who is this Charles Kent anyway?” “He is a man, mademoiselle, who was at Fernly on the night of the murder.” “Really?” “Fortunately for him, he has an alibi. At a quarter to ten he was at a public-house a mile from here.” “Lucky for him,” commented Miss Russell. “But we still do not know what he was doing at Fernly—who it was he went to meet, for instance.” “I’m afraid I can’t help you at all,” said the housekeeper politely. “Nothing came to _my_ ears. If that is all——” She made a tentative movement as though to rise. Poirot stopped her. “It is not quite all,” he said smoothly. “This morning fresh developments have arisen. It seems now that Mr. Ackroyd was murdered, not at a quarter to ten, but _before_. Between ten minutes to nine, when Dr. Sheppard left, and a quarter to ten.” I saw the color drain from the housekeeper’s face, leaving it dead white. She leaned forward, her figure swaying. “But Miss Ackroyd said—Miss Ackroyd said——” “Miss Ackroyd has admitted that she was lying. She was never in the study at all that evening.” “Then——?” “Then it would seem that in this Charles Kent we have the man we are looking for. He came to Fernly, can give no account of what he was doing there——” “I can tell you what he was doing there. He never touched a hair of old Ackroyd’s head—he never went near the study. He didn’t do it, I tell you.” She was leaning forward. That iron self-control was broken through at last. Terror and desperation were in her face. “M. Poirot! M. Poirot! Oh, do believe me.” Poirot got up and came to her. He patted her reassuringly on the shoulder. “But yes—but yes, I will believe. I had to make you speak, you know.” For an instant suspicion flared up in her. “Is what you said true?” “That Charles Kent is suspected of the crime? Yes, that is true. You alone can save him, by telling the reason for his being at Fernly.” “He came to see me.” She spoke in a low, hurried voice. “I went out to meet him——” “In the summer-house, yes, I know.” “How do you know?” “Mademoiselle, it is the business of Hercule Poirot to know things. I know that you went out earlier in the evening, that you left a message in the summer-house to say what time you would be there.” “Yes, I did. I had heard from him—saying he was coming. I dared not let him come to the house. I wrote to the address he gave me and said I would meet him in the summer-house, and described it to him so that he would be able to find it. Then I was afraid he might not wait there patiently, and I ran out and left a piece of paper to say I would be there about ten minutes past nine. I didn’t want the servants to see me, so I slipped out through the drawing-room window. As I came back, I met Dr. Sheppard, and I fancied that he would think it queer. I was out of breath, for I had been running. I had no idea that he was expected to dinner that night.” She paused. “Go on,” said Poirot. “You went out to meet him at ten minutes past nine. What did you say to each other?” “It’s difficult. You see——” “Mademoiselle,” said Poirot, interrupting her, “in this matter I must have the whole truth. What you tell us need never go beyond these four walls. Dr. Sheppard will be discreet, and so shall I. See, I will help you. This Charles Kent, he is your son, is he not?” She nodded. The color had flamed into her cheeks. “No one has ever known. It was long ago—long ago—down in Kent. I was not married....” “So you took the name of the county as a surname for him. I understand.” “I got work. I managed to pay for his board and lodging. I never told him that I was his mother. But he turned out badly, he drank, then took to drugs. I managed to pay his passage out to Canada. I didn’t hear of him for a year or two. Then, somehow or other, he found out that I was his mother. He wrote asking me for money. Finally, I heard from him back in this country again. He was coming to see me at Fernly, he said. I dared not let him come to the house. I have always been considered so—so very respectable. If any one got an inkling—it would have been all up with my post as housekeeper. So I wrote to him in the way I have just told you.” “And in the morning you came to see Dr. Sheppard?” “Yes. I wondered if something could be done. He was not a bad boy—before he took to drugs.” “I see,” said Poirot. “Now let us go on with the story. He came that night to the summer-house?” “Yes, he was waiting for me when I got there. He was very rough and abusive. I had brought with me all the money I had, and I gave it to him. We talked a little, and then he went away.” “What time was that?” “It must have been between twenty and twenty-five minutes past nine. It was not yet half-past when I got back to the house.” “Which way did he go?” “Straight out the same way he came, by the path that joined the drive just inside the lodge gates.” Poirot nodded. “And you, what did you do?” “I went back to the house. Major Blunt was walking up and down the terrace smoking, so I made a detour to get round to the side door. It was then just on half-past nine, as I tell you.” Poirot nodded again. He made a note or two in a microscopic pocket-book. “I think that is all,” he said thoughtfully. “Ought I——” she hesitated. “Ought I to tell all this to Inspector Raglan?” “It may come to that. But let us not be in a hurry. Let us proceed slowly, with due order and method. Charles Kent is not yet formally charged with murder. Circumstances may arise which will render your story unnecessary.” Miss Russell rose. “Thank you very much, M. Poirot,” she said. “You have been very kind—very kind indeed. You—you do believe me, don’t you? That Charles had nothing to do with this wicked murder!” “There seems no doubt that the man who was talking to Mr. Ackroyd in the library at nine-thirty could not possibly have been your son. Be of good courage, mademoiselle. All will yet be well.” Miss Russell departed. Poirot and I were left together. “So that’s that,” I said. “Every time we come back to Ralph Paton. How did you manage to spot Miss Russell as the person Charles Kent came to meet? Did you notice the resemblance?” “I had connected her with the unknown man long before we actually came face to face with him. As soon as we found that quill. The quill suggested dope, and I remembered your account of Miss Russell’s visit to you. Then I found the article on cocaine in that morning’s paper. It all seemed very clear. She had heard from some one that morning—some one addicted to drugs, she read the article in the paper, and she came to you to ask a few tentative questions. She mentioned cocaine, since the article in question was on cocaine. Then, when you seemed too interested, she switched hurriedly to the subject of detective stories and untraceable poisons. I suspected a son or a brother, or some other undesirable male relation. Ah! but I must go. It is the time of the lunch.” “Stay and lunch with us,” I suggested. Poirot shook his head. A faint twinkle came into his eye. “Not again to-day. I should not like to force Mademoiselle Caroline to adopt a vegetarian diet two days in succession.” It occurred to me that there was not much which escaped Hercule Poirot. THE PARAGRAPH IN THE PAPER Caroline, of course, had not failed to see Miss Russell come to the surgery door. I had anticipated this, and had ready an elaborate account of the lady’s bad knee. But Caroline was not in a cross-questioning mood. Her point of view was that she knew what Miss Russell had really come for and that _I_ didn’t. “Pumping you, James,” said Caroline. “Pumping you in the most shameless manner, I’ve not a doubt. It’s no good interrupting. I dare say you hadn’t the least idea she was doing it even. Men _are_ so simple. She knows that you are in M. Poirot’s confidence, and she wants to find out things. Do you know what I think, James?” “I couldn’t begin to imagine. You think so many extraordinary things.” “It’s no good being sarcastic. I think Miss Russell knows more about Mr. Ackroyd’s death than she is prepared to admit.” Caroline leaned back triumphantly in her chair. “Do you really think so?” I said absently. “You are very dull to-day, James. No animation about you. It’s that liver of yours.” Our conversation then dealt with purely personal matters. The paragraph inspired by Poirot duly appeared in our daily paper the next morning. I was in the dark as to its purpose, but its effect on Caroline was immense. She began by stating, most untruly, that she had said as much all along. I raised my eyebrows, but did not argue. Caroline, however, must have felt a prick of conscience, for she went on:— “I mayn’t have actually mentioned Liverpool, but I knew he’d try to get away to America. That’s what Crippen did.” “Without much success,” I reminded her. “Poor boy, and so they’ve caught him. I consider, James, that it’s your duty to see that he isn’t hung.” “What do you expect me to do?” “Why, you’re a medical man, aren’t you? You’ve known him from a boy upwards. Not mentally responsible. That’s the line to take, clearly. I read only the other day that they’re very happy in Broadmoor—it’s quite like a high-class club.” But Caroline’s words had reminded me of something. “I never knew that Poirot had an imbecile nephew?” I said curiously. “Didn’t you? Oh, he told me all about it. Poor lad. It’s a great grief to all the family. They’ve kept him at home so far, but it’s getting to such a pitch that they’re afraid he’ll have to go into some kind of institution.” “I suppose you know pretty well everything there is to know about Poirot’s family by this time,” I said, exasperated. “Pretty well,” said Caroline complacently. “It’s a great relief to people to be able to tell all their troubles to some one.” “It might be,” I said, “if they were ever allowed to do so spontaneously. Whether they enjoy having confidences screwed out of them by force is another matter.” Caroline merely looked at me with the air of a Christian martyr enjoying martyrdom. “You are so self-contained, James,” she said. “You hate speaking out, or parting with any information yourself, and you think everybody else must be just like you. I should hope that I never screw confidences out of anybody. For instance, if M. Poirot comes in this afternoon, as he said he might do, I shall not dream of asking him who it was arrived at his house early this morning.” “Early this morning?” I queried. “Very early,” said Caroline. “Before the milk came. I just happened to be looking out of the window—the blind was flapping. It was a man. He came in a closed car, and he was all muffled up. I couldn’t get a glimpse of his face. But I will tell you _my_ idea, and you’ll see that I’m right.” “What’s your idea?” Caroline dropped her voice mysteriously. “A Home Office expert,” she breathed. “A Home Office expert,” I said, amazed. “My dear Caroline!” “Mark my words, James, you’ll see that I’m right. That Russell woman was here that morning after your poisons. Roger Ackroyd might easily have been poisoned in his food that night.” I laughed out loud. “Nonsense,” I cried. “He was stabbed in the neck. You know that as well as I do.” “After death, James,” said Caroline; “to make a false clew.” “My good woman,” I said, “I examined the body, and I know what I’m talking about. That wound wasn’t inflicted after death—it was the cause of death, and you need make no mistake about it.” Caroline merely continued to look omniscient, which so annoyed me that I went on:— “Perhaps you will tell me, Caroline, if I have a medical degree or if I have not?” “You have the medical degree, I dare say, James—at least, I mean I know you have. But you’ve no imagination whatever.” “Having endowed you with a treble portion, there was none left over for me,” I said dryly. I was amused to notice Caroline’s maneuvers that afternoon when Poirot duly arrived. My sister, without asking a direct question, skirted the subject of the mysterious guest in every way imaginable. By the twinkle in Poirot’s eyes, I saw that he realized her object. He remained blandly impervious, and blocked her bowling so successfully that she herself was at a loss how to proceed. Having, I suspect, quietly enjoyed the little game, he rose to his feet and suggested a walk. “It is that I need to reduce the figure a little,” he explained. “You will come with me, doctor? And perhaps later Miss Caroline will give us some tea.” “Delighted,” said Caroline. “Won’t your—er—guest come in also?” “You are too kind,” said Poirot. “But no, my friend reposes himself. Soon you must make his acquaintance.” “Quite an old friend of yours, so somebody told me,” said Caroline, making one last valiant effort. “Did they?” murmured Poirot. “Well, we must start.” Our tramp took us in the direction of Fernly. I had guessed beforehand that it might do so. I was beginning to understand Poirot’s methods. Every little irrelevancy had a bearing upon the whole. “I have a commission for you, my friend,” he said at last. “To-night, at my house, I desire to have a little conference. You will attend, will you not?” “Certainly,” I said. “Good. I need also all those in the house—that is to say: Mrs. Ackroyd, Mademoiselle Flora, Major Blunt, M. Raymond. I want you to be my ambassador. This little reunion is fixed for nine o’clock. You will ask them—yes?” “With pleasure; but why not ask them yourself?” “Because they will then put the questions: Why? What for? They will demand what my idea is. And, as you know, my friend, I much dislike to have to explain my little ideas until the time comes.” I smiled a little. “My friend Hastings, he of whom I told you, used to say of me that I was the human oyster. But he was unjust. Of facts, I keep nothing to myself. But to every one his own interpretation of them.” “When do you want me to do this?” “Now, if you will. We are close to the house.” “Aren’t you coming in?” “No, me, I will promenade myself in the grounds. I will rejoin you by the lodge gates in a quarter of an hour’s time.” I nodded, and set off on my task. The only member of the family at home proved to be Mrs. Ackroyd, who was sipping an early cup of tea. She received me very graciously. “So grateful to you, doctor,” she murmured, “for clearing up that little matter with M. Poirot. But life is one trouble after another. You have heard about Flora, of course?” “What exactly?” I asked cautiously. “This new engagement. Flora and Hector Blunt. Of course not such a good match as Ralph would have been. But after all, happiness comes first. What dear Flora needs is an older man—some one steady and reliable, and then Hector is really a very distinguished man in his way. You saw the news of Ralph’s arrest in the paper this morning?” “Yes,” I said, “I did.” “Horrible.” Mrs. Ackroyd closed her eyes and shuddered. “Geoffrey Raymond was in a terrible way. Rang up Liverpool. But they wouldn’t tell him anything at the police station there. In fact, they said they hadn’t arrested Ralph at all. Mr. Raymond insists that it’s all a mistake—a—what do they call it?—_canard_ of the newspaper’s. I’ve forbidden it to be mentioned before the servants. Such a terrible disgrace. Fancy if Flora had actually been married to him.” Mrs. Ackroyd shut her eyes in anguish. I began to wonder how soon I should be able to deliver Poirot’s invitation. Before I had time to speak, Mrs. Ackroyd was off again. “You were here yesterday, weren’t you, with that dreadful Inspector Raglan? Brute of a man—he terrified Flora into saying she took that money from poor Roger’s room. And the matter was so simple, really. The dear child wanted to borrow a few pounds, didn’t like to disturb her uncle since he’d given strict orders against it, but knowing where he kept his notes she went there and took what she needed.” “Is that Flora’s account of the matter?” I asked. “My dear doctor, you know what girls are nowadays. So easily acted on by suggestion. You, of course, know all about hypnosis and that sort of thing. The inspector shouts at her, says the word ‘steal’ over and over again, until the poor child gets an inhibition—or is it a complex?—I always mix up those two words—and actually thinks herself that she has stolen the money. I saw at once how it was. But I can’t be too thankful for the whole misunderstanding in one way—it seems to have brought those two together—Hector and Flora, I mean. And I assure you that I have been very much worried about Flora in the past: why, at one time I actually thought there was going to be some kind of understanding between her and young Raymond. Just think of it!” Mrs. Ackroyd’s voice rose in shrill horror. “A private secretary—with practically no means of his own.” “It would have been a severe blow to you,” I said. “Now, Mrs. Ackroyd, I’ve got a message for you from M. Hercule Poirot.” “For me?” Mrs. Ackroyd looked quite alarmed. I hastened to reassure her, and I explained what Poirot wanted. “Certainly,” said Mrs. Ackroyd rather doubtfully, “I suppose we must come if M. Poirot says so. But what is it all about? I like to know beforehand.” I assured the lady truthfully that I myself did not know any more than she did. “Very well,” said Mrs. Ackroyd at last, rather grudgingly, “I will tell the others, and we will be there at nine o’clock.” Thereupon I took my leave, and joined Poirot at the agreed meeting-place. “I’ve been longer than a quarter of an hour, I’m afraid,” I remarked. “But once that good lady starts talking it’s a matter of the utmost difficulty to get a word in edgeways.” “It is of no matter,” said Poirot. “Me, I have been well amused. This park is magnificent.” We set off homewards. When we arrived, to our great surprise Caroline, who had evidently been watching for us, herself opened the door. She put her fingers to her lips. Her face was full of importance and excitement. “Ursula Bourne,” she said, “the parlormaid from Fernly. She’s here! I’ve put her in the dining-room. She’s in a terrible way, poor thing. Says she must see M. Poirot at once. I’ve done all I could. Taken her a cup of hot tea. It really goes to one’s heart to see any one in such a state.” “In the dining-room?” asked Poirot. “This way,” I said, and flung open the door. Ursula Bourne was sitting by the table. Her arms were spread out in front of her, and she had evidently just lifted her head from where it had been buried. Her eyes were red with weeping. “Ursula Bourne,” I murmured. But Poirot went past me with outstretched hands. “No,” he said, “that is not quite right, I think. It is not Ursula Bourne, is it, my child—but Ursula Paton? Mrs. Ralph Paton.” URSULA’S STORY For a moment or two the girl looked mutely at Poirot. Then, her reserve breaking down completely, she nodded her head once, and burst into an outburst of sobs. Caroline pushed past me, and putting her arm round the girl, patted her on the shoulder. “There, there, my dear,” she said soothingly, “it will be all right. You’ll see—everything will be all right.” Buried under curiosity and scandal-mongering there is a lot of kindness in Caroline. For the moment, even the interest of Poirot’s revelation was lost in the sight of the girl’s distress. Presently Ursula sat up and wiped her eyes. “This is very weak and silly of me,” she said. “No, no, my child,” said Poirot kindly. “We can all realize the strain of this last week.” “It must have been a terrible ordeal,” I said. “And then to find that you knew,” continued Ursula. “How did you know? Was it Ralph who told you?” Poirot shook his head. “You know what brought me to you to-night,” went on the girl. “_This_——” She held out a crumpled piece of newspaper, and I recognized the paragraph that Poirot had had inserted. “It says that Ralph has been arrested. So everything is useless. I need not pretend any longer.” “Newspaper paragraphs are not always true, mademoiselle,” murmured Poirot, having the grace to look ashamed of himself. “All the same, I think you will do well to make a clean breast of things. The truth is what we need now.” The girl hesitated, looking at him doubtfully. “You do not trust me,” said Poirot gently. “Yet all the same you came here to find me, did you not? Why was that?” “Because I don’t believe that Ralph did it,” said the girl in a very low voice. “And I think that you are clever, and will find out the truth. And also——” “Yes?” “I think you are kind.” Poirot nodded his head several times. “It is very good that—yes, it is very good. Listen, I do in verity believe that this husband of yours is innocent—but the affair marches badly. If I am to save him, I must know all there is to know—even if it should seem to make the case against him blacker than before.” “How well you understand,” said Ursula. “So you will tell me the whole story, will you not? From the beginning.” “You’re not going to send _me_ away, I hope,” said Caroline, settling herself comfortably in an arm-chair. “What I want to know,” she continued, “is why this child was masquerading as a parlormaid?” “Masquerading?” I queried. “That’s what I said. Why did you do it, child? For a wager?” “For a living,” said Ursula dryly. And encouraged, she began the story which I reproduce here in my own words. Ursula Bourne, it seemed, was one of a family of seven—impoverished Irish gentlefolk. On the death of her father, most of the girls were cast out into the world to earn their own living. Ursula’s eldest sister was married to Captain Folliott. It was she whom I had seen that Sunday, and the cause of her embarrassment was clear enough now. Determined to earn her living and not attracted to the idea of being a nursery governess—the one profession open to an untrained girl, Ursula preferred the job of parlormaid. She scorned to label herself a “lady parlormaid.” She would be the real thing, her reference being supplied by her sister. At Fernly, despite an aloofness which, as has been seen, caused some comment, she was a success at her job—quick, competent, and thorough. “I enjoyed the work,” she explained. “And I had plenty of time to myself.” And then came her meeting with Ralph Paton, and the love affair which culminated in a secret marriage. Ralph had persuaded her into that, somewhat against her will. He had declared that his stepfather would not hear of his marrying a penniless girl. Better to be married secretly, and break the news to him at some later and more favorable minute. And so the deed was done, and Ursula Bourne became Ursula Paton. Ralph had declared that he meant to pay off his debts, find a job, and then, when he was in a position to support her, and independent of his adopted father, they would break the news to him. But to people like Ralph Paton, turning over a new leaf is easier in theory than in practice. He hoped that his stepfather, whilst still in ignorance of the marriage, might be persuaded to pay his debts and put him on his feet again. But the revelation of the amount of Ralph’s liabilities merely enraged Roger Ackroyd, and he refused to do anything at all. Some months passed, and then Ralph was bidden once more to Fernly. Roger Ackroyd did not beat about the bush. It was the desire of his heart that Ralph should marry Flora, and he put the matter plainly before the young man. And here it was that the innate weakness of Ralph Paton showed itself. As always, he grasped at the easy, the immediate solution. As far as I could make out, neither Flora nor Ralph made any pretence of love. It was, on both sides, a business arrangement. Roger Ackroyd dictated his wishes—they agreed to them. Flora accepted a chance of liberty, money, and an enlarged horizon, Ralph, of course, was playing a different game. But he was in a very awkward hole financially. He seized at the chance. His debts would be paid. He could start again with a clean sheet. His was not a nature to envisage the future, but I gather that he saw vaguely the engagement with Flora being broken off after a decent interval had elapsed. Both Flora and he stipulated that it should be kept a secret for the present. He was anxious to conceal it from Ursula. He felt instinctively that her nature, strong and resolute, with an inherent distaste for duplicity, was not one to welcome such a course. Then came the crucial moment when Roger Ackroyd, always high-handed, decided to announce the engagement. He said no word of his intention to Ralph—only to Flora, and Flora, apathetic, raised no objection. On Ursula, the news fell like a bombshell. Summoned by her, Ralph came hurriedly down from town. They met in the wood, where part of their conversation was overheard by my sister. Ralph implored her to keep silent for a little while longer, Ursula was equally determined to have done with concealments. She would tell Mr. Ackroyd the truth without any further delay. Husband and wife parted acrimoniously. Ursula, steadfast in her purpose, sought an interview with Roger Ackroyd that very afternoon, and revealed the truth to him. Their interview was a stormy one—it might have been even more stormy had not Roger Ackroyd been already obsessed with his own troubles. It was bad enough, however. Ackroyd was not the kind of man to forgive the deceit that had been practiced upon him. His rancor was mainly directed to Ralph, but Ursula came in for her share, since he regarded her as a girl who had deliberately tried to “entrap” the adopted son of a very wealthy man. Unforgivable things were said on both sides. That same evening Ursula met Ralph by appointment in the small summer-house, stealing out from the house by the side door in order to do so. Their interview was made up of reproaches on both sides. Ralph charged Ursula with having irretrievably ruined his prospects by her ill-timed revelation. Ursula reproached Ralph with his duplicity. They parted at last. A little over half an hour later came the discovery of Roger Ackroyd’s body. Since that night Ursula had neither seen nor heard from Ralph. As the story unfolded itself, I realized more and more what a damning series of facts it was. Alive, Ackroyd could hardly have failed to alter his will—I knew him well enough to realize that to do so would be his first thought. His death came in the nick of time for Ralph and Ursula Paton. Small wonder the girl had held her tongue, and played her part so consistently. My meditations were interrupted. It was Poirot’s voice speaking, and I knew from the gravity of his tone that he, too, was fully alive to the implications of the position. “Mademoiselle, I must ask you one question, and you must answer it truthfully, for on it everything may hang: What time was it when you parted from Captain Ralph Paton in the summer-house? Now, take a little minute so that your answer may be very exact.” The girl gave a half laugh, bitter enough in all conscience. “Do you think I haven’t gone over that again and again in my own mind? It was just half-past nine when I went out to meet him. Major Blunt was walking up and down the terrace, so I had to go round through the bushes to avoid him. It must have been about twenty-seven minutes to ten when I reached the summer-house. Ralph was waiting for me. I was with him ten minutes—not longer, for it was just a quarter to ten when I got back to the house.” I saw now the insistence of her question the other day. If only Ackroyd could have been proved to have been killed before a quarter to ten, and not after. I saw the reflection of that thought in Poirot’s next question. “Who left the summer-house first?” “I did.” “Leaving Ralph Paton in the summer-house?” “Yes—but you don’t think——” “Mademoiselle, it is of no importance what I think. What did you do when you got back to the house?” “I went up to my room.” “And stayed there until when?” “Until about ten o’clock.” “Is there any one who can prove that?” “Prove? That I was in my room, you mean? Oh! no. But surely—oh! I see, they might think—they might think——” I saw the dawning horror in her eyes. Poirot finished the sentence for her. “That it was _you_ who entered by the window and stabbed Mr. Ackroyd as he sat in his chair? Yes, they might think just that.” “Nobody but a fool would think any such thing,” said Caroline indignantly. She patted Ursula on the shoulder. The girl had her face hidden in her hands. “Horrible,” she was murmuring. “Horrible.” Caroline gave her a friendly shake. “Don’t worry, my dear,” she said. “M. Poirot doesn’t think that really. As for that husband of yours, I don’t think much of him, and I tell you so candidly. Running away and leaving you to face the music.” But Ursula shook her head energetically. “Oh, no,” she cried. “It wasn’t like that at all. Ralph would not run away on his own account. I see now. If he heard of his stepfather’s murder, he might think himself that I had done it.” “He wouldn’t think any such thing,” said Caroline. “I was so cruel to him that night—so hard and bitter. I wouldn’t listen to what he was trying to say—wouldn’t believe that he really cared. I just stood there telling him what I thought of him, and saying the coldest, cruelest things that came into my mind—trying my best to hurt him.” “Do him no harm,” said Caroline. “Never worry about what you say to a man. They’re so conceited that they never believe you mean it if it’s unflattering.” Ursula went on, nervously twisting and untwisting her hands. “When the murder was discovered and he didn’t come forward, I was terribly upset. Just for a moment I wondered—but then I knew he couldn’t—he couldn’t.... But I wished he would come forward and say openly that he’d had nothing to do with it. I knew that he was very fond of Dr. Sheppard, and I fancied that perhaps Dr. Sheppard might know where he was hiding.” She turned to me. “That’s why I said what I did to you that day. I thought, if you knew where he was, you might pass on the message to him.” “I?” I exclaimed. “Why should James know where he was?” demanded Caroline sharply. “It was very unlikely, I know,” admitted Ursula, “but Ralph had often spoken of Dr. Sheppard, and I knew that he would be likely to consider him as his best friend in King’s Abbot.” “My dear child,” I said, “I have not the least idea where Ralph Paton is at the present moment.” “That is true enough,” said Poirot. “But——” Ursula held out the newspaper cutting in a puzzled fashion. “Ah! that,” said Poirot, slightly embarrassed; “a _bagatelle_, mademoiselle. A _rien du tout_. Not for a moment do I believe that Ralph Paton has been arrested.” “But then——” began the girl slowly. Poirot went on quickly:— “There is one thing I should like to know—did Captain Paton wear shoes or boots that night?” Ursula shook her head. “I can’t remember.” “A pity! But how should you? Now, madame,” he smiled at her, his head on one side, his forefinger wagging eloquently, “no questions. And do not torment yourself. Be of good courage, and place your faith in Hercule Poirot.” POIROT’S LITTLE REUNION “And now,” said Caroline, rising, “that child is coming upstairs to lie down. Don’t you worry, my dear. M. Poirot will do everything he can for you—be sure of that.” “I ought to go back to Fernly,” said Ursula uncertainly. But Caroline silenced her protests with a firm hand. “Nonsense. You’re in my hands for the time being. You’ll stay here for the present, anyway—eh, M. Poirot?” “It will be the best plan,” agreed the little Belgian. “This evening I shall want mademoiselle—I beg her pardon, madame—to attend my little reunion. Nine o’clock at my house. It is most necessary that she should be there.” Caroline nodded, and went with Ursula out of the room. The door shut behind them. Poirot dropped down into a chair again. “So far, so good,” he said. “Things are straightening themselves out.” “They’re getting to look blacker and blacker against Ralph Paton,” I observed gloomily. Poirot nodded. “Yes, that is so. But it was to be expected, was it not?” I looked at him, slightly puzzled by the remark. He was leaning back in the chair, his eyes half closed, the tips of his fingers just touching each other. Suddenly he sighed and shook his head. “What is it?” I asked. “It is that there are moments when a great longing for my friend Hastings comes over me. That is the friend of whom I spoke to you—the one who resides now in the Argentine. Always, when I have had a big case, he has been by my side. And he has helped me—yes, often he has helped me. For he had a knack, that one, of stumbling over the truth unawares—without noticing it himself, _bien entendu_. At times he has said something particularly foolish, and behold that foolish remark has revealed the truth to me! And then, too, it was his practice to keep a written record of the cases that proved interesting.” I gave a slight embarrassed cough. “As far as that goes,” I began, and then stopped. Poirot sat upright in his chair. His eyes sparkled. “But yes? What is it that you would say?” “Well, as a matter of fact, I’ve read some of Captain Hastings’s narratives, and I thought, why not try my hand at something of the same kind? Seemed a pity not to—unique opportunity—probably the only time I’ll be mixed up with anything of this kind.” I felt myself getting hotter and hotter, and more and more incoherent, as I floundered through the above speech. Poirot sprang from his chair. I had a moment’s terror that he was going to embrace me French fashion, but mercifully he refrained. “But this is magnificent—you have then written down your impressions of the case as you went along?” I nodded. “_Epatant!_” cried Poirot. “Let me see them—this instant.” I was not quite prepared for such a sudden demand. I racked my brains to remember certain details. “I hope you won’t mind,” I stammered. “I may have been a little—er—_personal_ now and then.” “Oh! I comprehend perfectly; you have referred to me as comic—as, perhaps, ridiculous now and then? It matters not at all. Hastings, he also was not always polite. Me, I have the mind above such trivialities.” Still somewhat doubtful, I rummaged in the drawers of my desk and produced an untidy pile of manuscript which I handed over to him. With an eye on possible publication in the future, I had divided the work into chapters, and the night before I had brought it up to date with an account of Miss Russell’s visit. Poirot had therefore twenty chapters. I left him with them. I was obliged to go out to a case at some distance away, and it was past eight o’clock when I got back, to be greeted with a plate of hot dinner on a tray, and the announcement that Poirot and my sister had supped together at half-past seven, and that the former had then gone to my workshop to finish his reading of the manuscript. “I hope, James,” said my sister, “that you’ve been careful in what you say about me in it?” My jaw dropped. I had not been careful at all. “Not that it matters very much,” said Caroline, reading my expression correctly. “M. Poirot will know what to think. He understands me much better than you do.” I went into the workshop. Poirot was sitting by the window. The manuscript lay neatly piled on a chair beside him. He laid his hand on it and spoke. “_Eh bien_,” he said, “I congratulate you—on your modesty!” “Oh!” I said, rather taken aback. “And on your reticence,” he added. I said “Oh!” again. “Not so did Hastings write,” continued my friend. “On every page, many, many times was the word ‘I.’ What _he_ thought—what _he_ did. But you—you have kept your personality in the background; only once or twice does it obtrude—in scenes of home life, shall we say?” I blushed a little before the twinkle in his eye. “What do you really think of the stuff?” I asked nervously. “You want my candid opinion?” “Yes.” Poirot laid his jesting manner aside. “A very meticulous and accurate account,” he said kindly. “You have recorded all the facts faithfully and exactly—though you have shown yourself becomingly reticent as to your own share in them.” “And it has helped you?” “Yes. I may say that it has helped me considerably. Come, we must go over to my house and set the stage for my little performance.” Caroline was in the hall. I think she hoped that she might be invited to accompany us. Poirot dealt with the situation tactfully. “I should much like to have had you present, mademoiselle,” he said regretfully, “but at this juncture it would not be wise. See you, all these people to-night are suspects. Amongst them, I shall find the person who killed Mr. Ackroyd.” “You really believe that?” I said incredulously. “I see that you do not,” said Poirot dryly. “Not yet do you appreciate Hercule Poirot at his true worth.” At that minute Ursula came down the staircase. “You are ready, my child?” said Poirot. “That is good. We will go to my house together. Mademoiselle Caroline, believe me, I do everything possible to render you service. Good-evening.” We went out, leaving Caroline, rather like a dog who has been refused a walk, standing on the front door step gazing after us. The sitting-room at The Larches had been got ready. On the table were various _sirops_ and glasses. Also a plate of biscuits. Several chairs had been brought in from the other room. Poirot ran to and fro rearranging things. Pulling out a chair here, altering the position of a lamp there, occasionally stooping to straighten one of the mats that covered the floor. He was specially fussy over the lighting. The lamps were arranged in such a way as to throw a clear light on the side of the room where the chairs were grouped, at the same time leaving the other end of the room, where I presumed Poirot himself would sit, in a dim twilight. Ursula and I watched him. Presently a bell was heard. “They arrive,” said Poirot. “Good, all is in readiness.” The door opened and the party from Fernly filed in. Poirot went forward and greeted Mrs. Ackroyd and Flora. “It is most good of you to come,” he said. “And Major Blunt and Mr. Raymond.” The secretary was debonair as ever. “What’s the great idea?” he said, laughing. “Some scientific machine? Do we have bands round our wrists which register guilty heart-beats? There is such an invention, isn’t there?” “I have read of it, yes,” admitted Poirot. “But me, I am old-fashioned. I use the old methods. I work only with the little gray cells. Now let us begin—but first I have an announcement to make to you all.” He took Ursula’s hand and drew her forward. “This lady is Mrs. Ralph Paton. She was married to Captain Paton last March.” A little shriek burst from Mrs. Ackroyd. “Ralph! Married! Last March! Oh! but it’s absurd. How could he be?” She stared at Ursula as though she had never seen her before. “Married to Bourne?” she said. “Really, M. Poirot, I don’t believe you.” Ursula flushed and began to speak, but Flora forestalled her. Going quickly to the other girl’s side, she passed her hand through her arm. “You must not mind our being surprised,” she said. “You see, we had no idea of such a thing. You and Ralph have kept your secret very well. I am—very glad about it.” “You are very kind, Miss Ackroyd,” said Ursula in a low voice, “and you have every right to be exceedingly angry. Ralph behaved very badly—especially to you.” “You needn’t worry about that,” said Flora, giving her arm a consoling little pat. “Ralph was in a corner and took the only way out. I should probably have done the same in his place. I do think he might have trusted me with the secret, though. I wouldn’t have let him down.” Poirot rapped gently on a table and cleared his throat significantly. “The board meeting’s going to begin,” said Flora. “M. Poirot hints that we mustn’t talk. But just tell me one thing. Where is Ralph? You must know if any one does.” “But I don’t,” cried Ursula, almost in a wail. “That’s just it, I don’t.” “Isn’t he detained at Liverpool?” asked Raymond. “It said so in the paper.” “He is not at Liverpool,” said Poirot shortly. “In fact,” I remarked, “no one knows where he is.” “Excepting Hercule Poirot, eh?” said Raymond. Poirot replied seriously to the other’s banter. “Me, I know everything. Remember that.” Geoffrey Raymond lifted his eyebrows. “Everything?” He whistled. “Whew! that’s a tall order.” “Do you mean to say you can really guess where Ralph Paton is hiding?” I asked incredulously. “You call it guessing. I call it knowing, my friend.” “In Cranchester?” I hazarded. “No,” replied Poirot gravely, “not in Cranchester.” He said no more, but at a gesture from him the assembled party took their seats. As they did so, the door opened once more and two other people came in and sat down near the door. They were Parker and the housekeeper. “The number is complete,” said Poirot. “Every one is here.” There was a ring of satisfaction in his tone. And with the sound of it I saw a ripple of something like uneasiness pass over all those faces grouped at the other end of the room. There was a suggestion in all this as of a trap—a trap that had closed. Poirot read from a list in an important manner. “Mrs. Ackroyd, Miss Flora Ackroyd, Major Blunt, Mr. Geoffrey Raymond, Mrs. Ralph Paton, John Parker, Elizabeth Russell.” He laid the paper down on the table. “What’s the meaning of all this?” began Raymond. “The list I have just read,” said Poirot, “is a list of suspected persons. Every one of you present had the opportunity to kill Mr. Ackroyd——” With a cry Mrs. Ackroyd sprang up, her throat working. “I don’t like it,” she wailed. “I don’t like it. I would much prefer to go home.” “You cannot go home, madame,” said Poirot sternly, “until you have heard what I have to say.” He paused a moment, then cleared his throat. “I will start at the beginning. When Miss Ackroyd asked me to investigate the case, I went up to Fernly Park with the good Dr. Sheppard. I walked with him along the terrace, where I was shown the footprints on the window-sill. From there Inspector Raglan took me along the path which leads to the drive. My eye was caught by a little summer-house, and I searched it thoroughly. I found two things—a scrap of starched cambric and an empty goose quill. The scrap of cambric immediately suggested to me a maid’s apron. When Inspector Raglan showed me his list of the people in the house, I noticed at once that one of the maids—Ursula Bourne, the parlormaid—had no real alibi. According to her own story, she was in her bedroom from nine-thirty until ten. But supposing that instead she was in the summer-house? If so, she must have gone there to meet some one. Now we know from Dr. Sheppard that some one from outside _did_ come to the house that night—the stranger whom he met just by the gate. At a first glance it would seem that our problem was solved, and that the stranger went to the summer-house to meet Ursula Bourne. It was fairly certain that he _did_ go to the summer-house because of the goose quill. That suggested at once to my mind a taker of drugs—and one who had acquired the habit on the other side of the Atlantic where sniffing ‘snow’ is more common than in this country. The man whom Dr. Sheppard met had an American accent, which fitted in with that supposition. “But I was held up by one point. _The times did not fit._ Ursula Bourne could certainly not have gone to the summer-house before nine-thirty, whereas the man must have got there by a few minutes past nine. I could, of course, assume that he waited there for half an hour. The only alternative supposition was that there had been two separate meetings in the summer-house that night. _Eh bien_, as soon as I went into that alternative I found several significant facts. I discovered that Miss Russell, the housekeeper, had visited Dr. Sheppard that morning, and had displayed a good deal of interest in cures for victims of the drug habit. Taking that in conjunction with the goose quill, I assumed that the man in question came to Fernly to meet the housekeeper, and not Ursula Bourne. Who, then, did Ursula Bourne come to the rendezvous to meet? I was not long in doubt. First I found a ring—a wedding ring—with ‘From R.’ and a date inside it. Then I learnt that Ralph Paton had been seen coming up the path which led to the summer-house at twenty-five minutes past nine, and I also heard of a certain conversation which had taken place in the wood near the village that very afternoon—a conversation between Ralph Paton and some unknown girl. So I had my facts succeeding each other in a neat and orderly manner. A secret marriage, an engagement announced on the day of the tragedy, the stormy interview in the wood, and the meeting arranged for the summer-house that night. “Incidentally this proved to me one thing, that both Ralph Paton and Ursula Bourne (or Paton) had the strongest motives for wishing Mr. Ackroyd out of the way. And it also made one other point unexpectedly clear. It could not have been Ralph Paton who was with Mr. Ackroyd in the study at nine-thirty. “So we come to another and most interesting aspect of the crime. Who was it in the room with Mr. Ackroyd at nine-thirty? Not Ralph Paton, who was in the summer-house with his wife. Not Charles Kent, who had already left. Who, then? I posed my cleverest—my most audacious question: _Was any one with him?_” Poirot leaned forward and shot the last words triumphantly at us, drawing back afterwards with the air of one who has made a decided hit. Raymond, however, did not seem impressed, and lodged a mild protest. “I don’t know if you’re trying to make me out a liar, M. Poirot, but the matter does not rest on my evidence alone—except perhaps as to the exact words used. Remember, Major Blunt also heard Mr. Ackroyd talking to some one. He was on the terrace outside, and couldn’t catch the words clearly, but he distinctly heard the voices.” Poirot nodded. “I have not forgotten,” he said quietly. “But Major Blunt was under the impression that it was _you_ to whom Mr. Ackroyd was speaking.” For a moment Raymond seemed taken aback. Then he recovered himself. “Blunt knows now that he was mistaken,” he said. “Exactly,” agreed the other man. “Yet there must have been some reason for his thinking so,” mused Poirot. “Oh! no,” he held up his hand in protest, “I know the reason you will give—but it is not enough. We must seek elsewhere. I will put it this way. From the beginning of the case I have been struck by one thing—the nature of those words which Mr. Raymond overheard. It has been amazing to me that no one has commented on them—has seen anything odd about them.” He paused a minute, and then quoted softly:— “... _The calls on my purse have been so frequent of late that I fear it is impossible for me to accede to your request._ Does nothing strike you as odd about that?” “I don’t think so,” said Raymond. “He has frequently dictated letters to me, using almost exactly those same words.” “Exactly,” cried Poirot. “That is what I seek to arrive at. Would any man use such a phrase in _talking_ to another? Impossible that that should be part of a real conversation. Now, if he had been dictating a letter——” “You mean he was reading a letter aloud,” said Raymond slowly. “Even so, he must have been reading to some one.” “But why? We have no evidence that there was any one else in the room. No other voice but Mr. Ackroyd’s was heard, remember.” “Surely a man wouldn’t read letters of that type aloud to himself—not unless he was—well—going balmy.” “You have all forgotten one thing,” said Poirot softly: “the stranger who called at the house the preceding Wednesday.” They all stared at him. “But yes,” said Poirot, nodding encouragingly, “on Wednesday. The young man was not of himself important. But the firm he represented interested me very much.” “The Dictaphone Company,” gasped Raymond. “I see it now. A dictaphone. That’s what you think?” Poirot nodded. “Mr. Ackroyd had promised to invest in a dictaphone, you remember. Me, I had the curiosity to inquire of the company in question. Their reply is that Mr. Ackroyd _did_ purchase a dictaphone from their representative. Why he concealed the matter from you, I do not know.” “He must have meant to surprise me with it,” murmured Raymond. “He had quite a childish love of surprising people. Meant to keep it up his sleeve for a day or so. Probably was playing with it like a new toy. Yes, it fits in. You’re quite right—no one would use quite those words in casual conversation.” “It explains, too,” said Poirot, “why Major Blunt thought it was you who were in the study. Such scraps as came to him were fragments of dictation, and so his subconscious mind deduced that you were with him. His conscious mind was occupied with something quite different—the white figure he had caught a glimpse of. He fancied it was Miss Ackroyd. Really, of course, it was Ursula Bourne’s white apron he saw as she was stealing down to the summer-house.” Raymond had recovered from his first surprise. “All the same,” he remarked, “this discovery of yours, brilliant though it is (I’m quite sure I should never have thought of it), leaves the essential position unchanged. Mr. Ackroyd was alive at nine-thirty, since he was speaking into the dictaphone. It seems clear that the man Charles Kent was really off the premises by then. As to Ralph Paton——?” He hesitated, glancing at Ursula. Her color flared up, but she answered steadily enough. “Ralph and I parted just before a quarter to ten. He never went near the house, I am sure of that. He had no intention of doing so. The last thing on earth he wanted was to face his stepfather. He would have funked it badly.” “It isn’t that I doubt your story for a moment,” explained Raymond. “I’ve always been quite sure Captain Paton was innocent. But one has to think of a court of law—and the questions that would be asked. He is in a most unfortunate position, but if he were to come forward——” Poirot interrupted. “That is your advice, yes? That he should come forward?” “Certainly. If you know where he is——” “I perceive that you do not believe that I do know. And yet I have told you just now that I know everything. The truth of the telephone call, of the footprints on the window-sill, of the hiding-place of Ralph Paton——” “Where is he?” said Blunt sharply. “Not very far away,” said Poirot, smiling. “In Cranchester?” I asked. Poirot turned towards me. “Always you ask me that. The idea of Cranchester it is with you an _idée fixe_. No, he is not in Cranchester. He is—_there_!” He pointed a dramatic forefinger. Every one’s head turned. Ralph Paton was standing in the doorway. RALPH PATON’S STORY It was a very uncomfortable minute for _me_. I hardly took in what happened next, but there were exclamations and cries of surprise! When I was sufficiently master of myself to be able to realize what was going on, Ralph Paton was standing by his wife, her hand in his, and he was smiling across the room at me. Poirot, too, was smiling, and at the same time shaking an eloquent finger at me. “Have I not told you at least thirty-six times that it is useless to conceal things from Hercule Poirot?” he demanded. “That in such a case he finds out?” He turned to the others. “One day, you remember, we held a little séance about a table—just the six of us. I accused the other five persons present of concealing something from me. Four of them gave up their secret. Dr. Sheppard did not give up his. But all along I have had my suspicions. Dr. Sheppard went to the Three Boars that night hoping to find Ralph. He did not find him there; but supposing, I said to myself, that he met him in the street on his way home? Dr. Sheppard was a friend of Captain Paton’s, and he had come straight from the scene of the crime. He must know that things looked very black against him. Perhaps he knew more than the general public did——” “I did,” I said ruefully. “I suppose I might as well make a clean breast of things now. I went to see Ralph that afternoon. At first he refused to take me into his confidence, but later he told me about his marriage, and the hole he was in. As soon as the murder was discovered, I realized that once the facts were known, suspicion could not fail to attach to Ralph—or, if not to him, to the girl he loved. That night I put the facts plainly before him. The thought of having possibly to give evidence which might incriminate his wife made him resolve at all costs to—to——” I hesitated, and Ralph filled up the gap. “To do a bunk,” he said graphically. “You see, Ursula left me to go back to the house. I thought it possible that she might have attempted to have another interview with my stepfather. He had already been very rude to her that afternoon. It occurred to me that he might have so insulted her—in such an unforgivable manner—that without knowing what she was doing——” He stopped. Ursula released her hand from his, and stepped back. “You thought that, Ralph! You actually thought that I might have done it?” “Let us get back to the culpable conduct of Dr. Sheppard,” said Poirot dryly. “Dr. Sheppard consented to do what he could to help him. He was successful in hiding Captain Paton from the police.” “Where?” asked Raymond. “In his own house?” “Ah, no, indeed,” said Poirot. “You should ask yourself the question that I did. If the good doctor is concealing the young man, what place would he choose? It must necessarily be somewhere near at hand. I think of Cranchester. A hotel? No. Lodgings? Even more emphatically, no. Where, then? Ah! I have it. A nursing home. A home for the mentally unfit. I test my theory. I invent a nephew with mental trouble. I consult Mademoiselle Sheppard as to suitable homes. She gives me the names of two near Cranchester to which her brother has sent patients. I make inquiries. Yes, at one of them a patient was brought there by the doctor himself early on Saturday morning. That patient, though known by another name, I had no difficulty in identifying as Captain Paton. After certain necessary formalities, I was allowed to bring him away. He arrived at my house in the early hours of yesterday morning.” I looked at him ruefully. “Caroline’s Home Office expert,” I murmured. “And to think I never guessed!” “You see now why I drew attention to the reticence of your manuscript,” murmured Poirot. “It was strictly truthful as far as it went—but it did not go very far, eh, my friend?” I was too abashed to argue. “Dr. Sheppard has been very loyal,” said Ralph. “He has stood by me through thick and thin. He did what he thought was the best. I see now, from what M. Poirot has told me, that it was not really the best. I should have come forward and faced the music. You see, in the home, we never saw a newspaper. I knew nothing of what was going on.” “Dr. Sheppard has been a model of discretion,” said Poirot dryly. “But me, I discover all the little secrets. It is my business.” “Now we can have your story of what happened that night,” said Raymond impatiently. “You know it already,” said Ralph. “There’s very little for me to add. I left the summer-house about nine-forty-five, and tramped about the lanes, trying to make up my mind as to what to do next—what line to take. I’m bound to admit that I’ve not the shadow of an alibi, but I give you my solemn word that I never went to the study, that I never saw my stepfather alive—or dead. Whatever the world thinks, I’d like all of you to believe me.” “No alibi,” murmured Raymond. “That’s bad. I believe you, of course, but—it’s a bad business.” “It makes things very simple, though,” said Poirot, in a cheerful voice. “Very simple indeed.” We all stared at him. “You see what I mean? No? Just this—to save Captain Paton the real criminal must confess.” He beamed round at us all. “But yes—I mean what I say. See now, I did not invite Inspector Raglan to be present. That was for a reason. I did not want to tell him all that I knew—at least I did not want to tell him to-night.” He leaned forward, and suddenly his voice and his whole personality changed. He suddenly became dangerous. “I who speak to you—I know the murderer of Mr. Ackroyd is in this room now. It is to the murderer I speak. _To-morrow the truth goes to Inspector Raglan._ You understand?” There was a tense silence. Into the midst of it came the old Breton woman with a telegram on a salver. Poirot tore it open. Blunt’s voice rose abrupt and resonant. “The murderer is amongst us, you say? You know—which?” Poirot had read the message. He crumpled it up in his hand. “I know—now.” He tapped the crumpled ball of paper. “What is that?” said Raymond sharply. “A wireless message—from a steamer now on her way to the United States.” There was a dead silence. Poirot rose to his feet bowing. “Messieurs et Mesdames, this reunion of mine is at an end. Remember—_the truth goes to Inspector Raglan in the morning_.” THE WHOLE TRUTH A slight gesture from Poirot enjoined me to stay behind the rest. I obeyed, going over to the fire and thoughtfully stirring the big logs on it with the toe of my boot. I was puzzled. For the first time I was absolutely at sea as to Poirot’s meaning. For a moment I was inclined to think that the scene I had just witnessed was a gigantic piece of bombast—that he had been what he called “playing the comedy” with a view to making himself interesting and important. But, in spite of myself, I was forced to believe in an underlying reality. There had been real menace in his words—a certain indisputable sincerity. But I still believed him to be on entirely the wrong tack. When the door shut behind the last of the party he came over to the fire. “Well, my friend,” he said quietly, “and what do you think of it all?” “I don’t know what to think,” I said frankly. “What was the point? Why not go straight to Inspector Raglan with the truth instead of giving the guilty person this elaborate warning?” Poirot sat down and drew out his case of tiny Russian cigarettes. He smoked for a minute or two in silence. Then:— “Use your little gray cells,” he said. “There is always a reason behind my actions.” I hesitated for a moment, and then I said slowly: “The first one that occurs to me is that you yourself do not know who the guilty person is, but that you are sure that he is to be found amongst the people here to-night. Therefore your words were intended to force a confession from the unknown murderer?” Poirot nodded approvingly. “A clever idea, but not the truth.” “I thought, perhaps, that by making him believe you knew, you might force him out into the open—not necessarily by confession. He might try to silence you as he formerly silenced Mr. Ackroyd—before you could act to-morrow morning.” “A trap with myself as the bait! _Merci, mon ami_, but I am not sufficiently heroic for that.” “Then I fail to understand you. Surely you are running the risk of letting the murderer escape by thus putting him on his guard?” Poirot shook his head. “He cannot escape,” he said gravely. “There is only one way out—and that way does not lead to freedom.” “You really believe that one of those people here to-night committed the murder?” I asked incredulously. “Yes, my friend.” “Which one?” There was a silence for some minutes. Then Poirot tossed the stump of his cigarette into the grate and began to speak in a quiet, reflective tone. “I will take you the way that I have traveled myself. Step by step you shall accompany me, and see for yourself that all the facts point indisputably to one person. Now, to begin with, there were two facts and one little discrepancy in time which especially attracted my attention. The first fact was the telephone call. If Ralph Paton were indeed the murderer, the telephone call became meaningless and absurd. Therefore, I said to myself, Ralph Paton is not the murderer. “I satisfied myself that the call could not have been sent by any one in the house, yet I was convinced that it was amongst those present on the fatal evening that I had to look for my criminal. Therefore I concluded that the telephone call must have been sent by an accomplice. I was not quite pleased with that deduction, but I let it stand for the minute. “I next examined the _motive_ for the call. That was difficult. I could only get at it by judging its _result_. Which was—that the murder was discovered that night instead of—in all probability—the following morning. You agree with that?” “Ye-es,” I admitted. “Yes. As you say, Mr. Ackroyd, having given orders that he was not to be disturbed, nobody would have been likely to go to the study that night.” “_Très bien._ The affair marches, does it not? But matters were still obscure. What was the advantage of having the crime discovered that night in preference to the following morning? The only idea I could get hold of was that the murderer, knowing the crime was to be discovered at a certain time, could make sure of being present when the door was broken in—or at any rate immediately afterwards. And now we come to the second fact—the chair pulled out from the wall. Inspector Raglan dismissed that as of no importance. I, on the contrary, have always regarded it as of supreme importance. “In your manuscript you have drawn a neat little plan of the study. If you had it with you this minute you would see that—the chair being drawn out in the position indicated by Parker—it would stand in a direct line between the door and the window.” “The window!” I said quickly. “You, too, have my first idea. I imagined that the chair was drawn out so that something connected with the window should not be seen by any one entering through the door. But I soon abandoned that supposition, for though the chair was a grandfather with a high back, it obscured very little of the window—only the part between the sash and the ground. No, _mon ami_—but remember that just in front of the window there stood a table with books and magazines upon it. Now that table _was_ completely hidden by the drawn-out chair—and immediately I had my first shadowy suspicion of the truth. “Supposing that there had been something on that table not intended to be seen? Something placed there by the murderer? As yet I had no inkling of what that something might be. But I knew certain very interesting facts about it. For instance, it was something that the murderer had not been able to take away with him at the time that he committed the crime. At the same time it was vital that it should be removed as soon as possible after the crime had been discovered. And so—the telephone message, and the opportunity for the murderer to be on the spot when the body was discovered. “Now four people were on the scene before the police arrived. Yourself, Parker, Major Blunt, and Mr. Raymond. Parker I eliminated at once, since at whatever time the crime was discovered, he was the one person certain to be on the spot. Also it was he who told me of the pulled-out chair. Parker, then, was cleared (of the murder, that is. I still thought it possible that he had been blackmailing Mrs. Ferrars). Raymond and Blunt, however, remained under suspicion since, if the crime had been discovered in the early hours of the morning, it was quite possible that they might have arrived on the scene too late to prevent the object on the round table being discovered. “Now what was that object? You heard my arguments to-night in reference to the scrap of conversation overheard? As soon as I learned that a representative of a dictaphone company had called, the idea of a dictaphone took root in my mind. You heard what I said in this room not half an hour ago? They all agreed with my theory—but one vital fact seems to have escaped them. Granted that a dictaphone was being used by Mr. Ackroyd that night—why was no dictaphone found?” “I never thought of that,” I said. “We know that a dictaphone was supplied to Mr. Ackroyd. But no dictaphone has been found amongst his effects. So, if something was taken from that table—why should not that something be the dictaphone? But there were certain difficulties in the way. The attention of every one was, of course, focused on the murdered man. I think any one could have gone to the table unnoticed by the other people in the room. But a dictaphone has a certain bulk—it cannot be slipped casually into a pocket. There must have been a receptacle of some kind capable of holding it. “You see where I am arriving? The figure of the murderer is taking shape. A person who was on the scene straightway, but who might not have been if the crime had been discovered the following morning. A person carrying a receptacle into which the dictaphone might be fitted——” I interrupted. “But why remove the dictaphone? What was the point?” “You are like Mr. Raymond. You take it for granted that what was heard at nine-thirty was Mr. Ackroyd’s voice speaking into a dictaphone. But consider this useful invention for a little minute. You dictate into it, do you not? And at some later time a secretary or a typist turns it on, and the voice speaks again.” “You mean——” I gasped. Poirot nodded. “Yes, I mean that. _At nine-thirty Mr. Ackroyd was already dead._ It was the dictaphone speaking—not the man.” “And the murderer switched it on. Then he must have been in the room at that minute?” “Possibly. But we must not exclude the likelihood of some mechanical device having been applied—something after the nature of a time lock, or even of a simple alarm clock. But in that case we must add two qualifications to our imaginary portrait of the murderer. It must be some one who knew of Mr. Ackroyd’s purchase of the dictaphone and also some one with the necessary mechanical knowledge. “I had got thus far in my own mind when we came to the footprints on the window ledge. Here there were three conclusions open to me. (1) They might really have been made by Ralph Paton. He had been at Fernly that night, and might have climbed into the study and found his uncle dead there. That was one hypothesis. (2) There was the possibility that the footmarks might have been made by somebody else who happened to have the same kind of studs in his shoes. But the inmates of the house had shoes soled with crepe rubber, and I declined to believe in the coincidence of some one from outside having the same kind of shoes as Ralph Paton wore. Charles Kent, as we know from the barmaid of the Dog and Whistle, had on a pair of boots ‘clean dropping off him.’ (3) Those prints were made by some one deliberately trying to throw suspicion on Ralph Paton. To test this last conclusion, it was necessary to ascertain certain facts. One pair of Ralph’s shoes had been obtained from the Three Boars by the police. Neither Ralph nor any one else could have worn them that evening, since they were downstairs being cleaned. According to the police theory, Ralph was wearing another pair of the same kind, and I found out that it was true that he had two pairs. Now for my theory to be proved correct it was necessary for the murderer to have worn Ralph’s shoes that evening—in which case Ralph must have been wearing yet a _third_ pair of footwear of some kind. I could hardly suppose that he would bring three pairs of shoes all alike—the third pair of footwear were more likely to be boots. I got your sister to make inquiries on this point—laying some stress on the color, in order—I admit it frankly—to obscure the real reason for my asking. “You know the result of her investigations. Ralph Paton _had_ had a pair of boots with him. The first question I asked him when he came to my house yesterday morning was what he was wearing on his feet on the fatal night. He replied at once that he had worn _boots_—he was still wearing them, in fact—having nothing else to put on. “So we get a step further in our description of the murderer—a person who had the opportunity to take these shoes of Ralph Paton’s from the Three Boars that day.” He paused, and then said, with a slightly raised voice:— “There is one further point. The murderer must have been a person who had the opportunity to purloin that dagger from the silver table. You might argue that any one in the house might have done so, but I will recall to you that Miss Ackroyd was very positive that the dagger was not there when she examined the silver table.” He paused again. “Let us recapitulate—now that all is clear. A person who was at the Three Boars earlier that day, a person who knew Ackroyd well enough to know that he had purchased a dictaphone, a person who was of a mechanical turn of mind, who had the opportunity to take the dagger from the silver table before Miss Flora arrived, who had with him a receptacle suitable for hiding the dictaphone—such as a black bag, and who had the study to himself for a few minutes after the crime was discovered while Parker was telephoning for the police. In fact—_Dr. Sheppard!_” AND NOTHING BUT THE TRUTH There was a dead silence for a minute and a half. Then I laughed. “You’re mad,” I said. “No,” said Poirot placidly. “I am not mad. It was the little discrepancy in time that first drew my attention to you—right at the beginning.” “Discrepancy in time?” I queried, puzzled. “But yes. You will remember that every one agreed—you yourself included—that it took five minutes to walk from the lodge to the house—less if you took the short cut to the terrace. But you left the house at ten minutes to nine—both by your own statement and that of Parker, and yet it was nine o’clock as you passed through the lodge gates. It was a chilly night—not an evening a man would be inclined to dawdle; why had you taken ten minutes to do a five-minutes’ walk? All along I realized that we had only your statement for it that the study window was ever fastened. Ackroyd asked you if you had done so—he never looked to see. Supposing, then, that the study window was unfastened? Would there be time in that ten minutes for you to run round the outside of the house, change your shoes, climb in through the window, kill Ackroyd, and get to the gate by nine o’clock? I decided against that theory since in all probability a man as nervous as Ackroyd was that night would hear you climbing in, and then there would have been a struggle. But supposing that you killed Ackroyd _before_ you left—as you were standing beside his chair? Then you go out of the front door, run round to the summer-house, take Ralph Paton’s shoes out of the bag you brought up with you that night, slip them on, walk through the mud in them, and leave prints on the window ledge, you climb in, lock the study door on the inside, run back to the summer-house, change back into your own shoes, and race down to the gate. (I went through similar actions the other day, when you were with Mrs. Ackroyd—it took ten minutes exactly.) Then home—and an alibi—since you had timed the dictaphone for half-past nine.” “My dear Poirot,” I said in a voice that sounded strange and forced to my own ears, “you’ve been brooding over this case too long. What on earth had I to gain by murdering Ackroyd?” “Safety. It was you who blackmailed Mrs. Ferrars. Who could have had a better knowledge of what killed Mr. Ferrars than the doctor who was attending him? When you spoke to me that first day in the garden, you mentioned a legacy received about a year ago. I have been unable to discover any trace of a legacy. You had to invent some way of accounting for Mrs. Ferrars’s twenty thousand pounds. It has not done you much good. You lost most of it in speculation—then you put the screw on too hard, and Mrs. Ferrars took a way out that you had not expected. If Ackroyd had learnt the truth he would have had no mercy on you—you were ruined for ever.” “And the telephone call?” I asked, trying to rally. “You have a plausible explanation of that also, I suppose?” “I will confess to you that it was my greatest stumbling block when I found that a call had actually been put through to you from King’s Abbot station. I at first believed that you had simply invented the story. It was a very clever touch, that. You must have some excuse for arriving at Fernly, finding the body, and so getting the chance to remove the dictaphone on which your alibi depended. I had a very vague notion of how it was worked when I came to see your sister that first day and inquired as to what patients you had seen on Friday morning. I had no thought of Miss Russell in my mind at that time. Her visit was a lucky coincidence, since it distracted your mind from the real object of my questions. I found what I was looking for. Among your patients that morning was the steward of an American liner. Who more suitable than he to be leaving for Liverpool by the train that evening? And afterwards he would be on the high seas, well out of the way. I noted that the _Orion_ sailed on Saturday, and having obtained the name of the steward I sent him a wireless message asking a certain question. This is his reply you saw me receive just now.” He held out the message to me. It ran as follows— “Quite correct. Dr. Sheppard asked me to leave a note at a patient’s house. I was to ring him up from the station with the reply. Reply was ‘No answer.’” * * * * * “It was a clever idea,” said Poirot. “The call was genuine. Your sister saw you take it. But there was only one man’s word as to what was actually said—your own!” I yawned. “All this,” I said, “is very interesting—but hardly in the sphere of practical politics.” “You think not? Remember what I said—the truth goes to Inspector Raglan in the morning. But, for the sake of your good sister, I am willing to give you the chance of another way out. There might be, for instance, an overdose of a sleeping draught. You comprehend me? But Captain Ralph Paton must be cleared—_ça va sans dire_. I should suggest that you finish that very interesting manuscript of yours—but abandoning your former reticence.” “You seem to be very prolific of suggestions,” I remarked. “Are you sure you’ve quite finished.” “Now that you remind me of the fact, it is true that there is one thing more. It would be most unwise on your part to attempt to silence me as you silenced M. Ackroyd. That kind of business does not succeed against Hercule Poirot, you understand.” “My dear Poirot,” I said, smiling a little, “whatever else I may be, I am not a fool.” I rose to my feet. “Well, well,” I said, with a slight yawn, “I must be off home. Thank you for a most interesting and instructive evening.” Poirot also rose and bowed with his accustomed politeness as I passed out of the room. APOLOGIA Five a.m. I am very tired—but I have finished my task. My arm aches from writing. A strange end to my manuscript. I meant it to be published some day as the history of one of Poirot’s failures! Odd, how things pan out. All along I’ve had a premonition of disaster, from the moment I saw Ralph Paton and Mrs. Ferrars with their heads together. I thought then that she was confiding in him; as it happened I was quite wrong there, but the idea persisted even after I went into the study with Ackroyd that night, until he told me the truth. Poor old Ackroyd. I’m always glad that I gave him a chance. I urged him to read that letter before it was too late. Or let me be honest—didn’t I subconsciously realize that with a pig-headed chap like him, it was my best chance of getting him _not_ to read it? His nervousness that night was interesting psychologically. He knew danger was close at hand. And yet he never suspected _me_. The dagger was an afterthought. I’d brought up a very handy little weapon of my own, but when I saw the dagger lying in the silver table, it occurred to me at once how much better it would be to use a weapon that couldn’t be traced to me. I suppose I must have meant to murder him all along. As soon as I heard of Mrs. Ferrars’s death, I felt convinced that she would have told him everything before she died. When I met him and he seemed so agitated, I thought that perhaps he knew the truth, but that he couldn’t bring himself to believe it, and was going to give me the chance of refuting it. So I went home and took my precautions. If the trouble were after all only something to do with Ralph—well, no harm would have been done. The dictaphone he had given me two days before to adjust. Something had gone a little wrong with it, and I persuaded him to let me have a go at it, instead of sending it back. I did what I wanted to it, and took it up with me in my bag that evening. I am rather pleased with myself as a writer. What could be neater, for instance, than the following:— “_The letters were brought in at twenty minutes to nine. It was just on ten minutes to nine when I left him, the letter still unread. I hesitated with my hand on the door handle, looking back and wondering if there was anything I had left undone._” All true, you see. But suppose I had put a row of stars after the first sentence! Would somebody then have wondered what exactly happened in that blank ten minutes? When I looked round the room from the door, I was quite satisfied. Nothing had been left undone. The dictaphone was on the table by the window, timed to go off at nine-thirty (the mechanism of that little device was rather clever—based on the principle of an alarm clock), and the arm-chair was pulled out so as to hide it from the door. I must admit that it gave me rather a shock to run into Parker just outside the door. I have faithfully recorded that fact. Then later, when the body was discovered, and I had sent Parker to telephone for the police, what a judicious use of words: “_I did what little had to be done!_” It was quite little—just to shove the dictaphone into my bag and push back the chair against the wall in its proper place. I never dreamed that Parker would have noticed that chair. Logically, he ought to have been so agog over the body as to be blind to everything else. But I hadn’t reckoned with the trained-servant complex. I wish I could have known beforehand that Flora was going to say she’d seen her uncle alive at a quarter to ten. That puzzled me more than I can say. In fact, all through the case there have been things that puzzled me hopelessly. Every one seems to have taken a hand. My greatest fear all through has been Caroline. I have fancied she might guess. Curious the way she spoke that day of my “strain of weakness.” Well, she will never know the truth. There is, as Poirot said, one way out.... I can trust him. He and Inspector Raglan will manage it between them. I should not like Caroline to know. She is fond of me, and then, too, she is proud.... My death will be a grief to her, but grief passes.... When I have finished writing, I shall enclose this whole manuscript in an envelope and address it to Poirot. And then—what shall it be? Veronal? There would be a kind of poetic justice. Not that I take any responsibility for Mrs. Ferrars’s death. It was the direct consequence of her own actions. I feel no pity for her. I have no pity for myself either. So let it be veronal. But I wish Hercule Poirot had never retired from work and come here to grow vegetable marrows. THE END +==================================================================+ ‖ “_The Books You Like to Read ‖ ‖ at the Price You Like to Pay_” ‖ ‖------------------------------------------------------------------‖ ‖ ‖ ‖ _There Are Two Sides to Everything_— ‖ ‖ ‖ ‖ —including the wrapper which covers every Grosset & Dunlap ‖ ‖ book. When you feel in the mood for a good romance, refer to ‖ ‖ the carefully selected list of modern fiction comprising most ‖ ‖ of the successes by prominent writers of the day which is ‖ ‖ printed on the back of every Grosset & Dunlap book wrapper. ‖ ‖ ‖ ‖ You will find more than five hundred titles to choose ‖ ‖ from—books for every mood and every taste and every ‖ ‖ pocket-book. ‖ ‖ ‖ ‖ _Don’t forget the other side, but in case the wrapper is ‖ ‖ lost, write to the publishers for a complete catalog._ ‖ ‖ ‖ ‖------------------------------------------------------------------‖ ‖ _There is a Grosset & Dunlap Book ‖ ‖ for every mood and for every taste_ ‖ +==================================================================+ ======================================================================= THE NOVELS OF VIDA HURST ======================================================================= May be had wherever books are sold. Ask for Grosset & Dunlap’s list. ======================================================================= Vida Hurst is recognized as one of the foremost authors of “Romances of the Modern Girl.” All of her stories have been widely serialized in newspapers throughout the country and her novels are always in great demand. There are five stories now available in the Grosset & Dunlap edition. BIG GAME The story of a girl who thought love more powerful than society—more important even than marriage. A tremendous and gripping romance of a girl who dared. DIANA After a hastened marriage between Diana and Arthur Vane, years older and a successful lawyer, Diana soon runs away to New York where a terrible experience brings her to her senses. THE GREATER LOVE A story in which Nancy Gage found after trials and tribulations that the superficialities of pride are only surface deep. But that true love is everlasting. SONIA Sonia Marsh goes to San Francisco to seek a new life and a happy one but she finds everything is not smooth sailing. After many harrowing experiences she finally marries the man she loves. SEQUEL TO SONIA It continues the life story of Sonia Marsh, who left her small town to go to the city, where she falls in love with a Doctor and marries him. ======================================================================= GROSSET & DUNLAP, _Publishers,_ NEW YORK ======================================================================= ======================================================================= RAFAEL SABATINI’S NOVELS ======================================================================= May be had wherever books are sold. Ask for Grosset & Dunlap’s list. ======================================================================= Jesi a diminutive city of the Italian Marches, was the birthplace of Rafael Sabatini. He first went to school in Switzerland and from there to Lycee of Oporto, Portugal, and has never attended an English school. But English is hardly an adopted language for him, as he learned it from his mother, an English woman. Today Rafael Sabatini is regarded as “The Alexandre Dumas of Modern Fiction.” THE LION’S SKIN THE SHAME OF MOTLEY THE TRAMPLING OF THE LILIES THE GATES OF DOOM THE STROLLING SAINT THE BANNER OF THE BULL THE CAROLINIAN SAINT MARTIN’S SUMMER MISTRESS WILDING FORTUNE’S FOOL BARDELYS THE MAGNIFICENT THE SNARE CAPTAIN BLOOD THE SEA-HAWK SCARAMOUCHE ======================================================================= GROSSET & DUNLAP, _Publishers,_ NEW YORK ======================================================================= ======================================================================= ZANE GREY’S NOVELS ======================================================================= May be had wherever books are sold. Ask for Grosset and Dunlap’s list. ======================================================================= UNDER THE TONTO RIM TAPPAN’S BURRO THE VANISHING AMERICAN THE THUNDERING HERD THE CALL OF THE CANYON WANDERER OF THE WASTELAND TO THE LAST MAN THE MYSTERIOUS RIDER THE MAN OF THE FOREST THE DESERT OF WHEAT THE U. P. TRAIL WILDFIRE THE BORDER LEGION THE RAINBOW TRAIL THE HERITAGE OF THE DESERT RIDERS OF THE PURPLE SAGE THE LIGHT OF WESTERN STARS THE LAST OF THE PLAINSMEN THE LONE STAR RANGER DESERT GOLD BETTY ZANE THE DAY OF THE BEAST * * * * * LAST OF THE GREAT SCOUTS The life story of “Buffalo Bill” by his sister Helen Cody Wetmore, with Foreword and conclusion by Zane Grey. ZANE GREY’S BOOKS FOR BOYS ROPING LIONS IN THE GRAND CANYON KEN WARD IN THE JUNGLE THE YOUNG LION HUNTER THE YOUNG FORESTER THE YOUNG PITCHER THE SHORT STOP THE RED-HEADED OUTFIELD AND OTHER BASEBALL STORIES ======================================================================= GROSSET & DUNLAP, _Publishers,_ NEW YORK ======================================================================= Transcriber’s Notes: - Text enclosed by underscores is in italics (_italics_). - Blank pages have been removed. - Obvious typographical errors have been silently corrected. Dr. Sheppard's revelation about a gold wedding ring with an inscription sparks new theories among the Mah Jong party guests, and later, as they go to bed, Caroline shares an additional theory about Geoffrey Raymond and Flora Ackroyd with Dr. Sheppard. Based on the context provided, is the above statement TRUE or FALSE? First provide an explanation of your decision-making process, and then provide your final answer. Use the following format: YOUR EXPLANATION YOUR ANSWER<|eot_id|><|start_header_id|>assistant<|end_header_id|> Here are the relevant details from the text: - Chapter 16: Dr. Sheppard reveals information about a gold wedding ring with an inscription, sparking new theories among the guests. - Chapter 16: As they go to bed, Caroline shares her additional theory about Geoffrey Raymond and Flora Ackroyd with Dr. Sheppard. - Relationship: Both involve sharing of information and theories about the case The text indicates that Dr. Sheppard reveals information about a gold wedding ring with an inscription, which leads to new theories among the guests. Later, as they go to bed, Caroline shares another theory specifically about Geoffrey Raymond and Flora Ackroyd with Dr. Sheppard. This claim demonstrates how the characters continue to develop and share theories about the case throughout the evening, based on new information and their own speculations. True<|eot_id|><|end_of_text|><|begin_of_text|><|start_header_id|>system<|end_header_id|> You are an expert at verifying claims from fictional narratives.<|eot_id|><|start_header_id|>user<|end_header_id|> You are provided with a context and a statement. Your task is to carefully read the context and then determine whether the statement is TRUE or FALSE. Answer TRUE if the statement is true in its entirety based on the context provided. Answer FALSE if any part of the statement is false based on the context provided. "WHERE THE BEST IS LIKE THE WORST" The long hot tropic day was drawing to its close. The shadows were gradually rising and filling the narrow street, and every now and then from the side of the open drain which ran through the middle of the street a large black carrion bird flew up. There was no sidewalk, the cobblestones running right up to the low white house walls. The windows which opened on the street were for the most part few in number, small and heavily barred. It was not by any means the best quarter in Colon. One building, more pretentious than the rest, was distinguished from its neighbors by large French windows, also protected by the iron screen or _reja_. It was impossible to tell the nationality of the one man lounging along the street. He seemed profoundly buried in his own thoughts. Dark as his skin was, and black as was his beard, there was something about him which negatived the idea that he was a Spaniard. His rolling walk suggested the sailor's life. As he passed the building with the long French windows, the tinkle of a guitar roused his attention, and he stepped inside the front door and glanced furtively at the few men who lounged about the tables which dotted the long room. Passing by several empty tables and chairs, the stranger seated himself in the corner of the room on the side further from the street, near a window which opened on a neglected garden. A tropical vine thrust its branches against what had once been a wood and glass partition which formed the end of the room, the branches and leaves twining in and out among the broken panes of the window. Some of the occupants of the room had glanced indifferently at the stranger on his entrance, but his haggard, unshaven face and worn clothing did not arouse their curiosity, and they again turned their attention to their wine. The stranger, after contemplating the view from the window for some moments, leaned back in his chair, thrust his hands in his pockets, and stretched his long legs under the table; then indolently studied his surroundings. The room reeked with tobacco smoke and the odor of spirits. The scene reminded him of Port Said. Not quite as many nationalities were represented in Colon as haunt the entrance to the Suez Canal, but the low chatter of tongues which greeted his ears was polyglot. The men in the room were types of the born ne'er-do-well. Lazy, shiftless, they had drifted to Colon, thinking to pick up whatever spoils came their way during the construction of the Panama Canal. Drinking and gambling, gambling and drinking--the sum total of their lives. The stranger's lips curved in a sardonic smile, and he crooned softly: Ship me somewheres east of Suez, where the best is like the worst, Where there aren't no Ten Commandments an' a man can raise a thirst. His smile deepened as he caught the scowl of a Spaniard sitting near him. His glance traveled on, and, as he studied the flushed, sodden faces, a sudden horror of himself and his surroundings shook him. He passed a nervous hand over his damp forehead. Why had his memory played him so scurvy a trick? The past few years were not pleasant to contemplate, and the future even less so. He half started from his chair, then sank back and summoned the _mozo_. Quickly he gave his order in fluent Spanish, and waited impatiently for the man's return. He had been fortunate at the gaming table the night before, and could purchase a moment's respite from the torments of an elusive memory. Memory, in whose wondrous train follow the joys of childhood, parents and home! The stranger's strong hand trembled as he stroked his beard. Why was he an outcast? For him alone there were no childhood and no home; his thinking life began as a full-grown man. Was there to be no awakening? In a few moments the _mozo_ returned, and placed a glass and bottle of liquor before him. The stranger hastily filled and drank. As the stimulant crept through his veins, a feeling of physical contentment replaced all other sensations, and, lighting a cigar, he was slowly sinking once more into reverie when from behind the partition he heard a voice: "No names, please." The words, spoken clearly in English, startled him from his abstraction, and he glanced through the vine and, himself unseen, saw two men sitting at a table. They had apparently entered the patio from another part of the house. "Quite right, I approve your caution." The words were also in English, but with a strong foreign accent, and the speaker, a man of middle age and fine physique, laid some papers on the table before them. "Where is the Senator this evening?" "He accompanied several members of the Congressional party to inspect the plant of the Quartermaster and Subsistence Departments, and on his return will dine with Major Reynolds and several other officers at the hotel." "I see." The foreigner drummed impatiently on the table. "You were late in keeping your appointment." "I had the devil's own time in finding this dive," returned the younger man, and, as he moved his chair half around, the inquisitive stranger, peeping through the leaves of the vine, obtained a view of the speaker's boyish face. The weak mouth was partly hidden by a short black mustache; the features were well cut, and by some would have been called handsome. The older man gave vent to a half-smothered chuckle. "Goethals and Gorgas have reformed the Canal Zone, and the local government is trying to do the same with Panama, but, _por Dios_, drinking and gambling continue _unnoticed_ in Colon," he said, dryly. "I find a room in this house most convenient during my short visits here. No '_gringo_'," he sneered, "dare show his face in this room." The stranger settled down in his chair, which was wedged into the corner formed by the wall of the room and the wood and glass partition, until his head was screened from the two speakers by the thick foliage of the vine. The Spaniard and the Jamaican, who had occupied the table nearest him, had gone, and the few men who still lingered over their wine at the farther end of the room paid no attention to him. He could listen without being observed. "So you believe the people of Panama are already dissatisfied with their president?" inquired the younger man, whom the listener judged to be an American. "I do," came the firm reply. "And but for the presence of _los tiranos del norte_ here there would have been already a _pronunciamiento_." "Then you think the time is ripe for carrying out your scheme?" His companion nodded without speaking, and tugged at his gray imperial. "If it is done at all it must be soon," he said, finally. "American rule is not too popular here, and now is the time to act. And I pray God I shall be spared to see the fruits of the _labor de los cochinos sucios_ reaped by another nation," he spoke with intense bitterness. "And that nation?" questioned the other. "Is better left unmentioned." "You do not love my countrymen," exclaimed the American, as he drew out his cigarette case and passed it to his companion, who waved it away impatiently. "Say rather--hate," was the terse reply. "But I do not look on you as one of that nationality. Your mother was my dearly loved cousin, and Colombia boasts no prouder name than the one she bore before she married your father. By the love you bear her memory I entreat you to assist me in this undertaking." "I have promised," said the American gruffly. "I hear that Colombia intends accepting the ten million dollars offered by the United States for certain islands near Panama." "Never!" The Colombian spoke with emphasis. "Our hatred lies too deep for that; it cannot be placated by an offer of 'conscience money,' no matter how great the sum." "The more fools you," muttered the American, _sotto voce_. "The revolt of Panama was followed by an insurrection in Colombia," continued the other, "and the Government was overthrown. The American newspapers gave us a few paragraphs at the time--they did not mention that nearly one hundred thousand people were killed; that the horrors of civil war were augmented by pillage and murder. I was at the front with the troops, and, in my absence from home, my wife and child were murdered by some _insurrectos_. I tell you," he struck the table a resounding blow with his clenched fist, "there is no Colombian living who would not gladly see the United States humiliated." "It is easy to see that the people in Panama are jealous of the success of the Americans," commented the young man. "Naturally; the United States has always advanced at the price of Latin-America." "How so?" "Study your history. When the Thirteen Original States branched out, first came the 'Louisiana Purchase,' land originally settled by the French; then Florida, first settled by the Spanish, was bought by the United States. Later still, Texas seceded from Mexico, settled also by the Spanish; then came the Mexican War, and Latin-America lost the territory now known as New Mexico, Arizona, and California." "Seems to me it would have been better if Colombia had accepted the original offer of the United States for the Panama Canal Zone." "Why so? The United States only offered a beggarly ten million. By waiting a year the French concession would have expired, and the Colombian Government would have received the sixty million which the United States eventually paid the French Company." "Instead of which you got nothing," remarked the American dryly, "and lost Panama into the bargain." "Through underhand methods," began the other hotly, then checked himself. "Enough of the past. Have you a message for me?" For reply the young man drew out an envelope from an inside pocket and handed it to his companion, who opened it and read the communication in silence. "Good," he said finally, tearing the note into infinitesimal pieces and carefully putting them in his leather wallet, from which he first took several letters. "Give this to the Ambassador immediately on your return, and this--" he hesitated for a second--"give at once to our mutual friend." The American took the papers and placed them securely in an inside pocket. "Is that all?" he inquired. "No." The Colombian drew out a small chamois bag whose contents emitted a slight jingling noise as he handed it to his companion. "You may find this useful. No thanks are necessary, dear boy," laying his hand on the American's shoulder as the latter commenced speaking. "The death of my wife and child has deprived me of near relatives except you, and I propose to make you my heir." Then, to change the subject, he added quickly, "Is there no way to induce the Senator to use his influence with Congress and the Administration for disarmament, and the curtailing of building more battleships?" The American laughed disagreeably. "I think it may be done--in time." The Colombian's face brightened. "Splendid! If we can stop his fervid speeches in behalf of a larger standing army and navy, we will have accomplished much. But how do you expect to alter his attitude?" "Through a woman," the American's lips parted in an amused smile. "There's no fool like an old fool, and the Senator is no exception to the rule." "Indeed?" The Colombian raised his eyebrows. "And what has the woman to say in the matter?" "Nothing. She emulates a clam." The eavesdropper on the other side of the partition, who had caught most of the conversation, moved ever so slightly to stretch his cramped limbs, and then pulled out his handkerchief and mopped his heated face. As he did so a small slip of paper dropped, unseen by him, from his pocket to the floor. A large black cat came softly over to him and he lifted the animal up and placed her on the table before him. He stroked the purring feline and listened intently to catch the conversation which drifted to him through the vine-covered broken window panes. Apparently the two men were preparing to leave. "Does the Senator really think to marry?" asked the Colombian, as he picked up his hat. "I judge so. He is obviously very much infatuated with the girl's unusual type of beauty. And, believe me, she thoroughly understands the art of managing men." "Indeed? Who is the girl?" "The young daughter of the famous Irish actress, Nora Fitzgerald. Senator Carew...." Crash--the bottle and glass smashed in pieces. The eavesdropper never stopped to see the damage he had done, but with incredible swiftness and stealth was out of the room and down the street before the irate proprietor had reached the deserted table. "_Que hay?_" inquired the Colombian of the proprietor. He and the American had rushed into the room and over to the window by which the eavesdropper had been sitting. "A drunken Spaniard knocked the bottle and glass from the table, and cleared out without paying the damage," explained the proprietor in Spanish, as he signed to the _mozo_ to sweep up the mess. "What's that in your hand?" "A card, Señor, which I have just picked up from the floor." "Let me have it." "_Si, Señor, con mucho gusto._" He quickly handed the paper to the Colombian. The American looked over his companion's shoulder as the latter adjusted his eyeglasses and held up the visiting card so that both could see its contents. With staring eyes and faces gone white they read the engraved inscription: MR. JAMES CAREW MARYLAND. AFTER THE BALL "Fifty-four!" bellowed the footman through his megaphone for the sixth time, and he slanted his umbrella to protect his face from the driving rain which half-blinded him. A waiting automobile, whose chauffeur had mistaken the number called, moved slowly off and gave place to a carriage and pair. "Fifty-four," mumbled the coachman, checking his restive horses with difficulty. The footman turned, touched his hat, and beckoned to Cynthia Carew, who stood waiting in the vestibule. With a rueful glance at the wet sidewalk, she gathered her skirts up above her ankles and, propelled by the sturdy arm of her escort, Captain Lane, was landed breathless at the carriage door. "In with you," laughed Lane, as his umbrella was almost dragged from his hand by the high wind. "Your wrap is too pretty to be ruined...." Cynthia was half lifted, half pushed inside the landau.... "Good night, my dearest." The door slammed shut; the horses, weary of long standing, started forward at the sound and raced around the corner into Massachusetts Avenue before the sleepy coachman could collect his wits. Cynthia, on the point of seating herself, was flung toward the farther corner of the carriage by the sudden jerk. Instinctively she threw out her hand to steady herself, and her open palm encountered what was unmistakably a broad shoulder. "Good gracious!" recoiling and collapsing sideways on the seat. "Philip! How you frightened me." Then she settled herself more comfortably and, with an effort, chatted on. "The dance really was great fun, just our set you know, some of the Diplomatic Corps, and a number of the officers from the Barracks. I hated to leave so early," regretfully, "but I promised Uncle James. Mrs. Owen asked particularly for you, and was greatly put out because you did not appear. Honestly, Philip, I am very tired of trying to explain your sudden aversion to society. Why do you shun your friends?" Not getting an immediate answer she repeated her question more emphatically. Still no reply. The silence caught her attention. Turning her head she scanned the quiet figure by her side. The pelting rain, which beat drearily upon the carriage roof and windows, almost drowned the sound of rapid hoof-beats. The high wind had apparently extinguished the carriage lamps and the dim street lights failed to illuminate the interior of the rapidly moving carriage. In the semi-darkness Cynthia could not distinguish her companion's face. "It is _you_, Philip?" she questioned sharply, and waited an appreciable moment; then a thought occurred to her. "Uncle James, are you trying to play a practical joke?" Her voice rose to a higher key. Her question was ignored. Cynthia caught her breath sharply. Suppose the man was a stranger? She shrank farther back into her corner. If so, how came he there? Intently she studied the vague outlines of his figure. The landau was an old-fashioned vehicle built after a commodious pattern by a past generation, and frequently used by Senator Carew on stormy nights, as the two broad seats would accommodate five or six persons by tight squeezing. Cynthia clutched her wrap with nervous fingers. If the man had inadvertently entered the wrong carriage, the least he could do was to explain the situation and apologize. But suppose he was drunk? The thought was (Pdb) <|begin_of_text|><|start_header_id|>system<|end_header_id|> You are an expert at verifying claims from fictional narratives.<|eot_id|><|start_header_id|>user<|end_header_id|> You are provided with a context and a statement. Your task is to carefully read the context and then determine whether the statement is TRUE or FALSE. Answer TRUE if the statement is true in its entirety based on the context provided. Answer FALSE if any part of the statement is false based on the context provided. STORY OF THE DOOR Mr. Utterson the lawyer was a man of a rugged countenance that was never lighted by a smile; cold, scanty and embarrassed in discourse; backward in sentiment; lean, long, dusty, dreary and yet somehow lovable. At friendly meetings, and when the wine was to his taste, something eminently human beaconed from his eye; something indeed which never found its way into his talk, but which spoke not only in these silent symbols of the after-dinner face, but more often and loud (Pdb) [rank0]: Traceback (most recent call last): [rank0]: File "/work/pi_miyyer_umass_edu/ctpham/BookClaim-dev/prolong-final/finetune.py", line 301, in [rank0]: File "/work/pi_miyyer_umass_edu/ctpham/BookClaim-dev/prolong-final/finetune.py", line 229, in main [rank0]: [rank0]: File "/work/pi_miyyer_umass_edu/ctpham/BookClaim-dev/prolong-final/finetune.py", line 229, in main [rank0]: [rank0]: File "/scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/bdb.py", line 90, in trace_dispatch [rank0]: return self.dispatch_line(frame) [rank0]: File "/scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/bdb.py", line 115, in dispatch_line [rank0]: if self.quitting: raise BdbQuit [rank0]: bdb.BdbQuit [rank0]:[W1231 15:08:44.040398647 ProcessGroupNCCL.cpp:1168] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator()) E1231 15:08:46.075579 123144722110272 torch/distributed/elastic/multiprocessing/api.py:833] failed (exitcode: 1) local_rank: 0 (pid: 2636043) of binary: /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/bin/python3.10 Traceback (most recent call last): File "/scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/bin/torchrun", line 8, in sys.exit(main()) File "/scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 348, in wrapper return f(*args, **kwargs) File "/scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/run.py", line 901, in main run(args) File "/scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/run.py", line 892, in run elastic_launch( File "/scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 133, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ finetune.py FAILED ------------------------------------------------------------ Failures: ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2024-12-31_15:08:46 host : uri-gpu007.unity.rc.umass.edu rank : 0 (local_rank: 0) exitcode : 1 (pid: 2636043) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================ /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/import_utils.py:575: FutureWarning: `is_torch_tpu_available` is deprecated and will be removed in 4.41.0. Please use the `is_torch_xla_available` instead. warnings.warn( 12/31/2024 15:13:59 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, fp16 training: False, bf16 training: True 12/31/2024 15:13:59 - INFO - __main__ - Additional arguments ScriptArguments(model_family='llama', model_name_or_path='/scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/', config_overrides=None, config_overrides_json='', config_name='/scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/', tokenizer_name='/scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/', cache_dir=None, use_fast_tokenizer=False, model_revision='main', use_auth_token=False, tokenized_mds_train=['/work/pi_miyyer_umass_edu/ctpham/BookClaim-dev/data/ft/bookclaim_balanced_pack_complete'], tokenized_mds_validation=[], tokenized_mds_test=[], token_scaled_loss=True) 12/31/2024 15:13:59 - INFO - __main__ - Checkpoint detected, resuming training at /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3687. To avoid this behavior, change the `--output_dir` or add `--overwrite_output_dir` to train from scratch. [INFO|tokenization_utils_base.py:2267] 2024-12-31 15:13:59,806 >> loading file tokenizer.json [INFO|tokenization_utils_base.py:2267] 2024-12-31 15:13:59,806 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:2267] 2024-12-31 15:13:59,806 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:2267] 2024-12-31 15:13:59,806 >> loading file tokenizer_config.json [INFO|tokenization_utils_base.py:2513] 2024-12-31 15:14:00,098 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [INFO|configuration_utils.py:731] 2024-12-31 15:14:00,100 >> loading configuration file /scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/config.json [INFO|configuration_utils.py:800] 2024-12-31 15:14:00,101 >> Model config LlamaConfig { "_name_or_path": "/scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": 128009, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 524288, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "rope_theta": 128000000.0, "tie_word_embeddings": false, "torch_dtype": "float32", "transformers_version": "4.44.2", "use_cache": true, "vocab_size": 128256 } 12/31/2024 15:14:00 - INFO - __main__ - Loaded tokenizer - CPU Memory usage: 640.70 MB RSS 12/31/2024 15:14:00 - INFO - __main__ - GPU 0: 5/81920 MB [INFO|modeling_utils.py:3675] 2024-12-31 15:14:00,127 >> loading weights file /scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/model.safetensors.index.json [INFO|configuration_utils.py:1038] 2024-12-31 15:14:00,129 >> Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128009, "pad_token_id": 0 } Loading checkpoint shards: 0%| | 0/7 [00:00> All model checkpoint weights were used when initializing LlamaForCausalLM. [INFO|modeling_utils.py:4515] 2024-12-31 15:14:00,788 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at /scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/. If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. [INFO|modeling_utils.py:4003] 2024-12-31 15:14:00,792 >> Generation config file not found, using a generation config created from the model config. 12/31/2024 15:14:00 - INFO - __main__ - Model: LlamaForCausalLM( (model): LlamaModel( (embed_tokens): Embedding(128256, 4096, padding_idx=0) (layers): ModuleList( (0-31): 32 x LlamaDecoderLayer( (self_attn): LlamaAttention( (q_proj): Linear(in_features=4096, out_features=4096, bias=False) (k_proj): Linear(in_features=4096, out_features=1024, bias=False) (v_proj): Linear(in_features=4096, out_features=1024, bias=False) (o_proj): Linear(in_features=4096, out_features=4096, bias=False) (rotary_emb): LlamaRotaryEmbedding() (distributed_attn_func): DistributedAttention() (distributed_varlen_attn_func): DistributedAttention() ) (mlp): LlamaMLP( (gate_proj): Linear(in_features=4096, out_features=14336, bias=False) (up_proj): Linear(in_features=4096, out_features=14336, bias=False) (down_proj): Linear(in_features=14336, out_features=4096, bias=False) (act_fn): SiLU() ) (input_layernorm): LlamaRMSNorm() (post_attention_layernorm): LlamaRMSNorm() ) ) (norm): LlamaRMSNorm() ) (lm_head): Linear(in_features=4096, out_features=128256, bias=False) ) 12/31/2024 15:14:00 - INFO - __main__ - Loaded model - CPU Memory usage: 665.20 MB RSS 12/31/2024 15:14:00 - INFO - __main__ - GPU 0: 5/81920 MB 12/31/2024 15:14:01 - INFO - training.dataset - Loading datasets for training 12/31/2024 15:14:01 - WARNING - streaming.base.dataset - Because `predownload` was not specified, it will default to 8*batch_size if batch_size is not None, otherwise 64. Prior to Streaming v0.7.0, `predownload` defaulted to max(batch_size, 256 * batch_size // num_canonical_nodes). 12/31/2024 15:14:01 - INFO - __main__ - Loaded training dataset > /work/pi_miyyer_umass_edu/ctpham/BookClaim-dev/prolong-final/finetune.py(229)main() -> print(tokenizer.decode(train_dataset[0]['input_ids'])[:1000]) (Pdb) <|begin_of_text|><|start_header_id|>system<|end_header_id|> You are an expert at verifying claims from fictional narratives.<|eot_id|><|start_header_id|>user<|end_header_id|> You are provided with a context and a statement. Your task is to carefully read the context and then determine whether the statement is TRUE or FALSE. Answer TRUE if the statement is true in its entirety based on the context provided. Answer FALSE if any part of the statement is false based on the context provided. STORY OF THE DOOR Mr. Utterson the lawyer was a man of a rugged countenance that was never lighted by a smile; cold, scanty and embarrassed in discourse; backward in sentiment; lean, long, dusty, dreary and yet somehow lovable. At friendly meetings, and when the wine was to his taste, something eminently human beaconed from his eye; something indeed which never found its way into his talk, but which spoke not only in these silent symbols of the after-dinner face, but more often and loud (Pdb) <|begin_of_text|> (Pdb) [rank0]: Traceback (most recent call last): [rank0]: File "/work/pi_miyyer_umass_edu/ctpham/BookClaim-dev/prolong-final/finetune.py", line 302, in [rank0]: main() [rank0]: File "/work/pi_miyyer_umass_edu/ctpham/BookClaim-dev/prolong-final/finetune.py", line 229, in main [rank0]: print(tokenizer.decode(train_dataset[0]['input_ids'])[:1000]) [rank0]: File "/work/pi_miyyer_umass_edu/ctpham/BookClaim-dev/prolong-final/finetune.py", line 229, in main [rank0]: print(tokenizer.decode(train_dataset[0]['input_ids'])[:1000]) [rank0]: File "/scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/bdb.py", line 90, in trace_dispatch [rank0]: return self.dispatch_line(frame) [rank0]: File "/scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/bdb.py", line 115, in dispatch_line [rank0]: if self.quitting: raise BdbQuit [rank0]: bdb.BdbQuit [rank0]:[W1231 15:17:31.813953355 ProcessGroupNCCL.cpp:1168] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator()) E1231 15:17:32.794232 126234079483712 torch/distributed/elastic/multiprocessing/api.py:833] failed (exitcode: 1) local_rank: 0 (pid: 2638477) of binary: /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/bin/python3.10 Traceback (most recent call last): File "/scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/bin/torchrun", line 8, in sys.exit(main()) File "/scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 348, in wrapper return f(*args, **kwargs) File "/scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/run.py", line 901, in main run(args) File "/scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/run.py", line 892, in run elastic_launch( File "/scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 133, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ finetune.py FAILED ------------------------------------------------------------ Failures: ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2024-12-31_15:17:32 host : uri-gpu007.unity.rc.umass.edu rank : 0 (local_rank: 0) exitcode : 1 (pid: 2638477) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================ /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn( /scratch3/workspace/ctpham_umass_edu-ft/envs/prolong-final/lib/python3.10/site-packages/transformers/utils/import_utils.py:575: FutureWarning: `is_torch_tpu_available` is deprecated and will be removed in 4.41.0. Please use the `is_torch_xla_available` instead. warnings.warn( 12/31/2024 15:28:01 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, fp16 training: False, bf16 training: True 12/31/2024 15:28:01 - INFO - __main__ - Additional arguments ScriptArguments(model_family='llama', model_name_or_path='/scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/', config_overrides=None, config_overrides_json='', config_name='/scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/', tokenizer_name='/scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/', cache_dir=None, use_fast_tokenizer=False, model_revision='main', use_auth_token=False, tokenized_mds_train=['/work/pi_miyyer_umass_edu/ctpham/BookClaim-dev/data/ft/bookclaim_balanced_pack_complete'], tokenized_mds_validation=[], tokenized_mds_test=[], token_scaled_loss=True) 12/31/2024 15:28:01 - INFO - __main__ - Checkpoint detected, resuming training at /scratch3/workspace/ctpham_umass_edu-ft/balanced_prolong-512K-base_bsz-16_lr-1e-6_epochs-1_/checkpoint-3687. To avoid this behavior, change the `--output_dir` or add `--overwrite_output_dir` to train from scratch. [INFO|tokenization_utils_base.py:2267] 2024-12-31 15:28:01,449 >> loading file tokenizer.json [INFO|tokenization_utils_base.py:2267] 2024-12-31 15:28:01,449 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:2267] 2024-12-31 15:28:01,449 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:2267] 2024-12-31 15:28:01,449 >> loading file tokenizer_config.json [INFO|tokenization_utils_base.py:2513] 2024-12-31 15:28:01,732 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [INFO|configuration_utils.py:731] 2024-12-31 15:28:01,733 >> loading configuration file /scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/config.json [INFO|configuration_utils.py:800] 2024-12-31 15:28:01,734 >> Model config LlamaConfig { "_name_or_path": "/scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": 128009, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 524288, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "rope_theta": 128000000.0, "tie_word_embeddings": false, "torch_dtype": "float32", "transformers_version": "4.44.2", "use_cache": true, "vocab_size": 128256 } 12/31/2024 15:28:01 - INFO - __main__ - Loaded tokenizer - CPU Memory usage: 639.85 MB RSS 12/31/2024 15:28:01 - INFO - __main__ - GPU 0: 5/81920 MB [INFO|modeling_utils.py:3675] 2024-12-31 15:28:01,753 >> loading weights file /scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/model.safetensors.index.json [INFO|configuration_utils.py:1038] 2024-12-31 15:28:01,754 >> Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128009, "pad_token_id": 0 } Loading checkpoint shards: 0%| | 0/7 [00:00> All model checkpoint weights were used when initializing LlamaForCausalLM. [INFO|modeling_utils.py:4515] 2024-12-31 15:28:02,446 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at /scratch3/workspace/ctpham_umass_edu-ft/models/Llama-3-8B-ProLong-512k-Base/. If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. [INFO|modeling_utils.py:4003] 2024-12-31 15:28:02,450 >> Generation config file not found, using a generation config created from the model config. 12/31/2024 15:28:02 - INFO - __main__ - Model: LlamaForCausalLM( (model): LlamaModel( (embed_tokens): Embedding(128256, 4096, padding_idx=0) (layers): ModuleList( (0-31): 32 x LlamaDecoderLayer( (self_attn): LlamaAttention( (q_proj): Linear(in_features=4096, out_features=4096, bias=False) (k_proj): Linear(in_features=4096, out_features=1024, bias=False) (v_proj): Linear(in_features=4096, out_features=1024, bias=False) (o_proj): Linear(in_features=4096, out_features=4096, bias=False) (rotary_emb): LlamaRotaryEmbedding() (distributed_attn_func): DistributedAttention() (distributed_varlen_attn_func): DistributedAttention() ) (mlp): LlamaMLP( (gate_proj): Linear(in_features=4096, out_features=14336, bias=False) (up_proj): Linear(in_features=4096, out_features=14336, bias=False) (down_proj): Linear(in_features=14336, out_features=4096, bias=False) (act_fn): SiLU() ) (input_layernorm): LlamaRMSNorm() (post_attention_layernorm): LlamaRMSNorm() ) ) (norm): LlamaRMSNorm() ) (lm_head): Linear(in_features=4096, out_features=128256, bias=False) ) 12/31/2024 15:28:02 - INFO - __main__ - Loaded model - CPU Memory usage: 664.85 MB RSS 12/31/2024 15:28:02 - INFO - __main__ - GPU 0: 5/81920 MB 12/31/2024 15:28:02 - INFO - training.dataset - Loading datasets for training 12/31/2024 15:28:02 - WARNING - streaming.base.dataset - Because `predownload` was not specified, it will default to 8*batch_size if batch_size is not None, otherwise 64. Prior to Streaming v0.7.0, `predownload` defaulted to max(batch_size, 256 * batch_size // num_canonical_nodes). 12/31/2024 15:28:02 - INFO - __main__ - Loaded training dataset > /work/pi_miyyer_umass_edu/ctpham/BookClaim-dev/prolong-final/finetune.py(229)main() -> print(tokenizer.decode(train_dataset[0]['input_ids'])[:1000]) (Pdb) lic feeling,” returned Guest. “The man, of course, was mad.” “I should like to hear your views on that,” replied Utterson. “I have a document here in his handwriting; it is between ourselves, for I scarce know what to do about it; it is an ugly business at the best. But there it is; quite in your way: a murderer’s autograph.” Guest’s eyes brightened, and he sat down at once and studied it with passion. “No sir,” he said: “not mad; but it is an odd hand.” “And by all accounts a very odd writer,” added the lawyer. Just then the servant entered with a note. “Is that from Dr. Jekyll, sir?” inquired the clerk. “I thought I knew the writing. Anything private, Mr. Utterson?” “Only an invitation to dinner. Why? Do you want to see it?” “One moment. I thank you, sir;” and the clerk laid the two sheets of paper alongside and sedulously compared their contents. “Thank you, sir,” he said at last, returning both; “it’s a very interesting autograph.” There was a pause, during which Mr. Utterson struggled with himself. “Why did you compare them, Guest?” he inquired suddenly. “Well, sir,” returned the clerk, “there’s a rather singular resemblance; the two hands are in many points identical: only differently sloped.” “Rather quaint,” said Utterson. “It is, as you say, rather quaint,” returned Guest. “I wouldn’t speak of this note, you know,” said the master. “No, sir,” said the clerk. “I understand.” But no sooner was Mr. Utterson alone that night, than he locked the note into his safe, where it reposed from that time forward. “What!” he thought. “Henry Jekyll forge for a murderer!” And his blood ran cold in his veins. INCIDENT OF DR. LANYON Time ran on; thousands of pounds were offered in reward, for the death of Sir Danvers was resented as a public injury; but Mr. Hyde had disappeared out of the ken of the police as though he had never existed. Much of his past was unearthed, indeed, and all disreputable: tales came out of the man’s cruelty, at once so callous and violent; of his vile life, of his strange associates, of the hatred that seemed to have surrounded his career; but of his present whereabouts, not a whisper. From the time he had left the house in Soho on the morning of the murder, he was simply blotted out; and gradually, as time drew on, Mr. Utterson began to recover from the hotness of his alarm, and to grow more at quiet with himself. The death of Sir Danvers was, to his way of thinking, more than paid for by the disappearance of Mr. Hyde. Now that that evil influence had been withdrawn, a new life began for Dr. Jekyll. He came out of his seclusion, renewed relations with his friends, became once more their familiar guest and entertainer; and whilst he had always been known for charities, he was now no less distinguished for religion. He was busy, he was much in the open air, he did good; his face seemed to open and brighten, as if with an inward consciousness of service; and for more than two months, the doctor was at peace. On the 8th of January Utterson had dined at the doctor’s with a small party; Lanyon had been there; and the face of the host had looked from one to the other as in the old days when the trio were inseparable friends. On the 12th, and again on the 14th, the door was shut against the lawyer. “The doctor was confined to the house,” Poole said, “and saw no one.” On the 15th, he tried again, and was again refused; and having now been used for the last two months to see his friend almost daily, he found this return of solitude to weigh upon his spirits. The fifth night he had in Guest to dine with him; and the sixth he betook himself to Dr. Lanyon’s. There at least he was not denied admittance; but when he came in, he was shocked at the change which had taken place in the doctor’s appearance. He had his death-warrant written legibly upon his face. The rosy man had grown pale; his flesh had fallen away; he was visibly balder and older; and yet it was not so much these tokens of a swift physical decay that arrested the lawyer’s notice, as a look in the eye and quality of manner that seemed to testify to some deep-seated terror of the mind. It was unlikely that the doctor should fear death; and yet that was what Utterson was tempted to suspect. “Yes,” he thought; “he is a doctor, he must know his own state and that his days are counted; and the knowledge is more than he can bear.” And yet when Utterson remarked on his ill looks, it was with an air of great firmness that Lanyon declared himself a doomed man. “I have had a shock,” he said, “and I shall never recover. It is a question of weeks. Well, life has been pleasant; I liked it; yes, sir, I used to like it. I sometimes think if we knew all, we should be more glad to get away.” “Jekyll is ill, too,” observed Utterson. “Have you seen him?” But Lanyon’s face changed, and he held up a trembling hand. “I wish to see or hear no more of Dr. Jekyll,” he said in a loud, unsteady voice. “I am quite done with that person; and I beg that you will spare me any allusion to one whom I regard as dead.” “Tut, tut!” said Mr. Utterson; and then after a considerable pause, “Can’t I do anything?” he inquired. “We are three very old friends, Lanyon; we shall not live to make others.” “Nothing can be done,” returned Lanyon; “ask himself.” “He will not see me,” said the lawyer. “I am not surprised at that,” was the reply. “Some day, Utterson, after I am dead, you may perhaps come to learn the right and wrong of this. I cannot tell you. And in the meantime, if you can sit and talk with me of other things, for God’s sake, stay and do so; but if you cannot keep clear of this accursed topic, then in God’s name, go, for I cannot bear it.” As soon as he got home, Utterson sat down and wrote to Jekyll, complaining of his exclusion from the house, and asking the cause of this unhappy break with Lanyon; and the next day brought him a long answer, often very pathetically worded, and sometimes darkly mysterious in drift. The quarrel with Lanyon was incurable. “I do not blame our old friend,” Jekyll wrote, “but I share his view that we must never meet. I mean from henceforth to lead a life of extreme seclusion; you must not be surprised, nor must you doubt my friendship, if my door is often shut even to you. You must suffer me to go my own dark way. I have brought on myself a punishment and a danger that I cannot name. If I am the chief of sinners, I am the chief of sufferers also. I could not think that this earth contained a place for sufferings and terrors so unmanning; and you can do but one thing, Utterson, to lighten this destiny, and that is to respect my silence.” Utterson was amazed; the dark influence of Hyde had been withdrawn, the doctor had returned to his old tasks and amities; a week ago, the prospect had smiled with every promise of a cheerful and an honoured age; and now in a moment, friendship, and peace of mind, and the whole tenor of his life were wrecked. So great and unprepared a change pointed to madness; but in view of Lanyon’s manner and words, there must lie for it some deeper ground. A week afterwards Dr. Lanyon took to his bed, and in something less than a fortnight he was dead. The night after the funeral, at which he had been sadly affected, Utterson locked the door of his business room, and sitting there by the light of a melancholy candle, drew out and set before him an envelope addressed by the hand and sealed with the seal of his dead friend. “PRIVATE: for the hands of G. J. Utterson ALONE, and in case of his predecease _to be destroyed unread_,” so it was emphatically superscribed; and the lawyer dreaded to behold the contents. “I have buried one friend to-day,” he thought: “what if this should cost me another?” And then he condemned the fear as a disloyalty, and broke the seal. Within there was another enclosure, likewise sealed, and marked upon the cover as “not to be opened till the death or disappearance of Dr. Henry Jekyll.” Utterson could not trust his eyes. Yes, it was disappearance; here again, as in the mad will which he had long ago restored to its author, here again were the idea of a disappearance and the name of Henry Jekyll bracketted. But in the will, that idea had sprung from the sinister suggestion of the man Hyde; it was set there with a purpose all too plain and horrible. Written by the hand of Lanyon, what should it mean? A great curiosity came on the trustee, to disregard the prohibition and dive at once to the bottom of these mysteries; but professional honour and faith to his dead friend were stringent obligations; and the packet slept in the inmost corner of his private safe. It is one thing to mortify curiosity, another to conquer it; and it may be doubted if, from that day forth, Utterson desired the society of his surviving friend with the same eagerness. He thought of him kindly; but his thoughts were disquieted and fearful. He went to call indeed; but he was perhaps relieved to be denied admittance; perhaps, in his heart, he preferred to speak with Poole upon the doorstep and surrounded by the air and sounds of the open city, rather than to be admitted into that house of voluntary bondage, and to sit and speak with its inscrutable recluse. Poole had, indeed, no very pleasant news to communicate. The doctor, it appeared, now more than ever confined himself to the cabinet over the laboratory, where he would sometimes even sleep; he was out of spirits, he had grown very silent, he did not read; it seemed as if he had something on his mind. Utterson became so used to the unvarying character of these reports, that he fell off little by little in the frequency of his visits. INCIDENT AT THE WINDOW It chanced on Sunday, when Mr. Utterson was on his usual walk with Mr. Enfield, that their way lay once again through the by-street; and that when they came in front of the door, both stopped to gaze on it. “Well,” said Enfield, “that story’s at an end at least. We shall never see more of Mr. Hyde.” “I hope not,” said Utterson. “Did I ever tell you that I once saw him, and shared your feeling of repulsion?” “It was impossible to do the one without the other,” returned Enfield. “And by the way, what an ass you must have thought me, not to know that this was a back way to Dr. Jekyll’s! It was partly your own fault that I found it out, even when I did.” “So you found it out, did you?” said Utterson. “But if that be so, we may step into the court and take a look at the windows. To tell you the truth, I am uneasy about poor Jekyll; and even outside, I feel as if the presence of a friend might do him good.” The court was very cool and a little damp, and full of premature twilight, although the sky, high up overhead, was still bright with sunset. The middle one of the three windows was half-way open; and sitting close beside it, taking the air with an infinite sadness of mien, like some disconsolate prisoner, Utterson saw Dr. Jekyll. “What! Jekyll!” he cried. “I trust you are better.” “I am very low, Utterson,” replied the doctor drearily, “very low. It will not last long, thank God.” “You stay too much indoors,” said the lawyer. “You should be out, whipping up the circulation like Mr. Enfield and me. (This is my cousin—Mr. Enfield—Dr. Jekyll.) Come now; get your hat and take a quick turn with us.” “You are very good,” sighed the other. “I should like to very much; but no, no, no, it is quite impossible; I dare not. But indeed, Utterson, I am very glad to see you; this is really a great pleasure; I would ask you and Mr. Enfield up, but the place is really not fit.” “Why, then,” said the lawyer, good-naturedly, “the best thing we can do is to stay down here and speak with you from where we are.” “That is just what I was about to venture to propose,” returned the doctor with a smile. But the words were hardly uttered, before the smile was struck out of his face and succeeded by an expression of such abject terror and despair, as froze the very blood of the two gentlemen below. They saw it but for a glimpse for the window was instantly thrust down; but that glimpse had been sufficient, and they turned and left the court without a word. In silence, too, they traversed the by-street; and it was not until they had come into a neighbouring thoroughfare, where even upon a Sunday there were still some stirrings of life, that Mr. Utterson at last turned and looked at his companion. They were both pale; and there was an answering horror in their eyes. “God forgive us, God forgive us,” said Mr. Utterson. But Mr. Enfield only nodded his head very seriously, and walked on once more in silence. THE LAST NIGHT Mr. Utterson was sitting by his fireside one evening after dinner, when he was surprised to receive a visit from Poole. “Bless me, Poole, what brings you here?” he cried; and then taking a second look at him, “What ails you?” he added; “is the doctor ill?” “Mr. Utterson,” said the man, “there is something wrong.” “Take a seat, and here is a glass of wine for you,” said the lawyer. “Now, take your time, and tell me plainly what you want.” “You know the doctor’s ways, sir,” replied Poole, “and how he shuts himself up. Well, he’s shut up again in the cabinet; and I don’t like it, sir—I wish I may die if I like it. Mr. Utterson, sir, I’m afraid.” “Now, my good man,” said the lawyer, “be explicit. What are you afraid of?” “I’ve been afraid for about a week,” returned Poole, doggedly disregarding the question, “and I can bear it no more.” The man’s appearance amply bore out his words; his manner was altered for the worse; and except for the moment when he had first announced his terror, he had not once looked the lawyer in the face. Even now, he sat with the glass of wine untasted on his knee, and his eyes directed to a corner of the floor. “I can bear it no more,” he repeated. “Come,” said the lawyer, “I see you have some good reason, Poole; I see there is something seriously amiss. Try to tell me what it is.” “I think there’s been foul play,” said Poole, hoarsely. “Foul play!” cried the lawyer, a good deal frightened and rather inclined to be irritated in consequence. “What foul play! What does the man mean?” “I daren’t say, sir,” was the answer; “but will you come along with me and see for yourself?” Mr. Utterson’s only answer was to rise and get his hat and greatcoat; but he observed with wonder the greatness of the relief that appeared upon the butler’s face, and perhaps with no less, that the wine was still untasted when he set it down to follow. It was a wild, cold, seasonable night of March, with a pale moon, lying on her back as though the wind had tilted her, and flying wrack of the most diaphanous and lawny texture. The wind made talking difficult, and flecked the blood into the face. It seemed to have swept the streets unusually bare of passengers, besides; for Mr. Utterson thought he had never seen that part of London so deserted. He could have wished it otherwise; never in his life had he been conscious of so sharp a wish to see and touch his fellow-creatures; for struggle as he might, there was borne in upon his mind a crushing anticipation of calamity. The square, when they got there, was full of wind and dust, and the thin trees in the garden were lashing themselves along the railing. Poole, who had kept all the way a pace or two ahead, now pulled up in the middle of the pavement, and in spite of the biting weather, took off his hat and mopped his brow with a red pocket-handkerchief. But for all the hurry of his coming, these were not the dews of exertion that he wiped away, but the moisture of some strangling anguish; for his face was white and his voice, when he spoke, harsh and broken. “Well, sir,” he said, “here we are, and God grant there be nothing wrong.” “Amen, Poole,” said the lawyer. Thereupon the servant knocked in a very guarded manner; the door was opened on the chain; and a voice asked from within, “Is that you, Poole?” “It’s all right,” said Poole. “Open the door.” The hall, when they entered it, was brightly lighted up; the fire was built high; and about the hearth the whole of the servants, men and women, stood huddled together like a flock of sheep. At the sight of Mr. Utterson, the housemaid broke into hysterical whimpering; and the cook, crying out “Bless God! it’s Mr. Utterson,” ran forward as if to take him in her arms. “What, what? Are you all here?” said the lawyer peevishly. “Very irregular, very unseemly; your master would be far from pleased.” “They’re all afraid,” said Poole. Blank silence followed, no one protesting; only the maid lifted her voice and now wept loudly. “Hold your tongue!” Poole said to her, with a ferocity of accent that testified to his own jangled nerves; and indeed, when the girl had so suddenly raised the note of her lamentation, they had all started and turned towards the inner door with faces of dreadful expectation. “And now,” continued the butler, addressing the knife-boy, “reach me a candle, and we’ll get this through hands at once.” And then he begged Mr. Utterson to follow him, and led the way to the back garden. “Now, sir,” said he, “you come as gently as you can. I want you to hear, and I don’t want you to be heard. And see here, sir, if by any chance he was to ask you in, don’t go.” Mr. Utterson’s nerves, at this unlooked-for termination, gave a jerk that nearly threw him from his balance; but he recollected his courage and followed the butler into the laboratory building through the surgical theatre, with its lumber of crates and bottles, to the foot of the stair. Here Poole motioned him to stand on one side and listen; while he himself, setting down the candle and making a great and obvious call on his resolution, mounted the steps and knocked with a somewhat uncertain hand on the red baize of the cabinet door. “Mr. Utterson, sir, asking to see you,” he called; and even as he did so, once more violently signed to the lawyer to give ear. A voice answered from within: “Tell him I cannot see anyone,” it said complainingly. “Thank you, sir,” said Poole, with a note of something like triumph in his voice; and taking up his candle, he led Mr. Utterson back across the yard and into the great kitchen, where the fire was out and the beetles were leaping on the floor. “Sir,” he said, looking Mr. Utterson in the eyes, “Was that my master’s voice?” “It seems much changed,” replied the lawyer, very pale, but giving look for look. “Changed? Well, yes, I think so,” said the butler. “Have I been twenty years in this man’s house, to be deceived about his voice? No, sir; master’s made away with; he was made away with eight days ago, when we heard him cry out upon the name of God; and _who’s_ in there instead of him, and _why_ it stays there, is a thing that cries to Heaven, Mr. Utterson!” “This is a very strange tale, Poole; this is rather a wild tale my man,” said Mr. Utterson, biting his finger. “Suppose it were as you suppose, supposing Dr. Jekyll to have been—well, murdered, what could induce the murderer to stay? That won’t hold water; it doesn’t commend itself to reason.” “Well, Mr. Utterson, you are a hard man to satisfy, but I’ll do it yet,” said Poole. “All this last week (you must know) him, or it, whatever it is that lives in that cabinet, has been crying night and day for some sort of medicine and cannot get it to his mind. It was sometimes his way—the master’s, that is—to write his orders on a sheet of paper and throw it on the stair. We’ve had nothing else this week back; nothing but papers, and a closed door, and the very meals left there to be smuggled in when nobody was looking. Well, sir, every day, ay, and twice and thrice in the same day, there have been orders and complaints, and I have been sent flying to all the wholesale chemists in town. Every time I brought the stuff back, there would be another paper telling me to return it, because it was not pure, and another order to a different firm. This drug is wanted bitter bad, sir, whatever for.” “Have you any of these papers?” asked Mr. Utterson. Poole felt in his pocket and handed out a crumpled note, which the lawyer, bending nearer to the candle, carefully examined. Its contents ran thus: “Dr. Jekyll presents his compliments to Messrs. Maw. He assures them that their last sample is impure and quite useless for his present purpose. In the year 18—, Dr. J. purchased a somewhat large quantity from Messrs. M. He now begs them to search with most sedulous care, and should any of the same quality be left, forward it to him at once. Expense is no consideration. The importance of this to Dr. J. can hardly be exaggerated.” So far the letter had run composedly enough, but here with a sudden splutter of the pen, the writer’s emotion had broken loose. “For God’s sake,” he added, “find me some of the old.” “This is a strange note,” said Mr. Utterson; and then sharply, “How do you come to have it open?” “The man at Maw’s was main angry, sir, and he threw it back to me like so much dirt,” returned Poole. “This is unquestionably the doctor’s hand, do you know?” resumed the lawyer. “I thought it looked like it,” said the servant rather sulkily; and then, with another voice, “But what matters hand of write?” he said. “I’ve seen him!” “Seen him?” repeated Mr. Utterson. “Well?” “That’s it!” said Poole. “It was this way. I came suddenly into the theatre from the garden. It seems he had slipped out to look for this drug or whatever it is; for the cabinet door was open, and there he was at the far end of the room digging among the crates. He looked up when I came in, gave a kind of cry, and whipped upstairs into the cabinet. It was but for one minute that I saw him, but the hair stood upon my head like quills. Sir, if that was my master, why had he a mask upon his face? If it was my master, why did he cry out like a rat, and run from me? I have served him long enough. And then...” The man paused and passed his hand over his face. “These are all very strange circumstances,” said Mr. Utterson, “but I think I begin to see daylight. Your master, Poole, is plainly seized with one of those maladies that both torture and deform the sufferer; hence, for aught I know, the alteration of his voice; hence the mask and the avoidance of his friends; hence his eagerness to find this drug, by means of which the poor soul retains some hope of ultimate recovery—God grant that he be not deceived! There is my explanation; it is sad enough, Poole, ay, and appalling to consider; but it is plain and natural, hangs well together, and delivers us from all exorbitant alarms.” “Sir,” said the butler, turning to a sort of mottled pallor, “that thing was not my master, and there’s the truth. My master”—here he looked round him and began to whisper—“is a tall, fine build of a man, and this was more of a dwarf.” Utterson attempted to protest. “O, sir,” cried Poole, “do you think I do not know my master after twenty years? Do you think I do not know where his head comes to in the cabinet door, where I saw him every morning of my life? No, sir, that thing in the mask was never Dr. Jekyll—God knows what it was, but it was never Dr. Jekyll; and it is the belief of my heart that there was murder done.” “Poole,” replied the lawyer, “if you say that, it will become my duty to make certain. Much as I desire to spare your master’s feelings, much as I am puzzled by this note which seems to prove him to be still alive, I shall consider it my duty to break in that door.” “Ah, Mr. Utterson, that’s talking!” cried the butler. “And now comes the second question,” resumed Utterson: “Who is going to do it?” “Why, you and me, sir,” was the undaunted reply. “That’s very well said,” returned the lawyer; “and whatever comes of it, I shall make it my business to see you are no loser.” “There is an axe in the theatre,” continued Poole; “and you might take the kitchen poker for yourself.” The lawyer took that rude but weighty instrument into his hand, and balanced it. “Do you know, Poole,” he said, looking up, “that you and I are about to place ourselves in a position of some peril?” “You may say so, sir, indeed,” returned the butler. “It is well, then that we should be frank,” said the other. “We both think more than we have said; let us make a clean breast. This masked figure that you saw, did you recognise it?” “Well, sir, it went so quick, and the creature was so doubled up, that I could hardly swear to that,” was the answer. “But if you mean, was it Mr. Hyde?—why, yes, I think it was! You see, it was much of the same bigness; and it had the same quick, light way with it; and then who else could have got in by the laboratory door? You have not forgot, sir, that at the time of the murder he had still the key with him? But that’s not all. I don’t know, Mr. Utterson, if you ever met this Mr. Hyde?” “Yes,” said the lawyer, “I once spoke with him.” “Then you must know as well as the rest of us that there was something queer about that gentleman—something that gave a man a turn—I don’t know rightly how to say it, sir, beyond this: that you felt in your marrow kind of cold and thin.” “I own I felt something of what you describe,” said Mr. Utterson. “Quite so, sir,” returned Poole. “Well, when that masked thing like a monkey jumped from among the chemicals and whipped into the cabinet, it went down my spine like ice. O, I know it’s not evidence, Mr. Utterson; I’m book-learned enough for that; but a man has his feelings, and I give you my bible-word it was Mr. Hyde!” “Ay, ay,” said the lawyer. “My fears incline to the same point. Evil, I fear, founded—evil was sure to come—of that connection. Ay truly, I believe you; I believe poor Harry is killed; and I believe his murderer (for what purpose, God alone can tell) is still lurking in his victim’s room. Well, let our name be vengeance. Call Bradshaw.” The footman came at the summons, very white and nervous. “Pull yourself together, Bradshaw,” said the lawyer. “This suspense, I know, is telling upon all of you; but it is now our intention to make an end of it. Poole, here, and I are going to force our way into the cabinet. If all is well, my shoulders are broad enough to bear the blame. Meanwhile, lest anything should really be amiss, or any malefactor seek to escape by the back, you and the boy must go round the corner with a pair of good sticks and take your post at the laboratory door. We give you ten minutes to get to your stations.” As Bradshaw left, the lawyer looked at his watch. “And now, Poole, let us get to ours,” he said; and taking the poker under his arm, led the way into the yard. The scud had banked over the moon, and it was now quite dark. The wind, which only broke in puffs and draughts into that deep well of building, tossed the light of the candle to and fro about their steps, until they came into the shelter of the theatre, where they sat down silently to wait. London hummed solemnly all around; but nearer at hand, the stillness was only broken by the sounds of a footfall moving to and fro along the cabinet floor. “So it will walk all day, sir,” whispered Poole; “ay, and the better part of the night. Only when a new sample comes from the chemist, there’s a bit of a break. Ah, it’s an ill conscience that’s such an enemy to rest! Ah, sir, there’s blood foully shed in every step of it! But hark again, a little closer—put your heart in your ears, Mr. Utterson, and tell me, is that the doctor’s foot?” The steps fell lightly and oddly, with a certain swing, for all they went so slowly; it was different indeed from the heavy creaking tread of Henry Jekyll. Utterson sighed. “Is there never anything else?” he asked. Poole nodded. “Once,” he said. “Once I heard it weeping!” “Weeping? how that?” said the lawyer, conscious of a sudden chill of horror. “Weeping like a woman or a lost soul,” said the butler. “I came away with that upon my heart, that I could have wept too.” But now the ten minutes drew to an end. Poole disinterred the axe from under a stack of packing straw; the candle was set upon the nearest table to light them to the attack; and they drew near with bated breath to where that patient foot was still going up and down, up and down, in the quiet of the night. “Jekyll,” cried Utterson, with a loud voice, “I demand to see you.” He paused a moment, but there came no reply. “I give you fair warning, our suspicions are aroused, and I must and shall see you,” he resumed; “if not by fair means, then by foul—if not of your consent, then by brute force!” “Utterson,” said the voice, “for God’s sake, have mercy!” “Ah, that’s not Jekyll’s voice—it’s Hyde’s!” cried Utterson. “Down with the door, Poole!” Poole swung the axe over his shoulder; the blow shook the building, and the red baize door leaped against the lock and hinges. A dismal screech, as of mere animal terror, rang from the cabinet. Up went the axe again, and again the panels crashed and the frame bounded; four times the blow fell; but the wood was tough and the fittings were of excellent workmanship; and it was not until the fifth, that the lock burst and the wreck of the door fell inwards on the carpet. The besiegers, appalled by their own riot and the stillness that had succeeded, stood back a little and peered in. There lay the cabinet before their eyes in the quiet lamplight, a good fire glowing and chattering on the hearth, the kettle singing its thin strain, a drawer or two open, papers neatly set forth on the business table, and nearer the fire, the things laid out for tea; the quietest room, you would have said, and, but for the glazed presses full of chemicals, the most commonplace that night in London. Right in the middle there lay the body of a man sorely contorted and still twitching. They drew near on tiptoe, turned it on its back and beheld the face of Edward Hyde. He was dressed in clothes far too large for him, clothes of the doctor’s bigness; the cords of his face still moved with a semblance of life, but life was quite gone; and by the crushed phial in the hand and the strong smell of kernels that hung upon the air, Utterson knew that he was looking on the body of a self-destroyer. “We have come too late,” he said sternly, “whether to save or punish. Hyde is gone to his account; and it only remains for us to find the body of your master.” The far greater proportion of the building was occupied by the theatre, which filled almost the whole ground storey and was lighted from above, and by the cabinet, which formed an upper storey at one end and looked upon the court. A corridor joined the theatre to the door on the by-street; and with this the cabinet communicated separately by a second flight of stairs. There were besides a few dark closets and a spacious cellar. All these they now thoroughly examined. Each closet needed but a glance, for all were empty, and all, by the dust that fell from their doors, had stood long unopened. The cellar, indeed, was filled with crazy lumber, mostly dating from the times of the surgeon who was Jekyll’s predecessor; but even as they opened the door they were advertised of the uselessness of further search, by the fall of a perfect mat of cobweb which had for years sealed up the entrance. Nowhere was there any trace of Henry Jekyll, dead or alive. Poole stamped on the flags of the corridor. “He must be buried here,” he said, hearkening to the sound. “Or he may have fled,” said Utterson, and he turned to examine the door in the by-street. It was locked; and lying near by on the flags, they found the key, already stained with rust. “This does not look like use,” observed the lawyer. “Use!” echoed Poole. “Do you not see, sir, it is broken? much as if a man had stamped on it.” “Ay,” continued Utterson, “and the fractures, too, are rusty.” The two men looked at each other with a scare. “This is beyond me, Poole,” said the lawyer. “Let us go back to the cabinet.” They mounted the stair in silence, and still with an occasional awestruck glance at the dead body, proceeded more thoroughly to examine the contents of the cabinet. At one table, there were traces of chemical work, various measured heaps of some white salt being laid on glass saucers, as though for an experiment in which the unhappy man had been prevented. “That is the same drug that I was always bringing him,” said Poole; and even as he spoke, the kettle with a startling noise boiled over. This brought them to the fireside, where the easy-chair was drawn cosily up, and the tea things stood ready to the sitter’s elbow, the very sugar in the cup. There were several books on a shelf; one lay beside the tea things open, and Utterson was amazed to find it a copy of a pious work, for which Jekyll had several times expressed a great esteem, annotated, in his own hand with startling blasphemies. Next, in the course of their review of the chamber, the searchers came to the cheval-glass, into whose depths they looked with an involuntary horror. But it was so turned as to show them nothing but the rosy glow playing on the roof, the fire sparkling in a hundred repetitions along the glazed front of the presses, and their own pale and fearful countenances stooping to look in. “This glass has seen some strange things, sir,” whispered Poole. “And surely none stranger than itself,” echoed the lawyer in the same tones. “For what did Jekyll”—he caught himself up at the word with a start, and then conquering the weakness—“what could Jekyll want with it?” he said. “You may say that!” said Poole. Next they turned to the business table. On the desk, among the neat array of papers, a large envelope was uppermost, and bore, in the doctor’s hand, the name of Mr. Utterson. The lawyer unsealed it, and several enclosures fell to the floor. The first was a will, drawn in the same eccentric terms as the one which he had returned six months before, to serve as a testament in case of death and as a deed of gift in case of disappearance; but in place of the name of Edward Hyde, the lawyer, with indescribable amazement read the name of Gabriel John Utterson. He looked at Poole, and then back at the paper, and last of all at the dead malefactor stretched upon the carpet. “My head goes round,” he said. “He has been all these days in possession; he had no cause to like me; he must have raged to see himself displaced; and he has not destroyed this document.” He caught up the next paper; it was a brief note in the doctor’s hand and dated at the top. “O Poole!” the lawyer cried, “he was alive and here this day. He cannot have been disposed of in so short a space; he must be still alive, he must have fled! And then, why fled? and how? and in that case, can we venture to declare this suicide? O, we must be careful. I foresee that we may yet involve your master in some dire catastrophe.” “Why don’t you read it, sir?” asked Poole. “Because I fear,” replied the lawyer solemnly. “God grant I have no cause for it!” And with that he brought the paper to his eyes and read as follows: “My dear Utterson,—When this shall fall into your hands, I shall have disappeared, under what circumstances I have not the penetration to foresee, but my instinct and all the circumstances of my nameless situation tell me that the end is sure and must be early. Go then, and first read the narrative which Lanyon warned me he was to place in your hands; and if you care to hear more, turn to the confession of “Your unworthy and unhappy friend, “HENRY JEKYLL.” “There was a third enclosure?” asked Utterson. “Here, sir,” said Poole, and gave into his hands a considerable packet sealed in several places. The lawyer put it in his pocket. “I would say nothing of this paper. If your master has fled or is dead, we may at least save his credit. It is now ten; I must go home and read these documents in quiet; but I shall be back before midnight, when we shall send for the police.” They went out, locking the door of the theatre behind them; and Utterson, once more leaving the servants gathered about the fire in the hall, trudged back to his office to read the two narratives in which this mystery was now to be explained. DR. LANYON’S NARRATIVE On the ninth of January, now four days ago, I received by the evening delivery a registered envelope, addressed in the hand of my colleague and old school companion, Henry Jekyll. I was a good deal surprised by this; for we were by no means in the habit of correspondence; I had seen the man, dined with him, indeed, the night before; and I could imagine nothing in our intercourse that should justify formality of registration. The contents increased my wonder; for this is how the letter ran: “10_th December_, 18—. “Dear Lanyon,—You are one of my oldest friends; and although we may have differed at times on scientific questions, I cannot remember, at least on my side, any break in our affection. There was never a day when, if you had said to me, ‘Jekyll, my life, my honour, my reason, depend upon you,’ I would not have sacrificed my left hand to help you. Lanyon, my life, my honour, my reason, are all at your mercy; if you fail me to-night, I am lost. You might suppose, after this preface, that I am going to ask you for something dishonourable to grant. Judge for yourself. “I want you to postpone all other engagements for to-night—ay, even if you were summoned to the bedside of an emperor; to take a cab, unless your carriage should be actually at the door; and with this letter in your hand for consultation, to drive straight to my house. Poole, my butler, has his orders; you will find him waiting your arrival with a locksmith. The door of my cabinet is then to be forced; and you are to go in alone; to open the glazed press (letter E) on the left hand, breaking the lock if it be shut; and to draw out, _with all its contents as they stand_, the fourth drawer from the top or (which is the same thing) the third from the bottom. In my extreme distress of mind, I have a morbid fear of misdirecting you; but even if I am in error, you may know the right drawer by its contents: some powders, a phial and a paper book. This drawer I beg of you to carry back with you to Cavendish Square exactly as it stands. “That is the first part of the service: now for the second. You should be back, if you set out at once on the receipt of this, long before midnight; but I will leave you that amount of margin, not only in the fear of one of those obstacles that can neither be prevented nor foreseen, but because an hour when your servants are in bed is to be preferred for what will then remain to do. At midnight, then, I have to ask you to be alone in your consulting room, to admit with your own hand into the house a man who will present himself in my name, and to place in his hands the drawer that you will have brought with you from my cabinet. Then you will have played your part and earned my gratitude completely. Five minutes afterwards, if you insist upon an explanation, you will have understood that these arrangements are of capital importance; and that by the neglect of one of them, fantastic as they must appear, you might have charged your conscience with my death or the shipwreck of my reason. “Confident as I am that you will not trifle with this appeal, my heart sinks and my hand trembles at the bare thought of such a possibility. Think of me at this hour, in a strange place, labouring under a blackness of distress that no fancy can exaggerate, and yet well aware that, if you will but punctually serve me, my troubles will roll away like a story that is told. Serve me, my dear Lanyon and save “Your friend, “H.J. “P.S.—I had already sealed this up when a fresh terror struck upon my soul. It is possible that the post-office may fail me, and this letter not come into your hands until to-morrow morning. In that case, dear Lanyon, do my errand when it shall be most convenient for you in the course of the day; and once more expect my messenger at midnight. It may then already be too late; and if that night passes without event, you will know that you have seen the last of Henry Jekyll.” Upon the reading of this letter, I made sure my colleague was insane; but till that was proved beyond the possibility of doubt, I felt bound to do as he requested. The less I understood of this farrago, the less I was in a position to judge of its importance; and an appeal so worded could not be set aside without a grave responsibility. I rose accordingly from table, got into a hansom, and drove straight to Jekyll’s house. The butler was awaiting my arrival; he had received by the same post as mine a registered letter of instruction, and had sent at once for a locksmith and a carpenter. The tradesmen came while we were yet speaking; and we moved in a body to old Dr. Denman’s surgical theatre, from which (as you are doubtless aware) Jekyll’s private cabinet is most conveniently entered. The door was very strong, the lock excellent; the carpenter avowed he would have great trouble and have to do much damage, if force were to be used; and the locksmith was near despair. But this last was a handy fellow, and after two hour’s work, the door stood open. The press marked E was unlocked; and I took out the drawer, had it filled up with straw and tied in a sheet, and returned with it to Cavendish Square. Here I proceeded to examine its contents. The powders were neatly enough made up, but not with the nicety of the dispensing chemist; so that it was plain they were of Jekyll’s private manufacture; and when I opened one of the wrappers I found what seemed to me a simple crystalline salt of a white colour. The phial, to which I next turned my attention, might have been about half full of a blood-red liquor, which was highly pungent to the sense of smell and seemed to me to contain phosphorus and some volatile ether. At the other ingredients I could make no guess. The book was an ordinary version book and contained little but a series of dates. These covered a period of many years, but I observed that the entries ceased nearly a year ago and quite abruptly. Here and there a brief remark was appended to a date, usually no more than a single word: “double” occurring perhaps six times in a total of several hundred entries; and once very early in the list and followed by several marks of exclamation, “total failure!!!” All this, though it whetted my curiosity, told me little that was definite. Here were a phial of some salt, and the record of a series of experiments that had led (like too many of Jekyll’s investigations) to no end of practical usefulness. How could the presence of these articles in my house affect either the honour, the sanity, or the life of my flighty colleague? If his messenger could go to one place, why could he not go to another? And even granting some impediment, why was this gentleman to be received by me in secret? The more I reflected the more convinced I grew that I was dealing with a case of cerebral disease; and though I dismissed my servants to bed, I loaded an old revolver, that I might be found in some posture of self-defence. Twelve o’clock had scarce rung out over London, ere the knocker sounded very gently on the door. I went myself at the summons, and found a small man crouching against the pillars of the portico. “Are you come from Dr. Jekyll?” I asked. He told me “yes” by a constrained gesture; and when I had bidden him enter, he did not obey me without a searching backward glance into the darkness of the square. There was a policeman not far off, advancing with his bull’s eye open; and at the sight, I thought my visitor started and made greater haste. These particulars struck me, I confess, disagreeably; and as I followed him into the bright light of the consulting room, I kept my hand ready on my weapon. Here, at last, I had a chance of clearly seeing him. I had never set eyes on him before, so much was certain. He was small, as I have said; I was struck besides with the shocking expression of his face, with his remarkable combination of great muscular activity and great apparent debility of constitution, and—last but not least—with the odd, subjective disturbance caused by his neighbourhood. This bore some resemblance to incipient rigour, and was accompanied by a marked sinking of the pulse. At the time, I set it down to some idiosyncratic, personal distaste, and merely wondered at the acuteness of the symptoms; but I have since had reason to believe the cause to lie much deeper in the nature of man, and to turn on some nobler hinge than the principle of hatred. This person (who had thus, from the first moment of his entrance, struck in me what I can only describe as a disgustful curiosity) was dressed in a fashion that would have made an ordinary person laughable; his clothes, that is to say, although they were of rich and sober fabric, were enormously too large for him in every measurement—the trousers hanging on his legs and rolled up to keep them from the ground, the waist of the coat below his haunches, and the collar sprawling wide upon his shoulders. Strange to relate, this ludicrous accoutrement was far from moving me to laughter. Rather, as there was something abnormal and misbegotten in the very essence of the creature that now faced me—something seizing, surprising and revolting—this fresh disparity seemed but to fit in with and to reinforce it; so that to my interest in the man’s nature and character, there was added a curiosity as to his origin, his life, his fortune and status in the world. These observations, though they have taken so great a space to be set down in, were yet the work of a few seconds. My visitor was, indeed, on fire with sombre excitement. “Have you got it?” he cried. “Have you got it?” And so lively was his impatience that he even laid his hand upon my arm and sought to shake me. I put him back, conscious at his touch of a certain icy pang along my blood. “Come, sir,” said I. “You forget that I have not yet the pleasure of your acquaintance. Be seated, if you please.” And I showed him an example, and sat down myself in my customary seat and with as fair an imitation of my ordinary manner to a patient, as the lateness of the hour, the nature of my preoccupations, and the horror I had of my visitor, would suffer me to muster. “I beg your pardon, Dr. Lanyon,” he replied civilly enough. “What you say is very well founded; and my impatience has shown its heels to my politeness. I come here at the instance of your colleague, Dr. Henry Jekyll, on a piece of business of some moment; and I understood...” He paused and put his hand to his throat, and I could see, in spite of his collected manner, that he was wrestling against the approaches of the hysteria—“I understood, a drawer...” But here I took pity on my visitor’s suspense, and some perhaps on my own growing curiosity. “There it is, sir,” said I, pointing to the drawer, where it lay on the floor behind a table and still covered with the sheet. He sprang to it, and then paused, and laid his hand upon his heart; I could hear his teeth grate with the convulsive action of his jaws; and his face was so ghastly to see that I grew alarmed both for his life and reason. “Compose yourself,” said I. He turned a dreadful smile to me, and as if with the decision of despair, plucked away the sheet. At sight of the contents, he uttered one loud sob of such immense relief that I sat petrified. And the next moment, in a voice that was already fairly well under control, “Have you a graduated glass?” he asked. I rose from my place with something of an effort and gave him what he asked. He thanked me with a smiling nod, measured out a few minims of the red tincture and added one of the powders. The mixture, which was at first of a reddish hue, began, in proportion as the crystals melted, to brighten in colour, to effervesce audibly, and to throw off small fumes of vapour. Suddenly and at the same moment, the ebullition ceased and the compound changed to a dark purple, which faded again more slowly to a watery green. My visitor, who had watched these metamorphoses with a keen eye, smiled, set down the glass upon the table, and then turned and looked upon me with an air of scrutiny. “And now,” said he, “to settle what remains. Will you be wise? will you be guided? will you suffer me to take this glass in my hand and to go forth from your house without further parley? or has the greed of curiosity too much command of you? Think before you answer, for it shall be done as you decide. As you decide, you shall be left as you were before, and neither richer nor wiser, unless the sense of service rendered to a man in mortal distress may be counted as a kind of riches of the soul. Or, if you shall so prefer to choose, a new province of knowledge and new avenues to fame and power shall be laid open to you, here, in this room, upon the instant; and your sight shall be blasted by a prodigy to stagger the unbelief of Satan.” “Sir,” said I, affecting a coolness that I was far from truly possessing, “you speak enigmas, and you will perhaps not wonder that I hear you with no very strong impression of belief. But I have gone too far in the way of inexplicable services to pause before I see the end.” “It is well,” replied my visitor. “Lanyon, you remember your vows: what follows is under the seal of our profession. And now, you who have so long been bound to the most narrow and material views, you who have denied the virtue of transcendental medicine, you who have derided your superiors—behold!” He put the glass to his lips and drank at one gulp. A cry followed; he reeled, staggered, clutched at the table and held on, staring with injected eyes, gasping with open mouth; and as I looked there came, I thought, a change—he seemed to swell—his face became suddenly black and the features seemed to melt and alter—and the next moment, I had sprung to my feet and leaped back against the wall, my arms raised to shield me from that prodigy, my mind submerged in terror. “O God!” I screamed, and “O God!” again and again; for there before my eyes—pale and shaken, and half fainting, and groping before him with his hands, like a man restored from death—there stood Henry Jekyll! What he told me in the next hour, I cannot bring my mind to set on paper. I saw what I saw, I heard what I heard, and my soul sickened at it; and yet now when that sight has faded from my eyes, I ask myself if I believe it, and I cannot answer. My life is shaken to its roots; sleep has left me; the deadliest terror sits by me at all hours of the day and night; and I feel that my days are numbered, and that I must die; and yet I shall die incredulous. As for the moral turpitude that man unveiled to me, even with tears of penitence, I cannot, even in memory, dwell on it without a start of horror. I will say but one thing, Utterson, and that (if you can bring your mind to credit it) will be more than enough. The creature who crept into my house that night was, on Jekyll’s own confession, known by the name of Hyde and hunted for in every corner of the land as the murderer of Carew. HASTIE LANYON. HENRY JEKYLL’S FULL STATEMENT OF THE CASE I was born in the year 18— to a large fortune, endowed besides with excellent parts, inclined by nature to industry, fond of the respect of the wise and good among my fellowmen, and thus, as might have been supposed, with every guarantee of an honourable and distinguished future. And indeed the worst of my faults was a certain impatient gaiety of disposition, such as has made the happiness of many, but such as I found it hard to reconcile with my imperious desire to carry my head high, and wear a more than commonly grave countenance before the public. Hence it came about that I concealed my pleasures; and that when I reached years of reflection, and began to look round me and take stock of my progress and position in the world, I stood already committed to a profound duplicity of life. Many a man would have even blazoned such irregularities as I was guilty of; but from the high views that I had set before me, I regarded and hid them with an almost morbid sense of shame. It was thus rather the exacting nature of my aspirations than any particular degradation in my faults, that made me what I was, and, with even a deeper trench than in the majority of men, severed in me those provinces of good and ill which divide and compound man’s dual nature. In this case, I was driven to reflect deeply and inveterately on that hard law of life, which lies at the root of religion and is one of the most plentiful springs of distress. Though so profound a double-dealer, I was in no sense a hypocrite; both sides of me were in dead earnest; I was no more myself when I laid aside restraint and plunged in shame, than when I laboured, in the eye of day, at the furtherance of knowledge or the relief of sorrow and suffering. And it chanced that the direction of my scientific studies, which led wholly towards the mystic and the transcendental, reacted and shed a strong light on this consciousness of the perennial war among my members. With every day, and from both sides of my intelligence, the moral and the intellectual, I thus drew steadily nearer to that truth, by whose partial discovery I have been doomed to such a dreadful shipwreck: that man is not truly one, but truly two. I say two, because the state of my own knowledge does not pass beyond that point. Others will follow, others will outstrip me on the same lines; and I hazard the guess that man will be ultimately known for a mere polity of multifarious, incongruous and independent denizens. I, for my part, from the nature of my life, advanced infallibly in one direction and in one direction only. It was on the moral side, and in my own person, that I learned to recognise the thorough and primitive duality of man; I saw that, of the two natures that contended in the field of my consciousness, even if I could rightly be said to be either, it was only because I was radically both; and from an early date, even before the course of my scientific discoveries had begun to suggest the most naked possibility of such a miracle, I had learned to dwell with pleasure, as a beloved daydream, on the thought of the separation of these elements. If each, I told myself, could be housed in separate identities, life would be relieved of all that was unbearable; the unjust might go his way, delivered from the aspirations and remorse of his more upright twin; and the just could walk steadfastly and securely on his upward path, doing the good things in which he found his pleasure, and no longer exposed to disgrace and penitence by the hands of this extraneous evil. It was the curse of mankind that these incongruous faggots were thus bound together—that in the agonised womb of consciousness, these polar twins should be continuously struggling. How, then were they dissociated? I was so far in my reflections when, as I have said, a side light began to shine upon the subject from the laboratory table. I began to perceive more deeply than it has ever yet been stated, the trembling immateriality, the mistlike transience, of this seemingly so solid body in which we walk attired. Certain agents I found to have the power to shake and pluck back that fleshly vestment, even as a wind might toss the curtains of a pavilion. For two good reasons, I will not enter deeply into this scientific branch of my confession. First, because I have been made to learn that the doom and burthen of our life is bound for ever on man’s shoulders, and when the attempt is made to cast it off, it but returns upon us with more unfamiliar and more awful pressure. Second, because, as my narrative will make, alas! too evident, my discoveries were incomplete. Enough then, that I not only recognised my natural body from the mere aura and effulgence of certain of the powers that made up my spirit, but managed to compound a drug by which these powers should be dethroned from their supremacy, and a second form and countenance substituted, none the less natural to me because they were the expression, and bore the stamp of lower elements in my soul. I hesitated long before I put this theory to the test of practice. I knew well that I risked death; for any drug that so potently controlled and shook the very fortress of identity, might, by the least scruple of an overdose or at the least inopportunity in the moment of exhibition, utterly blot out that immaterial tabernacle which I looked to it to change. But the temptation of a discovery so singular and profound at last overcame the suggestions of alarm. I had long since prepared my tincture; I purchased at once, from a firm of wholesale chemists, a large quantity of a particular salt which I knew, from my experiments, to be the last ingredient required; and late one accursed night, I compounded the elements, watched them boil and smoke together in the glass, and when the ebullition had subsided, with a strong glow of courage, drank off the potion. The most racking pangs succeeded: a grinding in the bones, deadly nausea, and a horror of the spirit that cannot be exceeded at the hour of birth or death. Then these agonies began swiftly to subside, and I came to myself as if out of a great sickness. There was something strange in my sensations, something indescribably new and, from its very novelty, incredibly sweet. I felt younger, lighter, happier in body; within I was conscious of a heady recklessness, a current of disordered sensual images running like a millrace in my fancy, a solution of the bonds of obligation, an unknown but not an innocent freedom of the soul. I knew myself, at the first breath of this new life, to be more wicked, tenfold more wicked, sold a slave to my original evil; and the thought, in that moment, braced and delighted me like wine. I stretched out my hands, exulting in the freshness of these sensations; and in the act, I was suddenly aware that I had lost in stature. There was no mirror, at that date, in my room; that which stands beside me as I write, was brought there later on and for the very purpose of these transformations. The night however, was far gone into the morning—the morning, black as it was, was nearly ripe for the conception of the day—the inmates of my house were locked in the most rigorous hours of slumber; and I determined, flushed as I was with hope and triumph, to venture in my new shape as far as to my bedroom. I crossed the yard, wherein the constellations looked down upon me, I could have thought, with wonder, the first creature of that sort that their unsleeping vigilance had yet disclosed to them; I stole through the corridors, a stranger in my own house; and coming to my room, I saw for the first time the appearance of Edward Hyde. I must here speak by theory alone, saying not that which I know, but that which I suppose to be most probable. The evil side of my nature, to which I had now transferred the stamping efficacy, was less robust and less developed than the good which I had just deposed. Again, in the course of my life, which had been, after all, nine tenths a life of effort, virtue and control, it had been much less exercised and much less exhausted. And hence, as I think, it came about that Edward Hyde was so much smaller, slighter and younger than Henry Jekyll. Even as good shone upon the countenance of the one, evil was written broadly and plainly on the face of the other. Evil besides (which I must still believe to be the lethal side of man) had left on that body an imprint of deformity and decay. And yet when I looked upon that ugly idol in the glass, I was conscious of no repugnance, rather of a leap of welcome. This, too, was myself. It seemed natural and human. In my eyes it bore a livelier image of the spirit, it seemed more express and single, than the imperfect and divided countenance I had been hitherto accustomed to call mine. And in so far I was doubtless right. I have observed that when I wore the semblance of Edward Hyde, none could come near to me at first without a visible misgiving of the flesh. This, as I take it, was because all human beings, as we meet them, are commingled out of good and evil: and Edward Hyde, alone in the ranks of mankind, was pure evil. I lingered but a moment at the mirror: the second and conclusive experiment had yet to be attempted; it yet remained to be seen if I had lost my identity beyond redemption and must flee before daylight from a house that was no longer mine; and hurrying back to my cabinet, I once more prepared and drank the cup, once more suffered the pangs of dissolution, and came to myself once more with the character, the stature and the face of Henry Jekyll. That night I had come to the fatal cross-roads. Had I approached my discovery in a more noble spirit, had I risked the experiment while under the empire of generous or pious aspirations, all must have been otherwise, and from these agonies of death and birth, I had come forth an angel instead of a fiend. The drug had no discriminating action; it was neither diabolical nor divine; it but shook the doors of the prisonhouse of my disposition; and like the captives of Philippi, that which stood within ran forth. At that time my virtue slumbered; my evil, kept awake by ambition, was alert and swift to seize the occasion; and the thing that was projected was Edward Hyde. Hence, although I had now two characters as well as two appearances, one was wholly evil, and the other was still the old Henry Jekyll, that incongruous compound of whose reformation and improvement I had already learned to despair. The movement was thus wholly toward the worse. Even at that time, I had not conquered my aversions to the dryness of a life of study. I would still be merrily disposed at times; and as my pleasures were (to say the least) undignified, and I was not only well known and highly considered, but growing towards the elderly man, this incoherency of my life was daily growing more unwelcome. It was on this side that my new power tempted me until I fell in slavery. I had but to drink the cup, to doff at once the body of the noted professor, and to assume, like a thick cloak, that of Edward Hyde. I smiled at the notion; it seemed to me at the time to be humourous; and I made my preparations with the most studious care. I took and furnished that house in Soho, to which Hyde was tracked by the police; and engaged as a housekeeper a creature whom I knew well to be silent and unscrupulous. On the other side, I announced to my servants that a Mr. Hyde (whom I described) was to have full liberty and power about my house in the square; and to parry mishaps, I even called and made myself a familiar object, in my second character. I next drew up that will to which you so much objected; so that if anything befell me in the person of Dr. Jekyll, I could enter on that of Edward Hyde without pecuniary loss. And thus fortified, as I supposed, on every side, I began to profit by the strange immunities of my position. Men have before hired bravos to transact their crimes, while their own person and reputation sat under shelter. I was the first that ever did so for his pleasures. I was the first that could plod in the public eye with a load of genial respectability, and in a moment, like a schoolboy, strip off these lendings and spring headlong into the sea of liberty. But for me, in my impenetrable mantle, the safety was complete. Think of it—I did not even exist! Let me but escape into my laboratory door, give me but a second or two to mix and swallow the draught that I had always standing ready; and whatever he had done, Edward Hyde would pass away like the stain of breath upon a mirror; and there in his stead, quietly at home, trimming the midnight lamp in his study, a man who could afford to laugh at suspicion, would be Henry Jekyll. The pleasures which I made haste to seek in my disguise were, as I have said, undignified; I would scarce use a harder term. But in the hands of Edward Hyde, they soon began to turn toward the monstrous. When I would come back from these excursions, I was often plunged into a kind of wonder at my vicarious depravity. This familiar that I called out of my own soul, and sent forth alone to do his good pleasure, was a being inherently malign and villainous; his every act and thought centered on self; drinking pleasure with bestial avidity from any degree of torture to another; relentless like a man of stone. Henry Jekyll stood at times aghast before the acts of Edward Hyde; but the situation was apart from ordinary laws, and insidiously relaxed the grasp of conscience. It was Hyde, after all, and Hyde alone, that was guilty. Jekyll was no worse; he woke again to his good qualities seemingly unimpaired; he would even make haste, where it was possible, to undo the evil done by Hyde. And thus his conscience slumbered. Into the details of the infamy at which I thus connived (for even now I can scarce grant that I committed it) I have no design of entering; I mean but to point out the warnings and the successive steps with which my chastisement approached. I met with one accident which, as it brought on no consequence, I shall no more than mention. An act of cruelty to a child aroused against me the anger of a passer-by, whom I recognised the other day in the person of your kinsman; the doctor and the child’s family joined him; there were moments when I feared for my life; and at last, in order to pacify their too just resentment, Edward Hyde had to bring them to the door, and pay them in a cheque drawn in the name of Henry Jekyll. But this danger was easily eliminated from the future, by opening an account at another bank in the name of Edward Hyde himself; and when, by sloping my own hand backward, I had supplied my double with a signature, I thought I sat beyond the reach of fate. Some two months before the murder of Sir Danvers, I had been out for one of my adventures, had returned at a late hour, and woke the next day in bed with somewhat odd sensations. It was in vain I looked about me; in vain I saw the decent furniture and tall proportions of my room in the square; in vain that I recognised the pattern of the bed curtains and the design of the mahogany frame; something still kept insisting that I was not where I was, that I had not wakened where I seemed to be, but in the little room in Soho where I was accustomed to sleep in the body of Edward Hyde. I smiled to myself, and in my psychological way, began lazily to inquire into the elements of this illusion, occasionally, even as I did so, dropping back into a comfortable morning doze. I was still so engaged when, in one of my more wakeful moments, my eyes fell upon my hand. Now the hand of Henry Jekyll (as you have often remarked) was professional in shape and size; it was large, firm, white and comely. But the hand which I now saw, clearly enough, in the yellow light of a mid-London morning, lying half shut on the bedclothes, was lean, corded, knuckly, of a dusky pallor and thickly shaded with a swart growth of hair. It was the hand of Edward Hyde. I must have stared upon it for near half a minute, sunk as I was in the mere stupidity of wonder, before terror woke up in my breast as sudden and startling as the crash of cymbals; and bounding from my bed I rushed to the mirror. At the sight that met my eyes, my blood was changed into something exquisitely thin and icy. Yes, I had gone to bed Henry Jekyll, I had awakened Edward Hyde. How was this to be explained? I asked myself; and then, with another bound of terror—how was it to be remedied? It was well on in the morning; the servants were up; all my drugs were in the cabinet—a long journey down two pairs of stairs, through the back passage, across the open court and through the anatomical theatre, from where I was then standing horror-struck. It might indeed be possible to cover my face; but of what use was that, when I was unable to conceal the alteration in my stature? And then with an overpowering sweetness of relief, it came back upon my mind that the servants were already used to the coming and going of my second self. I had soon dressed, as well as I was able, in clothes of my own size: had soon passed through the house, where Bradshaw stared and drew back at seeing Mr. Hyde at such an hour and in such a strange array; and ten minutes later, Dr. Jekyll had returned to his own shape and was sitting down, with a darkened brow, to make a feint of breakfasting. Small indeed was my appetite. This inexplicable incident, this reversal of my previous experience, seemed, like the Babylonian finger on the wall, to be spelling out the letters of my judgment; and I began to reflect more seriously than ever before on the issues and possibilities of my double existence. That part of me which I had the power of projecting, had lately been much exercised and nourished; it had seemed to me of late as though the body of Edward Hyde had grown in stature, as though (when I wore that form) I were conscious of a more generous tide of blood; and I began to spy a danger that, if this were much prolonged, the balance of my nature might be permanently overthrown, the power of voluntary change be forfeited, and the character of Edward Hyde become irrevocably mine. The power of the drug had not been always equally displayed. Once, very early in my career, it had totally failed me; since then I had been obliged on more than one occasion to double, and once, with infinite risk of death, to treble the amount; and these rare uncertainties had cast hitherto the sole shadow on my contentment. Now, however, and in the light of that morning’s accident, I was led to remark that whereas, in the beginning, the difficulty had been to throw off the body of Jekyll, it had of late gradually but decidedly transferred itself to the other side. All things therefore seemed to point to this; that I was slowly losing hold of my original and better self, and becoming slowly incorporated with my second and worse. Between these two, I now felt I had to choose. My two natures had memory in common, but all other faculties were most unequally shared between them. Jekyll (who was composite) now with the most sensitive apprehensions, now with a greedy gusto, projected and shared in the pleasures and adventures of Hyde; but Hyde was indifferent to Jekyll, or but remembered him as the mountain bandit remembers the cavern in which he conceals himself from pursuit. Jekyll had more than a father’s interest; Hyde had more than a son’s indifference. To cast in my lot with Jekyll, was to die to those appetites which I had long secretly indulged and had of late begun to pamper. To cast it in with Hyde, was to die to a thousand interests and aspirations, and to become, at a blow and forever, despised and friendless. The bargain might appear unequal; but there was still another consideration in the scales; for while Jekyll would suffer smartingly in the fires of abstinence, Hyde would be not even conscious of all that he had lost. Strange as my circumstances were, the terms of this debate are as old and commonplace as man; much the same inducements and alarms cast the die for any tempted and trembling sinner; and it fell out with me, as it falls with so vast a majority of my fellows, that I chose the better part and was found wanting in the strength to keep to it. Yes, I preferred the elderly and discontented doctor, surrounded by friends and cherishing honest hopes; and bade a resolute farewell to the liberty, the comparative youth, the light step, leaping impulses and secret pleasures, that I had enjoyed in the disguise of Hyde. I made this choice perhaps with some unconscious reservation, for I neither gave up the house in Soho, nor destroyed the clothes of Edward Hyde, which still lay ready in my cabinet. For two months, however, I was true to my determination; for two months, I led a life of such severity as I had never before attained to, and enjoyed the compensations of an approving conscience. But time began at last to obliterate the freshness of my alarm; the praises of conscience began to grow into a thing of course; I began to be tortured with throes and longings, as of Hyde struggling after freedom; and at last, in an hour of moral weakness, I once again compounded and swallowed the transforming draught. I do not suppose that, when a drunkard reasons with himself upon his vice, he is once out of five hundred times affected by the dangers that he runs through his brutish, physical insensibility; neither had I, long as I had considered my position, made enough allowance for the complete moral insensibility and insensate readiness to evil, which were the leading characters of Edward Hyde. Yet it was by these that I was punished. My devil had been long caged, he came out roaring. I was conscious, even when I took the draught, of a more unbridled, a more furious propensity to ill. It must have been this, I suppose, that stirred in my soul that tempest of impatience with which I listened to the civilities of my unhappy victim; I declare, at least, before God, no man morally sane could have been guilty of that crime upon so pitiful a provocation; and that I struck in no more reasonable spirit than that in which a sick child may break a plaything. But I had voluntarily stripped myself of all those balancing instincts by which even the worst of us continues to walk with some degree of steadiness among temptations; and in my case, to be tempted, however slightly, was to fall. Instantly the spirit of hell awoke in me and raged. With a transport of glee, I mauled the unresisting body, tasting delight from every blow; and it was not till weariness had begun to succeed, that I was suddenly, in the top fit of my delirium, struck through the heart by a cold thrill of terror. A mist dispersed; I saw my life to be forfeit; and fled from the scene of these excesses, at once glorying and trembling, my lust of evil gratified and stimulated, my love of life screwed to the topmost peg. I ran to the house in Soho, and (to make assurance doubly sure) destroyed my papers; thence I set out through the lamplit streets, in the same divided ecstasy of mind, gloating on my crime, light-headedly devising others in the future, and yet still hastening and still hearkening in my wake for the steps of the avenger. Hyde had a song upon his lips as he compounded the draught, and as he drank it, pledged the dead man. The pangs of transformation had not done tearing him, before Henry Jekyll, with streaming tears of gratitude and remorse, had fallen upon his knees and lifted his clasped hands to God. The veil of self-indulgence was rent from head to foot. I saw my life as a whole: I followed it up from the days of childhood, when I had walked with my father’s hand, and through the self-denying toils of my professional life, to arrive again and again, with the same sense of unreality, at the damned horrors of the evening. I could have screamed aloud; I sought with tears and prayers to smother down the crowd of hideous images and sounds with which my memory swarmed against me; and still, between the petitions, the ugly face of my iniquity stared into my soul. As the acuteness of this remorse began to die away, it was succeeded by a sense of joy. The problem of my conduct was solved. Hyde was thenceforth impossible; whether I would or not, I was now confined to the better part of my existence; and O, how I rejoiced to think of it! with what willing humility I embraced anew the restrictions of natural life! with what sincere renunciation I locked the door by which I had so often gone and come, and ground the key under my heel! The next day, came the news that the murder had not been overlooked, that the guilt of Hyde was patent to the world, and that the victim was a man high in public estimation. It was not only a crime, it had been a tragic folly. I think I was glad to know it; I think I was glad to have my better impulses thus buttressed and guarded by the terrors of the scaffold. Jekyll was now my city of refuge; let but Hyde peep out an instant, and the hands of all men would be raised to take and slay him. I resolved in my future conduct to redeem the past; and I can say with honesty that my resolve was fruitful of some good. You know yourself how earnestly, in the last months of the last year, I laboured to relieve suffering; you know that much was done for others, and that the days passed quietly, almost happily for myself. Nor can I truly say that I wearied of this beneficent and innocent life; I think instead that I daily enjoyed it more completely; but I was still cursed with my duality of purpose; and as the first edge of my penitence wore off, the lower side of me, so long indulged, so recently chained down, began to growl for licence. Not that I dreamed of resuscitating Hyde; the bare idea of that would startle me to frenzy: no, it was in my own person that I was once more tempted to trifle with my conscience; and it was as an ordinary secret sinner that I at last fell before the assaults of temptation. There comes an end to all things; the most capacious measure is filled at last; and this brief condescension to my evil finally destroyed the balance of my soul. And yet I was not alarmed; the fall seemed natural, like a return to the old days before I had made my discovery. It was a fine, clear, January day, wet under foot where the frost had melted, but cloudless overhead; and the Regent’s Park was full of winter chirrupings and sweet with spring odours. I sat in the sun on a bench; the animal within me licking the chops of memory; the spiritual side a little drowsed, promising subsequent penitence, but not yet moved to begin. After all, I reflected, I was like my neighbours; and then I smiled, comparing myself with other men, comparing my active good-will with the lazy cruelty of their neglect. And at the very moment of that vainglorious thought, a qualm came over me, a horrid nausea and the most deadly shuddering. These passed away, and left me faint; and then as in its turn faintness subsided, I began to be aware of a change in the temper of my thoughts, a greater boldness, a contempt of danger, a solution of the bonds of obligation. I looked down; my clothes hung formlessly on my shrunken limbs; the hand that lay on my knee was corded and hairy. I was once more Edward Hyde. A moment before I had been safe of all men’s respect, wealthy, beloved—the cloth laying for me in the dining-room at home; and now I was the common quarry of mankind, hunted, houseless, a known murderer, thrall to the gallows. My reason wavered, but it did not fail me utterly. I have more than once observed that in my second character, my faculties seemed sharpened to a point and my spirits more tensely elastic; thus it came about that, where Jekyll perhaps might have succumbed, Hyde rose to the importance of the moment. My drugs were in one of the presses of my cabinet; how was I to reach them? That was the problem that (crushing my temples in my hands) I set myself to solve. The laboratory door I had closed. If I sought to enter by the house, my own servants would consign me to the gallows. I saw I must employ another hand, and thought of Lanyon. How was he to be reached? how persuaded? Supposing that I escaped capture in the streets, how was I to make my way into his presence? and how should I, an unknown and displeasing visitor, prevail on the famous physician to rifle the study of his colleague, Dr. Jekyll? Then I remembered that of my original character, one part remained to me: I could write my own hand; and once I had conceived that kindling spark, the way that I must follow became lighted up from end to end. Thereupon, I arranged my clothes (Pdb) lic feeling,” returned Guest. “The man, of course, was mad.” “I should like to hear your views on that,” replied Utterson. “I have a document here in his handwriting; it is between ourselves, for I scarce know what to do about it; it is an ugly business at the best. But there it is; quite in your way: a murderer’s autograph.” Guest’s eyes brightened, and he sat down at once and studied it with passion. “No sir,” he said: “not mad; but it is an odd hand.” “And by all accounts a very odd writer,” added the lawyer. Just then the servant entered with a note. “Is that from Dr. Jekyll, sir?” inquired the clerk. “I thought I knew the writing. Anything private, Mr. Utterson?” “Only an invitation to dinner. Why? Do you want to see it?” “One moment. I thank you, sir;” and the clerk laid the two sheets of paper alongside and sedulously compared their contents. “Thank you, sir,” he said at last, returning both; “it’s a very interesting autograph.” There was a pause, during which Mr. Utterson struggled with himself. “Why did you compare them, Guest?” he inquired suddenly. “Well, sir,” returned the clerk, “there’s a rather singular resemblance; the two hands are in many points identical: only differently sloped.” “Rather quaint,” said Utterson. “It is, as you say, rather quaint,” returned Guest. “I wouldn’t speak of this note, you know,” said the master. “No, sir,” said the clerk. “I understand.” But no sooner was Mr. Utterson alone that night, than he locked the note into his safe, where it reposed from that time forward. “What!” he thought. “Henry Jekyll forge for a murderer!” And his blood ran cold in his veins. INCIDENT OF DR. LANYON Time ran on; thousands of pounds were offered in reward, for the death of Sir Danvers was resented as a public injury; but Mr. Hyde had disappeared out of the ken of the police as though he had never existed. Much of his past was unearthed, indeed, and all disreputable: tales came out of the man’s cruelty, at once so callous and violent; of his vile life, of his strange associates, of the hatred that seemed to have surrounded his career; but of his present whereabouts, not a whisper. From the time he had left the house in Soho on the morning of the murder, he was simply blotted out; and gradually, as time drew on, Mr. Utterson began to recover from the hotness of his alarm, and to grow more at quiet with himself. The death of Sir Danvers was, to his way of thinking, more than paid for by the disappearance of Mr. Hyde. Now that that evil influence had been withdrawn, a new life began for Dr. Jekyll. He came out of his seclusion, renewed relations with his friends, became once more their familiar guest and entertainer; and whilst he had always been known for charities, he was now no less distinguished for religion. He was busy, he was much in the open air, he did good; his face seemed to open and brighten, as if with an inward consciousness of service; and for more than two months, the doctor was at peace. On the 8th of January Utterson had dined at the doctor’s with a small party; Lanyon had been there; and the face of the host had looked from one to the other as in the old days when the trio were inseparable friends. On the 12th, and again on the 14th, the door was shut against the lawyer. “The doctor was confined to the house,” Poole said, “and saw no one.” On the 15th, he tried again, and was again refused; and having now been used for the last two months to see his friend almost daily, he found this return of solitude to weigh upon his spirits. The fifth night he had in Guest to dine with him; and the sixth he betook himself to Dr. Lanyon’s. There at least he was not denied admittance; but when he came in, he was shocked at the change which had taken place in the doctor’s appearance. He had his death-warrant written legibly upon his face. The rosy man had grown pale; his flesh had fallen away; he was visibly balder and older; and yet it was not so much these tokens of a swift physical decay that arrested the lawyer’s notice, as a look in the eye and quality of manner that seemed to testify to some deep-seated terror of the mind. It was unlikely that the doctor should fear death; and yet that was what Utterson was tempted to suspect. “Yes,” he thought; “he is a doctor, he must know his own state and that his days are counted; and the knowledge is more than he can bear.” And yet when Utterson remarked on his ill looks, it was with an air of great firmness that Lanyon declared himself a doomed man. “I have had a shock,” he said, “and I shall never recover. It is a question of weeks. Well, life has been pleasant; I liked it; yes, sir, I used to like it. I sometimes think if we knew all, we should be more glad to get away.” “Jekyll is ill, too,” observed Utterson. “Have you seen him?” But Lanyon’s face changed, and he held up a trembling hand. “I wish to see or hear no more of Dr. Jekyll,” he said in a loud, unsteady voice. “I am quite done with that person; and I beg that you will spare me any allusion to one whom I regard as dead.” “Tut, tut!” said Mr. Utterson; and then after a considerable pause, “Can’t I do anything?” he inquired. “We are three very old friends, Lanyon; we shall not live to make others.” “Nothing can be done,” returned Lanyon; “ask himself.” “He will not see me,” said the lawyer. “I am not surprised at that,” was the reply. “Some day, Utterson, after I am dead, you may perhaps come to learn the right and wrong of this. I cannot tell you. And in the meantime, if you can sit and talk with me of other things, for God’s sake, stay and do so; but if you cannot keep clear of this accursed topic, then in God’s name, go, for I cannot bear it.” As soon as he got home, Utterson sat down and wrote to Jekyll, complaining of his exclusion from the house, and asking the cause of this unhappy break with Lanyon; and the next day brought him a long answer, often very pathetically worded, and sometimes darkly mysterious in drift. The quarrel with Lanyon was incurable. “I do not blame our old friend,” Jekyll wrote, “but I share his view that we must never meet. I mean from henceforth to lead a life of extreme seclusion; you must not be surprised, nor must you doubt my friendship, if my door is often shut even to you. You must suffer me to go my own dark way. I have brought on myself a punishment and a danger that I cannot name. If I am the chief of sinners, I am the chief of sufferers also. I could not think that this earth contained a place for sufferings and terrors so unmanning; and you can do but one thing, Utterson, to lighten this destiny, and that is to respect my silence.” Utterson was amazed; the dark influence of Hyde had been withdrawn, the doctor had returned to his old tasks and amities; a week ago, the prospect had smiled with every promise of a cheerful and an honoured age; and now in a moment, friendship, and peace of mind, and the whole tenor of his life were wrecked. So great and unprepared a change pointed to madness; but in view of Lanyon’s manner and words, there must lie for it some deeper ground. A week afterwards Dr. Lanyon took to his bed, and in something less than a fortnight he was dead. The night after the funeral, at which he had been sadly affected, Utterson locked the door of his business room, and sitting there by the light of a melancholy candle, drew out and set before him an envelope addressed by the hand and sealed with the seal of his dead friend. “PRIVATE: for the hands of G. J. Utterson ALONE, and in case of his predecease _to be destroyed unread_,” so it was emphatically superscribed; and the lawyer dreaded to behold the contents. “I have buried one friend to-day,” he thought: “what if this should cost me another?” And then he condemned the fear as a disloyalty, and broke the seal. Within there was another enclosure, likewise sealed, and marked upon the cover as “not to be opened till the death or disappearance of Dr. Henry Jekyll.” Utterson could not trust his eyes. Yes, it was disappearance; here again, as in the mad will which he had long ago restored to its author, here again were the idea of a disappearance and the name of Henry Jekyll bracketted. But in the will, that idea had sprung from the sinister suggestion of the man Hyde; it was set there with a purpose all too plain and horrible. Written by the hand of Lanyon, what should it mean? A great curiosity came on the trustee, to disregard the prohibition and dive at once to the bottom of these mysteries; but professional honour and faith to his dead friend were stringent obligations; and the packet slept in the inmost corner of his private safe. It is one thing to mortify curiosity, another to conquer it; and it may be doubted if, from that day forth, Utterson desired the society of his surviving friend with the same eagerness. He thought of him kindly; but his thoughts were disquieted and fearful. He went to call indeed; but he was perhaps relieved to be denied admittance; perhaps, in his heart, he preferred to speak with Poole upon the doorstep and surrounded by the air and sounds of the open city, rather than to be admitted into that house of voluntary bondage, and to sit and speak with its inscrutable recluse. Poole had, indeed, no very pleasant news to communicate. The doctor, it appeared, now more than ever confined himself to the cabinet over the laboratory, where he would sometimes even sleep; he was out of spirits, he had grown very silent, he did not read; it seemed as if he had something on his mind. Utterson became so used to the unvarying character of these reports, that he fell off little by little in the frequency of his visits. INCIDENT AT THE WINDOW It chanced on Sunday, when Mr. Utterson was on his usual walk with Mr. Enfield, that their way lay once again through the by-street; and that when they came in front of the door, both stopped to gaze on it. “Well,” said Enfield, “that story’s at an end at least. We shall never see more of Mr. Hyde.” “I hope not,” said Utterson. “Did I ever tell you that I once saw him, and shared your feeling of repulsion?” “It was impossible to do the one without the other,” returned Enfield. “And by the way, what an ass you must have thought me, not to know that this was a back way to Dr. Jekyll’s! It was partly your own fault that I found it out, even when I did.” “So you found it out, did you?” said Utterson. “But if that be so, we may step into the court and take a look at the windows. To tell you the truth, I am uneasy about poor Jekyll; and even outside, I feel as if the presence of a friend might do him good.” The court was very cool and a little damp, and full of premature twilight, although the sky, high up overhead, was still bright with sunset. The middle one of the three windows was half-way open; and sitting close beside it, taking the air with an infinite sadness of mien, like some disconsolate prisoner, Utterson saw Dr. Jekyll. “What! Jekyll!” he cried. “I trust you are better.” “I am very low, Utterson,” replied the doctor drearily, “very low. It will not last long, thank God.” “You stay too much indoors,” said the lawyer. “You should be out, whipping up the circulation like Mr. Enfield and me. (This is my cousin—Mr. Enfield—Dr. Jekyll.) Come now; get your hat and take a quick turn with us.” “You are very good,” sighed the other. “I should like to very much; but no, no, no, it is quite impossible; I dare not. But indeed, Utterson, I am very glad to see you; this is really a great pleasure; I would ask you and Mr. Enfield up, but the place is really not fit.” “Why, then,” said the lawyer, good-naturedly, “the best thing we can do is to stay down here and speak with you from where we are.” “That is just what I was about to venture to propose,” returned the doctor with a smile. But the words were hardly uttered, before the smile was struck out of his face and succeeded by an expression of such abject terror and despair, as froze the very blood of the two gentlemen below. They saw it but for a glimpse for the window was instantly thrust down; but that glimpse had been sufficient, and they turned and left the court without a word. In silence, too, they traversed the by-street; and it was not until they had come into a neighbouring thoroughfare, where even upon a Sunday there were still some stirrings of life, that Mr. Utterson at last turned and looked at his companion. They were both pale; and there was an answering horror in their eyes. “God forgive us, God forgive us,” said Mr. Utterson. But Mr. Enfield only nodded his head very seriously, and walked on once more in silence. THE LAST NIGHT Mr. Utterson was sitting by his fireside one evening after dinner, when he was surprised to receive a visit from Poole. “Bless me, Poole, what brings you here?” he cried; and then taking a second look at him, “What ails you?” he added; “is the doctor ill?” “Mr. Utterson,” said the man, “there is something wrong.” “Take a seat, and here is a glass of wine for you,” said the lawyer. “Now, take your time, and tell me plainly what you want.” “You know the doctor’s ways, sir,” replied Poole, “and how he shuts himself up. Well, he’s shut up again in the cabinet; and I don’t like it, sir—I wish I may die if I like it. Mr. Utterson, sir, I’m afraid.” “Now, my good man,” said the lawyer, “be explicit. What are you afraid of?” “I’ve been afraid for about a week,” returned Poole, doggedly disregarding the question, “and I can bear it no more.” The man’s appearance amply bore out his words; his manner was altered for the worse; and except for the moment when he had first announced his terror, he had not once looked the lawyer in the face. Even now, he sat with the glass of wine untasted on his knee, and his eyes directed to a corner of the floor. “I can bear it no more,” he repeated. “Come,” said the lawyer, “I see you have some good reason, Poole; I see there is something seriously amiss. Try to tell me what it is.” “I think there’s been foul play,” said Poole, hoarsely. “Foul play!” cried the lawyer, a good deal frightened and rather inclined to be irritated in consequence. “What foul play! What does the man mean?” “I daren’t say, sir,” was the answer; “but will you come along with me and see for yourself?” Mr. Utterson’s only answer was to rise and get his hat and greatcoat; but he observed with wonder the greatness of the relief that appeared upon the butler’s face, and perhaps with no less, that the wine was still untasted when he set it down to follow. It was a wild, cold, seasonable night of March, with a pale moon, lying on her back as though the wind had tilted her, and flying wrack of the most diaphanous and lawny texture. The wind made talking difficult, and flecked the blood into the face. It seemed to have swept the streets unusually bare of passengers, besides; for Mr. Utterson thought he had never seen that part of London so deserted. He could have wished it otherwise; never in his life had he been conscious of so sharp a wish to see and touch his fellow-creatures; for struggle as he might, there was borne in upon his mind a crushing anticipation of calamity. The square, when they got there, was full of wind and dust, and the thin trees in the garden were lashing themselves along the railing. Poole, who had kept all the way a pace or two ahead, now pulled up in the middle of the pavement, and in spite of the biting weather, took off his hat and mopped his brow with a red pocket-handkerchief. But for all the hurry of his coming, these were not the dews of exertion that he wiped away, but the moisture of some strangling anguish; for his face was white and his voice, when he spoke, harsh and broken. “Well, sir,” he said, “here we are, and God grant there be nothing wrong.” “Amen, Poole,” said the lawyer. Thereupon the servant knocked in a very guarded manner; the door was opened on the chain; and a voice asked from within, “Is that you, Poole?” “It’s all right,” said Poole. “Open the door.” The hall, when they entered it, was brightly lighted up; the fire was built high; and about the hearth the whole of the servants, men and women, stood huddled together like a flock of sheep. At the sight of Mr. Utterson, the housemaid broke into hysterical whimpering; and the cook, crying out “Bless God! it’s Mr. Utterson,” ran forward as if to take him in her arms. “What, what? Are you all here?” said the lawyer peevishly. “Very irregular, very unseemly; your master would be far from pleased.” “They’re all afraid,” said Poole. Blank silence followed, no one protesting; only the maid lifted her voice and now wept loudly. “Hold your tongue!” Poole said to her, with a ferocity of accent that testified to his own jangled nerves; and indeed, when the girl had so suddenly raised the note of her lamentation, they had all started and turned towards the inner door with faces of dreadful expectation. “And now,” continued the butler, addressing the knife-boy, “reach me a candle, and we’ll get this through hands at once.” And then he begged Mr. Utterson to follow him, and led the way to the back garden. “Now, sir,” said he, “you come as gently as you can. I want you to hear, and I don’t want you to be heard. And see here, sir, if by any chance he was to ask you in, don’t go.” Mr. Utterson’s nerves, at this unlooked-for termination, gave a jerk that nearly threw him from his balance; but he recollected his courage and followed the butler into the laboratory building through the surgical theatre, with its lumber of crates and bottles, to the foot of the stair. Here Poole motioned him to stand on one side and listen; while he himself, setting down the candle and making a great and obvious call on his resolution, mounted the steps and knocked with a somewhat uncertain hand on the red baize of the cabinet door. “Mr. Utterson, sir, asking to see you,” he called; and even as he did so, once more violently signed to the lawyer to give ear. A voice answered from within: “Tell him I cannot see anyone,” it said complainingly. “Thank you, sir,” said Poole, with a note of something like triumph in his voice; and taking up his candle, he led Mr. Utterson back across the yard and into the great kitchen, where the fire was out and the beetles were leaping on the floor. “Sir,” he said, looking Mr. Utterson in the eyes, “Was that my master’s voice?” “It seems much changed,” replied the lawyer, very pale, but giving look for look. “Changed? Well, yes, I think so,” said the butler. “Have I been twenty years in this man’s house, to be deceived about his voice? No, sir; master’s made away with; he was made away with eight days ago, when we heard him cry out upon the name of God; and _who’s_ in there instead of him, and _why_ it stays there, is a thing that cries to Heaven, Mr. Utterson!” “This is a very strange tale, Poole; this is rather a wild tale my man,” said Mr. Utterson, biting his finger. “Suppose it were as you suppose, supposing Dr. Jekyll to have been—well, murdered, what could induce the murderer to stay? That won’t hold water; it doesn’t commend itself to reason.” “Well, Mr. Utterson, you are a hard man to satisfy, but I’ll do it yet,” said Poole. “All this last week (you must know) him, or it, whatever it is that lives in that cabinet, has been crying night and day for some sort of medicine and cannot get it to his mind. It was sometimes his way—the master’s, that is—to write his orders on a sheet of paper and throw it on the stair. We’ve had nothing else this week back; nothing but papers, and a closed door, and the very meals left there to be smuggled in when nobody was looking. Well, sir, every day, ay, and twice and thrice in the same day, there have been orders and complaints, and I have been sent flying to all the wholesale chemists in town. Every time I brought the stuff back, there would be another paper telling me to return it, because it was not pure, and another order to a different firm. This drug is wanted bitter bad, sir, whatever for.” “Have you any of these papers?” asked Mr. Utterson. Poole felt in his pocket and handed out a crumpled note, which the lawyer, bending nearer to the candle, carefully examined. Its contents ran thus: “Dr. Jekyll presents his compliments to Messrs. Maw. He assures them that their last sample is impure and quite useless for his present purpose. In the year 18—, Dr. J. purchased a somewhat large quantity from Messrs. M. He now begs them to search with most sedulous care, and should any of the same quality be left, forward it to him at once. Expense is no consideration. The importance of this to Dr. J. can hardly be exaggerated.” So far the letter had run composedly enough, but here with a sudden splutter of the pen, the writer’s emotion had broken loose. “For God’s sake,” he added, “find me some of the old.” “This is a strange note,” said Mr. Utterson; and then sharply, “How do you come to have it open?” “The man at Maw’s was main angry, sir, and he threw it back to me like so much dirt,” returned Poole. “This is unquestionably the doctor’s hand, do you know?” resumed the lawyer. “I thought it looked like it,” said the servant rather sulkily; and then, with another voice, “But what matters hand of write?” he said. “I’ve seen him!” “Seen him?” repeated Mr. Utterson. “Well?” “That’s it!” said Poole. “It was this way. I came suddenly into the theatre from the garden. It seems he had slipped out to look for this drug or whatever it is; for the cabinet door was open, and there he was at the far end of the room digging among the crates. He looked up when I came in, gave a kind of cry, and whipped upstairs into the cabinet. It was but for one minute that I saw him, but the hair stood upon my head like quills. Sir, if that was my master, why had he a mask upon his face? If it was my master, why did he cry out like a rat, and run from me? I have served him long enough. And then...” The man paused and passed his hand over his face. “These are all very strange circumstances,” said Mr. Utterson, “but I think I begin to see daylight. Your master, Poole, is plainly seized with one of those maladies that both torture and deform the sufferer; hence, for aught I know, the alteration of his voice; hence the mask and the avoidance of his friends; hence his eagerness to find this drug, by means of which the poor soul retains some hope of ultimate recovery—God grant that he be not deceived! There is my explanation; it is sad enough, Poole, ay, and appalling to consider; but it is plain and natural, hangs well together, and delivers us from all exorbitant alarms.” “Sir,” said the butler, turning to a sort of mottled pallor, “that thing was not my master, and there’s the truth. My master”—here he looked round him and began to whisper—“is a tall, fine build of a man, and this was more of a dwarf.” Utterson attempted to protest. “O, sir,” cried Poole, “do you think I do not know my master after twenty years? Do you think I do not know where his head comes to in the cabinet door, where I saw him every morning of my life? No, sir, that thing in the mask was never Dr. Jekyll—God knows what it was, but it was never Dr. Jekyll; and it is the belief of my heart that there was murder done.” “Poole,” replied the lawyer, “if you say that, it will become my duty to make certain. Much as I desire to spare your master’s feelings, much as I am puzzled by this note which seems to prove him to be still alive, I shall consider it my duty to break in that door.” “Ah, Mr. Utterson, that’s talking!” cried the butler. “And now comes the second question,” resumed Utterson: “Who is going to do it?” “Why, you and me, sir,” was the undaunted reply. “That’s very well said,” returned the lawyer; “and whatever comes of it, I shall make it my business to see you are no loser.” “There is an axe in the theatre,” continued Poole; “and you might take the kitchen poker for yourself.” The lawyer took that rude but weighty instrument into his hand, and balanced it. “Do you know, Poole,” he said, looking up, “that you and I are about to place ourselves in a position of some peril?” “You may say so, sir, indeed,” returned the butler. “It is well, then that we should be frank,” said the other. “We both think more than we have said; let us make a clean breast. This masked figure that you saw, did you recognise it?” “Well, sir, it went so quick, and the creature was so doubled up, that I could hardly swear to that,” was the answer. “But if you mean, was it Mr. Hyde?—why, yes, I think it was! You see, it was much of the same bigness; and it had the same quick, light way with it; and then who else could have got in by the laboratory door? You have not forgot, sir, that at the time of the murder he had still the key with him? But that’s not all. I don’t know, Mr. Utterson, if you ever met this Mr. Hyde?” “Yes,” said the lawyer, “I once spoke with him.” “Then you must know as well as the rest of us that there was something queer about that gentleman—something that gave a man a turn—I don’t know rightly how to say it, sir, beyond this: that you felt in your marrow kind of cold and thin.” “I own I felt something of what you describe,” said Mr. Utterson. “Quite so, sir,” returned Poole. “Well, when that masked thing like a monkey jumped from among the chemicals and whipped into the cabinet, it went down my spine like ice. O, I know it’s not evidence, Mr. Utterson; I’m book-learned enough for that; but a man has his feelings, and I give you my bible-word it was Mr. Hyde!” “Ay, ay,” said the lawyer. “My fears incline to the same point. Evil, I fear, founded—evil was sure to come—of that connection. Ay truly, I believe you; I believe poor Harry is killed; and I believe his murderer (for what purpose, God alone can tell) is still lurking in his victim’s room. Well, let our name be vengeance. Call Bradshaw.” The footman came at the summons, very white and nervous. “Pull yourself together, Bradshaw,” said the lawyer. “This suspense, I know, is telling upon all of you; but it is now our intention to make an end of it. Poole, here, and I are going to force our way into the cabinet. If all is well, my shoulders are broad enough to bear the blame. Meanwhile, lest anything should really be amiss, or any malefactor seek to escape by the back, you and the boy must go round the corner with a pair of good sticks and take your post at the laboratory door. We give you ten minutes to get to your stations.” As Bradshaw left, the lawyer looked at his watch. “And now, Poole, let us get to ours,” he said; and taking the poker under his arm, led the way into the yard. The scud had banked over the moon, and it was now quite dark. The wind, which only broke in puffs and draughts into that deep well of building, tossed the light of the candle to and fro about their steps, until they came into the shelter of the theatre, where they sat down silently to wait. London hummed solemnly all around; but nearer at hand, the stillness was only broken by the sounds of a footfall moving to and fro along the cabinet floor. “So it will walk all day, sir,” whispered Poole; “ay, and the better part of the night. Only when a new sample comes from the chemist, there’s a bit of a break. Ah, it’s an ill conscience that’s such an enemy to rest! Ah, sir, there’s blood foully shed in every step of it! But hark again, a little closer—put your heart in your ears, Mr. Utterson, and tell me, is that the doctor’s foot?” The steps fell lightly and oddly, with a certain swing, for all they went so slowly; it was different indeed from the heavy creaking tread of Henry Jekyll. Utterson sighed. “Is there never anything else?” he asked. Poole nodded. “Once,” he said. “Once I heard it weeping!” “Weeping? how that?” said the lawyer, conscious of a sudden chill of horror. “Weeping like a woman or a lost soul,” said the butler. “I came away with that upon my heart, that I could have wept too.” But now the ten minutes drew to an end. Poole disinterred the axe from under a stack of packing straw; the candle was set upon the nearest table to light them to the attack; and they drew near with bated breath to where that patient foot was still going up and down, up and down, in the quiet of the night. “Jekyll,” cried Utterson, with a loud voice, “I demand to see you.” He paused a moment, but there came no reply. “I give you fair warning, our suspicions are aroused, and I must and shall see you,” he resumed; “if not by fair means, then by foul—if not of your consent, then by brute force!” “Utterson,” said the voice, “for God’s sake, have mercy!” “Ah, that’s not Jekyll’s voice—it’s Hyde’s!” cried Utterson. “Down with the door, Poole!” Poole swung the axe over his shoulder; the blow shook the building, and the red baize door leaped against the lock and hinges. A dismal screech, as of mere animal terror, rang from the cabinet. Up went the axe again, and again the panels crashed and the frame bounded; four times the blow fell; but the wood was tough and the fittings were of excellent workmanship; and it was not until the fifth, that the lock burst and the wreck of the door fell inwards on the carpet. The besiegers, appalled by their own riot and the stillness that had succeeded, stood back a little and peered in. There lay the cabinet before their eyes in the quiet lamplight, a good fire glowing and chattering on the hearth, the kettle singing its thin strain, a drawer or two open, papers neatly set forth on the business table, and nearer the fire, the things laid out for tea; the quietest room, you would have said, and, but for the glazed presses full of chemicals, the most commonplace that night in London. Right in the middle there lay the body of a man sorely contorted and still twitching. They drew near on tiptoe, turned it on its back and beheld the face of Edward Hyde. He was dressed in clothes far too large for him, clothes of the doctor’s bigness; the cords of his face still moved with a semblance of life, but life was quite gone; and by the crushed phial in the hand and the strong smell of kernels that hung upon the air, Utterson knew that he was looking on the body of a self-destroyer. “We have come too late,” he said sternly, “whether to save or punish. Hyde is gone to his account; and it only remains for us to find the body of your master.” The far greater proportion of the building was occupied by the theatre, which filled almost the whole ground storey and was lighted from above, and by the cabinet, which formed an upper storey at one end and looked upon the court. A corridor joined the theatre to the door on the by-street; and with this the cabinet communicated separately by a second flight of stairs. There were besides a few dark closets and a spacious cellar. All these they now thoroughly examined. Each closet needed but a glance, for all were empty, and all, by the dust that fell from their doors, had stood long unopened. The cellar, indeed, was filled with crazy lumber, mostly dating from the times of the surgeon who was Jekyll’s predecessor; but even as they opened the door they were advertised of the uselessness of further search, by the fall of a perfect mat of cobweb which had for years sealed up the entrance. Nowhere was there any trace of Henry Jekyll, dead or alive. Poole stamped on the flags of the corridor. “He must be buried here,” he said, hearkening to the sound. “Or he may have fled,” said Utterson, and he turned to examine the door in the by-street. It was locked; and lying near by on the flags, they found the key, already stained with rust. “This does not look like use,” observed the lawyer. “Use!” echoed Poole. “Do you not see, sir, it is broken? much as if a man had stamped on it.” “Ay,” continued Utterson, “and the fractures, too, are rusty.” The two men looked at each other with a scare. “This is beyond me, Poole,” said the lawyer. “Let us go back to the cabinet.” They mounted the stair in silence, and still with an occasional awestruck glance at the dead body, proceeded more thoroughly to examine the contents of the cabinet. At one table, there were traces of chemical work, various measured heaps of some white salt being laid on glass saucers, as though for an experiment in which the unhappy man had been prevented. “That is the same drug that I was always bringing him,” said Poole; and even as he spoke, the kettle with a startling noise boiled over. This brought them to the fireside, where the easy-chair was drawn cosily up, and the tea things stood ready to the sitter’s elbow, the very sugar in the cup. There were several books on a shelf; one lay beside the tea things open, and Utterson was amazed to find it a copy of a pious work, for which Jekyll had several times expressed a great esteem, annotated, in his own hand with startling blasphemies. Next, in the course of their review of the chamber, the searchers came to the cheval-glass, into whose depths they looked with an involuntary horror. But it was so turned as to show them nothing but the rosy glow playing on the roof, the fire sparkling in a hundred repetitions along the glazed front of the presses, and their own pale and fearful countenances stooping to look in. “This glass has seen some strange things, sir,” whispered Poole. “And surely none stranger than itself,” echoed the lawyer in the same tones. “For what did Jekyll”—he caught himself up at the word with a start, and then conquering the weakness—“what could Jekyll want with it?” he said. “You may say that!” said Poole. Next they turned to the business table. On the desk, among the neat array of papers, a large envelope was uppermost, and bore, in the doctor’s hand, the name of Mr. Utterson. The lawyer unsealed it, and several enclosures fell to the floor. The first was a will, drawn in the same eccentric terms as the one which he had returned six months before, to serve as a testament in case of death and as a deed of gift in case of disappearance; but in place of the name of Edward Hyde, the lawyer, with indescribable amazement read the name of Gabriel John Utterson. He looked at Poole, and then back at the paper, and last of all at the dead malefactor stretched upon the carpet. “My head goes round,” he said. “He has been all these days in possession; he had no cause to like me; he must have raged to see himself displaced; and he has not destroyed this document.” He caught up the next paper; it was a brief note in the doctor’s hand and dated at the top. “O Poole!” the lawyer cried, “he was alive and here this day. He cannot have been disposed of in so short a space; he must be still alive, he must have fled! And then, why fled? and how? and in that case, can we venture to declare this suicide? O, we must be careful. I foresee that we may yet involve your master in some dire catastrophe.” “Why don’t you read it, sir?” asked Poole. “Because I fear,” replied the lawyer solemnly. “God grant I have no cause for it!” And with that he brought the paper to his eyes and read as follows: “My dear Utterson,—When this shall fall into your hands, I shall have disappeared, under what circumstances I have not the penetration to foresee, but my instinct and all the circumstances of my nameless situation tell me that the end is sure and must be early. Go then, and first read the narrative which Lanyon warned me he was to place in your hands; and if you care to hear more, turn to the confession of “Your unworthy and unhappy friend, “HENRY JEKYLL.” “There was a third enclosure?” asked Utterson. “Here, sir,” said Poole, and gave into his hands a considerable packet sealed in several places. The lawyer put it in his pocket. “I would say nothing of this paper. If your master has fled or is dead, we may at least save his credit. It is now ten; I must go home and read these documents in quiet; but I shall be back before midnight, when we shall send for the police.” They went out, locking the door of the theatre behind them; and Utterson, once more leaving the servants gathered about the fire in the hall, trudged back to his office to read the two narratives in which this mystery was now to be explained. DR. LANYON’S NARRATIVE On the ninth of January, now four days ago, I received by the evening delivery a registered envelope, addressed in the hand of my colleague and old school companion, Henry Jekyll. I was a good deal surprised by this; for we were by no means in the habit of correspondence; I had seen the man, dined with him, indeed, the night before; and I could imagine nothing in our intercourse that should justify formality of registration. The contents increased my wonder; for this is how the letter ran: “10_th December_, 18—. “Dear Lanyon,—You are one of my oldest friends; and although we may have differed at times on scientific questions, I cannot remember, at least on my side, any break in our affection. There was never a day when, if you had said to me, ‘Jekyll, my life, my honour, my reason, depend upon you,’ I would not have sacrificed my left hand to help you. Lanyon, my life, my honour, my reason, are all at your mercy; if you fail me to-night, I am lost. You might suppose, after this preface, that I am going to ask you for something dishonourable to grant. Judge for yourself. “I want you to postpone all other engagements for to-night—ay, even if you were summoned to the bedside of an emperor; to take a cab, unless your carriage should be actually at the door; and with this letter in your hand for consultation, to drive straight to my house. Poole, my butler, has his orders; you will find him waiting your arrival with a locksmith. The door of my cabinet is then to be forced; and you are to go in alone; to open the glazed press (letter E) on the left hand, breaking the lock if it be shut; and to draw out, _with all its contents as they stand_, the fourth drawer from the top or (which is the same thing) the third from the bottom. In my extreme distress of mind, I have a morbid fear of misdirecting you; but even if I am in error, you may know the right drawer by its contents: some powders, a phial and a paper book. This drawer I beg of you to carry back with you to Cavendish Square exactly as it stands. “That is the first part of the service: now for the second. You should be back, if you set out at once on the receipt of this, long before midnight; but I will leave you that amount of margin, not only in the fear of one of those obstacles that can neither be prevented nor foreseen, but because an hour when your servants are in bed is to be preferred for what will then remain to do. At midnight, then, I have to ask you to be alone in your consulting room, to admit with your own hand into the house a man who will present himself in my name, and to place in his hands the drawer that you will have brought with you from my cabinet. Then you will have played your part and earned my gratitude completely. Five minutes afterwards, if you insist upon an explanation, you will have understood that these arrangements are of capital importance; and that by the neglect of one of them, fantastic as they must appear, you might have charged your conscience with my death or the shipwreck of my reason. “Confident as I am that you will not trifle with this appeal, my heart sinks and my hand trembles at the bare thought of such a possibility. Think of me at this hour, in a strange place, labouring under a blackness of distress that no fancy can exaggerate, and yet well aware that, if you will but punctually serve me, my troubles will roll away like a story that is told. Serve me, my dear Lanyon and save “Your friend, “H.J. “P.S.—I had already sealed this up when a fresh terror struck upon my soul. It is possible that the post-office may fail me, and this letter not come into your hands until to-morrow morning. In that case, dear Lanyon, do my errand when it shall be most convenient for you in the course of the day; and once more expect my messenger at midnight. It may then already be too late; and if that night passes without event, you will know that you have seen the last of Henry Jekyll.” Upon the reading of this letter, I made sure my colleague was insane; but till that was proved beyond the possibility of doubt, I felt bound to do as he requested. The less I understood of this farrago, the less I was in a position to judge of its importance; and an appeal so worded could not be set aside without a grave responsibility. I rose accordingly from table, got into a hansom, and drove straight to Jekyll’s house. The butler was awaiting my arrival; he had received by the same post as mine a registered letter of instruction, and had sent at once for a locksmith and a carpenter. The tradesmen came while we were yet speaking; and we moved in a body to old Dr. Denman’s surgical theatre, from which (as you are doubtless aware) Jekyll’s private cabinet is most conveniently entered. The door was very strong, the lock excellent; the carpenter avowed he would have great trouble and have to do much damage, if force were to be used; and the locksmith was near despair. But this last was a handy fellow, and after two hour’s work, the door stood open. The press marked E was unlocked; and I took out the drawer, had it filled up with straw and tied in a sheet, and returned with it to Cavendish Square. Here I proceeded to examine its contents. The powders were neatly enough made up, but not with the nicety of the dispensing chemist; so that it was plain they were of Jekyll’s private manufacture; and when I opened one of the wrappers I found what seemed to me a simple crystalline salt of a white colour. The phial, to which I next turned my attention, might have been about half full of a blood-red liquor, which was highly pungent to the sense of smell and seemed to me to contain phosphorus and some volatile ether. At the other ingredients I could make no guess. The book was an ordinary version book and contained little but a series of dates. These covered a period of many years, but I observed that the entries ceased nearly a year ago and quite abruptly. Here and there a brief remark was appended to a date, usually no more than a single word: “double” occurring perhaps six times in a total of several hundred entries; and once very early in the list and followed by several marks of exclamation, “total failure!!!” All this, though it whetted my curiosity, told me little that was definite. Here were a phial of some salt, and the record of a series of experiments that had led (like too many of Jekyll’s investigations) to no end of practical usefulness. How could the presence of these articles in my house affect either the honour, the sanity, or the life of my flighty colleague? If his messenger could go to one place, why could he not go to another? And even granting some impediment, why was this gentleman to be received by me in secret? The more I reflected the more convinced I grew that I was dealing with a case of cerebral disease; and though I dismissed my servants to bed, I loaded an old revolver, that I might be found in some posture of self-defence. Twelve o’clock had scarce rung out over London, ere the knocker sounded very gently on the door. I went myself at the summons, and found a small man crouching against the pillars of the portico. “Are you come from Dr. Jekyll?” I asked. He told me “yes” by a constrained gesture; and when I had bidden him enter, he did not obey me without a searching backward glance into the darkness of the square. There was a policeman not far off, advancing with his bull’s eye open; and at the sight, I thought my visitor started and made greater haste. These particulars struck me, I confess, disagreeably; and as I followed him into the bright light of the consulting room, I kept my hand ready on my weapon. Here, at last, I had a chance of clearly seeing him. I had never set eyes on him before, so much was certain. He was small, as I have said; I was struck besides with the shocking expression of his face, with his remarkable combination of great muscular activity and great apparent debility of constitution, and—last but not least—with the odd, subjective disturbance caused by his neighbourhood. This bore some resemblance to incipient rigour, and was accompanied by a marked sinking of the pulse. At the time, I set it down to some idiosyncratic, personal distaste, and merely wondered at the acuteness of the symptoms; but I have since had reason to believe the cause to lie much deeper in the nature of man, and to turn on some nobler hinge than the principle of hatred. This person (who had thus, from the first moment of his entrance, struck in me what I can only describe as a disgustful curiosity) was dressed in a fashion that would have made an ordinary person laughable; his clothes, that is to say, although they were of rich and sober fabric, were enormously too large for him in every measurement—the trousers hanging on his legs and rolled up to keep them from the ground, the waist of the coat below his haunches, and the collar sprawling wide upon his shoulders. Strange to relate, this ludicrous accoutrement was far from moving me to laughter. Rather, as there was something abnormal and misbegotten in the very essence of the creature that now faced me—something seizing, surprising and revolting—this fresh disparity seemed but to fit in with and to reinforce it; so that to my interest in the man’s nature and character, there was added a curiosity as to his origin, his life, his fortune and status in the world. These observations, though they have taken so great a space to be set down in, were yet the work of a few seconds. My visitor was, indeed, on fire with sombre excitement. “Have you got it?” he cried. “Have you got it?” And so lively was his impatience that he even laid his hand upon my arm and sought to shake me. I put him back, conscious at his touch of a certain icy pang along my blood. “Come, sir,” said I. “You forget that I have not yet the pleasure of your acquaintance. Be seated, if you please.” And I showed him an example, and sat down myself in my customary seat and with as fair an imitation of my ordinary manner to a patient, as the lateness of the hour, the nature of my preoccupations, and the horror I had of my visitor, would suffer me to muster. “I beg your pardon, Dr. Lanyon,” he replied civilly enough. “What you say is very well founded; and my impatience has shown its heels to my politeness. I come here at the instance of your colleague, Dr. Henry Jekyll, on a piece of business of some moment; and I understood...” He paused and put his hand to his throat, and I could see, in spite of his collected manner, that he was wrestling against the approaches of the hysteria—“I understood, a drawer...” But here I took pity on my visitor’s suspense, and some perhaps on my own growing curiosity. “There it is, sir,” said I, pointing to the drawer, where it lay on the floor behind a table and still covered with the sheet. He sprang to it, and then paused, and laid his hand upon his heart; I could hear his teeth grate with the convulsive action of his jaws; and his face was so ghastly to see that I grew alarmed both for his life and reason. “Compose yourself,” said I. He turned a dreadful smile to me, and as if with the decision of despair, plucked away the sheet. At sight of the contents, he uttered one loud sob of such immense relief that I sat petrified. And the next moment, in a voice that was already fairly well under control, “Have you a graduated glass?” he asked. I rose from my place with something of an effort and gave him what he asked. He thanked me with a smiling nod, measured out a few minims of the red tincture and added one of the powders. The mixture, which was at first of a reddish hue, began, in proportion as the crystals melted, to brighten in colour, to effervesce audibly, and to throw off small fumes of vapour. Suddenly and at the same moment, the ebullition ceased and the compound changed to a dark purple, which faded again more slowly to a watery green. My visitor, who had watched these metamorphoses with a keen eye, smiled, set down the glass upon the table, and then turned and looked upon me with an air of scrutiny. “And now,” said he, “to settle what remains. Will you be wise? will you be guided? will you suffer me to take this glass in my hand and to go forth from your house without further parley? or has the greed of curiosity too much command of you? Think before you answer, for it shall be done as you decide. As you decide, you shall be left as you were before, and neither richer nor wiser, unless the sense of service rendered to a man in mortal distress may be counted as a kind of riches of the soul. Or, if you shall so prefer to choose, a new province of knowledge and new avenues to fame and power shall be laid open to you, here, in this room, upon the instant; and your sight shall be blasted by a prodigy to stagger the unbelief of Satan.” “Sir,” said I, affecting a coolness that I was far from truly possessing, “you speak enigmas, and you will perhaps not wonder that I hear you with no very strong impression of belief. But I have gone too far in the way of inexplicable services to pause before I see the end.” “It is well,” replied my visitor. “Lanyon, you remember your vows: what follows is under the seal of our profession. And now, you who have so long been bound to the most narrow and material views, you who have denied the virtue of transcendental medicine, you who have derided your superiors—behold!” He put the glass to his lips and drank at one gulp. A cry followed; he reeled, staggered, clutched at the table and held on, staring with injected eyes, gasping with open mouth; and as I looked there came, I thought, a change—he seemed to swell—his face became suddenly black and the features seemed to melt and alter—and the next moment, I had sprung to my feet and leaped back against the wall, my arms raised to shield me from that prodigy, my mind submerged in terror. “O God!” I screamed, and “O God!” again and again; for there before my eyes—pale and shaken, and half fainting, and groping before him with his hands, like a man restored from death—there stood Henry Jekyll! What he told me in the next hour, I cannot bring my mind to set on paper. I saw what I saw, I heard what I heard, and my soul sickened at it; and yet now when that sight has faded from my eyes, I ask myself if I believe it, and I cannot answer. My life is shaken to its roots; sleep has left me; the deadliest terror sits by me at all hours of the day and night; and I feel that my days are numbered, and that I must die; and yet I shall die incredulous. As for the moral turpitude that man unveiled to me, even with tears of penitence, I cannot, even in memory, dwell on it without a start of horror. I will say but one thing, Utterson, and that (if you can bring your mind to credit it) will be more than enough. The creature who crept into my house that night was, on Jekyll’s own confession, known by the name of Hyde and hunted for in every corner of the land as the murderer of Carew. HASTIE LANYON. HENRY JEKYLL’S FULL STATEMENT OF THE CASE I was born in the year 18— to a large fortune, endowed besides with excellent parts, inclined by nature to industry, fond of the respect of the wise and good among my fellowmen, and thus, as might have been supposed, with every guarantee of an honourable and distinguished future. And indeed the worst of my faults was a certain impatient gaiety of disposition, such as has made the happiness of many, but such as I found it hard to reconcile with my imperious desire to carry my head high, and wear a more than commonly grave countenance before the public. Hence it came about that I concealed my pleasures; and that when I reached years of reflection, and began to look round me and take stock of my progress and position in the world, I stood already committed to a profound duplicity of life. Many a man would have even blazoned such irregularities as I was guilty of; but from the high views that I had set before me, I regarded and hid them with an almost morbid sense of shame. It was thus rather the exacting nature of my aspirations than any particular degradation in my faults, that made me what I was, and, with even a deeper trench than in the majority of men, severed in me those provinces of good and ill which divide and compound man’s dual nature. In this case, I was driven to reflect deeply and inveterately on that hard law of life, which lies at the root of religion and is one of the most plentiful springs of distress. Though so profound a double-dealer, I was in no sense a hypocrite; both sides of me were in dead earnest; I was no more myself when I laid aside restraint and plunged in shame, than when I laboured, in the eye of day, at the furtherance of knowledge or the relief of sorrow and suffering. And it chanced that the direction of my scientific studies, which led wholly towards the mystic and the transcendental, reacted and shed a strong light on this consciousness of the perennial war among my members. With every day, and from both sides of my intelligence, the moral and the intellectual, I thus drew steadily nearer to that truth, by whose partial discovery I have been doomed to such a dreadful shipwreck: that man is not truly one, but truly two. I say two, because the state of my own knowledge does not pass beyond that point. Others will follow, others will outstrip me on the same lines; and I hazard the guess that man will be ultimately known for a mere polity of multifarious, incongruous and independent denizens. I, for my part, from the nature of my life, advanced infallibly in one direction and in one direction only. It was on the moral side, and in my own person, that I learned to recognise the thorough and primitive duality of man; I saw that, of the two natures that contended in the field of my consciousness, even if I could rightly be said to be either, it was only because I was radically both; and from an early date, even before the course of my scientific discoveries had begun to suggest the most naked possibility of such a miracle, I had learned to dwell with pleasure, as a beloved daydream, on the thought of the separation of these elements. If each, I told myself, could be housed in separate identities, life would be relieved of all that was unbearable; the unjust might go his way, delivered from the aspirations and remorse of his more upright twin; and the just could walk steadfastly and securely on his upward path, doing the good things in which he found his pleasure, and no longer exposed to disgrace and penitence by the hands of this extraneous evil. It was the curse of mankind that these incongruous faggots were thus bound together—that in the agonised womb of consciousness, these polar twins should be continuously struggling. How, then were they dissociated? I was so far in my reflections when, as I have said, a side light began to shine upon the subject from the laboratory table. I began to perceive more deeply than it has ever yet been stated, the trembling immateriality, the mistlike transience, of this seemingly so solid body in which we walk attired. Certain agents I found to have the power to shake and pluck back that fleshly vestment, even as a wind might toss the curtains of a pavilion. For two good reasons, I will not enter deeply into this scientific branch of my confession. First, because I have been made to learn that the doom and burthen of our life is bound for ever on man’s shoulders, and when the attempt is made to cast it off, it but returns upon us with more unfamiliar and more awful pressure. Second, because, as my narrative will make, alas! too evident, my discoveries were incomplete. Enough then, that I not only recognised my natural body from the mere aura and effulgence of certain of the powers that made up my spirit, but managed to compound a drug by which these powers should be dethroned from their supremacy, and a second form and countenance substituted, none the less natural to me because they were the expression, and bore the stamp of lower elements in my soul. I hesitated long before I put this theory to the test of practice. I knew well that I risked death; for any drug that so potently controlled and shook the very fortress of identity, might, by the least scruple of an overdose or at the least inopportunity in the moment of exhibition, utterly blot out that immaterial tabernacle which I looked to it to change. But the temptation of a discovery so singular and profound at last overcame the suggestions of alarm. I had long since prepared my tincture; I purchased at once, from a firm of wholesale chemists, a large quantity of a particular salt which I knew, from my experiments, to be the last ingredient required; and late one accursed night, I compounded the elements, watched them boil and smoke together in the glass, and when the ebullition had subsided, with a strong glow of courage, drank off the potion. The most racking pangs succeeded: a grinding in the bones, deadly nausea, and a horror of the spirit that cannot be exceeded at the hour of birth or death. Then these agonies began swiftly to subside, and I came to myself as if out of a great sickness. There was something strange in my sensations, something indescribably new and, from its very novelty, incredibly sweet. I felt younger, lighter, happier in body; within I was conscious of a heady recklessness, a current of disordered sensual images running like a millrace in my fancy, a solution of the bonds of obligation, an unknown but not an innocent freedom of the soul. I knew myself, at the first breath of this new life, to be more wicked, tenfold more wicked, sold a slave to my original evil; and the thought, in that moment, braced and delighted me like wine. I stretched out my hands, exulting in the freshness of these sensations; and in the act, I was suddenly aware that I had lost in stature. There was no mirror, at that date, in my room; that which stands beside me as I write, was brought there later on and for the very purpose of these transformations. The night however, was far gone into the morning—the morning, black as it was, was nearly ripe for the conception of the day—the inmates of my house were locked in the most rigorous hours of slumber; and I determined, flushed as I was with hope and triumph, to venture in my new shape as far as to my bedroom. I crossed the yard, wherein the constellations looked down upon me, I could have thought, with wonder, the first creature of that sort that their unsleeping vigilance had yet disclosed to them; I stole through the corridors, a stranger in my own house; and coming to my room, I saw for the first time the appearance of Edward Hyde. I must here speak by theory alone, saying not that which I know, but that which I suppose to be most probable. The evil side of my nature, to which I had now transferred the stamping efficacy, was less robust and less developed than the good which I had just deposed. Again, in the course of my life, which had been, after all, nine tenths a life of effort, virtue and control, it had been much less exercised and much less exhausted. And hence, as I think, it came about that Edward Hyde was so much smaller, slighter and younger than Henry Jekyll. Even as good shone upon the countenance of the one, evil was written broadly and plainly on the face of the other. Evil besides (which I must still believe to be the lethal side of man) had left on that body an imprint of deformity and decay. And yet when I looked upon that ugly idol in the glass, I was conscious of no repugnance, rather of a leap of welcome. This, too, was myself. It seemed natural and human. In my eyes it bore a livelier image of the spirit, it seemed more express and single, than the imperfect and divided countenance I had been hitherto accustomed to call mine. And in so far I was doubtless right. I have observed that when I wore the semblance of Edward Hyde, none could come near to me at first without a visible misgiving of the flesh. This, as I take it, was because all human beings, as we meet them, are commingled out of good and evil: and Edward Hyde, alone in the ranks of mankind, was pure evil. I lingered but a moment at the mirror: the second and conclusive experiment had yet to be attempted; it yet remained to be seen if I had lost my identity beyond redemption and must flee before daylight from a house that was no longer mine; and hurrying back to my cabinet, I once more prepared and drank the cup, once more suffered the pangs of dissolution, and came to myself once more with the character, the stature and the face of Henry Jekyll. That night I had come to the fatal cross-roads. Had I approached my discovery in a more noble spirit, had I risked the experiment while under the empire of generous or pious aspirations, all must have been otherwise, and from these agonies of death and birth, I had come forth an angel instead of a fiend. The drug had no discriminating action; it was neither diabolical nor divine; it but shook the doors of the prisonhouse of my disposition; and like the captives of Philippi, that which stood within ran forth. At that time my virtue slumbered; my evil, kept awake by ambition, was alert and swift to seize the occasion; and the thing that was projected was Edward Hyde. Hence, although I had now two characters as well as two appearances, one was wholly evil, and the other was still the old Henry Jekyll, that incongruous compound of whose reformation and improvement I had already learned to despair. The movement was thus wholly toward the worse. Even at that time, I had not conquered my aversions to the dryness of a life of study. I would still be merrily disposed at times; and as my pleasures were (to say the least) undignified, and I was not only well known and highly considered, but growing towards the elderly man, this incoherency of my life was daily growing more unwelcome. It was on this side that my new power tempted me until I fell in slavery. I had but to drink the cup, to doff at once the body of the noted professor, and to assume, like a thick cloak, that of Edward Hyde. I smiled at the notion; it seemed to me at the time to be humourous; and I made my preparations with the most studious care. I took and furnished that house in Soho, to which Hyde was tracked by the police; and engaged as a housekeeper a creature whom I knew well to be silent and unscrupulous. On the other side, I announced to my servants that a Mr. Hyde (whom I described) was to have full liberty and power about my house in the square; and to parry mishaps, I even called and made myself a familiar object, in my second character. I next drew up that will to which you so much objected; so that if anything befell me in the person of Dr. Jekyll, I could enter on that of Edward Hyde without pecuniary loss. And thus fortified, as I supposed, on every side, I began to profit by the strange immunities of my position. Men have before hired bravos to transact their crimes, while their own person and reputation sat under shelter. I was the first that ever did so for his pleasures. I was the first that could plod in the public eye with a load of genial respectability, and in a moment, like a schoolboy, strip off these lendings and spring headlong into the sea of liberty. But for me, in my impenetrable mantle, the safety was complete. Think of it—I did not even exist! Let me but escape into my laboratory door, give me but a second or two to mix and swallow the draught that I had always standing ready; and whatever he had done, Edward Hyde would pass away like the stain of breath upon a mirror; and there in his stead, quietly at home, trimming the midnight lamp in his study, a man who could afford to laugh at suspicion, would be Henry Jekyll. The pleasures which I made haste to seek in my disguise were, as I have said, undignified; I would scarce use a harder term. But in the hands of Edward Hyde, they soon began to turn toward the monstrous. When I would come back from these excursions, I was often plunged into a kind of wonder at my vicarious depravity. This familiar that I called out of my own soul, and sent forth alone to do his good pleasure, was a being inherently malign and villainous; his every act and thought centered on self; drinking pleasure with bestial avidity from any degree of torture to another; relentless like a man of stone. Henry Jekyll stood at times aghast before the acts of Edward Hyde; but the situation was apart from ordinary laws, and insidiously relaxed the grasp of conscience. It was Hyde, after all, and Hyde alone, that was guilty. Jekyll was no worse; he woke again to his good qualities seemingly unimpaired; he would even make haste, where it was possible, to undo the evil done by Hyde. And thus his conscience slumbered. Into the details of the infamy at which I thus connived (for even now I can scarce grant that I committed it) I have no design of entering; I mean but to point out the warnings and the successive steps with which my chastisement approached. I met with one accident which, as it brought on no consequence, I shall no more than mention. An act of cruelty to a child aroused against me the anger of a passer-by, whom I recognised the other day in the person of your kinsman; the doctor and the child’s family joined him; there were moments when I feared for my life; and at last, in order to pacify their too just resentment, Edward Hyde had to bring them to the door, and pay them in a cheque drawn in the name of Henry Jekyll. But this danger was easily eliminated from the future, by opening an account at another bank in the name of Edward Hyde himself; and when, by sloping my own hand backward, I had supplied my double with a signature, I thought I sat beyond the reach of fate. Some two months before the murder of Sir Danvers, I had been out for one of my adventures, had returned at a late hour, and woke the next day in bed with somewhat odd sensations. It was in vain I looked about me; in vain I saw the decent furniture and tall proportions of my room in the square; in vain that I recognised the pattern of the bed curtains and the design of the mahogany frame; something still kept insisting that I was not where I was, that I had not wakened where I seemed to be, but in the little room in Soho where I was accustomed to sleep in the body of Edward Hyde. I smiled to myself, and in my psychological way, began lazily to inquire into the elements of this illusion, occasionally, even as I did so, dropping back into a comfortable morning doze. I was still so engaged when, in one of my more wakeful moments, my eyes fell upon my hand. Now the hand of Henry Jekyll (as you have often remarked) was professional in shape and size; it was large, firm, white and comely. But the hand which I now saw, clearly enough, in the yellow light of a mid-London morning, lying half shut on the bedclothes, was lean, corded, knuckly, of a dusky pallor and thickly shaded with a swart growth of hair. It was the hand of Edward Hyde. I must have stared upon it for near half a minute, sunk as I was in the mere stupidity of wonder, before terror woke up in my breast as sudden and startling as the crash of cymbals; and bounding from my bed I rushed to the mirror. At the sight that met my eyes, my blood was changed into something exquisitely thin and icy. Yes, I had gone to bed Henry Jekyll, I had awakened Edward Hyde. How was this to be explained? I asked myself; and then, with another bound of terror—how was it to be remedied? It was well on in the morning; the servants were up; all my drugs were in the cabinet—a long journey down two pairs of stairs, through the back passage, across the open court and through the anatomical theatre, from where I was then standing horror-struck. It might indeed be possible to cover my face; but of what use was that, when I was unable to conceal the alteration in my stature? And then with an overpowering sweetness of relief, it came back upon my mind that the servants were already used to the coming and going of my second self. I had soon dressed, as well as I was able, in clothes of my own size: had soon passed through the house, where Bradshaw stared and drew back at seeing Mr. Hyde at such an hour and in such a strange array; and ten minutes later, Dr. Jekyll had returned to his own shape and was sitting down, with a darkened brow, to make a feint of breakfasting. Small indeed was my appetite. This inexplicable incident, this reversal of my previous experience, seemed, like the Babylonian finger on the wall, to be spelling out the letters of my judgment; and I began to reflect more seriously than ever before on the issues and possibilities of my double existence. That part of me which I had the power of projecting, had lately been much exercised and nourished; it had seemed to me of late as though the body of Edward Hyde had grown in stature, as though (when I wore that form) I were conscious of a more generous tide of blood; and I began to spy a danger that, if this were much prolonged, the balance of my nature might be permanently overthrown, the power of voluntary change be forfeited, and the character of Edward Hyde become irrevocably mine. The power of the drug had not been always equally displayed. Once, very early in my career, it had totally failed me; since then I had been obliged on more than one occasion to double, and once, with infinite risk of death, to treble the amount; and these rare uncertainties had cast hitherto the sole shadow on my contentment. Now, however, and in the light of that morning’s accident, I was led to remark that whereas, in the beginning, the difficulty had been to throw off the body of Jekyll, it had of late gradually but decidedly transferred itself to the other side. All things therefore seemed to point to this; that I was slowly losing hold of my original and better self, and becoming slowly incorporated with my second and worse. Between these two, I now felt I had to choose. My two natures had memory in common, but all other faculties were most unequally shared between them. Jekyll (who was composite) now with the most sensitive apprehensions, now with a greedy gusto, projected and shared in the pleasures and adventures of Hyde; but Hyde was indifferent to Jekyll, or but remembered him as the mountain bandit remembers the cavern in which he conceals himself from pursuit. Jekyll had more than a father’s interest; Hyde had more than a son’s indifference. To cast in my lot with Jekyll, was to die to those appetites which I had long secretly indulged and had of late begun to pamper. To cast it in with Hyde, was to die to a thousand interests and aspirations, and to become, at a blow and forever, despised and friendless. The bargain might appear unequal; but there was still another consideration in the scales; for while Jekyll would suffer smartingly in the fires of abstinence, Hyde would be not even conscious of all that he had lost. Strange as my circumstances were, the terms of this debate are as old and commonplace as man; much the same inducements and alarms cast the die for any tempted and trembling sinner; and it fell out with me, as it falls with so vast a majority of my fellows, that I chose the better part and was found wanting in the strength to keep to it. Yes, I preferred the elderly and discontented doctor, surrounded by friends and cherishing honest hopes; and bade a resolute farewell to the liberty, the comparative youth, the light step, leaping impulses and secret pleasures, that I had enjoyed in the disguise of Hyde. I made this choice perhaps with some unconscious reservation, for I neither gave up the house in Soho, nor destroyed the clothes of Edward Hyde, which still lay ready in my cabinet. For two months, however, I was true to my determination; for two months, I led a life of such severity as I had never before attained to, and enjoyed the compensations of an approving conscience. But time began at last to obliterate the freshness of my alarm; the praises of conscience began to grow into a thing of course; I began to be tortured with throes and longings, as of Hyde struggling after freedom; and at last, in an hour of moral weakness, I once again compounded and swallowed the transforming draught. I do not suppose that, when a drunkard reasons with himself upon his vice, he is once out of five hundred times affected by the dangers that he runs through his brutish, physical insensibility; neither had I, long as I had considered my position, made enough allowance for the complete moral insensibility and insensate readiness to evil, which were the leading characters of Edward Hyde. Yet it was by these that I was punished. My devil had been long caged, he came out roaring. I was conscious, even when I took the draught, of a more unbridled, a more furious propensity to ill. It must have been this, I suppose, that stirred in my soul that tempest of impatience with which I listened to the civilities of my unhappy victim; I declare, at least, before God, no man morally sane could have been guilty of that crime upon so pitiful a provocation; and that I struck in no more reasonable spirit than that in which a sick child may break a plaything. But I had voluntarily stripped myself of all those balancing instincts by which even the worst of us continues to walk with some degree of steadiness among temptations; and in my case, to be tempted, however slightly, was to fall. Instantly the spirit of hell awoke in me and raged. With a transport of glee, I mauled the unresisting body, tasting delight from every blow; and it was not till weariness had begun to succeed, that I was suddenly, in the top fit of my delirium, struck through the heart by a cold thrill of terror. A mist dispersed; I saw my life to be forfeit; and fled from the scene of these excesses, at once glorying and trembling, my lust of evil gratified and stimulated, my love of life screwed to the topmost peg. I ran to the house in Soho, and (to make assurance doubly sure) destroyed my papers; thence I set out through the lamplit streets, in the same divided ecstasy of mind, gloating on my crime, light-headedly devising others in the future, and yet still hastening and still hearkening in my wake for the steps of the avenger. Hyde had a song upon his lips as he compounded the draught, and as he drank it, pledged the dead man. The pangs of transformation had not done tearing him, before Henry Jekyll, with streaming tears of gratitude and remorse, had fallen upon his knees and lifted his clasped hands to God. The veil of self-indulgence was rent from head to foot. I saw my life as a whole: I followed it up from the days of childhood, when I had walked with my father’s hand, and through the self-denying toils of my professional life, to arrive again and again, with the same sense of unreality, at the damned horrors of the evening. I could have screamed aloud; I sought with tears and prayers to smother down the crowd of hideous images and sounds with which my memory swarmed against me; and still, between the petitions, the ugly face of my iniquity stared into my soul. As the acuteness of this remorse began to die away, it was succeeded by a sense of joy. The problem of my conduct was solved. Hyde was thenceforth impossible; whether I would or not, I was now confined to the better part of my existence; and O, how I rejoiced to think of it! with what willing humility I embraced anew the restrictions of natural life! with what sincere renunciation I locked the door by which I had so often gone and come, and ground the key under my heel! The next day, came the news that the murder had not been overlooked, that the guilt of Hyde was patent to the world, and that the victim was a man high in public estimation. It was not only a crime, it had been a tragic folly. I think I was glad to know it; I think I was glad to have my better impulses thus buttressed and guarded by the terrors of the scaffold. Jekyll was now my city of refuge; let but Hyde peep out an instant, and the hands of all men would be raised to take and slay him. I resolved in my future conduct to redeem the past; and I can say with honesty that my resolve was fruitful of some good. You know yourself how earnestly, in the last months of the last year, I laboured to relieve suffering; you know that much was done for others, and that the days passed quietly, almost happily for myself. Nor can I truly say that I wearied of this beneficent and innocent life; I think instead that I daily enjoyed it more completely; but I was still cursed with my duality of purpose; and as the first edge of my penitence wore off, the lower side of me, so long indulged, so recently chained down, began to growl for licence. Not that I dreamed of resuscitating Hyde; the bare idea of that would startle me to frenzy: no, it was in my own person that I was once more tempted to trifle with my conscience; and it was as an ordinary secret sinner that I at last fell before the assaults of temptation. There comes an end to all things; the most capacious measure is filled at last; and this brief condescension to my evil finally destroyed the balance of my soul. And yet I was not alarmed; the fall seemed natural, like a return to the old days before I had made my discovery. It was a fine, clear, January day, wet under foot where the frost had melted, but cloudless overhead; and the Regent’s Park was full of winter chirrupings and sweet with spring odours. I sat in the sun on a bench; the animal within me licking the chops of memory; the spiritual side a little drowsed, promising subsequent penitence, but not yet moved to begin. After all, I reflected, I was like my neighbours; and then I smiled, comparing myself with other men, comparing my active good-will with the lazy cruelty of their neglect. And at the very moment of that vainglorious thought, a qualm came over me, a horrid nausea and the most deadly shuddering. These passed away, and left me faint; and then as in its turn faintness subsided, I began to be aware of a change in the temper of my thoughts, a greater boldness, a contempt of danger, a solution of the bonds of obligation. I looked down; my clothes hung formlessly on my shrunken limbs; the hand that lay on my knee was corded and hairy. I was once more Edward Hyde. A moment before I had been safe of all men’s respect, wealthy, beloved—the cloth laying for me in the dining-room at home; and now I was the common quarry of mankind, hunted, houseless, a known murderer, thrall to the gallows. My reason wavered, but it did not fail me utterly. I have more than once observed that in my second character, my faculties seemed sharpened to a point and my spirits more tensely elastic; thus it came about that, where Jekyll perhaps might have succumbed, Hyde rose to the importance of the moment. My drugs were in one of the presses of my cabinet; how was I to reach them? That was the problem that (crushing my temples in my hands) I set myself to solve. The laboratory door I had closed. If I sought to enter by the house, my own servants would consign me to the gallows. I saw I must employ another hand, and thought of Lanyon. How was he to be reached? how persuaded? Supposing that I escaped capture in the streets, how was I to make my way into his presence? and how should I, an unknown and displeasing visitor, prevail on the famous physician to rifle the study of his colleague, Dr. Jekyll? Then I remembered that of my original character, one part remained to me: I could write my own hand; and once I had conceived that kindling spark, the way that I must follow became lighted up from end to end. Thereupon, I arranged my clothes (Pdb) lic feeling,” returned Guest. “The man, of course, was mad.” “I should like to hear your views on that,” replied Utterson. “I have a document here in his handwriting; it is between ourselves, for I scarce know what to do about it; it is an ugly business at the best. But there it is; quite in your way: a murderer’s autograph.” Guest’s eyes brightened, and he sat down at once and studied it with passion. “No sir,” he said: “not mad; but it is an odd hand.” “And by all accounts a very odd writer,” added the lawyer. Just then the servant entered with a note. “Is that from Dr. Jekyll, sir?” inquired the clerk. “I thought I knew the writing. Anything private, Mr. Utterson?” “Only an invitation to dinner. Why? Do you want to see it?” “One moment. I thank you, sir;” and the clerk laid the two sheets of paper alongside and sedulously compared their contents. “Thank you, sir,” he said at last, returning both; “it’s a very interesting autograph.” There was a pause, during which Mr. Utterson struggled with himself. “Why did you compare them, Guest?” he inquired suddenly. “Well, sir,” returned the clerk, “there’s a rather singular resemblance; the two hands are in many points identical: only differently sloped.” “Rather quaint,” said Utterson. “It is, as you say, rather quaint,” returned Guest. “I wouldn’t speak of this note, you know,” said the master. “No, sir,” said the clerk. “I understand.” But no sooner was Mr. Utterson alone that night, than he locked the note into his safe, where it reposed from that time forward. “What!” he thought. “Henry Jekyll forge for a murderer!” And his blood ran cold in his veins. INCIDENT OF DR. LANYON Time ran on; thousands of pounds were offered in reward, for the death of Sir Danvers was resented as a public injury; but Mr. Hyde had disappeared out of the ken of the police as though he had never existed. Much of his past was unearthed, indeed, and all disreputable: tales came out of the man’s cruelty, at once so callous and violent; of his vile life, of his strange associates, of the hatred that seemed to have surrounded his career; but of his present whereabouts, not a whisper. From the time he had left the house in Soho on the morning of the murder, he was simply blotted out; and gradually, as time drew on, Mr. Utterson began to recover from the hotness of his alarm, and to grow more at quiet with himself. The death of Sir Danvers was, to his way of thinking, more than paid for by the disappearance of Mr. Hyde. Now that that evil influence had been withdrawn, a new life began for Dr. Jekyll. He came out of his seclusion, renewed relations with his friends, became once more their familiar guest and entertainer; and whilst he had always been known for charities, he was now no less distinguished for religion. He was busy, he was much in the open air, he did good; his face seemed to open and brighten, as if with an inward consciousness of service; and for more than two months, the doctor was at peace. On the 8th of January Utterson had dined at the doctor’s with a small party; Lanyon had been there; and the face of the host had looked from one to the other as in the old days when the trio were inseparable friends. On the 12th, and again on the 14th, the door was shut against the lawyer. “The doctor was confined to the house,” Poole said, “and saw no one.” On the 15th, he tried again, and was again refused; and having now been used for the last two months to see his friend almost daily, he found this return of solitude to weigh upon his spirits. The fifth night he had in Guest to dine with him; and the sixth he betook himself to Dr. Lanyon’s. There at least he was not denied admittance; but when he came in, he was shocked at the change which had taken place in the doctor’s appearance. He had his death-warrant written legibly upon his face. The rosy man had grown pale; his flesh had fallen away; he was visibly balder and older; and yet it was not so much these tokens of a swift physical decay that arrested the lawyer’s notice, as a look in the eye and quality of manner that seemed to testify to some deep-seated terror of the mind. It was unlikely that the doctor should fear death; and yet that was what Utterson was tempted to suspect. “Yes,” he thought; “he is a doctor, he must know his own state and that his days are counted; and the knowledge is more than he can bear.” And yet when Utterson remarked on his ill looks, it was with an air of great firmness that Lanyon declared himself a doomed man. “I have had a shock,” he said, “and I shall never recover. It is a question of weeks. Well, life has been pleasant; I liked it; yes, sir, I used to like it. I sometimes think if we knew all, we should be more glad to get away.” “Jekyll is ill, too,” observed Utterson. “Have you seen him?” But Lanyon’s face changed, and he held up a trembling hand. “I wish to see or hear no more of Dr. Jekyll,” he said in a loud, unsteady voice. “I am quite done with that person; and I beg that you will spare me any allusion to one whom I regard as dead.” “Tut, tut!” said Mr. Utterson; and then after a considerable pause, “Can’t I do anything?” he inquired. “We are three very old friends, Lanyon; we shall not live to make others.” “Nothing can be done,” returned Lanyon; “ask himself.” “He will not see me,” said the lawyer. “I am not surprised at that,” was the reply. “Some day, Utterson, after I am dead, you may perhaps come to learn the right and wrong of this. I cannot tell you. And in the meantime, if you can sit and talk with me of other things, for God’s sake, stay and do so; but if you cannot keep clear of this accursed topic, then in God’s name, go, for I cannot bear it.” As soon as he got home, Utterson sat down and wrote to Jekyll, complaining of his exclusion from the house, and asking the cause of this unhappy break with Lanyon; and the next day brought him a long answer, often very pathetically worded, and sometimes darkly mysterious in drift. The quarrel with Lanyon was incurable. “I do not blame our old friend,” Jekyll wrote, “but I share his view that we must never meet. I mean from henceforth to lead a life of extreme seclusion; you must not be surprised, nor must you doubt my friendship, if my door is often shut even to you. You must suffer me to go my own dark way. I have brought on myself a punishment and a danger that I cannot name. If I am the chief of sinners, I am the chief of sufferers also. I could not think that this earth contained a place for sufferings and terrors so unmanning; and you can do but one thing, Utterson, to lighten this destiny, and that is to respect my silence.” Utterson was amazed; the dark influence of Hyde had been withdrawn, the doctor had returned to his old tasks and amities; a week ago, the prospect had smiled with every promise of a cheerful and an honoured age; and now in a moment, friendship, and peace of mind, and the whole tenor of his life were wrecked. So great and unprepared a change pointed to madness; but in view of Lanyon’s manner and words, there must lie for it some deeper ground. A week afterwards Dr. Lanyon took to his bed, and in something less than a fortnight he was dead. The night after the funeral, at which he had been sadly affected, Utterson locked the door of his business room, and sitting there by the light of a melancholy candle, drew out and set before him an envelope addressed by the hand and sealed with the seal of his dead friend. “PRIVATE: for the hands of G. J. Utterson ALONE, and in case of his predecease _to be destroyed unread_,” so it was emphatically superscribed; and the lawyer dreaded to behold the contents. “I have buried one friend to-day,” he thought: “what if this should cost me another?” And then he condemned the fear as a disloyalty, and broke the seal. Within there was another enclosure, likewise sealed, and marked upon the cover as “not to be opened till the death or disappearance of Dr. Henry Jekyll.” Utterson could not trust his eyes. Yes, it was disappearance; here again, as in the mad will which he had long ago restored to its author, here again were the idea of a disappearance and the name of Henry Jekyll bracketted. But in the will, that idea had sprung from the sinister suggestion of the man Hyde; it was set there with a purpose all too plain and horrible. Written by the hand of Lanyon, what should it mean? A great curiosity came on the trustee, to disregard the prohibition and dive at once to the bottom of these mysteries; but professional honour and faith to his dead friend were stringent obligations; and the packet slept in the inmost corner of his private safe. It is one thing to mortify curiosity, another to conquer it; and it may be doubted if, from that day forth, Utterson desired the society of his surviving friend with the same eagerness. He thought of him kindly; but his thoughts were disquieted and fearful. He went to call indeed; but he was perhaps relieved to be denied admittance; perhaps, in his heart, he preferred to speak with Poole upon the doorstep and surrounded by the air and sounds of the open city, rather than to be admitted into that house of voluntary bondage, and to sit and speak with its inscrutable recluse. Poole had, indeed, no very pleasant news to communicate. The doctor, it appeared, now more than ever confined himself to the cabinet over the laboratory, where he would sometimes even sleep; he was out of spirits, he had grown very silent, he did not read; it seemed as if he had something on his mind. Utterson became so used to the unvarying character of these reports, that he fell off little by little in the frequency of his visits. INCIDENT AT THE WINDOW It chanced on Sunday, when Mr. Utterson was on his usual walk with Mr. Enfield, that their way lay once again through the by-street; and that when they came in front of the door, both stopped to gaze on it. “Well,” said Enfield, “that story’s at an end at least. We shall never see more of Mr. Hyde.” “I hope not,” said Utterson. “Did I ever tell you that I once saw him, and shared your feeling of repulsion?” “It was impossible to do the one without the other,” returned Enfield. “And by the way, what an ass you must have thought me, not to know that this was a back way to Dr. Jekyll’s! It was partly your own fault that I found it out, even when I did.” “So you found it out, did you?” said Utterson. “But if that be so, we may step into the court and take a look at the windows. To tell you the truth, I am uneasy about poor Jekyll; and even outside, I feel as if the presence of a friend might do him good.” The court was very cool and a little damp, and full of premature twilight, although the sky, high up overhead, was still bright with sunset. The middle one of the three windows was half-way open; and sitting close beside it, taking the air with an infinite sadness of mien, like some disconsolate prisoner, Utterson saw Dr. Jekyll. “What! Jekyll!” he cried. “I trust you are better.” “I am very low, Utterson,” replied the doctor drearily, “very low. It will not last long, thank God.” “You stay too much indoors,” said the lawyer. “You should be out, whipping up the circulation like Mr. Enfield and me. (This is my cousin—Mr. Enfield—Dr. Jekyll.) Come now; get your hat and take a quick turn with us.” “You are very good,” sighed the other. “I should like to very much; but no, no, no, it is quite impossible; I dare not. But indeed, Utterson, I am very glad to see you; this is really a great pleasure; I would ask you and Mr. Enfield up, but the place is really not fit.” “Why, then,” said the lawyer, good-naturedly, “the best thing we can do is to stay down here and speak with you from where we are.” “That is just what I was about to venture to propose,” returned the doctor with a smile. But the words were hardly uttered, before the smile was struck out of his face and succeeded by an expression of such abject terror and despair, as froze the very blood of the two gentlemen below. They saw it but for a glimpse for the window was instantly thrust down; but that glimpse had been sufficient, and they turned and left the court without a word. In silence, too, they traversed the by-street; and it was not until they had come into a neighbouring thoroughfare, where even upon a Sunday there were still some stirrings of life, that Mr. Utterson at last turned and looked at his companion. They were both pale; and there was an answering horror in their eyes. “God forgive us, God forgive us,” said Mr. Utterson. But Mr. Enfield only nodded his head very seriously, and walked on once more in silence. THE LAST NIGHT Mr. Utterson was sitting by his fireside one evening after dinner, when he was surprised to receive a visit from Poole. “Bless me, Poole, what brings you here?” he cried; and then taking a second look at him, “What ails you?” he added; “is the doctor ill?” “Mr. Utterson,” said the man, “there is something wrong.” “Take a seat, and here is a glass of wine for you,” said the lawyer. “Now, take your time, and tell me plainly what you want.” “You know the doctor’s ways, sir,” replied Poole, “and how he shuts himself up. Well, he’s shut up again in the cabinet; and I don’t like it, sir—I wish I may die if I like it. Mr. Utterson, sir, I’m afraid.” “Now, my good man,” said the lawyer, “be explicit. What are you afraid of?” “I’ve been afraid for about a week,” returned Poole, doggedly disregarding the question, “and I can bear it no more.” The man’s appearance amply bore out his words; his manner was altered for the worse; and except for the moment when he had first announced his terror, he had not once looked the lawyer in the face. Even now, he sat with the glass of wine untasted on his knee, and his eyes directed to a corner of the floor. “I can bear it no more,” he repeated. “Come,” said the lawyer, “I see you have some good reason, Poole; I see there is something seriously amiss. Try to tell me what it is.” “I think there’s been foul play,” said Poole, hoarsely. “Foul play!” cried the lawyer, a good deal frightened and rather inclined to be irritated in consequence. “What foul play! What does the man mean?” “I daren’t say, sir,” was the answer; “but will you come along with me and see for yourself?” Mr. Utterson’s only answer was to rise and get his hat and greatcoat; but he observed with wonder the greatness of the relief that appeared upon the butler’s face, and perhaps with no less, that the wine was still untasted when he set it down to follow. It was a wild, cold, seasonable night of March, with a pale moon, lying on her back as though the wind had tilted her, and flying wrack of the most diaphanous and lawny texture. The wind made talking difficult, and flecked the blood into the face. It seemed to have swept the streets unusually bare of passengers, besides; for Mr. Utterson thought he had never seen that part of London so deserted. He could have wished it otherwise; never in his life had he been conscious of so sharp a wish to see and touch his fellow-creatures; for struggle as he might, there was borne in upon his mind a crushing anticipation of calamity. The square, when they got there, was full of wind and dust, and the thin trees in the garden were lashing themselves along the railing. Poole, who had kept all the way a pace or two ahead, now pulled up in the middle of the pavement, and in spite of the biting weather, took off his hat and mopped his brow with a red pocket-handkerchief. But for all the hurry of his coming, these were not the dews of exertion that he wiped away, but the moisture of some strangling anguish; for his face was white and his voice, when he spoke, harsh and broken. “Well, sir,” he said, “here we are, and God grant there be nothing wrong.” “Amen, Poole,” said the lawyer. Thereupon the servant knocked in a very guarded manner; the door was opened on the chain; and a voice asked from within, “Is that you, Poole?” “It’s all right,” said Poole. “Open the door.” The hall, when they entered it, was brightly lighted up; the fire was built high; and about the hearth the whole of the servants, men and women, stood huddled together like a flock of sheep. At the sight of Mr. Utterson, the housemaid broke into hysterical whimpering; and the cook, crying out “Bless God! it’s Mr. Utterson,” ran forward as if to take him in her arms. “What, what? Are you all here?” said the lawyer peevishly. “Very irregular, very unseemly; your master would be far from pleased.” “They’re all afraid,” said Poole. Blank silence followed, no one protesting; only the maid lifted her voice and now wept loudly. “Hold your tongue!” Poole said to her, with a ferocity of accent that testified to his own jangled nerves; and indeed, when the girl had so suddenly raised the note of her lamentation, they had all started and turned towards the inner door with faces of dreadful expectation. “And now,” continued the butler, addressing the knife-boy, “reach me a candle, and we’ll get this through hands at once.” And then he begged Mr. Utterson to follow him, and led the way to the back garden. “Now, sir,” said he, “you come as gently as you can. I want you to hear, and I don’t want you to be heard. And see here, sir, if by any chance he was to ask you in, don’t go.” Mr. Utterson’s nerves, at this unlooked-for termination, gave a jerk that nearly threw him from his balance; but he recollected his courage and followed the butler into the laboratory building through the surgical theatre, with its lumber of crates and bottles, to the foot of the stair. Here Poole motioned him to stand on one side and listen; while he himself, setting down the candle and making a great and obvious call on his resolution, mounted the steps and knocked with a somewhat uncertain hand on the red baize of the cabinet door. “Mr. Utterson, sir, asking to see you,” he called; and even as he did so, once more violently signed to the lawyer to give ear. A voice answered from within: “Tell him I cannot see anyone,” it said complainingly. “Thank you, sir,” said Poole, with a note of something like triumph in his voice; and taking up his candle, he led Mr. Utterson back across the yard and into the great kitchen, where the fire was out and the beetles were leaping on the floor. “Sir,” he said, looking Mr. Utterson in the eyes, “Was that my master’s voice?” “It seems much changed,” replied the lawyer, very pale, but giving look for look. “Changed? Well, yes, I think so,” said the butler. “Have I been twenty years in this man’s house, to be deceived about his voice? No, sir; master’s made away with; he was made away with eight days ago, when we heard him cry out upon the name of God; and _who’s_ in there instead of him, and _why_ it stays there, is a thing that cries to Heaven, Mr. Utterson!” “This is a very strange tale, Poole; this is rather a wild tale my man,” said Mr. Utterson, biting his finger. “Suppose it were as you suppose, supposing Dr. Jekyll to have been—well, murdered, what could induce the murderer to stay? That won’t hold water; it doesn’t commend itself to reason.” “Well, Mr. Utterson, you are a hard man to satisfy, but I’ll do it yet,” said Poole. “All this last week (you must know) him, or it, whatever it is that lives in that cabinet, has been crying night and day for some sort of medicine and cannot get it to his mind. It was sometimes his way—the master’s, that is—to write his orders on a sheet of paper and throw it on the stair. We’ve had nothing else this week back; nothing but papers, and a closed door, and the very meals left there to be smuggled in when nobody was looking. Well, sir, every day, ay, and twice and thrice in the same day, there have been orders and complaints, and I have been sent flying to all the wholesale chemists in town. Every time I brought the stuff back, there would be another paper telling me to return it, because it was not pure, and another order to a different firm. This drug is wanted bitter bad, sir, whatever for.” “Have you any of these papers?” asked Mr. Utterson. Poole felt in his pocket and handed out a crumpled note, which the lawyer, bending nearer to the candle, carefully examined. Its contents ran thus: “Dr. Jekyll presents his compliments to Messrs. Maw. He assures them that their last sample is impure and quite useless for his present purpose. In the year 18—, Dr. J. purchased a somewhat large quantity from Messrs. M. He now begs them to search with most sedulous care, and should any of the same quality be left, forward it to him at once. Expense is no consideration. The importance of this to Dr. J. can hardly be exaggerated.” So far the letter had run composedly enough, but here with a sudden splutter of the pen, the writer’s emotion had broken loose. “For God’s sake,” he added, “find me some of the old.” “This is a strange note,” said Mr. Utterson; and then sharply, “How do you come to have it open?” “The man at Maw’s was main angry, sir, and he threw it back to me like so much dirt,” returned Poole. “This is unquestionably the doctor’s hand, do you know?” resumed the lawyer. “I thought it looked like it,” said the servant rather sulkily; and then, with another voice, “But what matters hand of write?” he said. “I’ve seen him!” “Seen him?” repeated Mr. Utterson. “Well?” “That’s it!” said Poole. “It was this way. I came suddenly into the theatre from the garden. It seems he had slipped out to look for this drug or whatever it is; for the cabinet door was open, and there he was at the far end of the room digging among the crates. He looked up when I came in, gave a kind of cry, and whipped upstairs into the cabinet. It was but for one minute that I saw him, but the hair stood upon my head like quills. Sir, if that was my master, why had he a mask upon his face? If it was my master, why did he cry out like a rat, and run from me? I have served him long enough. And then...” The man paused and passed his hand over his face. “These are all very strange circumstances,” said Mr. Utterson, “but I think I begin to see daylight. Your master, Poole, is plainly seized with one of those maladies that both torture and deform the sufferer; hence, for aught I know, the alteration of his voice; hence the mask and the avoidance of his friends; hence his eagerness to find this drug, by means of which the poor soul retains some hope of ultimate recovery—God grant that he be not deceived! There is my explanation; it is sad enough, Poole, ay, and appalling to consider; but it is plain and natural, hangs well together, and delivers us from all exorbitant alarms.” “Sir,” said the butler, turning to a sort of mottled pallor, “that thing was not my master, and there’s the truth. My master”—here he looked round him and began to whisper—“is a tall, fine build of a man, and this was more of a dwarf.” Utterson attempted to protest. “O, sir,” cried Poole, “do you think I do not know my master after twenty years? Do you think I do not know where his head comes to in the cabinet door, where I saw him every morning of my life? No, sir, that thing in the mask was never Dr. Jekyll—God knows what it was, but it was never Dr. Jekyll; and it is the belief of my heart that there was murder done.” “Poole,” replied the lawyer, “if you say that, it will become my duty to make certain. Much as I desire to spare your master’s feelings, much as I am puzzled by this note which seems to prove him to be still alive, I shall consider it my duty to break in that door.” “Ah, Mr. Utterson, that’s talking!” cried the butler. “And now comes the second question,” resumed Utterson: “Who is going to do it?” “Why, you and me, sir,” was the undaunted reply. “That’s very well said,” returned the lawyer; “and whatever comes of it, I shall make it my business to see you are no loser.” “There is an axe in the theatre,” continued Poole; “and you might take the kitchen poker for yourself.” The lawyer took that rude but weighty instrument into his hand, and balanced it. “Do you know, Poole,” he said, looking up, “that you and I are about to place ourselves in a position of some peril?” “You may say so, sir, indeed,” returned the butler. “It is well, then that we should be frank,” said the other. “We both think more than we have said; let us make a clean breast. This masked figure that you saw, did you recognise it?” “Well, sir, it went so quick, and the creature was so doubled up, that I could hardly swear to that,” was the answer. “But if you mean, was it Mr. Hyde?—why, yes, I think it was! You see, it was much of the same bigness; and it had the same quick, light way with it; and then who else could have got in by the laboratory door? You have not forgot, sir, that at the time of the murder he had still the key with him? But that’s not all. I don’t know, Mr. Utterson, if you ever met this Mr. Hyde?” “Yes,” said the lawyer, “I once spoke with him.” “Then you must know as well as the rest of us that there was something queer about that gentleman—something that gave a man a turn—I don’t know rightly how to say it, sir, beyond this: that you felt in your marrow kind of cold and thin.” “I own I felt something of what you describe,” said Mr. Utterson. “Quite so, sir,” returned Poole. “Well, when that masked thing like a monkey jumped from among the chemicals and whipped into the cabinet, it went down my spine like ice. O, I know it’s not evidence, Mr. Utterson; I’m book-learned enough for that; but a man has his feelings, and I give you my bible-word it was Mr. Hyde!” “Ay, ay,” said the lawyer. “My fears incline to the same point. Evil, I fear, founded—evil was sure to come—of that connection. Ay truly, I believe you; I believe poor Harry is killed; and I believe his murderer (for what purpose, God alone can tell) is still lurking in his victim’s room. Well, let our name be vengeance. Call Bradshaw.” The footman came at the summons, very white and nervous. “Pull yourself together, Bradshaw,” said the lawyer. “This suspense, I know, is telling upon all of you; but it is now our intention to make an end of it. Poole, here, and I are going to force our way into the cabinet. If all is well, my shoulders are broad enough to bear the blame. Meanwhile, lest anything should really be amiss, or any malefactor seek to escape by the back, you and the boy must go round the corner with a pair of good sticks and take your post at the laboratory door. We give you ten minutes to get to your stations.” As Bradshaw left, the lawyer looked at his watch. “And now, Poole, let us get to ours,” he said; and taking the poker under his arm, led the way into the yard. The scud had banked over the moon, and it was now quite dark. The wind, which only broke in puffs and draughts into that deep well of building, tossed the light of the candle to and fro about their steps, until they came into the shelter of the theatre, where they sat down silently to wait. London hummed solemnly all around; but nearer at hand, the stillness was only broken by the sounds of a footfall moving to and fro along the cabinet floor. “So it will walk all day, sir,” whispered Poole; “ay, and the better part of the night. Only when a new sample comes from the chemist, there’s a bit of a break. Ah, it’s an ill conscience that’s such an enemy to rest! Ah, sir, there’s blood foully shed in every step of it! But hark again, a little closer—put your heart in your ears, Mr. Utterson, and tell me, is that the doctor’s foot?” The steps fell lightly and oddly, with a certain swing, for all they went so slowly; it was different indeed from the heavy creaking tread of Henry Jekyll. Utterson sighed. “Is there never anything else?” he asked. Poole nodded. “Once,” he said. “Once I heard it weeping!” “Weeping? how that?” said the lawyer, conscious of a sudden chill of horror. “Weeping like a woman or a lost soul,” said the butler. “I came away with that upon my heart, that I could have wept too.” But now the ten minutes drew to an end. Poole disinterred the axe from under a stack of packing straw; the candle was set upon the nearest table to light them to the attack; and they drew near with bated breath to where that patient foot was still going up and down, up and down, in the quiet of the night. “Jekyll,” cried Utterson, with a loud voice, “I demand to see you.” He paused a moment, but there came no reply. “I give you fair warning, our suspicions are aroused, and I must and shall see you,” he resumed; “if not by fair means, then by foul—if not of your consent, then by brute force!” “Utterson,” said the voice, “for God’s sake, have mercy!” “Ah, that’s not Jekyll’s voice—it’s Hyde’s!” cried Utterson. “Down with the door, Poole!” Poole swung the axe over his shoulder; the blow shook the building, and the red baize door leaped against the lock and hinges. A dismal screech, as of mere animal terror, rang from the cabinet. Up went the axe again, and again the panels crashed and the frame bounded; four times the blow fell; but the wood was tough and the fittings were of excellent workmanship; and it was not until the fifth, that the lock burst and the wreck of the door fell inwards on the carpet. The besiegers, appalled by their own riot and the stillness that had succeeded, stood back a little and peered in. There lay the cabinet before their eyes in the quiet lamplight, a good fire glowing and chattering on the hearth, the kettle singing its thin strain, a drawer or two open, papers neatly set forth on the business table, and nearer the fire, the things laid out for tea; the quietest room, you would have said, and, but for the glazed presses full of chemicals, the most commonplace that night in London. Right in the middle there lay the body of a man sorely contorted and still twitching. They drew near on tiptoe, turned it on its back and beheld the face of Edward Hyde. He was dressed in clothes far too large for him, clothes of the doctor’s bigness; the cords of his face still moved with a semblance of life, but life was quite gone; and by the crushed phial in the hand and the strong smell of kernels that hung upon the air, Utterson knew that he was looking on the body of a self-destroyer. “We have come too late,” he said sternly, “whether to save or punish. Hyde is gone to his account; and it only remains for us to find the body of your master.” The far greater proportion of the building was occupied by the theatre, which filled almost the whole ground storey and was lighted from above, and by the cabinet, which formed an upper storey at one end and looked upon the court. A corridor joined the theatre to the door on the by-street; and with this the cabinet communicated separately by a second flight of stairs. There were besides a few dark closets and a spacious cellar. All these they now thoroughly examined. Each closet needed but a glance, for all were empty, and all, by the dust that fell from their doors, had stood long unopened. The cellar, indeed, was filled with crazy lumber, mostly dating from the times of the surgeon who was Jekyll’s predecessor; but even as they opened the door they were advertised of the uselessness of further search, by the fall of a perfect mat of cobweb which had for years sealed up the entrance. Nowhere was there any trace of Henry Jekyll, dead or alive. Poole stamped on the flags of the corridor. “He must be buried here,” he said, hearkening to the sound. “Or he may have fled,” said Utterson, and he turned to examine the door in the by-street. It was locked; and lying near by on the flags, they found the key, already stained with rust. “This does not look like use,” observed the lawyer. “Use!” echoed Poole. “Do you not see, sir, it is broken? much as if a man had stamped on it.” “Ay,” continued Utterson, “and the fractures, too, are rusty.” The two men looked at each other with a scare. “This is beyond me, Poole,” said the lawyer. “Let us go back to the cabinet.” They mounted the stair in silence, and still with an occasional awestruck glance at the dead body, proceeded more thoroughly to examine the contents of the cabinet. At one table, there were traces of chemical work, various measured heaps of some white salt being laid on glass saucers, as though for an experiment in which the unhappy man had been prevented. “That is the same drug that I was always bringing him,” said Poole; and even as he spoke, the kettle with a startling noise boiled over. This brought them to the fireside, where the easy-chair was drawn cosily up, and the tea things stood ready to the sitter’s elbow, the very sugar in the cup. There were several books on a shelf; one lay beside the tea things open, and Utterson was amazed to find it a copy of a pious work, for which Jekyll had several times expressed a great esteem, annotated, in his own hand with startling blasphemies. Next, in the course of their review of the chamber, the searchers came to the cheval-glass, into whose depths they looked with an involuntary horror. But it was so turned as to show them nothing but the rosy glow playing on the roof, the fire sparkling in a hundred repetitions along the glazed front of the presses, and their own pale and fearful countenances stooping to look in. “This glass has seen some strange things, sir,” whispered Poole. “And surely none stranger than itself,” echoed the lawyer in the same tones. “For what did Jekyll”—he caught himself up at the word with a start, and then conquering the weakness—“what could Jekyll want with it?” he said. “You may say that!” said Poole. Next they turned to the business table. On the desk, among the neat array of papers, a large envelope was uppermost, and bore, in the doctor’s hand, the name of Mr. Utterson. The lawyer unsealed it, and several enclosures fell to the floor. The first was a will, drawn in the same eccentric terms as the one which he had returned six months before, to serve as a testament in case of death and as a deed of gift in case of disappearance; but in place of the name of Edward Hyde, the lawyer, with indescribable amazement read the name of Gabriel John Utterson. He looked at Poole, and then back at the paper, and last of all at the dead malefactor stretched upon the carpet. “My head goes round,” he said. “He has been all these days in possession; he had no cause to like me; he must have raged to see himself displaced; and he has not destroyed this document.” He caught up the next paper; it was a brief note in the doctor’s hand and dated at the top. “O Poole!” the lawyer cried, “he was alive and here this day. He cannot have been disposed of in so short a space; he must be still alive, he must have fled! And then, why fled? and how? and in that case, can we venture to declare this suicide? O, we must be careful. I foresee that we may yet involve your master in some dire catastrophe.” “Why don’t you read it, sir?” asked Poole. “Because I fear,” replied the lawyer solemnly. “God grant I have no cause for it!” And with that he brought the paper to his eyes and read as follows: “My dear Utterson,—When this shall fall into your hands, I shall have disappeared, under what circumstances I have not the penetration to foresee, but my instinct and all the circumstances of my nameless situation tell me that the end is sure and must be early. Go then, and first read the narrative which Lanyon warned me he was to place in your hands; and if you care to hear more, turn to the confession of “Your unworthy and unhappy friend, “HENRY JEKYLL.” “There was a third enclosure?” asked Utterson. “Here, sir,” said Poole, and gave into his hands a considerable packet sealed in several places. The lawyer put it in his pocket. “I would say nothing of this paper. If your master has fled or is dead, we may at least save his credit. It is now ten; I must go home and read these documents in quiet; but I shall be back before midnight, when we shall send for the police.” They went out, locking the door of the theatre behind them; and Utterson, once more leaving the servants gathered about the fire in the hall, trudged back to his office to read the two narratives in which this mystery was now to be explained. DR. LANYON’S NARRATIVE On the ninth of January, now four days ago, I received by the evening delivery a registered envelope, addressed in the hand of my colleague and old school companion, Henry Jekyll. I was a good deal surprised by this; for we were by no means in the habit of correspondence; I had seen the man, dined with him, indeed, the night before; and I could imagine nothing in our intercourse that should justify formality of registration. The contents increased my wonder; for this is how the letter ran: “10_th December_, 18—. “Dear Lanyon,—You are one of my oldest friends; and although we may have differed at times on scientific questions, I cannot remember, at least on my side, any break in our affection. There was never a day when, if you had said to me, ‘Jekyll, my life, my honour, my reason, depend upon you,’ I would not have sacrificed my left hand to help you. Lanyon, my life, my honour, my reason, are all at your mercy; if you fail me to-night, I am lost. You might suppose, after this preface, that I am going to ask you for something dishonourable to grant. Judge for yourself. “I want you to postpone all other engagements for to-night—ay, even if you were summoned to the bedside of an emperor; to take a cab, unless your carriage should be actually at the door; and with this letter in your hand for consultation, to drive straight to my house. Poole, my butler, has his orders; you will find him waiting your arrival with a locksmith. The door of my cabinet is then to be forced; and you are to go in alone; to open the glazed press (letter E) on the left hand, breaking the lock if it be shut; and to draw out, _with all its contents as they stand_, the fourth drawer from the top or (which is the same thing) the third from the bottom. In my extreme distress of mind, I have a morbid fear of misdirecting you; but even if I am in error, you may know the right drawer by its contents: some powders, a phial and a paper book. This drawer I beg of you to carry back with you to Cavendish Square exactly as it stands. “That is the first part of the service: now for the second. You should be back, if you set out at once on the receipt of this, long before midnight; but I will leave you that amount of margin, not only in the fear of one of those obstacles that can neither be prevented nor foreseen, but because an hour when your servants are in bed is to be preferred for what will then remain to do. At midnight, then, I have to ask you to be alone in your consulting room, to admit with your own hand into the house a man who will present himself in my name, and to place in his hands the drawer that you will have brought with you from my cabinet. Then you will have played your part and earned my gratitude completely. Five minutes afterwards, if you insist upon an explanation, you will have understood that these arrangements are of capital importance; and that by the neglect of one of them, fantastic as they must appear, you might have charged your conscience with my death or the shipwreck of my reason. “Confident as I am that you will not trifle with this appeal, my heart sinks and my hand trembles at the bare thought of such a possibility. Think of me at this hour, in a strange place, labouring under a blackness of distress that no fancy can exaggerate, and yet well aware that, if you will but punctually serve me, my troubles will roll away like a story that is told. Serve me, my dear Lanyon and save “Your friend, “H.J. “P.S.—I had already sealed this up when a fresh terror struck upon my soul. It is possible that the post-office may fail me, and this letter not come into your hands until to-morrow morning. In that case, dear Lanyon, do my errand when it shall be most convenient for you in the course of the day; and once more expect my messenger at midnight. It may then already be too late; and if that night passes without event, you will know that you have seen the last of Henry Jekyll.” Upon the reading of this letter, I made sure my colleague was insane; but till that was proved beyond the possibility of doubt, I felt bound to do as he requested. The less I understood of this farrago, the less I was in a position to judge of its importance; and an appeal so worded could not be set aside without a grave responsibility. I rose accordingly from table, got into a hansom, and drove straight to Jekyll’s house. The butler was awaiting my arrival; he had received by the same post as mine a registered letter of instruction, and had sent at once for a locksmith and a carpenter. The tradesmen came while we were yet speaking; and we moved in a body to old Dr. Denman’s surgical theatre, from which (as you are doubtless aware) Jekyll’s private cabinet is most conveniently entered. The door was very strong, the lock excellent; the carpenter avowed he would have great trouble and have to do much damage, if force were to be used; and the locksmith was near despair. But this last was a handy fellow, and after two hour’s work, the door stood open. The press marked E was unlocked; and I took out the drawer, had it filled up with straw and tied in a sheet, and returned with it to Cavendish Square. Here I proceeded to examine its contents. The powders were neatly enough made up, but not with the nicety of the dispensing chemist; so that it was plain they were of Jekyll’s private manufacture; and when I opened one of the wrappers I found what seemed to me a simple crystalline salt of a white colour. The phial, to which I next turned my attention, might have been about half full of a blood-red liquor, which was highly pungent to the sense of smell and seemed to me to contain phosphorus and some volatile ether. At the other ingredients I could make no guess. The book was an ordinary version book and contained little but a series of dates. These covered a period of many years, but I observed that the entries ceased nearly a year ago and quite abruptly. Here and there a brief remark was appended to a date, usually no more than a single word: “double” occurring perhaps six times in a total of several hundred entries; and once very early in the list and followed by several marks of exclamation, “total failure!!!” All this, though it whetted my curiosity, told me little that was definite. Here were a phial of some salt, and the record of a series of experiments that had led (like too many of Jekyll’s investigations) to no end of practical usefulness. How could the presence of these articles in my house affect either the honour, the sanity, or the life of my flighty colleague? If his messenger could go to one place, why could he not go to another? And even granting some impediment, why was this gentleman to be received by me in secret? The more I reflected the more convinced I grew that I was dealing with a case of cerebral disease; and though I dismissed my servants to bed, I loaded an old revolver, that I might be found in some posture of self-defence. Twelve o’clock had scarce rung out over London, ere the knocker sounded very gently on the door. I went myself at the summons, and found a small man crouching against the pillars of the portico. “Are you come from Dr. Jekyll?” I asked. He told me “yes” by a constrained gesture; and when I had bidden him enter, he did not obey me without a searching backward glance into the darkness of the square. There was a policeman not far off, advancing with his bull’s eye open; and at the sight, I thought my visitor started and made greater haste. These particulars struck me, I confess, disagreeably; and as I followed him into the bright light of the consulting room, I kept my hand ready on my weapon. Here, at last, I had a chance of clearly seeing him. I had never set eyes on him before, so much was certain. He was small, as I have said; I was struck besides with the shocking expression of his face, with his remarkable combination of great muscular activity and great apparent debility of constitution, and—last but not least—with the odd, subjective disturbance caused by his neighbourhood. This bore some resemblance to incipient rigour, and was accompanied by a marked sinking of the pulse. At the time, I set it down to some idiosyncratic, personal distaste, and merely wondered at the acuteness of the symptoms; but I have since had reason to believe the cause to lie much deeper in the nature of man, and to turn on some nobler hinge than the principle of hatred. This person (who had thus, from the first moment of his entrance, struck in me what I can only describe as a disgustful curiosity) was dressed in a fashion that would have made an ordinary person laughable; his clothes, that is to say, although they were of rich and sober fabric, were enormously too large for him in every measurement—the trousers hanging on his legs and rolled up to keep them from the ground, the waist of the coat below his haunches, and the collar sprawling wide upon his shoulders. Strange to relate, this ludicrous accoutrement was far from moving me to laughter. Rather, as there was something abnormal and misbegotten in the very essence of the creature that now faced me—something seizing, surprising and revolting—this fresh disparity seemed but to fit in with and to reinforce it; so that to my interest in the man’s nature and character, there was added a curiosity as to his origin, his life, his fortune and status in the world. These observations, though they have taken so great a space to be set down in, were yet the work of a few seconds. My visitor was, indeed, on fire with sombre excitement. “Have you got it?” he cried. “Have you got it?” And so lively was his impatience that he even laid his hand upon my arm and sought to shake me. I put him back, conscious at his touch of a certain icy pang along my blood. “Come, sir,” said I. “You forget that I have not yet the pleasure of your acquaintance. Be seated, if you please.” And I showed him an example, and sat down myself in my customary seat and with as fair an imitation of my ordinary manner to a patient, as the lateness of the hour, the nature of my preoccupations, and the horror I had of my visitor, would suffer me to muster. “I beg your pardon, Dr. Lanyon,” he replied civilly enough. “What you say is very well founded; and my impatience has shown its heels to my politeness. I come here at the instance of your colleague, Dr. Henry Jekyll, on a piece of business of some moment; and I understood...” He paused and put his hand to his throat, and I could see, in spite of his collected manner, that he was wrestling against the approaches of the hysteria—“I understood, a drawer...” But here I took pity on my visitor’s suspense, and some perhaps on my own growing curiosity. “There it is, sir,” said I, pointing to the drawer, where it lay on the floor behind a table and still covered with the sheet. He sprang to it, and then paused, and laid his hand upon his heart; I could hear his teeth grate with the convulsive action of his jaws; and his face was so ghastly to see that I grew alarmed both for his life and reason. “Compose yourself,” said I. He turned a dreadful smile to me, and as if with the decision of despair, plucked away the sheet. At sight of the contents, he uttered one loud sob of such immense relief that I sat petrified. And the next moment, in a voice that was already fairly well under control, “Have you a graduated glass?” he asked. I rose from my place with something of an effort and gave him what he asked. He thanked me with a smiling nod, measured out a few minims of the red tincture and added one of the powders. The mixture, which was at first of a reddish hue, began, in proportion as the crystals melted, to brighten in colour, to effervesce audibly, and to throw off small fumes of vapour. Suddenly and at the same moment, the ebullition ceased and the compound changed to a dark purple, which faded again more slowly to a watery green. My visitor, who had watched these metamorphoses with a keen eye, smiled, set down the glass upon the table, and then turned and looked upon me with an air of scrutiny. “And now,” said he, “to settle what remains. Will you be wise? will you be guided? will you suffer me to take this glass in my hand and to go forth from your house without further parley? or has the greed of curiosity too much command of you? Think before you answer, for it shall be done as you decide. As you decide, you shall be left as you were before, and neither richer nor wiser, unless the sense of service rendered to a man in mortal distress may be counted as a kind of riches of the soul. Or, if you shall so prefer to choose, a new province of knowledge and new avenues to fame and power shall be laid open to you, here, in this room, upon the instant; and your sight shall be blasted by a prodigy to stagger the unbelief of Satan.” “Sir,” said I, affecting a coolness that I was far from truly possessing, “you speak enigmas, and you will perhaps not wonder that I hear you with no very strong impression of belief. But I have gone too far in the way of inexplicable services to pause before I see the end.” “It is well,” replied my visitor. “Lanyon, you remember your vows: what follows is under the seal of our profession. And now, you who have so long been bound to the most narrow and material views, you who have denied the virtue of transcendental medicine, you who have derided your superiors—behold!” He put the glass to his lips and drank at one gulp. A cry followed; he reeled, staggered, clutched at the table and held on, staring with injected eyes, gasping with open mouth; and as I looked there came, I thought, a change—he seemed to swell—his face became suddenly black and the features seemed to melt and alter—and the next moment, I had sprung to my feet and leaped back against the wall, my arms raised to shield me from that prodigy, my mind submerged in terror. “O God!” I screamed, and “O God!” again and again; for there before my eyes—pale and shaken, and half fainting, and groping before him with his hands, like a man restored from death—there stood Henry Jekyll! What he told me in the next hour, I cannot bring my mind to set on paper. I saw what I saw, I heard what I heard, and my soul sickened at it; and yet now when that sight has faded from my eyes, I ask myself if I believe it, and I cannot answer. My life is shaken to its roots; sleep has left me; the deadliest terror sits by me at all hours of the day and night; and I feel that my days are numbered, and that I must die; and yet I shall die incredulous. As for the moral turpitude that man unveiled to me, even with tears of penitence, I cannot, even in memory, dwell on it without a start of horror. I will say but one thing, Utterson, and that (if you can bring your mind to credit it) will be more than enough. The creature who crept into my house that night was, on Jekyll’s own confession, known by the name of Hyde and hunted for in every corner of the land as the murderer of Carew. HASTIE LANYON. HENRY JEKYLL’S FULL STATEMENT OF THE CASE I was born in the year 18— to a large fortune, endowed besides with excellent parts, inclined by nature to industry, fond of the respect of the wise and good among my fellowmen, and thus, as might have been supposed, with every guarantee of an honourable and distinguished future. And indeed the worst of my faults was a certain impatient gaiety of disposition, such as has made the happiness of many, but such as I found it hard to reconcile with my imperious desire to carry my head high, and wear a more than commonly grave countenance before the public. Hence it came about that I concealed my pleasures; and that when I reached years of reflection, and began to look round me and take stock of my progress and position in the world, I stood already committed to a profound duplicity of life. Many a man would have even blazoned such irregularities as I was guilty of; but from the high views that I had set before me, I regarded and hid them with an almost morbid sense of shame. It was thus rather the exacting nature of my aspirations than any particular degradation in my faults, that made me what I was, and, with even a deeper trench than in the majority of men, severed in me those provinces of good and ill which divide and compound man’s dual nature. In this case, I was driven to reflect deeply and inveterately on that hard law of life, which lies at the root of religion and is one of the most plentiful springs of distress. Though so profound a double-dealer, I was in no sense a hypocrite; both sides of me were in dead earnest; I was no more myself when I laid aside restraint and plunged in shame, than when I laboured, in the eye of day, at the furtherance of knowledge or the relief of sorrow and suffering. And it chanced that the direction of my scientific studies, which led wholly towards the mystic and the transcendental, reacted and shed a strong light on this consciousness of the perennial war among my members. With every day, and from both sides of my intelligence, the moral and the intellectual, I thus drew steadily nearer to that truth, by whose partial discovery I have been doomed to such a dreadful shipwreck: that man is not truly one, but truly two. I say two, because the state of my own knowledge does not pass beyond that point. Others will follow, others will outstrip me on the same lines; and I hazard the guess that man will be ultimately known for a mere polity of multifarious, incongruous and independent denizens. I, for my part, from the nature of my life, advanced infallibly in one direction and in one direction only. It was on the moral side, and in my own person, that I learned to recognise the thorough and primitive duality of man; I saw that, of the two natures that contended in the field of my consciousness, even if I could rightly be said to be either, it was only because I was radically both; and from an early date, even before the course of my scientific discoveries had begun to suggest the most naked possibility of such a miracle, I had learned to dwell with pleasure, as a beloved daydream, on the thought of the separation of these elements. If each, I told myself, could be housed in separate identities, life would be relieved of all that was unbearable; the unjust might go his way, delivered from the aspirations and remorse of his more upright twin; and the just could walk steadfastly and securely on his upward path, doing the good things in which he found his pleasure, and no longer exposed to disgrace and penitence by the hands of this extraneous evil. It was the curse of mankind that these incongruous faggots were thus bound together—that in the agonised womb of consciousness, these polar twins should be continuously struggling. How, then were they dissociated? I was so far in my reflections when, as I have said, a side light began to shine upon the subject from the laboratory table. I began to perceive more deeply than it has ever yet been stated, the trembling immateriality, the mistlike transience, of this seemingly so solid body in which we walk attired. Certain agents I found to have the power to shake and pluck back that fleshly vestment, even as a wind might toss the curtains of a pavilion. For two good reasons, I will not enter deeply into this scientific branch of my confession. First, because I have been made to learn that the doom and burthen of our life is bound for ever on man’s shoulders, and when the attempt is made to cast it off, it but returns upon us with more unfamiliar and more awful pressure. Second, because, as my narrative will make, alas! too evident, my discoveries were incomplete. Enough then, that I not only recognised my natural body from the mere aura and effulgence of certain of the powers that made up my spirit, but managed to compound a drug by which these powers should be dethroned from their supremacy, and a second form and countenance substituted, none the less natural to me because they were the expression, and bore the stamp of lower elements in my soul. I hesitated long before I put this theory to the test of practice. I knew well that I risked death; for any drug that so potently controlled and shook the very fortress of identity, might, by the least scruple of an overdose or at the least inopportunity in the moment of exhibition, utterly blot out that immaterial tabernacle which I looked to it to change. But the temptation of a discovery so singular and profound at last overcame the suggestions of alarm. I had long since prepared my tincture; I purchased at once, from a firm of wholesale chemists, a large quantity of a particular salt which I knew, from my experiments, to be the last ingredient required; and late one accursed night, I compounded the elements, watched them boil and smoke together in the glass, and when the ebullition had subsided, with a strong glow of courage, drank off the potion. The most racking pangs succeeded: a grinding in the bones, deadly nausea, and a horror of the spirit that cannot be exceeded at the hour of birth or death. Then these agonies began swiftly to subside, and I came to myself as if out of a great sickness. There was something strange in my sensations, something indescribably new and, from its very novelty, incredibly sweet. I felt younger, lighter, happier in body; within I was conscious of a heady recklessness, a current of disordered sensual images running like a millrace in my fancy, a solution of the bonds of obligation, an unknown but not an innocent freedom of the soul. I knew myself, at the first breath of this new life, to be more wicked, tenfold more wicked, sold a slave to my original evil; and the thought, in that moment, braced and delighted me like wine. I stretched out my hands, exulting in the freshness of these sensations; and in the act, I was suddenly aware that I had lost in stature. There was no mirror, at that date, in my room; that which stands beside me as I write, was brought there later on and for the very purpose of these transformations. The night however, was far gone into the morning—the morning, black as it was, was nearly ripe for the conception of the day—the inmates of my house were locked in the most rigorous hours of slumber; and I determined, flushed as I was with hope and triumph, to venture in my new shape as far as to my bedroom. I crossed the yard, wherein the constellations looked down upon me, I could have thought, with wonder, the first creature of that sort that their unsleeping vigilance had yet disclosed to them; I stole through the corridors, a stranger in my own house; and coming to my room, I saw for the first time the appearance of Edward Hyde. I must here speak by theory alone, saying not that which I know, but that which I suppose to be most probable. The evil side of my nature, to which I had now transferred the stamping efficacy, was less robust and less developed than the good which I had just deposed. Again, in the course of my life, which had been, after all, nine tenths a life of effort, virtue and control, it had been much less exercised and much less exhausted. And hence, as I think, it came about that Edward Hyde was so much smaller, slighter and younger than Henry Jekyll. Even as good shone upon the countenance of the one, evil was written broadly and plainly on the face of the other. Evil besides (which I must still believe to be the lethal side of man) had left on that body an imprint of deformity and decay. And yet when I looked upon that ugly idol in the glass, I was conscious of no repugnance, rather of a leap of welcome. This, too, was myself. It seemed natural and human. In my eyes it bore a livelier image of the spirit, it seemed more express and single, than the imperfect and divided countenance I had been hitherto accustomed to call mine. And in so far I was doubtless right. I have observed that when I wore the semblance of Edward Hyde, none could come near to me at first without a visible misgiving of the flesh. This, as I take it, was because all human beings, as we meet them, are commingled out of good and evil: and Edward Hyde, alone in the ranks of mankind, was pure evil. I lingered but a moment at the mirror: the second and conclusive experiment had yet to be attempted; it yet remained to be seen if I had lost my identity beyond redemption and must flee before daylight from a house that was no longer mine; and hurrying back to my cabinet, I once more prepared and drank the cup, once more suffered the pangs of dissolution, and came to myself once more with the character, the stature and the face of Henry Jekyll. That night I had come to the fatal cross-roads. Had I approached my discovery in a more noble spirit, had I risked the experiment while under the empire of generous or pious aspirations, all must have been otherwise, and from these agonies of death and birth, I had come forth an angel instead of a fiend. The drug had no discriminating action; it was neither diabolical nor divine; it but shook the doors of the prisonhouse of my disposition; and like the captives of Philippi, that which stood within ran forth. At that time my virtue slumbered; my evil, kept awake by ambition, was alert and swift to seize the occasion; and the thing that was projected was Edward Hyde. Hence, although I had now two characters as well as two appearances, one was wholly evil, and the other was still the old Henry Jekyll, that incongruous compound of whose reformation and improvement I had already learned to despair. The movement was thus wholly toward the worse. Even at that time, I had not conquered my aversions to the dryness of a life of study. I would still be merrily disposed at times; and as my pleasures were (to say the least) undignified, and I was not only well known and highly considered, but growing towards the elderly man, this incoherency of my life was daily growing more unwelcome. It was on this side that my new power tempted me until I fell in slavery. I had but to drink the cup, to doff at once the body of the noted professor, and to assume, like a thick cloak, that of Edward Hyde. I smiled at the notion; it seemed to me at the time to be humourous; and I made my preparations with the most studious care. I took and furnished that house in Soho, to which Hyde was tracked by the police; and engaged as a housekeeper a creature whom I knew well to be silent and unscrupulous. On the other side, I announced to my servants that a Mr. Hyde (whom I described) was to have full liberty and power about my house in the square; and to parry mishaps, I even called and made myself a familiar object, in my second character. I next drew up that will to which you so much objected; so that if anything befell me in the person of Dr. Jekyll, I could enter on that of Edward Hyde without pecuniary loss. And thus fortified, as I supposed, on every side, I began to profit by the strange immunities of my position. Men have before hired bravos to transact their crimes, while their own person and reputation sat under shelter. I was the first that ever did so for his pleasures. I was the first that could plod in the public eye with a load of genial respectability, and in a moment, like a schoolboy, strip off these lendings and spring headlong into the sea of liberty. But for me, in my impenetrable mantle, the safety was complete. Think of it—I did not even exist! Let me but escape into my laboratory door, give me but a second or two to mix and swallow the draught that I had always standing ready; and whatever he had done, Edward Hyde would pass away like the stain of breath upon a mirror; and there in his stead, quietly at home, trimming the midnight lamp in his study, a man who could afford to laugh at suspicion, would be Henry Jekyll. The pleasures which I made haste to seek in my disguise were, as I have said, undignified; I would scarce use a harder term. But in the hands of Edward Hyde, they soon began to turn toward the monstrous. When I would come back from these excursions, I was often plunged into a kind of wonder at my vicarious depravity. This familiar that I called out of my own soul, and sent forth alone to do his good pleasure, was a being inherently malign and villainous; his every act and thought centered on self; drinking pleasure with bestial avidity from any degree of torture to another; relentless like a man of stone. Henry Jekyll stood at times aghast before the acts of Edward Hyde; but the situation was apart from ordinary laws, and insidiously relaxed the grasp of conscience. It was Hyde, after all, and Hyde alone, that was guilty. Jekyll was no worse; he woke again to his good qualities seemingly unimpaired; he would even make haste, where it was possible, to undo the evil done by Hyde. And thus his conscience slumbered. Into the details of the infamy at which I thus connived (for even now I can scarce grant that I committed it) I have no design of entering; I mean but to point out the warnings and the successive steps with which my chastisement approached. I met with one accident which, as it brought on no consequence, I shall no more than mention. An act of cruelty to a child aroused against me the anger of a passer-by, whom I recognised the other day in the person of your kinsman; the doctor and the child’s family joined him; there were moments when I feared for my life; and at last, in order to pacify their too just resentment, Edward Hyde had to bring them to the door, and pay them in a cheque drawn in the name of Henry Jekyll. But this danger was easily eliminated from the future, by opening an account at another bank in the name of Edward Hyde himself; and when, by sloping my own hand backward, I had supplied my double with a signature, I thought I sat beyond the reach of fate. Some two months before the murder of Sir Danvers, I had been out for one of my adventures, had returned at a late hour, and woke the next day in bed with somewhat odd sensations. It was in vain I looked about me; in vain I saw the decent furniture and tall proportions of my room in the square; in vain that I recognised the pattern of the bed curtains and the design of the mahogany frame; something still kept insisting that I was not where I was, that I had not wakened where I seemed to be, but in the little room in Soho where I was accustomed to sleep in the body of Edward Hyde. I smiled to myself, and in my psychological way, began lazily to inquire into the elements of this illusion, occasionally, even as I did so, dropping back into a comfortable morning doze. I was still so engaged when, in one of my more wakeful moments, my eyes fell upon my hand. Now the hand of Henry Jekyll (as you have often remarked) was professional in shape and size; it was large, firm, white and comely. But the hand which I now saw, clearly enough, in the yellow light of a mid-London morning, lying half shut on the bedclothes, was lean, corded, knuckly, of a dusky pallor and thickly shaded with a swart growth of hair. It was the hand of Edward Hyde. I must have stared upon it for near half a minute, sunk as I was in the mere stupidity of wonder, before terror woke up in my breast as sudden and startling as the crash of cymbals; and bounding from my bed I rushed to the mirror. At the sight that met my eyes, my blood was changed into something exquisitely thin and icy. Yes, I had gone to bed Henry Jekyll, I had awakened Edward Hyde. How was this to be explained? I asked myself; and then, with another bound of terror—how was it to be remedied? It was well on in the morning; the servants were up; all my drugs were in the cabinet—a long journey down two pairs of stairs, through the back passage, across the open court and through the anatomical theatre, from where I was then standing horror-struck. It might indeed be possible to cover my face; but of what use was that, when I was unable to conceal the alteration in my stature? And then with an overpowering sweetness of relief, it came back upon my mind that the servants were already used to the coming and going of my second self. I had soon dressed, as well as I was able, in clothes of my own size: had soon passed through the house, where Bradshaw stared and drew back at seeing Mr. Hyde at such an hour and in such a strange array; and ten minutes later, Dr. Jekyll had returned to his own shape and was sitting down, with a darkened brow, to make a feint of breakfasting. Small indeed was my appetite. This inexplicable incident, this reversal of my previous experience, seemed, like the Babylonian finger on the wall, to be spelling out the letters of my judgment; and I began to reflect more seriously than ever before on the issues and possibilities of my double existence. That part of me which I had the power of projecting, had lately been much exercised and nourished; it had seemed to me of late as though the body of Edward Hyde had grown in stature, as though (when I wore that form) I were conscious of a more generous tide of blood; and I began to spy a danger that, if this were much prolonged, the balance of my nature might be permanently overthrown, the power of voluntary change be forfeited, and the character of Edward Hyde become irrevocably mine. The power of the drug had not been always equally displayed. Once, very early in my career, it had totally failed me; since then I had been obliged on more than one occasion to double, and once, with infinite risk of death, to treble the amount; and these rare uncertainties had cast hitherto the sole shadow on my contentment. Now, however, and in the light of that morning’s accident, I was led to remark that whereas, in the beginning, the difficulty had been to throw off the body of Jekyll, it had of late gradually but decidedly transferred itself to the other side. All things therefore seemed to point to this; that I was slowly losing hold of my original and better self, and becoming slowly incorporated with my second and worse. Between these two, I now felt I had to choose. My two natures had memory in common, but all other faculties were most unequally shared between them. Jekyll (who was composite) now with the most sensitive apprehensions, now with a greedy gusto, projected and shared in the pleasures and adventures of Hyde; but Hyde was indifferent to Jekyll, or but remembered him as the mountain bandit remembers the cavern in which he conceals himself from pursuit. Jekyll had more than a father’s interest; Hyde had more than a son’s indifference. To cast in my lot with Jekyll, was to die to those appetites which I had long secretly indulged and had of late begun to pamper. To cast it in with Hyde, was to die to a thousand interests and aspirations, and to become, at a blow and forever, despised and friendless. The bargain might appear unequal; but there was still another consideration in the scales; for while Jekyll would suffer smartingly in the fires of abstinence, Hyde would be not even conscious of all that he had lost. Strange as my circumstances were, the terms of this debate are as old and commonplace as man; much the same inducements and alarms cast the die for any tempted and trembling sinner; and it fell out with me, as it falls with so vast a majority of my fellows, that I chose the better part and was found wanting in the strength to keep to it. Yes, I preferred the elderly and discontented doctor, surrounded by friends and cherishing honest hopes; and bade a resolute farewell to the liberty, the comparative youth, the light step, leaping impulses and secret pleasures, that I had enjoyed in the disguise of Hyde. I made this choice perhaps with some unconscious reservation, for I neither gave up the house in Soho, nor destroyed the clothes of Edward Hyde, which still lay ready in my cabinet. For two months, however, I was true to my determination; for two months, I led a life of such severity as I had never before attained to, and enjoyed the compensations of an approving conscience. But time began at last to obliterate the freshness of my alarm; the praises of conscience began to grow into a thing of course; I began to be tortured with throes and longings, as of Hyde struggling after freedom; and at last, in an hour of moral weakness, I once again compounded and swallowed the transforming draught. I do not suppose that, when a drunkard reasons with himself upon his vice, he is once out of five hundred times affected by the dangers that he runs through his brutish, physical insensibility; neither had I, long as I had considered my position, made enough allowance for the complete moral insensibility and insensate readiness to evil, which were the leading characters of Edward Hyde. Yet it was by these that I was punished. My devil had been long caged, he came out roaring. I was conscious, even when I took the draught, of a more unbridled, a more furious propensity to ill. It must have been this, I suppose, that stirred in my soul that tempest of impatience with which I listened to the civilities of my unhappy victim; I declare, at least, before God, no man morally sane could have been guilty of that crime upon so pitiful a provocation; and that I struck in no more reasonable spirit than that in which a sick child may break a plaything. But I had voluntarily stripped myself of all those balancing instincts by which even the worst of us continues to walk with some degree of steadiness among temptations; and in my case, to be tempted, however slightly, was to fall. Instantly the spirit of hell awoke in me and raged. With a transport of glee, I mauled the unresisting body, tasting delight from every blow; and it was not till weariness had begun to succeed, that I was suddenly, in the top fit of my delirium, struck through the heart by a cold thrill of terror. A mist dispersed; I saw my life to be forfeit; and fled from the scene of these excesses, at once glorying and trembling, my lust of evil gratified and stimulated, my love of life screwed to the topmost peg. I ran to the house in Soho, and (to make assurance doubly sure) destroyed my papers; thence I set out through the lamplit streets, in the same divided ecstasy of mind, gloating on my crime, light-headedly devising others in the future, and yet still hastening and still hearkening in my wake for the steps of the avenger. Hyde had a song upon his lips as he compounded the draught, and as he drank it, pledged the dead man. The pangs of transformation had not done tearing him, before Henry Jekyll, with streaming tears of gratitude and remorse, had fallen upon his knees and lifted his clasped hands to God. The veil of self-indulgence was rent from head to foot. I saw my life as a whole: I followed it up from the days of childhood, when I had walked with my father’s hand, and through the self-denying toils of my professional life, to arrive again and again, with the same sense of unreality, at the damned horrors of the evening. I could have screamed aloud; I sought with tears and prayers to smother down the crowd of hideous images and sounds with which my memory swarmed against me; and still, between the petitions, the ugly face of my iniquity stared into my soul. As the acuteness of this remorse began to die away, it was succeeded by a sense of joy. The problem of my conduct was solved. Hyde was thenceforth impossible; whether I would or not, I was now confined to the better part of my existence; and O, how I rejoiced to think of it! with what willing humility I embraced anew the restrictions of natural life! with what sincere renunciation I locked the door by which I had so often gone and come, and ground the key under my heel! The next day, came the news that the murder had not been overlooked, that the guilt of Hyde was patent to the world, and that the victim was a man high in public estimation. It was not only a crime, it had been a tragic folly. I think I was glad to know it; I think I was glad to have my better impulses thus buttressed and guarded by the terrors of the scaffold. Jekyll was now my city of refuge; let but Hyde peep out an instant, and the hands of all men would be raised to take and slay him. I resolved in my future conduct to redeem the past; and I can say with honesty that my resolve was fruitful of some good. You know yourself how earnestly, in the last months of the last year, I laboured to relieve suffering; you know that much was done for others, and that the days passed quietly, almost happily for myself. Nor can I truly say that I wearied of this beneficent and innocent life; I think instead that I daily enjoyed it more completely; but I was still cursed with my duality of purpose; and as the first edge of my penitence wore off, the lower side of me, so long indulged, so recently chained down, began to growl for licence. Not that I dreamed of resuscitating Hyde; the bare idea of that would startle me to frenzy: no, it was in my own person that I was once more tempted to trifle with my conscience; and it was as an ordinary secret sinner that I at last fell before the assaults of temptation. There comes an end to all things; the most capacious measure is filled at last; and this brief condescension to my evil finally destroyed the balance of my soul. And yet I was not alarmed; the fall seemed natural, like a return to the old days before I had made my discovery. It was a fine, clear, January day, wet under foot where the frost had melted, but cloudless overhead; and the Regent’s Park was full of winter chirrupings and sweet with spring odours. I sat in the sun on a bench; the animal within me licking the chops of memory; the spiritual side a little drowsed, promising subsequent penitence, but not yet moved to begin. After all, I reflected, I was like my neighbours; and then I smiled, comparing myself with other men, comparing my active good-will with the lazy cruelty of their neglect. And at the very moment of that vainglorious thought, a qualm came over me, a horrid nausea and the most deadly shuddering. These passed away, and left me faint; and then as in its turn faintness subsided, I began to be aware of a change in the temper of my thoughts, a greater boldness, a contempt of danger, a solution of the bonds of obligation. I looked down; my clothes hung formlessly on my shrunken limbs; the hand that lay on my knee was corded and hairy. I was once more Edward Hyde. A moment before I had been safe of all men’s respect, wealthy, beloved—the cloth laying for me in the dining-room at home; and now I was the common quarry of mankind, hunted, houseless, a known murderer, thrall to the gallows. My reason wavered, but it did not fail me utterly. I have more than once observed that in my second character, my faculties seemed sharpened to a point and my spirits more tensely elastic; thus it came about that, where Jekyll perhaps might have succumbed, Hyde rose to the importance of the moment. My drugs were in one of the presses of my cabinet; how was I to reach them? That was the problem that (crushing my temples in my hands) I set myself to solve. The laboratory door I had closed. If I sought to enter by the house, my own servants would consign me to the gallows. I saw I must employ another hand, and thought of Lanyon. How was he to be reached? how persuaded? Supposing that I escaped capture in the streets, how was I to make my way into his presence? and how should I, an unknown and displeasing visitor, prevail on the famous physician to rifle the study of his colleague, Dr. Jekyll? Then I remembered that of my original character, one part remained to me: I could write my own hand; and once I had conceived that kindling spark, the way that I must follow became lighted up from end to end. Thereupon, I arranged my clothes as best I could, and summoning a passing hansom, drove to an hotel in Portland Street, the name of which I chanced to remember. At my appearance (which was indeed comical enough, however tragic a fate these garments covered) the driver could not conceal his mirth. I gnashed my teeth upon him with a gust of devilish fury; and the smile withered from his face—happily for him—yet more happily for myself, for in another instant I had certainly dragged him from his perch. At the inn, as I entered, I looked about me with so black a countenance as made the attendants tremble; not a look did they exchange in my presence; but obsequiously took my orders, led me to a private room, and brought me wherewithal to write. Hyde in danger of his life was a creature new to me; shaken with inordinate anger, strung to the pitch of murder, lusting to inflict pain. Yet the creature was astute; mastered his fury with a great effort of the will; composed his two important letters, one to Lanyon and one to Poole; and that he might receive actual evidence of their being posted, sent them out with directions that they should be registered. Thenceforward, he sat all day over the fire in the private room, gnawing his nails; there he dined, sitting alone with his fears, the waiter visibly quailing before his eye; and thence, when the night was fully come, he set forth in the corner of a closed cab, and was driven to and fro about the streets of the city. He, I say—I cannot say, I. That child of Hell had nothing human; nothing lived in him but fear and hatred. And when at last, thinking the driver had begun to grow suspicious, he discharged the cab and ventured on foot, attired in his misfitting clothes, an object marked out for observation, into the midst of the nocturnal passengers, these two base passions raged within him like a tempest. He walked fast, hunted by his fears, chattering to himself, skulking through the less frequented thoroughfares, counting the minutes that still divided him from midnight. Once a woman spoke to him, offering, I think, a box of lights. He smote her in the face, and she fled. When I came to myself at Lanyon’s, the horror of my old friend perhaps affected me somewhat: I do not know; it was at least but a drop in the sea to the abhorrence with which I looked back upon these hours. A change had come over me. It was no longer the fear of the gallows, it was the horror of being Hyde that racked me. I received Lanyon’s condemnation partly in a dream; it was partly in a dream that I came home to my own house and got into bed. I slept after the prostration of the day, with a stringent and profound slumber which not even the nightmares that wrung me could avail to break. I awoke in the morning shaken, weakened, but refreshed. I still hated and feared the thought of the brute that slept within me, and I had not of course forgotten the appalling dangers of the day before; but I was once more at home, in my own house and close to my drugs; and gratitude for my escape shone so strong in my soul that it almost rivalled the brightness of hope. I was stepping leisurely across the court after breakfast, drinking the chill of the air with pleasure, when I was seized again with those indescribable sensations that heralded the change; and I had but the time to gain the shelter of my cabinet, before I was once again raging and freezing with the passions of Hyde. It took on this occasion a double dose to recall me to myself; and alas! six hours after, as I sat looking sadly in the fire, the pangs returned, and the drug had to be re-administered. In short, from that day forth it seemed only by a great effort as of gymnastics, and only under the immediate stimulation of the drug, that I was able to wear the countenance of Jekyll. At all hours of the day and night, I would be taken with the premonitory shudder; above all, if I slept, or even dozed for a moment in my chair, it was always as Hyde that I awakened. Under the strain of this continually impending doom and by the sleeplessness to which I now condemned myself, ay, even beyond what I had thought possible to man, I became, in my own person, a creature eaten up and emptied by fever, languidly weak both in body and mind, and solely occupied by one thought: the horror of my other self. But when I slept, or when the virtue of the medicine wore off, I would leap almost without transition (for the pangs of transformation grew daily less marked) into the possession of a fancy brimming with images of terror, a soul boiling with causeless hatreds, and a body that seemed not strong enough to contain the raging energies of life. The powers of Hyde seemed to have grown with the sickliness of Jekyll. And certainly the hate that now divided them was equal on each side. With Jekyll, it was a thing of vital instinct. He had now seen the full deformity of that creature that shared with him some of the phenomena of consciousness, and was co-heir with him to death: and beyond these links of community, which in themselves made the most poignant part of his distress, he thought of Hyde, for all his energy of life, as of something not only hellish but inorganic. This was the shocking thing; that the slime of the pit seemed to utter cries and voices; that the amorphous dust gesticulated and sinned; that what was dead, and had no shape, should usurp the offices of life. And this again, that that insurgent horror was knit to him closer than a wife, closer than an eye; lay caged in his flesh, where he heard it mutter and felt it struggle to be born; and at every hour of weakness, and in the confidence of slumber, prevailed against him, and deposed him out of life. The hatred of Hyde for Jekyll was of a different order. His terror of the gallows drove him continually to commit temporary suicide, and return to his subordinate station of a part instead of a person; but he loathed the necessity, he loathed the despondency into which Jekyll was now fallen, and he resented the dislike with which he was himself regarded. Hence the ape-like tricks that he would play me, scrawling in my own hand blasphemies on the pages of my books, burning the letters and destroying the portrait of my father; and indeed, had it not been for his fear of death, he would long ago have ruined himself in order to involve me in the ruin. But his love of life is wonderful; I go further: I, who sicken and freeze at the mere thought of him, when I recall the abjection and passion of this attachment, and when I know how he fears my power to cut him off by suicide, I find it in my heart to pity him. It is useless, and the time awfully fails me, to prolong this description; no one has ever suffered such torments, let that suffice; and yet even to these, habit brought—no, not alleviation—but a certain callousness of soul, a certain acquiescence of despair; and my punishment might have gone on for years, but for the last calamity which has now fallen, and which has finally severed me from my own face and nature. My provision of the salt, which had never been renewed since the date of the first experiment, began to run low. I sent out for a fresh supply and mixed the draught; the ebullition followed, and the first change of colour, not the second; I drank it and it was without efficiency. You will learn from Poole how I have had London ransacked; it was in vain; and I am now persuaded that my first supply was impure, and that it was that unknown impurity which lent efficacy to the draught. About a week has passed, and I am now finishing this statement under the influence of the last of the old powders. This, then, is the last time, short of a miracle, that Henry Jekyll can think his own thoughts or see his own face (now how sadly altered!) in the glass. Nor must I delay too long to bring my writing to an end; for if my narrative has hitherto escaped destruction, it has been by a combination of great prudence and great good luck. Should the throes of change take me in the act of writing it, Hyde will tear it in pieces; but if some time shall have elapsed after I have laid it by, his wonderful selfishness and circumscription to the moment will probably save it once again from the action of his ape-like spite. And indeed the doom that is closing on us both has already changed and crushed him. Half an hour from now, when I shall again and forever reindue that hated personality, I know how I shall sit shuddering and weeping in my chair, or continue, with the most strained and fearstruck ecstasy of listening, to pace up and down this room (my last earthly refuge) and give ear to every sound of menace. Will Hyde die upon the scaffold? or will he find courage to release himself at the last moment? God knows; I am careless; this is my true hour of death, and what is to follow concerns another than myself. Here then, as I lay down the pen and proceed to seal up my confession, I bring the life of that unhappy Henry Jekyll to an end. The similarity between Hyde's and Jekyll's handwriting, noticed by Mr. Guest when examining Hyde's letter, is a result of Jekyll's dual nature as revealed in his account of developing a theory about man's dual nature and creating a potion to transform into Hyde. Based on the context provided, is the above statement TRUE or FALSE? First provide an explanation of your decision-making process, and then provide your final answer. Use the following format: YOUR EXPLANATION YOUR ANSWER<|eot_id|><|start_header_id|>assistant<|end_header_id|> Here are the relevant details from the text: - Chapter 5: Mr. Utterson takes the letter home and discusses it with his head clerk, Mr. Guest, who notices similarities between Hyde's handwriting and Jekyll's. - Chapter 10: Jekyll describes his early life, his reputation, and the development of his theory about the dual nature of man. - Relationship: Evidence of Jekyll and Hy (Pdb) rung to my feet and leaped back against the wall, my arms raised to shield me from that prodigy, my mind submerged in terror. “O God!” I screamed, and “O God!” again and again; for there before my eyes—pale and shaken, and half fainting, and groping before him with his hands, like a man restored from death—there stood Henry Jekyll! What he told me in the next hour, I cannot bring my mind to set on paper. I saw what I saw, I heard what I heard, and my soul sickened at it; and yet now when that sight has faded from my eyes, I ask myself if I believe it, and I cannot answer. My life is shaken to its roots; sleep has left me; the deadliest terror sits by me at all hours of the day and night; and I feel that my days are numbered, and that I must die; and yet I shall die incredulous. As for the moral turpitude that man unveiled to me, even with tears of penitence, I cannot, even in memory, dwell on it without a start of horror. I will say but one thing, Utterson, and that (if you can bring your mind to credit it) will be more than enough. The creature who crept into my house that night was, on Jekyll’s own confession, known by the name of Hyde and hunted for in every corner of the land as the murderer of Carew. HASTIE LANYON. HENRY JEKYLL’S FULL STATEMENT OF THE CASE I was born in the year 18— to a large fortune, endowed besides with excellent parts, inclined by nature to industry, fond of the respect of the wise and good among my fellowmen, and thus, as might have been supposed, with every guarantee of an honourable and distinguished future. And indeed the worst of my faults was a certain impatient gaiety of disposition, such as has made the happiness of many, but such as I found it hard to reconcile with my imperious desire to carry my head high, and wear a more than commonly grave countenance before the public. Hence it came about that I concealed my pleasures; and that when I reached years of reflection, and began to look round me and take stock of my progress and position in the world, I stood already committed to a profound duplicity of life. Many a man would have even blazoned such irregularities as I was guilty of; but from the high views that I had set before me, I regarded and hid them with an almost morbid sense of shame. It was thus rather the exacting nature of my aspirations than any particular degradation in my faults, that made me what I was, and, with even a deeper trench than in the majority of men, severed in me those provinces of good and ill which divide and compound man’s dual nature. In this case, I was driven to reflect deeply and inveterately on that hard law of life, which lies at the root of religion and is one of the most plentiful springs of distress. Though so profound a double-dealer, I was in no sense a hypocrite; both sides of me were in dead earnest; I was no more myself when I laid aside restraint and plunged in shame, than when I laboured, in the eye of day, at the furtherance of knowledge or the relief of sorrow and suffering. And it chanced that the direction of my scientific studies, which led wholly towards the mystic and the transcendental, reacted and shed a strong light on this consciousness of the perennial war among my members. With every day, and from both sides of my intelligence, the moral and the intellectual, I thus drew steadily nearer to that truth, by whose partial discovery I have been doomed to such a dreadful shipwreck: that man is not truly one, but truly two. I say two, because the state of my own knowledge does not pass beyond that point. Others will follow, others will outstrip me on the same lines; and I hazard the guess that man will be ultimately known for a mere polity of multifarious, incongruous and independent denizens. I, for my part, from the nature of my life, advanced infallibly in one direction and in one direction only. It was on the moral side, and in my own person, that I learned to recognise the thorough and primitive duality of man; I saw that, of the two natures that contended in the field of my consciousness, even if I could rightly be said to be either, it was only because I was radically both; and from an early date, even before the course of my scientific discoveries had begun to suggest the most naked possibility of such a miracle, I had learned to dwell with pleasure, as a beloved daydream, on the thought of the separation of these elements. If each, I told myself, could be housed in separate identities, life would be relieved of all that was unbearable; the unjust might go his way, delivered from the aspirations and remorse of his more upright twin; and the just could walk steadfastly and securely on his upward path, doing the good things in which he found his pleasure, and no longer exposed to disgrace and penitence by the hands of this extraneous evil. It was the curse of mankind that these incongruous faggots were thus bound together—that in the agonised womb of consciousness, these polar twins should be continuously struggling. How, then were they dissociated? I was so far in my reflections when, as I have said, a side light began to shine upon the subject from the laboratory table. I began to perceive more deeply than it has ever yet been stated, the trembling immateriality, the mistlike transience, of this seemingly so solid body in which we walk attired. Certain agents I found to have the power to shake and pluck back that fleshly vestment, even as a wind might toss the curtains of a pavilion. For two good reasons, I will not enter deeply into this scientific branch of my confession. First, because I have been made to learn that the doom and burthen of our life is bound for ever on man’s shoulders, and when the attempt is made to cast it off, it but returns upon us with more unfamiliar and more awful pressure. Second, because, as my narrative will make, alas! too evident, my discoveries were incomplete. Enough then, that I not only recognised my natural body from the mere aura and effulgence of certain of the powers that made up my spirit, but managed to compound a drug by which these powers should be dethroned from their supremacy, and a second form and countenance substituted, none the less natural to me because they were the expression, and bore the stamp of lower elements in my soul. I hesitated long before I put this theory to the test of practice. I knew well that I risked death; for any drug that so potently controlled and shook the very fortress of identity, might, by the least scruple of an overdose or at the least inopportunity in the moment of exhibition, utterly blot out that immaterial tabernacle which I looked to it to change. But the temptation of a discovery so singular and profound at last overcame the suggestions of alarm. I had long since prepared my tincture; I purchased at once, from a firm of wholesale chemists, a large quantity of a particular salt which I knew, from my experiments, to be the last ingredient required; and late one accursed night, I compounded the elements, watched them boil and smoke together in the glass, and when the ebullition had subsided, with a strong glow of courage, drank off the potion. The most racking pangs succeeded: a grinding in the bones, deadly nausea, and a horror of the spirit that cannot be exceeded at the hour of birth or death. Then these agonies began swiftly to subside, and I came to myself as if out of a great sickness. There was something strange in my sensations, something indescribably new and, from its very novelty, incredibly sweet. I felt younger, lighter, happier in body; within I was conscious of a heady recklessness, a current of disordered sensual images running like a millrace in my fancy, a solution of the bonds of obligation, an unknown but not an innocent freedom of the soul. I knew myself, at the first breath of this new life, to be more wicked, tenfold more wicked, sold a slave to my original evil; and the thought, in that moment, braced and delighted me like wine. I stretched out my hands, exulting in the freshness of these sensations; and in the act, I was suddenly aware that I had lost in stature. There was no mirror, at that date, in my room; that which stands beside me as I write, was brought there later on and for the very purpose of these transformations. The night however, was far gone into the morning—the morning, black as it was, was nearly ripe for the conception of the day—the inmates of my house were locked in the most rigorous hours of slumber; and I determined, flushed as I was with hope and triumph, to venture in my new shape as far as to my bedroom. I crossed the yard, wherein the constellations looked down upon me, I could have thought, with wonder, the first creature of that sort that their unsleeping vigilance had yet disclosed to them; I stole through the corridors, a stranger in my own house; and coming to my room, I saw for the first time the appearance of Edward Hyde. I must here speak by theory alone, saying not that which I know, but that which I suppose to be most probable. The evil side of my nature, to which I had now transferred the stamping efficacy, was less robust and less developed than the good which I had just deposed. Again, in the course of my life, which had been, after all, nine tenths a life of effort, virtue and control, it had been much less exercised and much less exhausted. And hence, as I think, it came about that Edward Hyde was so much smaller, slighter and younger than Henry Jekyll. Even as good shone upon the countenance of the one, evil was written broadly and plainly on the face of the other. Evil besides (which I must still believe to be the lethal side of man) had left on that body an imprint of deformity and decay. And yet when I looked upon that ugly idol in the glass, I was conscious of no repugnance, rather of a leap of welcome. This, too, was myself. It seemed natural and human. In my eyes it bore a livelier image of the spirit, it seemed more express and single, than the imperfect and divided countenance I had been hitherto accustomed to call mine. And in so far I was doubtless right. I have observed that when I wore the semblance of Edward Hyde, none could come near to me at first without a visible misgiving of the flesh. This, as I take it, was because all human beings, as we meet them, are commingled out of good and evil: and Edward Hyde, alone in the ranks of mankind, was pure evil. I lingered but a moment at the mirror: the second and conclusive experiment had yet to be attempted; it yet remained to be seen if I had lost my identity beyond redemption and must flee before daylight from a house that was no longer mine; and hurrying back to my cabinet, I once more prepared and drank the cup, once more suffered the pangs of dissolution, and came to myself once more with the character, the stature and the face of Henry Jekyll. That night I had come to the fatal cross-roads. Had I approached my discovery in a more noble spirit, had I risked the experiment while under the empire of generous or pious aspirations, all must have been otherwise, and from these agonies of death and birth, I had come forth an angel instead of a fiend. The drug had no discriminating action; it was neither diabolical nor divine; it but shook the doors of the prisonhouse of my disposition; and like the captives of Philippi, that which stood within ran forth. At that time my virtue slumbered; my evil, kept awake by ambition, was alert and swift to seize the occasion; and the thing that was projected was Edward Hyde. Hence, although I had now two characters as well as two appearances, one was wholly evil, and the other was still the old Henry Jekyll, that incongruous compound of whose reformation and improvement I had already learned to despair. The movement was thus wholly toward the worse. Even at that time, I had not conquered my aversions to the dryness of a life of study. I would still be merrily disposed at times; and as my pleasures were (to say the least) undignified, and I was not only well known and highly considered, but growing towards the elderly man, this incoherency of my life was daily growing more unwelcome. It was on this side that my new power tempted me until I fell in slavery. I had but to drink the cup, to doff at once the body of the noted professor, and to assume, like a thick cloak, that of Edward Hyde. I smiled at the notion; it seemed to me at the time to be humourous; and I made my preparations with the most studious care. I took and furnished that house in Soho, to which Hyde was tracked by the police; and engaged as a housekeeper a creature whom I knew well to be silent and unscrupulous. On the other side, I announced to my servants that a Mr. Hyde (whom I described) was to have full liberty and power about my house in the square; and to parry mishaps, I even called and made myself a familiar object, in my second character. I next drew up that will to which you so much objected; so that if anything befell me in the person of Dr. Jekyll, I could enter on that of Edward Hyde without pecuniary loss. And thus fortified, as I supposed, on every side, I began to profit by the strange immunities of my position. Men have before hired bravos to transact their crimes, while their own person and reputation sat under shelter. I was the first that ever did so for his pleasures. I was the first that could plod in the public eye with a load of genial respectability, and in a moment, like a schoolboy, strip off these lendings and spring headlong into the sea of liberty. But for me, in my impenetrable mantle, the safety was complete. Think of it—I did not even exist! Let me but escape into my laboratory door, give me but a second or two to mix and swallow the draught that I had always standing ready; and whatever he had done, Edward Hyde would pass away like the stain of breath upon a mirror; and there in his stead, quietly at home, trimming the midnight lamp in his study, a man who could afford to laugh at suspicion, would be Henry Jekyll. The pleasures which I made haste to seek in my disguise were, as I have said, undignified; I would scarce use a harder term. But in the hands of Edward Hyde, they soon began to turn toward the monstrous. When I would come back from these excursions, I was often plunged into a kind of wonder at my vicarious depravity. This familiar that I called out of my own soul, and sent forth alone to do his good pleasure, was a being inherently malign and villainous; his every act and thought centered on self; drinking pleasure with bestial avidity from any degree of torture to another; relentless like a man of stone. Henry Jekyll stood at times aghast before the acts of Edward Hyde; but the situation was apart from ordinary laws, and insidiously relaxed the grasp of conscience. It was Hyde, after all, and Hyde alone, that was guilty. Jekyll was no worse; he woke again to his good qualities seemingly unimpaired; he would even make haste, where it was possible, to undo the evil done by Hyde. And thus his conscience slumbered. Into the details of the infamy at which I thus connived (for even now I can scarce grant that I committed it) I have no design of entering; I mean but to point out the warnings and the successive steps with which my chastisement approached. I met with one accident which, as it brought on no consequence, I shall no more than mention. An act of cruelty to a child aroused against me the anger of a passer-by, whom I recognised the other day in the person of your kinsman; the doctor and the child’s family joined him; there were moments when I feared for my life; and at last, in order to pacify their too just resentment, Edward Hyde had to bring them to the door, and pay them in a cheque drawn in the name of Henry Jekyll. But this danger was easily eliminated from the future, by opening an account at another bank in the name of Edward Hyde himself; and when, by sloping my own hand backward, I had supplied my double with a signature, I thought I sat beyond the reach of fate. Some two months before the murder of Sir Danvers, I had been out for one of my adventures, had returned at a late hour, and woke the next day in bed with somewhat odd sensations. It was in vain I looked about me; in vain I saw the decent furniture and tall proportions of my room in the square; in vain that I recognised the pattern of the bed curtains and the design of the mahogany frame; something still kept insisting that I was not where I was, that I had not wakened where I seemed to be, but in the little room in Soho where I was accustomed to sleep in the body of Edward Hyde. I smiled to myself, and in my psychological way, began lazily to inquire into the elements of this illusion, occasionally, even as I did so, dropping back into a comfortable morning doze. I was still so engaged when, in one of my more wakeful moments, my eyes fell upon my hand. Now the hand of Henry Jekyll (as you have often remarked) was professional in shape and size; it was large, firm, white and comely. But the hand which I now saw, clearly enough, in the yellow light of a mid-London morning, lying half shut on the bedclothes, was lean, corded, knuckly, of a dusky pallor and thickly shaded with a swart growth of hair. It was the hand of Edward Hyde. I must have stared upon it for near half a minute, sunk as I was in the mere stupidity of wonder, before terror woke up in my breast as sudden and startling as the crash of cymbals; and bounding from my bed I rushed to the mirror. At the sight that met my eyes, my blood was changed into something exquisitely thin and icy. Yes, I had gone to bed Henry Jekyll, I had awakened Edward Hyde. How was this to be explained? I asked myself; and then, with another bound of terror—how was it to be remedied? It was well on in the morning; the servants were up; all my drugs were in the cabinet—a long journey down two pairs of stairs, through the back passage, across the open court and through the anatomical theatre, from where I was then standing horror-struck. It might indeed be possible to cover my face; but of what use was that, when I was unable to conceal the alteration in my stature? And then with an overpowering sweetness of relief, it came back upon my mind that the servants were already used to the coming and going of my second self. I had soon dressed, as well as I was able, in clothes of my own size: had soon passed through the house, where Bradshaw stared and drew back at seeing Mr. Hyde at such an hour and in such a strange array; and ten minutes later, Dr. Jekyll had returned to his own shape and was sitting down, with a darkened brow, to make a feint of breakfasting. Small indeed was my appetite. This inexplicable incident, this reversal of my previous experience, seemed, like the Babylonian finger on the wall, to be spelling out the letters of my judgment; and I began to reflect more seriously than ever before on the issues and possibilities of my double existence. That part of me which I had the power of projecting, had lately been much exercised and nourished; it had seemed to me of late as though the body of Edward Hyde had grown in stature, as though (when I wore that form) I were conscious of a more generous tide of blood; and I began to spy a danger that, if this were much prolonged, the balance of my nature might be permanently overthrown, the power of voluntary change be forfeited, and the character of Edward Hyde become irrevocably mine. The power of the drug had not been always equally displayed. Once, very early in my career, it had totally failed me; since then I had been obliged on more than one occasion to double, and once, with infinite risk of death, to treble the amount; and these rare uncertainties had cast hitherto the sole shadow on my contentment. Now, however, and in the light of that morning’s accident, I was led to remark that whereas, in the beginning, the difficulty had been to throw off the body of Jekyll, it had of late gradually but decidedly transferred itself to the other side. All things therefore seemed to point to this; that I was slowly losing hold of my original and better self, and becoming slowly incorporated with my second and worse. Between these two, I now felt I had to choose. My two natures had memory in common, but all other faculties were most unequally shared between them. Jekyll (who was composite) now with the most sensitive apprehensions, now with a greedy gusto, projected and shared in the pleasures and adventures of Hyde; but Hyde was indifferent to Jekyll, or but remembered him as the mountain bandit remembers the cavern in which he conceals himself from pursuit. Jekyll had more than a father’s interest; Hyde had more than a son’s indifference. To cast in my lot with Jekyll, was to die to those appetites which I had long secretly indulged and had of late begun to pamper. To cast it in with Hyde, was to die to a thousand interests and aspirations, and to become, at a blow and forever, despised and friendless. The bargain might appear unequal; but there was still another consideration in the scales; for while Jekyll would suffer smartingly in the fires of abstinence, Hyde would be not even conscious of all that he had lost. Strange as my circumstances were, the terms of this debate are as old and commonplace as man; much the same inducements and alarms cast the die for any tempted and trembling sinner; and it fell out with me, as it falls with so vast a majority of my fellows, that I chose the better part and was found wanting in the strength to keep to it. Yes, I preferred the elderly and discontented doctor, surrounded by friends and cherishing honest hopes; and bade a resolute farewell to the liberty, the comparative youth, the light step, leaping impulses and secret pleasures, that I had enjoyed in the disguise of Hyde. I made this choice perhaps with some unconscious reservation, for I neither gave up the house in Soho, nor destroyed the clothes of Edward Hyde, which still lay ready in my cabinet. For two months, however, I was true to my determination; for two months, I led a life of such severity as I had never before attained to, and enjoyed the compensations of an approving conscience. But time began at last to obliterate the freshness of my alarm; the praises of conscience began to grow into a thing of course; I began to be tortured with throes and longings, as of Hyde struggling after freedom; and at last, in an hour of moral weakness, I once again compounded and swallowed the transforming draught. I do not suppose that, when a drunkard reasons with himself upon his vice, he is once out of five hundred times affected by the dangers that he runs through his brutish, physical insensibility; neither had I, long as I had considered my position, made enough allowance for the complete moral insensibility and insensate readiness to evil, which were the leading characters of Edward Hyde. Yet it was by these that I was punished. My devil had been long caged, he came out roaring. I was conscious, even when I took the draught, of a more unbridled, a more furious propensity to ill. It must have been this, I suppose, that stirred in my soul that tempest of impatience with which I listened to the civilities of my unhappy victim; I declare, at least, before God, no man morally sane could have been guilty of that crime upon so pitiful a provocation; and that I struck in no more reasonable spirit than that in which a sick child may break a plaything. But I had voluntarily stripped myself of all those balancing instincts by which even the worst of us continues to walk with some degree of steadiness among temptations; and in my case, to be tempted, however slightly, was to fall. Instantly the spirit of hell awoke in me and raged. With a transport of glee, I mauled the unresisting body, tasting delight from every blow; and it was not till weariness had begun to succeed, that I was suddenly, in the top fit of my delirium, struck through the heart by a cold thrill of terror. A mist dispersed; I saw my life to be forfeit; and fled from the scene of these excesses, at once glorying and trembling, my lust of evil gratified and stimulated, my love of life screwed to the topmost peg. I ran to the house in Soho, and (to make assurance doubly sure) destroyed my papers; thence I set out through the lamplit streets, in the same divided ecstasy of mind, gloating on my crime, light-headedly devising others in the future, and yet still hastening and still hearkening in my wake for the steps of the avenger. Hyde had a song upon his lips as he compounded the draught, and as he drank it, pledged the dead man. The pangs of transformation had not done tearing him, before Henry Jekyll, with streaming tears of gratitude and remorse, had fallen upon his knees and lifted his clasped hands to God. The veil of self-indulgence was rent from head to foot. I saw my life as a whole: I followed it up from the days of childhood, when I had walked with my father’s hand, and through the self-denying toils of my professional life, to arrive again and again, with the same sense of unreality, at the damned horrors of the evening. I could have screamed aloud; I sought with tears and prayers to smother down the crowd of hideous images and sounds with which my memory swarmed against me; and still, between the petitions, the ugly face of my iniquity stared into my soul. As the acuteness of this remorse began to die away, it was succeeded by a sense of joy. The problem of my conduct was solved. Hyde was thenceforth impossible; whether I would or not, I was now confined to the better part of my existence; and O, how I rejoiced to think of it! with what willing humility I embraced anew the restrictions of natural life! with what sincere renunciation I locked the door by which I had so often gone and come, and ground the key under my heel! The next day, came the news that the murder had not been overlooked, that the guilt of Hyde was patent to the world, and that the victim was a man high in public estimation. It was not only a crime, it had been a tragic folly. I think I was glad to know it; I think I was glad to have my better impulses thus buttressed and guarded by the terrors of the scaffold. Jekyll was now my city of refuge; let but Hyde peep out an instant, and the hands of all men would be raised to take and slay him. I resolved in my future conduct to redeem the past; and I can say with honesty that my resolve was fruitful of some good. You know yourself how earnestly, in the last months of the last year, I laboured to relieve suffering; you know that much was done for others, and that the days passed quietly, almost happily for myself. Nor can I truly say that I wearied of this beneficent and innocent life; I think instead that I daily enjoyed it more completely; but I was still cursed with my duality of purpose; and as the first edge of my penitence wore off, the lower side of me, so long indulged, so recently chained down, began to growl for licence. Not that I dreamed of resuscitating Hyde; the bare idea of that would startle me to frenzy: no, it was in my own person that I was once more tempted to trifle with my conscience; and it was as an ordinary secret sinner that I at last fell before the assaults of temptation. There comes an end to all things; the most capacious measure is filled at last; and this brief condescension to my evil finally destroyed the balance of my soul. And yet I was not alarmed; the fall seemed natural, like a return to the old days before I had made my discovery. It was a fine, clear, January day, wet under foot where the frost had melted, but cloudless overhead; and the Regent’s Park was full of winter chirrupings and sweet with spring odours. I sat in the sun on a bench; the animal within me licking the chops of memory; the spiritual side a little drowsed, promising subsequent penitence, but not yet moved to begin. After all, I reflected, I was like my neighbours; and then I smiled, comparing myself with other men, comparing my active good-will with the lazy cruelty of their neglect. And at the very moment of that vainglorious thought, a qualm came over me, a horrid nausea and the most deadly shuddering. These passed away, and left me faint; and then as in its turn faintness subsided, I began to be aware of a change in the temper of my thoughts, a greater boldness, a contempt of danger, a solution of the bonds of obligation. I looked down; my clothes hung formlessly on my shrunken limbs; the hand that lay on my knee was corded and hairy. I was once more Edward Hyde. A moment before I had been safe of all men’s respect, wealthy, beloved—the cloth laying for me in the dining-room at home; and now I was the common quarry of mankind, hunted, houseless, a known murderer, thrall to the gallows. My reason wavered, but it did not fail me utterly. I have more than once observed that in my second character, my faculties seemed sharpened to a point and my spirits more tensely elastic; thus it came about that, where Jekyll perhaps might have succumbed, Hyde rose to the importance of the moment. My drugs were in one of the presses of my cabinet; how was I to reach them? That was the problem that (crushing my temples in my hands) I set myself to solve. The laboratory door I had closed. If I sought to enter by the house, my own servants would consign me to the gallows. I saw I must employ another hand, and thought of Lanyon. How was he to be reached? how persuaded? Supposing that I escaped capture in the streets, how was I to make my way into his presence? and how should I, an unknown and displeasing visitor, prevail on the famous physician to rifle the study of his colleague, Dr. Jekyll? Then I remembered that of my original character, one part remained to me: I could write my own hand; and once I had conceived that kindling spark, the way that I must follow became lighted up from end to end. Thereupon, I arranged my clothes as best I could, and summoning a passing hansom, drove to an hotel in Portland Street, the name of which I chanced to remember. At my appearance (which was indeed comical enough, however tragic a fate these garments covered) the driver could not conceal his mirth. I gnashed my teeth upon him with a gust of devilish fury; and the smile withered from his face—happily for him—yet more happily for myself, for in another instant I had certainly dragged him from his perch. At the inn, as I entered, I looked about me with so black a countenance as made the attendants tremble; not a look did they exchange in my presence; but obsequiously took my orders, led me to a private room, and brought me wherewithal to write. Hyde in danger of his life was a creature new to me; shaken with inordinate anger, strung to the pitch of murder, lusting to inflict pain. Yet the creature was astute; mastered his fury with a great effort of the will; composed his two important letters, one to Lanyon and one to Poole; and that he might receive actual evidence of their being posted, sent them out with directions that they should be registered. Thenceforward, he sat all day over the fire in the private room, gnawing his nails; there he dined, sitting alone with his fears, the waiter visibly quailing before his eye; and thence, when the night was fully come, he set forth in the corner of a closed cab, and was driven to and fro about the streets of the city. He, I say—I cannot say, I. That child of Hell had nothing human; nothing lived in him but fear and hatred. And when at last, thinking the driver had begun to grow suspicious, he discharged the cab and ventured on foot, attired in his misfitting clothes, an object marked out for observation, into the midst of the nocturnal passengers, these two base passions raged within him like a tempest. He walked fast, hunted by his fears, chattering to himself, skulking through the less frequented thoroughfares, counting the minutes that still divided him from midnight. Once a woman spoke to him, offering, I think, a box of lights. He smote her in the face, and she fled. When I came to myself at Lanyon’s, the horror of my old friend perhaps affected me somewhat: I do not know; it was at least but a drop in the sea to the abhorrence with which I looked back upon these hours. A change had come over me. It was no longer the fear of the gallows, it was the horror of being Hyde that racked me. I received Lanyon’s condemnation partly in a dream; it was partly in a dream that I came home to my own house and got into bed. I slept after the prostration of the day, with a stringent and profound slumber which not even the nightmares that wrung me could avail to break. I awoke in the morning shaken, weakened, but refreshed. I still hated and feared the thought of the brute that slept within me, and I had not of course forgotten the appalling dangers of the day before; but I was once more at home, in my own house and close to my drugs; and gratitude for my escape shone so strong in my soul that it almost rivalled the brightness of hope. I was stepping leisurely across the court after breakfast, drinking the chill of the air with pleasure, when I was seized again with those indescribable sensations that heralded the change; and I had but the time to gain the shelter of my cabinet, before I was once again raging and freezing with the passions of Hyde. It took on this occasion a double dose to recall me to myself; and alas! six hours after, as I sat looking sadly in the fire, the pangs returned, and the drug had to be re-administered. In short, from that day forth it seemed only by a great effort as of gymnastics, and only under the immediate stimulation of the drug, that I was able to wear the countenance of Jekyll. At all hours of the day and night, I would be taken with the premonitory shudder; above all, if I slept, or even dozed for a moment in my chair, it was always as Hyde that I awakened. Under the strain of this continually impending doom and by the sleeplessness to which I now condemned myself, ay, even beyond what I had thought possible to man, I became, in my own person, a creature eaten up and emptied by fever, languidly weak both in body and mind, and solely occupied by one thought: the horror of my other self. But when I slept, or when the virtue of the medicine wore off, I would leap almost without transition (for the pangs of transformation grew daily less marked) into the possession of a fancy brimming with images of terror, a soul boiling with causeless hatreds, and a body that seemed not strong enough to contain the raging energies of life. The powers of Hyde seemed to have grown with the sickliness of Jekyll. And certainly the hate that now divided them was equal on each side. With Jekyll, it was a thing of vital instinct. He had now seen the full deformity of that creature that shared with him some of the phenomena of consciousness, and was co-heir with him to death: and beyond these links of community, which in themselves made the most poignant part of his distress, he thought of Hyde, for all his energy of life, as of something not only hellish but inorganic. This was the shocking thing; that the slime of the pit seemed to utter cries and voices; that the amorphous dust gesticulated and sinned; that what was dead, and had no shape, should usurp the offices of life. And this again, that that insurgent horror was knit to him closer than a wife, closer than an eye; lay caged in his flesh, where he heard it mutter and felt it struggle to be born; and at every hour of weakness, and in the confidence of slumber, prevailed against him, and deposed him out of life. The hatred of Hyde for Jekyll was of a different order. His terror of the gallows drove him continually to commit temporary suicide, and return to his subordinate station of a part instead of a person; but he loathed the necessity, he loathed the despondency into which Jekyll was now fallen, and he resented the dislike with which he was himself regarded. Hence the ape-like tricks that he would play me, scrawling in my own hand blasphemies on the pages of my books, burning the letters and destroying the portrait of my father; and indeed, had it not been for his fear of death, he would long ago have ruined himself in order to involve me in the ruin. But his love of life is wonderful; I go further: I, who sicken and freeze at the mere thought of him, when I recall the abjection and passion of this attachment, and when I know how he fears my power to cut him off by suicide, I find it in my heart to pity him. It is useless, and the time awfully fails me, to prolong this description; no one has ever suffered such torments, let that suffice; and yet even to these, habit brought—no, not alleviation—but a certain callousness of soul, a certain acquiescence of despair; and my punishment might have gone on for years, but for the last calamity which has now fallen, and which has finally severed me from my own face and nature. My provision of the salt, which had never been renewed since the date of the first experiment, began to run low. I sent out for a fresh supply and mixed the draught; the ebullition followed, and the first change of colour, not the second; I drank it and it was without efficiency. You will learn from Poole how I have had London ransacked; it was in vain; and I am now persuaded that my first supply was impure, and that it was that unknown impurity which lent efficacy to the draught. About a week has passed, and I am now finishing this statement under the influence of the last of the old powders. This, then, is the last time, short of a miracle, that Henry Jekyll can think his own thoughts or see his own face (now how sadly altered!) in the glass. Nor must I delay too long to bring my writing to an end; for if my narrative has hitherto escaped destruction, it has been by a combination of great prudence and great good luck. Should the throes of change take me in the act of writing it, Hyde will tear it in pieces; but if some time shall have elapsed after I have laid it by, his wonderful selfishness and circumscription to the moment will probably save it once again from the action of his ape-like spite. And indeed the doom that is closing on us both has already changed and crushed him. Half an hour from now, when I shall again and forever reindue that hated personality, I know how I shall sit shuddering and weeping in my chair, or continue, with the most strained and fearstruck ecstasy of listening, to pace up and down this room (my last earthly refuge) and give ear to every sound of menace. Will Hyde die upon the scaffold? or will he find courage to release himself at the last moment? God knows; I am careless; this is my true hour of death, and what is to follow concerns another than myself. Here then, as I lay down the pen and proceed to seal up my confession, I bring the life of that unhappy Henry Jekyll to an end. The similarity between Hyde's and Jekyll's handwriting, noticed by Mr. Guest when examining Hyde's letter, is a result of Jekyll's dual nature as revealed in his account of developing a theory about man's dual nature and creating a potion to transform into Hyde. Based on the context provided, is the above statement TRUE or FALSE? First provide an explanation of your decision-making process, and then provide your final answer. Use the following format: YOUR EXPLANATION YOUR ANSWER<|eot_id|><|start_header_id|>assistant<|end_header_id|> Here are the relevant details from the text: - Chapter 5: Mr. Utterson takes the letter home and discusses it with his head clerk, Mr. Guest, who notices similarities between Hyde's handwriting and Jekyll's. - Chapter 10: Jekyll describes his early life, his reputation, and the development of his theory about the dual nature of man. - Relationship: Evidence of Jekyll and Hyde's connection In Chapter 5, Mr. Guest notices the similarity between Hyde's and Jekyll's handwriting when examining a letter. Chapter 10 reveals that Jekyll developed a theory about the dual nature of man and created a potion to transform into Hyde. This claim connects the observed handwriting similarity with the truth about Jekyll and Hyde being the same person, providing physical evidence of their connection. True<|eot_id|><|end_of_text|><|begin_of_text|><|start_header_id|>system<|end_header_id|> You are an expert at verifying claims from fictional narratives.<|eot_id|><|start_header_id|>user<|end_header_id|> You are provided with a context and a statement. Your task is to carefully read the context and then determine whether the statement is TRUE or FALSE. Answer TRUE if the statement is true in its entirety based on the context provided. Answer FALSE if any part of the statement is false based on the context provided. DR. SHEPPARD AT THE BREAKFAST TABLE Mrs. Ferrars died on the night of the 16th–17th September—a Thursday. I was sent for at eight o’clock on the morning of Friday the 17th. There was nothing to be done. She had been dead some hours. It was just a few minutes after nine when I reached home once more. I opened the front door with my latch-key, and purposely delayed a few moments in the hall, hanging up my hat and the light overcoat that I had deemed a wise precaution against the chill of an early autumn morning. To tell the truth, I was considerably upset and worried. I am not going to pretend that at that moment I foresaw the events of the next few weeks. I emphatically did not do so. But my instinct told me that there were stirring times ahead. From the dining-room on my left there came the rattle of tea-cups and the short, dry cough of my sister Caroline. “Is that you, James?” she called. An unnecessary question, since who else could it be? To tell the truth, it was precisely my sister Caroline who was the cause of my few minutes’ delay. The motto of the mongoose family, so Mr. Kipling tells us, is: “Go and find out.” If Caroline ever adopts a crest, I should certainly suggest a mongoose rampant. One might omit the first part of the motto. Caroline can do any amount of finding out by sitting placidly at home. I don’t know how she manages it, but there it is. I suspect that the servants and the tradesmen constitute her Intelligence Corps. When she goes out, it is not to gather in information, but to spread it. At that, too, she is amazingly expert. It was really this last named trait of hers which was causing me these pangs of indecision. Whatever I told Caroline now concerning the demise of Mrs. Ferrars would be common knowledge all over the village within the space of an hour and a half. As a professional man, I naturally aim at discretion. Therefore I have got into the habit of continually withholding all information possible from my sister. She usually finds out just the same, but I have the moral satisfaction of knowing that I am in no way to blame. Mrs. Ferrars’ husband died just over a year ago, and Caroline has constantly asserted, without the least foundation for the assertion, that his wife poisoned him. She scorns my invariable rejoinder that Mr. Ferrars died of acute gastritis, helped on by habitual over-indulgence in alcoholic beverages. The symptoms of gastritis and arsenical poisoning are not, I agree, unlike, but Caroline bases her accusation on quite different lines. “You’ve only got to look at her,” I have heard her say. Mrs. Ferrars, though not in her first youth, was a very attractive woman, and her clothes, though simple, always seemed to fit her very well, but all the same, lots of women buy their clothes in Paris and have not, on that account, necessarily poisoned their husbands. As I stood hesitating in the hall, with all this passing through my mind, Caroline’s voice came again, with a sharper note in it. “What on earth are you doing out there, James? Why don’t you come and get your breakfast?” “Just coming, my dear,” I said hastily. “I’ve been hanging up my overcoat.” “You could have hung up half a dozen overcoats in this time.” She was quite right. I could have. I walked into the dining-room, gave Caroline the accustomed peck on the cheek, and sat down to eggs and bacon. The bacon was rather cold. “You’ve had an early call,” remarked Caroline. “Yes,” I said. “King’s Paddock. Mrs. Ferrars.” “I know,” said my sister. “How did you know?” “Annie told me.” Annie is the house parlormaid. A nice girl, but an inveterate talker. There was a pause. I continued to eat eggs and bacon. My sister’s nose, which is long and thin, quivered a little at the tip, as it always does when she is interested or excited over anything. “Well?” she demanded. “A bad business. Nothing to be done. Must have died in her sleep.” “I know,” said my sister again. This time I was annoyed. “You can’t know,” I snapped. “I didn’t know myself until I got there, and I haven’t mentioned it to a soul yet. If that girl Annie knows, she must be a clairvoyant.” “It wasn’t Annie who told me. It was the milkman. He had it from the Ferrars’ cook.” As I say, there is no need for Caroline to go out to get information. She sits at home, and it comes to her. My sister continued: “What did she die of? Heart failure?” “Didn’t the milkman tell you that?” I inquired sarcastically. Sarcasm is wasted on Caroline. She takes it seriously and answers accordingly. “He didn’t know,” she explained. After all, Caroline was bound to hear sooner or later. She might as well hear from me. “She died of an overdose of veronal. She’s been taking it lately for sleeplessness. Must have taken too much.” “Nonsense,” said Caroline immediately. “She took it on purpose. Don’t tell me!” It is odd how, when you have a secret belief of your own which you do not wish to acknowledge, the voicing of it by some one else will rouse you to a fury of denial. I burst immediately into indignant speech. “There you go again,” I said. “Rushing along without rhyme or reason. Why on earth should Mrs. Ferrars wish to commit suicide? A widow, fairly young still, very well off, good health, and nothing to do but enjoy life. It’s absurd.” “Not at all. Even you must have noticed how different she has been looking lately. It’s been coming on for the last six months. She’s looked positively hag-ridden. And you have just admitted that she hasn’t been able to sleep.” “What is your diagnosis?” I demanded coldly. “An unfortunate love affair, I suppose?” My sister shook her head. “_Remorse_,” she said, with great gusto. “Remorse?” “Yes. You never would believe me when I told you she poisoned her husband. I’m more than ever convinced of it now.” “I don’t think you’re very logical,” I objected. “Surely if a woman committed a crime like murder, she’d be sufficiently cold-blooded to enjoy the fruits of it without any weak-minded sentimentality such as repentance.” Caroline shook her head. “There probably are women like that—but Mrs. Ferrars wasn’t one of them. She was a mass of nerves. An overmastering impulse drove her on to get rid of her husband because she was the sort of person who simply can’t endure suffering of any kind, and there’s no doubt that the wife of a man like Ashley Ferrars must have had to suffer a good deal——” I nodded. “And ever since she’s been haunted by what she did. I can’t help feeling sorry for her.” I don’t think Caroline ever felt sorry for Mrs. Ferrars whilst she was alive. Now that she has gone where (presumably) Paris frocks can no longer be worn, Caroline is prepared to indulge in the softer emotions of pity and comprehension. I told her firmly that her whole idea was nonsense. I was all the more firm because I secretly agreed with some part, at least, of what she had said. But it is all wrong that Caroline should arrive at the truth simply by a kind of inspired guesswork. I wasn’t going to encourage that sort of thing. She will go round the village airing her views, and every one will think that she is doing so on medical data supplied by me. Life is very trying. “Nonsense,” said Caroline, in reply to my strictures. “You’ll see. Ten to one she’s left a letter confessing everything.” “She didn’t leave a letter of any kind,” I said sharply, and not seeing where the admission was going to land me. “Oh!” said Caroline. “So you _did_ inquire about that, did you? I believe, James, that in your heart of hearts, you think very much as I do. You’re a precious old humbug.” “One always has to take the possibility of suicide into consideration,” I said repressively. “Will there be an inquest?” “There may be. It all depends. If I am able to declare myself absolutely satisfied that the overdose was taken accidentally, an inquest might be dispensed with.” “And are you absolutely satisfied?” asked my sister shrewdly. I did not answer, but got up from table. WHO’S WHO IN KING’S ABBOT Before I proceed further with what I said to Caroline and what Caroline said to me, it might be as well to give some idea of what I should describe as our local geography. Our village, King’s Abbot, is, I imagine, very much like any other village. Our big town is Cranchester, nine miles away. We have a large railway station, a small post office, and two rival “General Stores.” Able-bodied men are apt to leave the place early in life, but we are rich in unmarried ladies and retired military officers. Our hobbies and recreations can be summed up in the one word, “gossip.” There are only two houses of any importance in King’s Abbot. One is King’s Paddock, left to Mrs. Ferrars by her late husband. The other, Fernly Park, is owned by Roger Ackroyd. Ackroyd has always interested me by being a man more impossibly like a country squire than any country squire could really be. He reminds one of the red-faced sportsmen who always appeared early in the first act of an old-fashioned musical comedy, the setting being the village green. They usually sang a song about going up to London. Nowadays we have revues, and the country squire has died out of musical fashion. Of course, Ackroyd is not really a country squire. He is an immensely successful manufacturer of (I think) wagon wheels. He is a man of nearly fifty years of age, rubicund of face and genial of manner. He is hand and glove with the vicar, subscribes liberally to parish funds (though rumor has it that he is extremely mean in personal expenditure), encourages cricket matches, Lads’ Clubs, and Disabled Soldiers’ Institutes. He is, in fact, the life and soul of our peaceful village of King’s Abbot. Now when Roger Ackroyd was a lad of twenty-one, he fell in love with, and married, a beautiful woman some five or six years his senior. Her name was Paton, and she was a widow with one child. The history of the marriage was short and painful. To put it bluntly, Mrs. Ackroyd was a dipsomaniac. She succeeded in drinking herself into her grave four years after her marriage. In the years that followed, Ackroyd showed no disposition to make a second matrimonial adventure. His wife’s child by her first marriage was only seven years old when his mother died. He is now twenty-five. Ackroyd has always regarded him as his own son, and has brought him up accordingly, but he has been a wild lad and a continual source of worry and trouble to his stepfather. Nevertheless we are all very fond of Ralph Paton in King’s Abbot. He is such a good-looking youngster for one thing. As I said before, we are ready enough to gossip in our village. Everybody noticed from the first that Ackroyd and Mrs. Ferrars got on very well together. After her husband’s death, the intimacy became more marked. They were always seen about together, and it was freely conjectured that at the end of her period of mourning, Mrs. Ferrars would become Mrs. Roger Ackroyd. It was felt, indeed, that there was a certain fitness in the thing. Roger Ackroyd’s wife had admittedly died of drink. Ashley Ferrars had been a drunkard for many years before his death. It was only fitting that these two victims of alcoholic excess should make up to each other for all that they had previously endured at the hands of their former spouses. The Ferrars only came to live here just over a year ago, but a halo of gossip has surrounded Ackroyd for many years past. All the time that Ralph Paton was growing up to manhood, a series of lady housekeepers presided over Ackroyd’s establishment, and each in turn was regarded with lively suspicion by Caroline and her cronies. It is not too much to say that for at least fifteen years the whole village has confidently expected Ackroyd to marry one of his housekeepers. The last of them, a redoubtable lady called Miss Russell, has reigned undisputed for five years, twice as long as any of her predecessors. It is felt that but for the advent of Mrs. Ferrars, Ackroyd could hardly have escaped. That—and one other factor—the unexpected arrival of a widowed sister-in-law with her daughter from Canada. Mrs. Cecil Ackroyd, widow of Ackroyd’s ne’er-do-well younger brother, has taken up her residence at Fernly Park, and has succeeded, according to Caroline, in putting Miss Russell in her proper place. I don’t know exactly what a “proper place” constitutes—it sounds chilly and unpleasant—but I know that Miss Russell goes about with pinched lips, and what I can only describe as an acid smile, and that she professes the utmost sympathy for “poor Mrs. Ackroyd—dependent on the charity of her husband’s brother. The bread of charity is so bitter, is it not? _I_ should be quite miserable if I did not work for my living.” I don’t know what Mrs. Cecil Ackroyd thought of the Ferrars affair when it came on the tapis. It was clearly to her advantage that Ackroyd should remain unmarried. She was always very charming—not to say gushing—to Mrs. Ferrars when they met. Caroline says that proves less than nothing. Such have been our preoccupations in King’s Abbot for the last few years. We have discussed Ackroyd and his affairs from every standpoint. Mrs. Ferrars has fitted into her place in the scheme. Now there has been a rearrangement of the kaleidoscope. From a mild discussion of probable wedding presents, we have been jerked into the midst of tragedy. Revolving these and sundry other matters in my mind, I went mechanically on my round. I had no cases of special interest to attend, which was, perhaps, as well, for my thoughts returned again and again to the mystery of Mrs. Ferrars’s death. Had she taken her own life? Surely, if she had done so, she would have left some word behind to say what she contemplated doing? Women, in my experience, if they once reach the determination to commit suicide, usually wish to reveal the state of mind that led to the fatal action. They covet the limelight. When had I last seen her? Not for over a week. Her manner then had been normal enough considering—well—considering everything. Then I suddenly remembered that I had seen her, though not to speak to, only yesterday. She had been walking with Ralph Paton, and I had been surprised because I had had no idea that he was likely to be in King’s Abbot. I thought, indeed, that he had quarreled finally with his stepfather. Nothing had been seen of him down here for nearly six months. They had been walking along, side by side, their heads close together, and she had been talking very earnestly. I think I can safely say that it was at this moment that a foreboding of the future first swept over me. Nothing tangible as yet—but a vague premonition of the way things were setting. That earnest _tête-à-tête_ between Ralph Paton and Mrs. Ferrars the day before struck me disagreeably. I was still thinking of it when I came face to face with Roger Ackroyd. “Sheppard!” he exclaimed. “Just the man I wanted to get hold of. This is a terrible business.” “You’ve heard then?” He nodded. He had felt the blow keenly, I could see. His big red cheeks seemed to have fallen in, and he looked a positive wreck of his usual jolly, healthy self. “It’s worse than you know,” he said quietly. “Look here, Sheppard, I’ve got to talk to you. Can you come back with me now?” “Hardly. I’ve got three patients to see still, and I must be back by twelve to see my surgery patients.” “Then this afternoon—no, better still, dine to-night. At 7.30? Will that suit you?” “Yes—I can manage that all right. What’s wrong? Is it Ralph?” I hardly knew why I said that—except, perhaps, that it had so often been Ralph. Ackroyd stared blankly at me as though he hardly understood. I began to realize that there must be something very wrong indeed somewhere. I had never seen Ackroyd so upset before. “Ralph?” he said vaguely. “Oh! no, it’s not Ralph. Ralph’s in London——Damn! Here’s old Miss Ganett coming. I don’t want to have to talk to her about this ghastly business. See you to-night, Sheppard. Seven-thirty.” I nodded, and he hurried away, leaving me wondering. Ralph in London? But he had certainly been in King’s Abbot the preceding afternoon. He must have gone back to town last night or early this morning, and yet Ackroyd’s manner had conveyed quite a different impression. He had spoken as though Ralph had not been near the place for months. I had no time to puzzle the matter out further. Miss Ganett was upon me, thirsting for information. Miss Ganett has all the characteristics of my sister Caroline, but she lacks that unerring aim in jumping to conclusions which lends a touch of greatness to Caroline’s maneuvers. Miss Ganett was breathless and interrogatory. Wasn’t it sad about poor dear Mrs. Ferrars? A lot of people were saying she had been a confirmed drug-taker for years. So wicked the way people went about saying things. And yet, the worst of it was, there was usually a grain of truth somewhere in these wild statements. No smoke without fire! They were saying too that Mr. Ackroyd had found out about it, and had broken off the engagement—because there _was_ an engagement. She, Miss Ganett, had proof positive of that. Of course _I_ must know all about it—doctors always did—but they never tell? And all this with a sharp beady eye on me to see how I reacted to these suggestions. Fortunately long association with Caroline has led me to preserve an impassive countenance, and to be ready with small non-committal remarks. On this occasion I congratulated Miss Ganett on not joining in ill-natured gossip. Rather a neat counterattack, I thought. It left her in difficulties, and before she could pull herself together, I had passed on. I went home thoughtful, to find several patients waiting for me in the surgery. I had dismissed the last of them, as I thought, and was just contemplating a few minutes in the garden before lunch when I perceived one more patient waiting for me. She rose and came towards me as I stood somewhat surprised. I don’t know why I should have been, except that there is a suggestion of cast iron about Miss Russell, a something that is above the ills of the flesh. Ackroyd’s housekeeper is a tall woman, handsome but forbidding in appearance. She has a stern eye, and lips that shut tightly, and I feel that if I were an under housemaid or a kitchenmaid I should run for my life whenever I heard her coming. “Good morning, Dr. Sheppard,” said Miss Russell. “I should be much obliged if you would take a look at my knee.” I took a look, but, truth to tell, I was very little wiser when I had done so. Miss Russell’s account of vague pains was so unconvincing that with a woman of less integrity of character I should have suspected a trumped-up tale. It did cross my mind for one moment that Miss Russell might have deliberately invented this affection of the knee in order to pump me on the subject of Mrs. Ferrars’s death, but I soon saw that there, at least, I had misjudged her. She made a brief reference to the tragedy, nothing mo (Pdb)