The following values were not passed to `accelerate launch` and had defaults used instead: `--num_processes` was set to a value of `2` More than one GPU was found, enabling multi-GPU training. If this was unintended please pass in `--num_processes=1`. `--num_machines` was set to a value of `1` `--mixed_precision` was set to a value of `'no'` `--dynamo_backend` was set to a value of `'no'` To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`. Using RTX 3090 or 4000 series which doesn't support faster communication speedups. Ensuring P2P and IB communications are disabled. 01/21/2024 12:00:12 - WARNING - llmtuner.hparams.parser - We recommend enable `upcast_layernorm` in quantized training. 01/21/2024 12:00:12 - WARNING - llmtuner.hparams.parser - We recommend enable mixed precision training. 01/21/2024 12:00:12 - WARNING - llmtuner.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training. [INFO|training_args.py:1838] 2024-01-21 12:00:12,822 >> PyTorch: setting up devices /home/hangyu5/anaconda3/envs/llama_factory/lib/python3.11/site-packages/transformers/training_args.py:1751: FutureWarning: `--push_to_hub_token` is deprecated and will be removed in version 5 of 🤗 Transformers. Use `--hub_token` instead. warnings.warn( 01/21/2024 12:00:12 - INFO - llmtuner.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1 distributed training: True, compute dtype: None 01/21/2024 12:00:12 - INFO - llmtuner.hparams.parser - Training/evaluation parameters Seq2SeqTrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_persistent_workers=False, dataloader_pin_memory=True, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=False, ddp_timeout=1800, debug=[], deepspeed=None, disable_tqdm=False, dispatch_batches=None, do_eval=True, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=100, evaluation_strategy=IntervalStrategy.STEPS, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, generation_config=None, generation_max_length=None, generation_num_beams=None, gradient_accumulation_steps=4, gradient_checkpointing=False, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=None, hub_private_repo=False, hub_strategy=HubStrategy.EVERY_SAVE, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=5e-05, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/runs/Jan21_12-00-12_yhyu13fuwuqi, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=100, logging_strategy=IntervalStrategy.STEPS, lr_scheduler_kwargs={}, lr_scheduler_type=SchedulerType.COSINE, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mp_parameters=, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=1.0, optim=OptimizerNames.ADAMW_TORCH, optim_args=None, output_dir=./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=1, per_device_train_batch_size=1, predict_with_generate=False, prediction_loss_only=True, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=['tensorboard'], resume_from_checkpoint=None, run_name=./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=100, save_strategy=IntervalStrategy.STEPS, save_total_limit=None, seed=42, skip_memory_metrics=True, sortish_sampler=False, split_batches=False, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, ) 01/21/2024 12:00:12 - WARNING - llmtuner.hparams.parser - We recommend enable `upcast_layernorm` in quantized training. 01/21/2024 12:00:12 - WARNING - llmtuner.hparams.parser - We recommend enable mixed precision training. 01/21/2024 12:00:12 - WARNING - llmtuner.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training. [INFO|tokenization_utils_base.py:2024] 2024-01-21 12:00:12,904 >> loading file tokenizer.model [INFO|tokenization_utils_base.py:2024] 2024-01-21 12:00:12,904 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:2024] 2024-01-21 12:00:12,904 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:2024] 2024-01-21 12:00:12,904 >> loading file tokenizer_config.json [INFO|tokenization_utils_base.py:2024] 2024-01-21 12:00:12,904 >> loading file tokenizer.json /home/hangyu5/anaconda3/envs/llama_factory/lib/python3.11/site-packages/transformers/training_args.py:1751: FutureWarning: `--push_to_hub_token` is deprecated and will be removed in version 5 of 🤗 Transformers. Use `--hub_token` instead. warnings.warn( 01/21/2024 12:00:12 - INFO - llmtuner.hparams.parser - Process rank: 1, device: cuda:1, n_gpu: 1 distributed training: True, compute dtype: None 01/21/2024 12:00:12 - INFO - llmtuner.hparams.parser - Training/evaluation parameters Seq2SeqTrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_persistent_workers=False, dataloader_pin_memory=True, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=False, ddp_timeout=1800, debug=[], deepspeed=None, disable_tqdm=False, dispatch_batches=None, do_eval=True, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=100, evaluation_strategy=IntervalStrategy.STEPS, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, generation_config=None, generation_max_length=None, generation_num_beams=None, gradient_accumulation_steps=4, gradient_checkpointing=False, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=None, hub_private_repo=False, hub_strategy=HubStrategy.EVERY_SAVE, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=5e-05, length_column_name=length, load_best_model_at_end=False, local_rank=1, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/runs/Jan21_12-00-12_yhyu13fuwuqi, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=100, logging_strategy=IntervalStrategy.STEPS, lr_scheduler_kwargs={}, lr_scheduler_type=SchedulerType.COSINE, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mp_parameters=, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=1.0, optim=OptimizerNames.ADAMW_TORCH, optim_args=None, output_dir=./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=1, per_device_train_batch_size=1, predict_with_generate=False, prediction_loss_only=True, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=['tensorboard'], resume_from_checkpoint=None, run_name=./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=100, save_strategy=IntervalStrategy.STEPS, save_total_limit=None, seed=42, skip_memory_metrics=True, sortish_sampler=False, split_batches=False, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, ) [INFO|configuration_utils.py:737] 2024-01-21 12:00:12,992 >> loading configuration file cognitivecomputations/dolphin-2.6-mistral-7b-dpo-laser/config.json [INFO|configuration_utils.py:802] 2024-01-21 12:00:12,993 >> Model config MistralConfig { "_name_or_path": "cognitivecomputations/dolphin-2.6-mistral-7b-dpo-laser", "architectures": [ "MistralForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 32768, "model_type": "mistral", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "rms_norm_eps": 1e-05, "rope_theta": 10000.0, "sliding_window": null, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.36.2", "use_cache": false, "vocab_size": 32001 } 01/21/2024 12:00:12 - INFO - llmtuner.model.patcher - Quantizing model to 4 bit. [INFO|modeling_utils.py:3341] 2024-01-21 12:00:13,019 >> loading weights file cognitivecomputations/dolphin-2.6-mistral-7b-dpo-laser/model.safetensors.index.json [INFO|modeling_utils.py:1341] 2024-01-21 12:00:13,020 >> Instantiating MistralForCausalLM model under default dtype torch.bfloat16. [INFO|configuration_utils.py:826] 2024-01-21 12:00:13,021 >> Generate config GenerationConfig { "bos_token_id": 1, "eos_token_id": 2, "use_cache": false } 01/21/2024 12:00:13 - INFO - llmtuner.model.patcher - Quantizing model to 4 bit. [INFO|modeling_utils.py:3483] 2024-01-21 12:00:15,423 >> Detected 4-bit loading: activating 4-bit loading for this model Loading checkpoint shards: 0%| | 0/3 [00:00> All model checkpoint weights were used when initializing MistralForCausalLM. [INFO|modeling_utils.py:4193] 2024-01-21 12:00:40,729 >> All the weights of MistralForCausalLM were initialized from the model checkpoint at ./models/dolphin-2.6-mistral-7b-dpo-laser. If your task is similar to the task the model of the checkpoint was trained on, you can already use MistralForCausalLM for predictions without further training. [INFO|configuration_utils.py:779] 2024-01-21 12:00:40,738 >> loading configuration file ./models/dolphin-2.6-mistral-7b-dpo-laser/generation_config.json [INFO|configuration_utils.py:826] 2024-01-21 12:00:40,738 >> Generate config GenerationConfig { "bos_token_id": 1, "eos_token_id": 2 } Loading checkpoint shards: 100%|██████████| 3/3 [00:25<00:00, 8.22s/it] Loading checkpoint shards: 100%|██████████| 3/3 [00:25<00:00, 8.37s/it] 01/21/2024 12:00:40 - INFO - llmtuner.model.patcher - Gradient checkpointing enabled. 01/21/2024 12:00:40 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA 01/21/2024 12:00:41 - INFO - llmtuner.model.loader - trainable params: 3407872 || all params: 7245148160 || trainable%: 0.0470 01/21/2024 12:00:41 - WARNING - llmtuner.data.utils - Checksum failed: missing SHA-1 hash value in dataset_info.json. 01/21/2024 12:00:41 - INFO - llmtuner.model.patcher - Gradient checkpointing enabled. 01/21/2024 12:00:41 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA 01/21/2024 12:00:41 - INFO - llmtuner.model.loader - trainable params: 3407872 || all params: 7245148160 || trainable%: 0.0470 Using custom data configuration default-f901fb7e685ba757 Loading Dataset Infos from /home/hangyu5/anaconda3/envs/llama_factory/lib/python3.11/site-packages/datasets/packaged_modules/json Generating dataset json (/home/hangyu5/.cache/huggingface/datasets/json/default-f901fb7e685ba757/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96) Downloading and preparing dataset json/default to /home/hangyu5/.cache/huggingface/datasets/json/default-f901fb7e685ba757/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96... Downloading took 0.0 min Checksum Computation took 0.0 min Generating train split Generating train split: 0 examples [00:00, ? examples/s] Generating train split: 10051 examples [00:00, 90841.01 examples/s] Generating train split: 10051 examples [00:00, 87163.81 examples/s] Unable to verify splits sizes. Dataset json downloaded and prepared to /home/hangyu5/.cache/huggingface/datasets/json/default-f901fb7e685ba757/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96. Subsequent calls will reuse this data. Converting format of dataset: 0%| | 0/10051 [00:00 [INST] SYSTEM: You are a helpful assistant with access to the following functions. Use them if required - { "name": "get_exchange_rate", "description": "Get the exchange rate between two currencies", "parameters": { "type": "object", "properties": { "base_currency": { "type": "string", "description": "The currency to convert from" }, "target_currency": { "type": "string", "description": "The currency to convert to" } }, "required": [ "base_currency", "target_currency" ] } } Can you book a flight for me from New York to London? [/INST] I'm sorry, but I don't have the capability to book flights. My current function allows me to get the exchange rate between two currencies. If you need help with that, feel free to ask!<|im_end|> label_ids: [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 315, 28742, 28719, 7371, 28725, 562, 315, 949, 28742, 28707, 506, 272, 21368, 298, 1820, 22447, 28723, 1984, 1868, 908, 5976, 528, 298, 625, 272, 8877, 4338, 1444, 989, 1191, 951, 20023, 28723, 1047, 368, 927, 1316, 395, 369, 28725, 1601, 1933, 298, 1460, 28808, 2] labels: I'm sorry, but I don't have the capability to book flights. My current function allows me to get the exchange rate between two currencies. If you need help with that, feel free to ask!<|im_end|> [INFO|training_args.py:1838] 2024-01-21 12:00:59,346 >> PyTorch: setting up devices /home/hangyu5/anaconda3/envs/llama_factory/lib/python3.11/site-packages/transformers/training_args.py:1751: FutureWarning: `--push_to_hub_token` is deprecated and will be removed in version 5 of 🤗 Transformers. Use `--hub_token` instead. warnings.warn( Caching indices mapping at /home/hangyu5/.cache/huggingface/datasets/json/default-f901fb7e685ba757/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96/cache-3b3933eaf185fbda.arrow Caching indices mapping at /home/hangyu5/.cache/huggingface/datasets/json/default-f901fb7e685ba757/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96/cache-b78bdd1717eda93b.arrow Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. Running tokenizer on dataset: 0%| | 0/10051 [00:00> ***** Running training ***** [INFO|trainer.py:1707] 2024-01-21 12:01:15,693 >> Num examples = 9,045 [INFO|trainer.py:1708] 2024-01-21 12:01:15,693 >> Num Epochs = 1 [INFO|trainer.py:1709] 2024-01-21 12:01:15,693 >> Instantaneous batch size per device = 1 [INFO|trainer.py:1712] 2024-01-21 12:01:15,693 >> Total train batch size (w. parallel, distributed & accumulation) = 8 [INFO|trainer.py:1713] 2024-01-21 12:01:15,693 >> Gradient Accumulation steps = 4 [INFO|trainer.py:1714] 2024-01-21 12:01:15,693 >> Total optimization steps = 1,130 [INFO|trainer.py:1715] 2024-01-21 12:01:15,695 >> Number of trainable parameters = 3,407,872 0%| | 0/1130 [00:00> ***** Running Evaluation ***** [INFO|trainer.py:3168] 2024-01-21 12:05:28,193 >> Num examples = 1006 [INFO|trainer.py:3171] 2024-01-21 12:05:28,193 >> Batch size = 1 0%| | 0/503 [00:00> Saving model checkpoint to ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-100 [INFO|tokenization_utils_base.py:2432] 2024-01-21 12:07:18,088 >> tokenizer config file saved in ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-100/tokenizer_config.json [INFO|tokenization_utils_base.py:2441] 2024-01-21 12:07:18,088 >> Special tokens file saved in ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-100/special_tokens_map.json [INFO|tokenization_utils_base.py:2492] 2024-01-21 12:07:18,088 >> added tokens file saved in ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-100/added_tokens.json 9%|▉ | 101/1130 [06:03<10:09:28, 35.54s/it] 9%|▉ | 102/1130 [06:06<7:19:20, 25.64s/it] 9%|▉ | 103/1130 [06:08<5:19:49, 18.69s/it] 9%|▉ | 104/1130 [06:11<3:55:24, 13.77s/it] 9%|▉ | 105/1130 [06:13<2:58:41, 10.46s/it] 9%|▉ | 106/1130 [06:16<2:17:09, 8.04s/it] 9%|▉ | 107/1130 [06:19<1:49:51, 6.44s/it] 10%|▉ | 108/1130 [06:21<1:29:16, 5.24s/it] 10%|▉ | 109/1130 [06:23<1:14:53, 4.40s/it] 10%|▉ | 110/1130 [06:26<1:05:04, 3.83s/it] 10%|▉ | 111/1130 [06:28<57:53, 3.41s/it] 10%|▉ | 112/1130 [06:31<52:58, 3.12s/it] 10%|█ | 113/1130 [06:33<48:41, 2.87s/it] 10%|█ | 114/1130 [06:35<46:28, 2.74s/it] 10%|█ | 115/1130 [06:38<44:45, 2.65s/it] 10%|█ | 116/1130 [06:40<43:19, 2.56s/it] 10%|█ | 117/1130 [06:43<42:54, 2.54s/it] 10%|█ | 118/1130 [06:46<43:52, 2.60s/it] 11%|█ | 119/1130 [06:48<44:26, 2.64s/it] 11%|█ | 120/1130 [06:51<43:05, 2.56s/it] 11%|█ | 121/1130 [06:53<43:03, 2.56s/it] 11%|█ | 122/1130 [06:56<42:26, 2.53s/it] 11%|█ | 123/1130 [06:58<43:07, 2.57s/it] 11%|█ | 124/1130 [07:01<43:58, 2.62s/it] 11%|█ | 125/1130 [07:04<44:43, 2.67s/it] 11%|█ | 126/1130 [07:06<42:40, 2.55s/it] 11%|█ | 127/1130 [07:08<41:23, 2.48s/it] 11%|█▏ | 128/1130 [07:11<41:34, 2.49s/it] 11%|█▏ | 129/1130 [07:13<40:47, 2.45s/it] 12%|█▏ | 130/1130 [07:16<40:05, 2.41s/it] 12%|█▏ | 131/1130 [07:18<40:23, 2.43s/it] 12%|█▏ | 132/1130 [07:21<41:23, 2.49s/it] 12%|█▏ | 133/1130 [07:23<40:13, 2.42s/it] 12%|█▏ | 134/1130 [07:26<40:54, 2.46s/it] 12%|█▏ | 135/1130 [07:28<41:00, 2.47s/it] 12%|█▏ | 136/1130 [07:31<41:20, 2.50s/it] 12%|█▏ | 137/1130 [07:33<43:21, 2.62s/it] 12%|█▏ | 138/1130 [07:36<42:52, 2.59s/it] 12%|█▏ | 139/1130 [07:38<41:41, 2.52s/it] 12%|█▏ | 140/1130 [07:41<41:13, 2.50s/it] 12%|█▏ | 141/1130 [07:43<41:36, 2.52s/it] 13%|█▎ | 142/1130 [07:46<42:18, 2.57s/it] 13%|█▎ | 143/1130 [07:48<41:26, 2.52s/it] 13%|█▎ | 144/1130 [07:51<42:22, 2.58s/it] 13%|█▎ | 145/1130 [07:54<43:20, 2.64s/it] 13%|█▎ | 146/1130 [07:56<42:40, 2.60s/it] 13%|█▎ | 147/1130 [07:59<41:18, 2.52s/it] 13%|█▎ | 148/1130 [08:02<43:56, 2.68s/it] 13%|█▎ | 149/1130 [08:04<42:57, 2.63s/it] 13%|█▎ | 150/1130 [08:07<43:07, 2.64s/it] 13%|█▎ | 151/1130 [08:10<43:47, 2.68s/it] 13%|█▎ | 152/1130 [08:12<42:00, 2.58s/it] 14%|█▎ | 153/1130 [08:15<40:58, 2.52s/it] 14%|█▎ | 154/1130 [08:17<41:07, 2.53s/it] 14%|█▎ | 155/1130 [08:19<40:13, 2.48s/it] 14%|█▍ | 156/1130 [08:22<40:57, 2.52s/it] 14%|█▍ | 157/1130 [08:24<40:26, 2.49s/it] 14%|█▍ | 158/1130 [08:27<41:18, 2.55s/it] 14%|█▍ | 159/1130 [08:30<42:17, 2.61s/it] 14%|█▍ | 160/1130 [08:33<44:13, 2.74s/it] 14%|█▍ | 161/1130 [08:36<46:13, 2.86s/it] 14%|█▍ | 162/1130 [08:39<46:00, 2.85s/it] 14%|█▍ | 163/1130 [08:42<44:45, 2.78s/it] 15%|█▍ | 164/1130 [08:44<43:38, 2.71s/it] 15%|█▍ | 165/1130 [08:47<42:17, 2.63s/it] 15%|█▍ | 166/1130 [08:49<41:39, 2.59s/it] 15%|█▍ | 167/1130 [08:51<40:21, 2.51s/it] 15%|█▍ | 168/1130 [08:54<40:12, 2.51s/it] 15%|█▍ | 169/1130 [08:57<41:34, 2.60s/it] 15%|█▌ | 170/1130 [08:59<41:13, 2.58s/it] 15%|█▌ | 171/1130 [09:02<40:29, 2.53s/it] 15%|█▌ | 172/1130 [09:04<41:15, 2.58s/it] 15%|█▌ | 173/1130 [09:07<41:16, 2.59s/it] 15%|█▌ | 174/1130 [09:10<42:24, 2.66s/it] 15%|█▌ | 175/1130 [09:13<43:34, 2.74s/it] 16%|█▌ | 176/1130 [09:15<42:11, 2.65s/it] 16%|█▌ | 177/1130 [09:18<41:30, 2.61s/it] 16%|█▌ | 178/1130 [09:20<40:21, 2.54s/it] 16%|█▌ | 179/1130 [09:22<39:23, 2.49s/it] 16%|█▌ | 180/1130 [09:25<39:48, 2.51s/it] 16%|█▌ | 181/1130 [09:28<40:07, 2.54s/it] 16%|█▌ | 182/1130 [09:30<39:56, 2.53s/it] 16%|█▌ | 183/1130 [09:33<40:01, 2.54s/it] 16%|█▋ | 184/1130 [09:35<38:41, 2.45s/it] 16%|█▋ | 185/1130 [09:37<38:34, 2.45s/it] 16%|█▋ | 186/1130 [09:40<39:34, 2.52s/it] 17%|█▋ | 187/1130 [09:42<39:13, 2.50s/it] 17%|█▋ | 188/1130 [09:45<39:18, 2.50s/it] 17%|█▋ | 189/1130 [09:48<39:47, 2.54s/it] 17%|█▋ | 190/1130 [09:50<39:58, 2.55s/it] 17%|█▋ | 191/1130 [09:53<39:38, 2.53s/it] 17%|█▋ | 192/1130 [09:55<39:02, 2.50s/it] 17%|█▋ | 193/1130 [09:58<41:33, 2.66s/it] 17%|█▋ | 194/1130 [10:00<39:46, 2.55s/it] 17%|█▋ | 195/1130 [10:03<40:40, 2.61s/it] 17%|█▋ | 196/1130 [10:06<39:39, 2.55s/it] 17%|█▋ | 197/1130 [10:08<38:49, 2.50s/it] 18%|█▊ | 198/1130 [10:10<38:52, 2.50s/it] 18%|█▊ | 199/1130 [10:13<39:43, 2.56s/it] 18%|█▊ | 200/1130 [10:15<38:30, 2.48s/it] {'loss': 0.1149, 'learning_rate': 4.6233876873505694e-05, 'epoch': 0.18} 18%|█▊ | 200/1130 [10:15<38:30, 2.48s/it][INFO|trainer.py:3166] 2024-01-21 12:11:32,943 >> ***** Running Evaluation ***** [INFO|trainer.py:3168] 2024-01-21 12:11:32,943 >> Num examples = 1006 [INFO|trainer.py:3171] 2024-01-21 12:11:32,943 >> Batch size = 1 0%| | 0/503 [00:00> Saving model checkpoint to ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-200 [INFO|tokenization_utils_base.py:2432] 2024-01-21 12:13:22,880 >> tokenizer config file saved in ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-200/tokenizer_config.json [INFO|tokenization_utils_base.py:2441] 2024-01-21 12:13:22,880 >> Special tokens file saved in ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-200/special_tokens_map.json [INFO|tokenization_utils_base.py:2492] 2024-01-21 12:13:22,880 >> added tokens file saved in ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-200/added_tokens.json 18%|█▊ | 201/1130 [12:08<9:10:50, 35.58s/it] 18%|█▊ | 202/1130 [12:11<6:37:36, 25.71s/it] 18%|█▊ | 203/1130 [12:13<4:48:34, 18.68s/it] 18%|█▊ | 204/1130 [12:16<3:34:23, 13.89s/it] 18%|█▊ | 205/1130 [12:19<2:41:51, 10.50s/it] 18%|█▊ | 206/1130 [12:21<2:05:26, 8.15s/it] 18%|█▊ | 207/1130 [12:24<1:39:19, 6.46s/it] 18%|█▊ | 208/1130 [12:26<1:22:20, 5.36s/it] 18%|█▊ | 209/1130 [12:29<1:11:13, 4.64s/it] 19%|█▊ | 210/1130 [12:32<1:01:22, 4.00s/it] 19%|█▊ | 211/1130 [12:35<54:47, 3.58s/it] 19%|█▉ | 212/1130 [12:37<49:24, 3.23s/it] 19%|█▉ | 213/1130 [12:39<45:44, 2.99s/it] 19%|█▉ | 214/1130 [12:42<43:52, 2.87s/it] 19%|█▉ | 215/1130 [12:45<42:26, 2.78s/it] 19%|█▉ | 216/1130 [12:47<41:25, 2.72s/it] 19%|█▉ | 217/1130 [12:50<40:46, 2.68s/it] 19%|█▉ | 218/1130 [12:52<39:42, 2.61s/it] 19%|█▉ | 219/1130 [12:55<39:17, 2.59s/it] 19%|█▉ | 220/1130 [12:57<39:09, 2.58s/it] 20%|█▉ | 221/1130 [13:00<38:17, 2.53s/it] 20%|█▉ | 222/1130 [13:02<37:17, 2.46s/it] 20%|█▉ | 223/1130 [13:04<37:18, 2.47s/it] 20%|█▉ | 224/1130 [13:07<36:54, 2.44s/it] 20%|█▉ | 225/1130 [13:09<37:04, 2.46s/it] 20%|██ | 226/1130 [13:12<39:52, 2.65s/it] 20%|██ | 227/1130 [13:15<37:55, 2.52s/it] 20%|██ | 228/1130 [13:17<38:24, 2.55s/it] 20%|██ | 229/1130 [13:20<39:21, 2.62s/it] 20%|██ | 230/1130 [13:22<38:22, 2.56s/it] 20%|██ | 231/1130 [13:25<38:29, 2.57s/it] 21%|██ | 232/1130 [13:28<38:15, 2.56s/it] 21%|██ | 233/1130 [13:31<40:01, 2.68s/it] 21%|██ | 234/1130 [13:33<39:06, 2.62s/it] 21%|██ | 235/1130 [13:36<38:24, 2.57s/it] 21%|██ | 236/1130 [13:38<38:08, 2.56s/it] 21%|██ | 237/1130 [13:40<37:07, 2.49s/it] 21%|██ | 238/1130 [13:43<36:45, 2.47s/it] 21%|██ | 239/1130 [13:45<37:43, 2.54s/it] 21%|██ | 240/1130 [13:48<39:06, 2.64s/it] 21%|██▏ | 241/1130 [13:51<37:34, 2.54s/it] 21%|██▏ | 242/1130 [13:53<37:58, 2.57s/it] 22%|██▏ | 243/1130 [13:56<37:10, 2.51s/it] 22%|██▏ | 244/1130 [13:58<36:18, 2.46s/it] 22%|██▏ | 245/1130 [14:01<37:47, 2.56s/it] 22%|██▏ | 246/1130 [14:03<37:31, 2.55s/it] 22%|██▏ | 247/1130 [14:06<36:44, 2.50s/it] 22%|██▏ | 248/1130 [14:08<37:11, 2.53s/it] 22%|██▏ | 249/1130 [14:11<37:19, 2.54s/it] 22%|██▏ | 250/1130 [14:13<35:58, 2.45s/it] 22%|██▏ | 251/1130 [14:16<36:24, 2.48s/it] 22%|██▏ | 252/1130 [14:18<36:00, 2.46s/it] 22%|██▏ | 253/1130 [14:21<36:32, 2.50s/it] 22%|██▏ | 254/1130 [14:23<35:23, 2.42s/it] 23%|██▎ | 255/1130 [14:26<36:25, 2.50s/it] 23%|██▎ | 256/1130 [14:28<35:35, 2.44s/it] 23%|██▎ | 257/1130 [14:31<37:06, 2.55s/it] 23%|██▎ | 258/1130 [14:33<37:33, 2.58s/it] 23%|██▎ | 259/1130 [14:36<37:39, 2.59s/it] 23%|██▎ | 260/1130 [14:38<36:20, 2.51s/it] 23%|██▎ | 261/1130 [14:41<37:35, 2.60s/it] 23%|██▎ | 262/1130 [14:43<36:25, 2.52s/it] 23%|██▎ | 263/1130 [14:46<37:35, 2.60s/it] 23%|██▎ | 264/1130 [14:49<37:24, 2.59s/it] 23%|██▎ | 265/1130 [14:52<38:14, 2.65s/it] 24%|██▎ | 266/1130 [14:54<37:13, 2.59s/it] 24%|██▎ | 267/1130 [14:57<36:54, 2.57s/it] 24%|██▎ | 268/1130 [14:59<35:34, 2.48s/it] 24%|██▍ | 269/1130 [15:01<36:07, 2.52s/it] 24%|██▍ | 270/1130 [15:04<36:13, 2.53s/it] 24%|██▍ | 271/1130 [15:06<35:49, 2.50s/it] 24%|██▍ | 272/1130 [15:09<35:28, 2.48s/it] 24%|██▍ | 273/1130 [15:11<35:43, 2.50s/it] 24%|██▍ | 274/1130 [15:14<35:48, 2.51s/it] 24%|██▍ | 275/1130 [15:17<37:27, 2.63s/it] 24%|██▍ | 276/1130 [15:19<36:38, 2.57s/it] 25%|██▍ | 277/1130 [15:22<36:02, 2.53s/it] 25%|██▍ | 278/1130 [15:24<34:28, 2.43s/it] 25%|██▍ | 279/1130 [15:26<33:13, 2.34s/it] 25%|██▍ | 280/1130 [15:28<33:05, 2.34s/it] 25%|██▍ | 281/1130 [15:31<33:41, 2.38s/it] 25%|██▍ | 282/1130 [15:33<33:51, 2.40s/it] 25%|██▌ | 283/1130 [15:36<35:28, 2.51s/it] 25%|██▌ | 284/1130 [15:39<35:14, 2.50s/it] 25%|██▌ | 285/1130 [15:41<35:49, 2.54s/it] 25%|██▌ | 286/1130 [15:44<35:32, 2.53s/it] 25%|██▌ | 287/1130 [15:46<35:37, 2.54s/it] 25%|██▌ | 288/1130 [15:49<35:45, 2.55s/it] 26%|██▌ | 289/1130 [15:51<36:04, 2.57s/it] 26%|██▌ | 290/1130 [15:54<37:21, 2.67s/it] 26%|██▌ | 291/1130 [15:57<36:02, 2.58s/it] 26%|██▌ | 292/1130 [16:00<37:20, 2.67s/it] 26%|██▌ | 293/1130 [16:02<36:58, 2.65s/it] 26%|██▌ | 294/1130 [16:05<37:31, 2.69s/it] 26%|██▌ | 295/1130 [16:08<36:58, 2.66s/it] 26%|██▌ | 296/1130 [16:10<35:42, 2.57s/it] 26%|██▋ | 297/1130 [16:13<35:50, 2.58s/it] 26%|██▋ | 298/1130 [16:15<35:56, 2.59s/it] 26%|██▋ | 299/1130 [16:18<36:46, 2.66s/it] 27%|██▋ | 300/1130 [16:20<35:13, 2.55s/it] {'loss': 0.0871, 'learning_rate': 4.1797019046425264e-05, 'epoch': 0.27} 27%|██▋ | 300/1130 [16:20<35:13, 2.55s/it][INFO|trainer.py:3166] 2024-01-21 12:17:37,748 >> ***** Running Evaluation ***** [INFO|trainer.py:3168] 2024-01-21 12:17:37,748 >> Num examples = 1006 [INFO|trainer.py:3171] 2024-01-21 12:17:37,748 >> Batch size = 1 0%| | 0/503 [00:00> Saving model checkpoint to ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-300 [INFO|tokenization_utils_base.py:2432] 2024-01-21 12:19:27,933 >> tokenizer config file saved in ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-300/tokenizer_config.json [INFO|tokenization_utils_base.py:2441] 2024-01-21 12:19:27,934 >> Special tokens file saved in ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-300/special_tokens_map.json [INFO|tokenization_utils_base.py:2492] 2024-01-21 12:19:27,934 >> added tokens file saved in ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-300/added_tokens.json 27%|██▋ | 301/1130 [18:13<8:12:02, 35.61s/it] 27%|██▋ | 302/1130 [18:15<5:54:12, 25.67s/it] 27%|██▋ | 303/1130 [18:18<4:17:23, 18.67s/it] 27%|██▋ | 304/1130 [18:20<3:10:19, 13.83s/it] 27%|██▋ | 305/1130 [18:23<2:24:04, 10.48s/it] 27%|██▋ | 306/1130 [18:25<1:50:04, 8.01s/it] 27%|██▋ | 307/1130 [18:28<1:27:52, 6.41s/it] 27%|██▋ | 308/1130 [18:31<1:12:02, 5.26s/it] 27%|██▋ | 309/1130 [18:33<1:01:04, 4.46s/it] 27%|██▋ | 310/1130 [18:35<51:33, 3.77s/it] 28%|██▊ | 311/1130 [18:38<46:03, 3.37s/it] 28%|██▊ | 312/1130 [18:40<42:46, 3.14s/it] 28%|██▊ | 313/1130 [18:43<40:09, 2.95s/it] 28%|██▊ | 314/1130 [18:45<38:16, 2.81s/it] 28%|██▊ | 315/1130 [18:48<38:08, 2.81s/it] 28%|██▊ | 316/1130 [18:51<37:10, 2.74s/it] 28%|██▊ | 317/1130 [18:53<35:39, 2.63s/it] 28%|██▊ | 318/1130 [18:56<36:27, 2.69s/it] 28%|██▊ | 319/1130 [18:58<34:01, 2.52s/it] 28%|██▊ | 320/1130 [19:01<33:58, 2.52s/it] 28%|██▊ | 321/1130 [19:03<34:57, 2.59s/it] 28%|██▊ | 322/1130 [19:06<34:04, 2.53s/it] 29%|██▊ | 323/1130 [19:08<34:10, 2.54s/it] 29%|██▊ | 324/1130 [19:11<34:11, 2.54s/it] 29%|██▉ | 325/1130 [19:13<33:16, 2.48s/it] 29%|██▉ | 326/1130 [19:16<33:47, 2.52s/it] 29%|██▉ | 327/1130 [19:19<35:30, 2.65s/it] 29%|██▉ | 328/1130 [19:21<35:20, 2.64s/it] 29%|██▉ | 329/1130 [19:24<36:11, 2.71s/it] 29%|██▉ | 330/1130 [19:27<35:54, 2.69s/it] 29%|██▉ | 331/1130 [19:29<35:09, 2.64s/it] 29%|██▉ | 332/1130 [19:32<34:50, 2.62s/it] 29%|██▉ | 333/1130 [19:35<34:49, 2.62s/it] 30%|██▉ | 334/1130 [19:37<34:25, 2.60s/it] 30%|██▉ | 335/1130 [19:40<34:13, 2.58s/it] 30%|██▉ | 336/1130 [19:42<33:39, 2.54s/it] 30%|██▉ | 337/1130 [19:45<33:40, 2.55s/it] 30%|██▉ | 338/1130 [19:47<33:30, 2.54s/it] 30%|███ | 339/1130 [19:50<33:04, 2.51s/it] 30%|███ | 340/1130 [19:52<34:13, 2.60s/it] 30%|███ | 341/1130 [19:55<35:04, 2.67s/it] 30%|███ | 342/1130 [19:58<34:21, 2.62s/it] 30%|███ | 343/1130 [20:00<33:57, 2.59s/it] 30%|███ | 344/1130 [20:03<34:30, 2.63s/it] 31%|███ | 345/1130 [20:06<34:28, 2.64s/it] 31%|███ | 346/1130 [20:08<33:34, 2.57s/it] 31%|███ | 347/1130 [20:11<32:57, 2.53s/it] 31%|███ | 348/1130 [20:13<33:17, 2.55s/it] 31%|███ | 349/1130 [20:16<32:51, 2.52s/it] 31%|███ | 350/1130 [20:18<32:39, 2.51s/it] 31%|███ | 351/1130 [20:21<32:55, 2.54s/it] 31%|███ | 352/1130 [20:23<33:08, 2.56s/it] 31%|███ | 353/1130 [20:26<32:31, 2.51s/it] 31%|███▏ | 354/1130 [20:29<34:59, 2.71s/it] 31%|███▏ | 355/1130 [20:31<33:32, 2.60s/it] 32%|███▏ | 356/1130 [20:34<32:49, 2.54s/it] 32%|███▏ | 357/1130 [20:36<31:56, 2.48s/it] 32%|███▏ | 358/1130 [20:39<32:40, 2.54s/it] 32%|███▏ | 359/1130 [20:41<32:43, 2.55s/it] 32%|███▏ | 360/1130 [20:44<33:09, 2.58s/it] 32%|███▏ | 361/1130 [20:46<32:35, 2.54s/it] 32%|███▏ | 362/1130 [20:49<32:31, 2.54s/it] 32%|███▏ | 363/1130 [20:51<31:39, 2.48s/it] 32%|███▏ | 364/1130 [20:54<31:52, 2.50s/it] 32%|███▏ | 365/1130 [20:56<31:26, 2.47s/it] 32%|███▏ | 366/1130 [20:59<31:42, 2.49s/it] 32%|███▏ | 367/1130 [21:01<32:08, 2.53s/it] 33%|███▎ | 368/1130 [21:04<31:16, 2.46s/it] 33%|███▎ | 369/1130 [21:06<32:16, 2.55s/it] 33%|███▎ | 370/1130 [21:09<32:06, 2.53s/it] 33%|███▎ | 371/1130 [21:11<32:13, 2.55s/it] 33%|███▎ | 372/1130 [21:14<33:24, 2.64s/it] 33%|███▎ | 373/1130 [21:17<32:46, 2.60s/it] 33%|███▎ | 374/1130 [21:19<32:32, 2.58s/it] 33%|███▎ | 375/1130 [21:22<31:48, 2.53s/it] 33%|███▎ | 376/1130 [21:25<33:53, 2.70s/it] 33%|███▎ | 377/1130 [21:27<33:09, 2.64s/it] 33%|███▎ | 378/1130 [21:30<32:33, 2.60s/it] 34%|███▎ | 379/1130 [21:32<32:15, 2.58s/it] 34%|███▎ | 380/1130 [21:35<32:21, 2.59s/it] 34%|███▎ | 381/1130 [21:37<31:29, 2.52s/it] 34%|███▍ | 382/1130 [21:40<30:25, 2.44s/it] 34%|███▍ | 383/1130 [21:42<30:31, 2.45s/it] 34%|███▍ | 384/1130 [21:44<30:33, 2.46s/it] 34%|███▍ | 385/1130 [21:47<30:45, 2.48s/it] 34%|███▍ | 386/1130 [21:50<31:19, 2.53s/it] 34%|███▍ | 387/1130 [21:52<31:23, 2.54s/it] 34%|███▍ | 388/1130 [21:55<31:22, 2.54s/it] 34%|███▍ | 389/1130 [21:57<31:14, 2.53s/it] 35%|███▍ | 390/1130 [22:00<31:55, 2.59s/it] 35%|███▍ | 391/1130 [22:02<31:16, 2.54s/it] 35%|███▍ | 392/1130 [22:05<30:54, 2.51s/it] 35%|███▍ | 393/1130 [22:08<31:20, 2.55s/it] 35%|███▍ | 394/1130 [22:10<31:28, 2.57s/it] 35%|███▍ | 395/1130 [22:12<30:30, 2.49s/it] 35%|███▌ | 396/1130 [22:15<30:14, 2.47s/it] 35%|███▌ | 397/1130 [22:17<30:33, 2.50s/it] 35%|███▌ | 398/1130 [22:20<30:15, 2.48s/it] 35%|███▌ | 399/1130 [22:22<30:01, 2.47s/it] 35%|███▌ | 400/1130 [22:25<30:01, 2.47s/it] {'loss': 0.0865, 'learning_rate': 3.607020216633599e-05, 'epoch': 0.35} 35%|███▌ | 400/1130 [22:25<30:01, 2.47s/it][INFO|trainer.py:3166] 2024-01-21 12:23:42,252 >> ***** Running Evaluation ***** [INFO|trainer.py:3168] 2024-01-21 12:23:42,252 >> Num examples = 1006 [INFO|trainer.py:3171] 2024-01-21 12:23:42,252 >> Batch size = 1 0%| | 0/503 [00:00> Saving model checkpoint to ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-400 [INFO|tokenization_utils_base.py:2432] 2024-01-21 12:25:32,312 >> tokenizer config file saved in ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-400/tokenizer_config.json [INFO|tokenization_utils_base.py:2441] 2024-01-21 12:25:32,313 >> Special tokens file saved in ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-400/special_tokens_map.json [INFO|tokenization_utils_base.py:2492] 2024-01-21 12:25:32,313 >> added tokens file saved in ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-400/added_tokens.json 35%|███▌ | 401/1130 [24:17<7:11:34, 35.52s/it] 36%|███▌ | 402/1130 [24:20<5:11:31, 25.67s/it] 36%|███▌ | 403/1130 [24:23<3:47:12, 18.75s/it] 36%|███▌ | 404/1130 [24:25<2:47:17, 13.83s/it] 36%|███▌ | 405/1130 [24:28<2:06:04, 10.43s/it] 36%|███▌ | 406/1130 [24:30<1:37:58, 8.12s/it] 36%|███▌ | 407/1130 [24:33<1:18:09, 6.49s/it] 36%|███▌ | 408/1130 [24:35<1:03:39, 5.29s/it] 36%|███▌ | 409/1130 [24:38<53:10, 4.42s/it] 36%|███▋ | 410/1130 [24:41<47:00, 3.92s/it] 36%|███▋ | 411/1130 [24:43<41:47, 3.49s/it] 36%|███▋ | 412/1130 [24:46<39:01, 3.26s/it] 37%|███▋ | 413/1130 [24:48<34:42, 2.90s/it] 37%|███▋ | 414/1130 [24:51<34:07, 2.86s/it] 37%|███▋ | 415/1130 [24:53<33:06, 2.78s/it] 37%|███▋ | 416/1130 [24:56<31:52, 2.68s/it] 37%|███▋ | 417/1130 [24:58<30:30, 2.57s/it] 37%|███▋ | 418/1130 [25:01<30:23, 2.56s/it] 37%|███▋ | 419/1130 [25:03<30:01, 2.53s/it] 37%|███▋ | 420/1130 [25:06<31:12, 2.64s/it] 37%|███▋ | 421/1130 [25:08<30:49, 2.61s/it] 37%|███▋ | 422/1130 [25:11<31:10, 2.64s/it] 37%|███▋ | 423/1130 [25:14<30:45, 2.61s/it] 38%|███▊ | 424/1130 [25:16<30:59, 2.63s/it] 38%|███▊ | 425/1130 [25:19<30:17, 2.58s/it] 38%|███▊ | 426/1130 [25:21<29:58, 2.55s/it] 38%|███▊ | 427/1130 [25:24<30:01, 2.56s/it] 38%|███▊ | 428/1130 [25:26<28:26, 2.43s/it] 38%|███▊ | 429/1130 [25:29<28:42, 2.46s/it] 38%|███▊ | 430/1130 [25:31<28:32, 2.45s/it] 38%|███▊ | 431/1130 [25:34<29:26, 2.53s/it] 38%|███▊ | 432/1130 [25:36<30:19, 2.61s/it] 38%|███▊ | 433/1130 [25:39<30:04, 2.59s/it] 38%|███▊ | 434/1130 [25:42<30:04, 2.59s/it] 38%|███▊ | 435/1130 [25:44<29:16, 2.53s/it] 39%|███▊ | 436/1130 [25:47<29:20, 2.54s/it] 39%|███▊ | 437/1130 [25:49<28:36, 2.48s/it] 39%|███▉ | 438/1130 [25:51<27:37, 2.40s/it] 39%|███▉ | 439/1130 [25:54<28:00, 2.43s/it] 39%|███▉ | 440/1130 [25:56<27:24, 2.38s/it] 39%|███▉ | 441/1130 [25:59<28:38, 2.49s/it] 39%|███▉ | 442/1130 [26:01<27:27, 2.39s/it] 39%|███▉ | 443/1130 [26:03<28:01, 2.45s/it] 39%|███▉ | 444/1130 [26:06<28:24, 2.49s/it] 39%|███▉ | 445/1130 [26:08<27:49, 2.44s/it] 39%|███▉ | 446/1130 [26:11<29:18, 2.57s/it] 40%|███▉ | 447/1130 [26:14<29:07, 2.56s/it] 40%|███▉ | 448/1130 [26:16<28:40, 2.52s/it] 40%|███▉ | 449/1130 [26:19<28:59, 2.55s/it] 40%|███▉ | 450/1130 [26:21<28:54, 2.55s/it] 40%|███▉ | 451/1130 [26:24<30:19, 2.68s/it] 40%|████ | 452/1130 [26:27<30:20, 2.69s/it] 40%|████ | 453/1130 [26:30<30:53, 2.74s/it] 40%|████ | 454/1130 [26:32<29:36, 2.63s/it] 40%|████ | 455/1130 [26:35<30:17, 2.69s/it] 40%|████ | 456/1130 [26:38<30:43, 2.74s/it] 40%|████ | 457/1130 [26:40<30:00, 2.68s/it] 41%|████ | 458/1130 [26:43<29:17, 2.62s/it] 41%|████ | 459/1130 [26:45<28:42, 2.57s/it] 41%|████ | 460/1130 [26:48<30:28, 2.73s/it] 41%|████ | 461/1130 [26:51<30:18, 2.72s/it] 41%|████ | 462/1130 [26:54<29:48, 2.68s/it] 41%|████ | 463/1130 [26:56<29:09, 2.62s/it] 41%|████ | 464/1130 [26:59<28:58, 2.61s/it] 41%|████ | 465/1130 [27:01<28:42, 2.59s/it] 41%|████ | 466/1130 [27:04<28:12, 2.55s/it] 41%|████▏ | 467/1130 [27:07<29:06, 2.63s/it] 41%|████▏ | 468/1130 [27:09<28:54, 2.62s/it] 42%|████▏ | 469/1130 [27:12<28:30, 2.59s/it] 42%|████▏ | 470/1130 [27:14<28:14, 2.57s/it] 42%|████▏ | 471/1130 [27:16<27:03, 2.46s/it] 42%|████▏ | 472/1130 [27:19<26:12, 2.39s/it] 42%|████▏ | 473/1130 [27:21<27:05, 2.47s/it] 42%|████▏ | 474/1130 [27:24<26:33, 2.43s/it] 42%|████▏ | 475/1130 [27:27<27:51, 2.55s/it] 42%|████▏ | 476/1130 [27:29<28:11, 2.59s/it] 42%|████▏ | 477/1130 [27:32<28:20, 2.60s/it] 42%|████▏ | 478/1130 [27:35<28:53, 2.66s/it] 42%|████▏ | 479/1130 [27:37<28:14, 2.60s/it] 42%|████▏ | 480/1130 [27:40<28:30, 2.63s/it] 43%|████▎ | 481/1130 [27:42<28:13, 2.61s/it] 43%|████▎ | 482/1130 [27:45<27:11, 2.52s/it] 43%|████▎ | 483/1130 [27:47<26:38, 2.47s/it] 43%|████▎ | 484/1130 [27:50<26:43, 2.48s/it] 43%|████▎ | 485/1130 [27:52<27:38, 2.57s/it] 43%|████▎ | 486/1130 [27:55<27:39, 2.58s/it] 43%|████▎ | 487/1130 [27:57<26:43, 2.49s/it] 43%|████▎ | 488/1130 [28:00<27:17, 2.55s/it] 43%|████▎ | 489/1130 [28:02<26:50, 2.51s/it] 43%|████▎ | 490/1130 [28:05<27:55, 2.62s/it] 43%|████▎ | 491/1130 [28:08<27:10, 2.55s/it] 44%|████▎ | 492/1130 [28:10<26:00, 2.45s/it] 44%|████▎ | 493/1130 [28:12<26:38, 2.51s/it] 44%|████▎ | 494/1130 [28:15<26:42, 2.52s/it] 44%|████▍ | 495/1130 [28:17<26:47, 2.53s/it] 44%|████▍ | 496/1130 [28:20<28:10, 2.67s/it] 44%|████▍ | 497/1130 [28:23<27:31, 2.61s/it] 44%|████▍ | 498/1130 [28:26<27:41, 2.63s/it] 44%|████▍ | 499/1130 [28:28<27:27, 2.61s/it] 44%|████▍ | 500/1130 [28:31<26:58, 2.57s/it] {'loss': 0.0802, 'learning_rate': 2.9493228037294702e-05, 'epoch': 0.44} 44%|████▍ | 500/1130 [28:31<26:58, 2.57s/it][INFO|trainer.py:3166] 2024-01-21 12:29:48,165 >> ***** Running Evaluation ***** [INFO|trainer.py:3168] 2024-01-21 12:29:48,165 >> Num examples = 1006 [INFO|trainer.py:3171] 2024-01-21 12:29:48,165 >> Batch size = 1 0%| | 0/503 [00:00> Saving model checkpoint to ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-500 [INFO|tokenization_utils_base.py:2432] 2024-01-21 12:31:38,277 >> tokenizer config file saved in ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-500/tokenizer_config.json [INFO|tokenization_utils_base.py:2441] 2024-01-21 12:31:38,277 >> Special tokens file saved in ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-500/special_tokens_map.json [INFO|tokenization_utils_base.py:2492] 2024-01-21 12:31:38,278 >> added tokens file saved in ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-500/added_tokens.json 44%|████▍ | 501/1130 [30:24<6:13:54, 35.67s/it] 44%|████▍ | 502/1130 [30:26<4:28:51, 25.69s/it] 45%|████▍ | 503/1130 [30:29<3:15:56, 18.75s/it] 45%|████▍ | 504/1130 [30:31<2:24:17, 13.83s/it] 45%|████▍ | 505/1130 [30:34<1:49:16, 10.49s/it] 45%|████▍ | 506/1130 [30:36<1:23:59, 8.08s/it] 45%|████▍ | 507/1130 [30:39<1:07:00, 6.45s/it] 45%|████▍ | 508/1130 [30:41<55:08, 5.32s/it] 45%|████▌ | 509/1130 [30:44<46:24, 4.48s/it] 45%|████▌ | 510/1130 [30:46<39:45, 3.85s/it] 45%|████▌ | 511/1130 [30:49<34:51, 3.38s/it] 45%|████▌ | 512/1130 [30:51<33:08, 3.22s/it] 45%|████▌ | 513/1130 [30:54<30:31, 2.97s/it] 45%|████▌ | 514/1130 [30:56<29:38, 2.89s/it] 46%|████▌ | 515/1130 [30:59<28:55, 2.82s/it] 46%|████▌ | 516/1130 [31:02<28:12, 2.76s/it] 46%|████▌ | 517/1130 [31:04<27:41, 2.71s/it] 46%|████▌ | 518/1130 [31:07<26:43, 2.62s/it] 46%|████▌ | 519/1130 [31:09<25:40, 2.52s/it] 46%|████▌ | 520/1130 [31:12<25:28, 2.51s/it] 46%|████▌ | 521/1130 [31:14<25:21, 2.50s/it] 46%|████▌ | 522/1130 [31:16<25:08, 2.48s/it] 46%|████▋ | 523/1130 [31:19<25:22, 2.51s/it] 46%|████▋ | 524/1130 [31:22<26:02, 2.58s/it] 46%|████▋ | 525/1130 [31:24<25:41, 2.55s/it] 47%|████▋ | 526/1130 [31:27<25:22, 2.52s/it] 47%|████▋ | 527/1130 [31:30<26:17, 2.62s/it] 47%|████▋ | 528/1130 [31:32<25:27, 2.54s/it] 47%|████▋ | 529/1130 [31:34<25:08, 2.51s/it] 47%|████▋ | 530/1130 [31:37<26:43, 2.67s/it] 47%|████▋ | 531/1130 [31:40<25:23, 2.54s/it] 47%|████▋ | 532/1130 [31:42<25:15, 2.53s/it] 47%|████▋ | 533/1130 [31:45<24:51, 2.50s/it] 47%|████▋ | 534/1130 [31:47<25:46, 2.59s/it] 47%|████▋ | 535/1130 [31:51<28:01, 2.83s/it] 47%|████▋ | 536/1130 [31:53<26:47, 2.71s/it] 48%|████▊ | 537/1130 [31:56<26:06, 2.64s/it] 48%|████▊ | 538/1130 [31:58<25:53, 2.62s/it] 48%|████▊ | 539/1130 [32:01<25:39, 2.61s/it] 48%|████▊ | 540/1130 [32:03<25:54, 2.64s/it] 48%|████▊ | 541/1130 [32:06<24:37, 2.51s/it] 48%|████▊ | 542/1130 [32:08<24:40, 2.52s/it] 48%|████▊ | 543/1130 [32:10<23:46, 2.43s/it] 48%|████▊ | 544/1130 [32:13<23:24, 2.40s/it] 48%|████▊ | 545/1130 [32:15<22:59, 2.36s/it] 48%|████▊ | 546/1130 [32:17<23:03, 2.37s/it] 48%|████▊ | 547/1130 [32:20<22:55, 2.36s/it] 48%|████▊ | 548/1130 [32:22<23:14, 2.40s/it] 49%|████▊ | 549/1130 [32:25<24:35, 2.54s/it] 49%|████▊ | 550/1130 [32:28<24:44, 2.56s/it] 49%|████▉ | 551/1130 [32:30<24:51, 2.58s/it] 49%|████▉ | 552/1130 [32:33<25:02, 2.60s/it] 49%|████▉ | 553/1130 [32:36<24:54, 2.59s/it] 49%|████▉ | 554/1130 [32:38<25:21, 2.64s/it] 49%|████▉ | 555/1130 [32:41<25:05, 2.62s/it] 49%|████▉ | 556/1130 [32:43<24:01, 2.51s/it] 49%|████▉ | 557/1130 [32:46<24:11, 2.53s/it] 49%|████▉ | 558/1130 [32:48<23:17, 2.44s/it] 49%|████▉ | 559/1130 [32:51<23:49, 2.50s/it] 50%|████▉ | 560/1130 [32:53<23:12, 2.44s/it] 50%|████▉ | 561/1130 [32:55<23:24, 2.47s/it] 50%|████▉ | 562/1130 [32:58<22:16, 2.35s/it] 50%|████▉ | 563/1130 [33:00<22:58, 2.43s/it] 50%|████▉ | 564/1130 [33:03<23:11, 2.46s/it] 50%|█████ | 565/1130 [33:05<23:14, 2.47s/it] 50%|█████ | 566/1130 [33:08<23:43, 2.52s/it] 50%|█████ | 567/1130 [33:10<23:59, 2.56s/it] 50%|█████ | 568/1130 [33:13<23:58, 2.56s/it] 50%|█████ | 569/1130 [33:15<23:25, 2.50s/it] 50%|█████ | 570/1130 [33:18<23:16, 2.49s/it] 51%|█████ | 571/1130 [33:21<23:45, 2.55s/it] 51%|█████ | 572/1130 [33:23<23:57, 2.58s/it] 51%|█████ | 573/1130 [33:26<24:02, 2.59s/it] 51%|█████ | 574/1130 [33:28<23:12, 2.51s/it] 51%|█████ | 575/1130 [33:30<22:36, 2.44s/it] 51%|█████ | 576/1130 [33:33<22:06, 2.40s/it] 51%|█████ | 577/1130 [33:35<22:36, 2.45s/it] 51%|█████ | 578/1130 [33:38<23:38, 2.57s/it] 51%|█████ | 579/1130 [33:41<23:37, 2.57s/it] 51%|█████▏ | 580/1130 [33:43<22:53, 2.50s/it] 51%|█████▏ | 581/1130 [33:45<22:32, 2.46s/it] 52%|█████▏ | 582/1130 [33:48<23:09, 2.54s/it] 52%|█████▏ | 583/1130 [33:51<24:01, 2.64s/it] 52%|█████▏ | 584/1130 [33:54<24:01, 2.64s/it] 52%|█████▏ | 585/1130 [33:56<24:00, 2.64s/it] 52%|█████▏ | 586/1130 [33:59<23:02, 2.54s/it] 52%|█████▏ | 587/1130 [34:01<23:26, 2.59s/it] 52%|█████▏ | 588/1130 [34:04<24:12, 2.68s/it] 52%|█████▏ | 589/1130 [34:07<23:45, 2.64s/it] 52%|█████▏ | 590/1130 [34:09<23:12, 2.58s/it] 52%|█████▏ | 591/1130 [34:12<23:04, 2.57s/it] 52%|█████▏ | 592/1130 [34:14<23:30, 2.62s/it] 52%|█████▏ | 593/1130 [34:17<22:59, 2.57s/it] 53%|█████▎ | 594/1130 [34:19<22:14, 2.49s/it] 53%|█████▎ | 595/1130 [34:22<22:03, 2.47s/it] 53%|█████▎ | 596/1130 [34:24<22:09, 2.49s/it] 53%|█████▎ | 597/1130 [34:27<21:57, 2.47s/it] 53%|█████▎ | 598/1130 [34:29<21:37, 2.44s/it] 53%|█████▎ | 599/1130 [34:32<22:17, 2.52s/it] 53%|█████▎ | 600/1130 [34:34<22:03, 2.50s/it] {'loss': 0.0689, 'learning_rate': 2.2571187907677853e-05, 'epoch': 0.53} 53%|█████▎ | 600/1130 [34:34<22:03, 2.50s/it][INFO|trainer.py:3166] 2024-01-21 12:35:51,611 >> ***** Running Evaluation ***** [INFO|trainer.py:3168] 2024-01-21 12:35:51,611 >> Num examples = 1006 [INFO|trainer.py:3171] 2024-01-21 12:35:51,611 >> Batch size = 1 0%| | 0/503 [00:00> Saving model checkpoint to ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-600 [INFO|tokenization_utils_base.py:2432] 2024-01-21 12:37:41,321 >> tokenizer config file saved in ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-600/tokenizer_config.json [INFO|tokenization_utils_base.py:2441] 2024-01-21 12:37:41,321 >> Special tokens file saved in ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-600/special_tokens_map.json [INFO|tokenization_utils_base.py:2492] 2024-01-21 12:37:41,321 >> added tokens file saved in ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-600/added_tokens.json 53%|█████▎ | 601/1130 [36:26<5:11:57, 35.38s/it] 53%|█████▎ | 602/1130 [36:29<3:44:19, 25.49s/it] 53%|█████▎ | 603/1130 [36:31<2:43:30, 18.62s/it] 53%|█████▎ | 604/1130 [36:34<2:01:43, 13.88s/it] 54%|█████▎ | 605/1130 [36:37<1:31:30, 10.46s/it] 54%|█████▎ | 606/1130 [36:39<1:10:19, 8.05s/it] 54%|█████▎ | 607/1130 [36:41<55:36, 6.38s/it] 54%|█████▍ | 608/1130 [36:44<45:40, 5.25s/it] 54%|█████▍ | 609/1130 [36:46<38:01, 4.38s/it] 54%|█████▍ | 610/1130 [36:49<33:10, 3.83s/it] 54%|█████▍ | 611/1130 [36:51<29:42, 3.43s/it] 54%|█████▍ | 612/1130 [36:54<27:16, 3.16s/it] 54%|█████▍ | 613/1130 [36:57<25:51, 3.00s/it] 54%|█████▍ | 614/1130 [36:59<25:07, 2.92s/it] 54%|█████▍ | 615/1130 [37:02<23:27, 2.73s/it] 55%|█████▍ | 616/1130 [37:04<22:27, 2.62s/it] 55%|█████▍ | 617/1130 [37:06<21:08, 2.47s/it] 55%|█████▍ | 618/1130 [37:09<20:52, 2.45s/it] 55%|█████▍ | 619/1130 [37:11<21:19, 2.50s/it] 55%|█████▍ | 620/1130 [37:13<20:47, 2.45s/it] 55%|█████▍ | 621/1130 [37:16<20:48, 2.45s/it] 55%|█████▌ | 622/1130 [37:18<21:01, 2.48s/it] 55%|█████▌ | 623/1130 [37:21<20:35, 2.44s/it] 55%|█████▌ | 624/1130 [37:23<20:06, 2.38s/it] 55%|█████▌ | 625/1130 [37:25<20:07, 2.39s/it] 55%|█████▌ | 626/1130 [37:28<20:55, 2.49s/it] 55%|█████▌ | 627/1130 [37:31<20:37, 2.46s/it] 56%|█████▌ | 628/1130 [37:33<20:52, 2.49s/it] 56%|█████▌ | 629/1130 [37:36<21:55, 2.63s/it] 56%|█████▌ | 630/1130 [37:39<21:30, 2.58s/it] 56%|█████▌ | 631/1130 [37:41<21:20, 2.57s/it] 56%|█████▌ | 632/1130 [37:44<21:00, 2.53s/it] 56%|█████▌ | 633/1130 [37:46<20:45, 2.51s/it] 56%|█████▌ | 634/1130 [37:49<21:00, 2.54s/it] 56%|█████▌ | 635/1130 [37:51<20:37, 2.50s/it] 56%|█████▋ | 636/1130 [37:54<21:25, 2.60s/it] 56%|█████▋ | 637/1130 [37:57<21:39, 2.64s/it] 56%|█████▋ | 638/1130 [37:59<20:37, 2.51s/it] 57%|█████▋ | 639/1130 [38:01<20:27, 2.50s/it] 57%|█████▋ | 640/1130 [38:04<19:56, 2.44s/it] 57%|█████▋ | 641/1130 [38:06<20:16, 2.49s/it] 57%|█████▋ | 642/1130 [38:09<20:19, 2.50s/it] 57%|█████▋ | 643/1130 [38:11<20:11, 2.49s/it] 57%|█████▋ | 644/1130 [38:14<20:36, 2.54s/it] 57%|█████▋ | 645/1130 [38:16<19:19, 2.39s/it] 57%|█████▋ | 646/1130 [38:18<19:13, 2.38s/it] 57%|█████▋ | 647/1130 [38:21<19:27, 2.42s/it] 57%|█████▋ | 648/1130 [38:23<19:13, 2.39s/it] 57%|█████▋ | 649/1130 [38:26<20:01, 2.50s/it] 58%|█████▊ | 650/1130 [38:28<19:30, 2.44s/it] 58%|█████▊ | 651/1130 [38:31<19:45, 2.48s/it] 58%|█████▊ | 652/1130 [38:33<19:33, 2.46s/it] 58%|█████▊ | 653/1130 [38:35<19:16, 2.42s/it] 58%|█████▊ | 654/1130 [38:38<19:28, 2.46s/it] 58%|█████▊ | 655/1130 [38:40<19:04, 2.41s/it] 58%|█████▊ | 656/1130 [38:43<19:16, 2.44s/it] 58%|█████▊ | 657/1130 [38:45<19:12, 2.44s/it] 58%|█████▊ | 658/1130 [38:48<19:02, 2.42s/it] 58%|█████▊ | 659/1130 [38:50<19:26, 2.48s/it] 58%|█████▊ | 660/1130 [38:53<19:19, 2.47s/it] 58%|█████▊ | 661/1130 [38:55<20:06, 2.57s/it] 59%|█████▊ | 662/1130 [38:58<19:41, 2.52s/it] 59%|█████▊ | 663/1130 [39:00<19:43, 2.53s/it] 59%|█████▉ | 664/1130 [39:03<19:15, 2.48s/it] 59%|█████▉ | 665/1130 [39:05<18:37, 2.40s/it] 59%|█████▉ | 666/1130 [39:07<18:23, 2.38s/it] 59%|█████▉ | 667/1130 [39:10<18:15, 2.37s/it] 59%|█████▉ | 668/1130 [39:12<18:23, 2.39s/it] 59%|█████▉ | 669/1130 [39:15<18:36, 2.42s/it] 59%|█████▉ | 670/1130 [39:17<18:36, 2.43s/it] 59%|█████▉ | 671/1130 [39:20<19:03, 2.49s/it] 59%|█████▉ | 672/1130 [39:22<18:38, 2.44s/it] 60%|█████▉ | 673/1130 [39:25<19:27, 2.55s/it] 60%|█████▉ | 674/1130 [39:28<20:34, 2.71s/it] 60%|█████▉ | 675/1130 [39:30<19:46, 2.61s/it] 60%|█████▉ | 676/1130 [39:33<19:43, 2.61s/it] 60%|█████▉ | 677/1130 [39:35<19:25, 2.57s/it] 60%|██████ | 678/1130 [39:38<18:26, 2.45s/it] 60%|██████ | 679/1130 [39:40<18:56, 2.52s/it] 60%|██████ | 680/1130 [39:43<18:30, 2.47s/it] 60%|██████ | 681/1130 [39:45<18:23, 2.46s/it] 60%|██████ | 682/1130 [39:47<18:03, 2.42s/it] 60%|██████ | 683/1130 [39:50<17:39, 2.37s/it] 61%|██████ | 684/1130 [39:52<17:34, 2.36s/it] 61%|██████ | 685/1130 [39:54<17:54, 2.42s/it] 61%|██████ | 686/1130 [39:57<18:08, 2.45s/it] 61%|██████ | 687/1130 [39:59<18:10, 2.46s/it] 61%|██████ | 688/1130 [40:02<19:00, 2.58s/it] 61%|██████ | 689/1130 [40:05<18:47, 2.56s/it] 61%|██████ | 690/1130 [40:08<19:53, 2.71s/it] 61%|██████ | 691/1130 [40:10<19:27, 2.66s/it] 61%|██████ | 692/1130 [40:13<18:39, 2.56s/it] 61%|██████▏ | 693/1130 [40:16<19:13, 2.64s/it] 61%|██████▏ | 694/1130 [40:18<18:49, 2.59s/it] 62%|██████▏ | 695/1130 [40:20<18:08, 2.50s/it] 62%|██████▏ | 696/1130 [40:23<19:09, 2.65s/it] 62%|██████▏ | 697/1130 [40:26<18:40, 2.59s/it] 62%|██████▏ | 698/1130 [40:29<18:58, 2.64s/it] 62%|██████▏ | 699/1130 [40:31<18:39, 2.60s/it] 62%|██████▏ | 700/1130 [40:34<19:17, 2.69s/it] {'loss': 0.0649, 'learning_rate': 1.583567302625469e-05, 'epoch': 0.62} 62%|██████▏ | 700/1130 [40:34<19:17, 2.69s/it][INFO|trainer.py:3166] 2024-01-21 12:41:51,473 >> ***** Running Evaluation ***** [INFO|trainer.py:3168] 2024-01-21 12:41:51,473 >> Num examples = 1006 [INFO|trainer.py:3171] 2024-01-21 12:41:51,473 >> Batch size = 1 0%| | 0/503 [00:00> Saving model checkpoint to ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-700 [INFO|tokenization_utils_base.py:2432] 2024-01-21 12:43:41,469 >> tokenizer config file saved in ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-700/tokenizer_config.json [INFO|tokenization_utils_base.py:2441] 2024-01-21 12:43:41,469 >> Special tokens file saved in ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-700/special_tokens_map.json [INFO|tokenization_utils_base.py:2492] 2024-01-21 12:43:41,469 >> added tokens file saved in ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-700/added_tokens.json 62%|██████▏ | 701/1130 [42:27<4:15:27, 35.73s/it] 62%|██████▏ | 702/1130 [42:29<3:03:24, 25.71s/it] 62%|██████▏ | 703/1130 [42:32<2:14:07, 18.85s/it] 62%|██████▏ | 704/1130 [42:34<1:39:03, 13.95s/it] 62%|██████▏ | 705/1130 [42:37<1:14:45, 10.55s/it] 62%|██████▏ | 706/1130 [42:40<57:36, 8.15s/it] 63%|██████▎ | 707/1130 [42:42<45:09, 6.41s/it] 63%|██████▎ | 708/1130 [42:45<36:51, 5.24s/it] 63%|██████▎ | 709/1130 [42:47<31:49, 4.54s/it] 63%|██████▎ | 710/1130 [42:50<27:09, 3.88s/it] 63%|██████▎ | 711/1130 [42:52<24:20, 3.48s/it] 63%|██████▎ | 712/1130 [42:55<22:30, 3.23s/it] 63%|██████▎ | 713/1130 [42:57<20:46, 2.99s/it] 63%|██████▎ | 714/1130 [43:00<19:57, 2.88s/it] 63%|██████▎ | 715/1130 [43:02<18:49, 2.72s/it] 63%|██████▎ | 716/1130 [43:05<18:44, 2.72s/it] 63%|██████▎ | 717/1130 [43:07<17:58, 2.61s/it] 64%|██████▎ | 718/1130 [43:10<18:08, 2.64s/it] 64%|██████▎ | 719/1130 [43:13<18:28, 2.70s/it] 64%|██████▎ | 720/1130 [43:15<17:54, 2.62s/it] 64%|██████▍ | 721/1130 [43:18<17:17, 2.54s/it] 64%|██████▍ | 722/1130 [43:20<16:37, 2.45s/it] 64%|██████▍ | 723/1130 [43:23<17:24, 2.57s/it] 64%|██████▍ | 724/1130 [43:26<17:37, 2.61s/it] 64%|██████▍ | 725/1130 [43:28<17:34, 2.60s/it] 64%|██████▍ | 726/1130 [43:31<17:18, 2.57s/it] 64%|██████▍ | 727/1130 [43:33<17:06, 2.55s/it] 64%|██████▍ | 728/1130 [43:36<17:14, 2.57s/it] 65%|██████▍ | 729/1130 [43:38<16:59, 2.54s/it] 65%|██████▍ | 730/1130 [43:41<16:35, 2.49s/it] 65%|██████▍ | 731/1130 [43:43<16:19, 2.46s/it] 65%|██████▍ | 732/1130 [43:46<17:25, 2.63s/it] 65%|██████▍ | 733/1130 [43:49<17:09, 2.59s/it] 65%|██████▍ | 734/1130 [43:51<17:35, 2.67s/it] 65%|██████▌ | 735/1130 [43:54<17:58, 2.73s/it] 65%|██████▌ | 736/1130 [43:57<17:34, 2.68s/it] 65%|██████▌ | 737/1130 [43:59<16:56, 2.59s/it] 65%|██████▌ | 738/1130 [44:02<17:34, 2.69s/it] 65%|██████▌ | 739/1130 [44:05<17:11, 2.64s/it] 65%|██████▌ | 740/1130 [44:07<17:10, 2.64s/it] 66%|██████▌ | 741/1130 [44:10<17:05, 2.64s/it] 66%|██████▌ | 742/1130 [44:13<17:32, 2.71s/it] 66%|██████▌ | 743/1130 [44:15<16:59, 2.63s/it] 66%|██████▌ | 744/1130 [44:17<15:57, 2.48s/it] 66%|██████▌ | 745/1130 [44:20<15:41, 2.45s/it] 66%|██████▌ | 746/1130 [44:22<15:30, 2.42s/it] 66%|██████▌ | 747/1130 [44:25<16:07, 2.53s/it] 66%|██████▌ | 748/1130 [44:27<15:54, 2.50s/it] 66%|██████▋ | 749/1130 [44:30<15:31, 2.44s/it] 66%|██████▋ | 750/1130 [44:32<15:33, 2.46s/it] 66%|██████▋ | 751/1130 [44:35<16:01, 2.54s/it] 67%|██████▋ | 752/1130 [44:37<15:57, 2.53s/it] 67%|██████▋ | 753/1130 [44:40<16:00, 2.55s/it] 67%|██████▋ | 754/1130 [44:42<15:28, 2.47s/it] 67%|██████▋ | 755/1130 [44:45<15:19, 2.45s/it] 67%|██████▋ | 756/1130 [44:47<15:02, 2.41s/it] 67%|██████▋ | 757/1130 [44:50<16:36, 2.67s/it] 67%|██████▋ | 758/1130 [44:53<16:36, 2.68s/it] 67%|██████▋ | 759/1130 [44:55<15:48, 2.56s/it] 67%|██████▋ | 760/1130 [44:58<15:37, 2.53s/it] 67%|██████▋ | 761/1130 [45:00<15:30, 2.52s/it] 67%|██████▋ | 762/1130 [45:03<16:03, 2.62s/it] 68%|██████▊ | 763/1130 [45:05<15:17, 2.50s/it] 68%|██████▊ | 764/1130 [45:08<15:24, 2.53s/it] 68%|██████▊ | 765/1130 [45:10<14:56, 2.46s/it] 68%|██████▊ | 766/1130 [45:13<14:55, 2.46s/it] 68%|██████▊ | 767/1130 [45:15<15:16, 2.52s/it] 68%|██████▊ | 768/1130 [45:18<14:55, 2.47s/it] 68%|██████▊ | 769/1130 [45:20<15:11, 2.53s/it] 68%|██████▊ | 770/1130 [45:23<15:04, 2.51s/it] 68%|██████▊ | 771/1130 [45:25<15:02, 2.51s/it] 68%|██████▊ | 772/1130 [45:28<15:01, 2.52s/it] 68%|██████▊ | 773/1130 [45:30<15:10, 2.55s/it] 68%|██████▊ | 774/1130 [45:33<15:44, 2.65s/it] 69%|██████▊ | 775/1130 [45:36<15:14, 2.58s/it] 69%|██████▊ | 776/1130 [45:38<15:26, 2.62s/it] 69%|██████▉ | 777/1130 [45:41<15:16, 2.60s/it] 69%|██████▉ | 778/1130 [45:43<15:04, 2.57s/it] 69%|██████▉ | 779/1130 [45:46<15:17, 2.62s/it] 69%|██████▉ | 780/1130 [45:49<15:18, 2.62s/it] 69%|██████▉ | 781/1130 [45:51<15:06, 2.60s/it] 69%|██████▉ | 782/1130 [45:54<15:00, 2.59s/it] 69%|██████▉ | 783/1130 [45:56<14:48, 2.56s/it] 69%|██████▉ | 784/1130 [45:59<14:36, 2.53s/it] 69%|██████▉ | 785/1130 [46:01<14:22, 2.50s/it] 70%|██████▉ | 786/1130 [46:04<14:33, 2.54s/it] 70%|██████▉ | 787/1130 [46:07<15:30, 2.71s/it] 70%|██████▉ | 788/1130 [46:09<14:11, 2.49s/it] 70%|██████▉ | 789/1130 [46:12<14:09, 2.49s/it] 70%|██████▉ | 790/1130 [46:14<14:24, 2.54s/it] 70%|███████ | 791/1130 [46:17<14:06, 2.50s/it] 70%|███████ | 792/1130 [46:19<14:05, 2.50s/it] 70%|███████ | 793/1130 [46:21<13:51, 2.47s/it] 70%|███████ | 794/1130 [46:24<13:59, 2.50s/it] 70%|███████ | 795/1130 [46:26<13:43, 2.46s/it] 70%|███████ | 796/1130 [46:29<13:40, 2.46s/it] 71%|███████ | 797/1130 [46:31<13:45, 2.48s/it] 71%|███████ | 798/1130 [46:34<13:52, 2.51s/it] 71%|███████ | 799/1130 [46:37<14:14, 2.58s/it] 71%|███████ | 800/1130 [46:39<13:42, 2.49s/it] {'loss': 0.0637, 'learning_rate': 9.803950080284005e-06, 'epoch': 0.71} 71%|███████ | 800/1130 [46:39<13:42, 2.49s/it][INFO|trainer.py:3166] 2024-01-21 12:47:56,490 >> ***** Running Evaluation ***** [INFO|trainer.py:3168] 2024-01-21 12:47:56,490 >> Num examples = 1006 [INFO|trainer.py:3171] 2024-01-21 12:47:56,490 >> Batch size = 1 0%| | 0/503 [00:00> Saving model checkpoint to ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-800 [INFO|tokenization_utils_base.py:2432] 2024-01-21 12:49:46,434 >> tokenizer config file saved in ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-800/tokenizer_config.json [INFO|tokenization_utils_base.py:2441] 2024-01-21 12:49:46,434 >> Special tokens file saved in ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-800/special_tokens_map.json [INFO|tokenization_utils_base.py:2492] 2024-01-21 12:49:46,434 >> added tokens file saved in ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-800/added_tokens.json 71%|███████ | 801/1130 [48:31<3:14:35, 35.49s/it] 71%|███████ | 802/1130 [48:34<2:20:22, 25.68s/it] 71%|███████ | 803/1130 [48:37<1:42:03, 18.72s/it] 71%|███████ | 804/1130 [48:39<1:15:04, 13.82s/it] 71%|███████ | 805/1130 [48:42<56:16, 10.39s/it] 71%|███████▏ | 806/1130 [48:44<43:49, 8.12s/it] 71%|███████▏ | 807/1130 [48:47<34:44, 6.45s/it] 72%|███████▏ | 808/1130 [48:49<28:11, 5.25s/it] 72%|███████▏ | 809/1130 [48:51<23:05, 4.32s/it] 72%|███████▏ | 810/1130 [48:55<21:19, 4.00s/it] 72%|███████▏ | 811/1130 [48:57<18:54, 3.56s/it] 72%|███████▏ | 812/1130 [49:00<17:10, 3.24s/it] 72%|███████▏ | 813/1130 [49:02<15:41, 2.97s/it] 72%|███████▏ | 814/1130 [49:04<14:17, 2.71s/it] 72%|███████▏ | 815/1130 [49:07<14:01, 2.67s/it] 72%|███████▏ | 816/1130 [49:09<13:36, 2.60s/it] 72%|███████▏ | 817/1130 [49:12<13:43, 2.63s/it] 72%|███████▏ | 818/1130 [49:14<13:25, 2.58s/it] 72%|███████▏ | 819/1130 [49:17<13:19, 2.57s/it] 73%|███████▎ | 820/1130 [49:19<13:08, 2.54s/it] 73%|███████▎ | 821/1130 [49:22<12:43, 2.47s/it] 73%|███████▎ | 822/1130 [49:25<13:10, 2.57s/it] 73%|███████▎ | 823/1130 [49:27<12:44, 2.49s/it] 73%|███████▎ | 824/1130 [49:29<12:24, 2.43s/it] 73%|███████▎ | 825/1130 [49:32<12:32, 2.47s/it] 73%|███████▎ | 826/1130 [49:34<12:20, 2.44s/it] 73%|███████▎ | 827/1130 [49:36<12:01, 2.38s/it] 73%|███████▎ | 828/1130 [49:39<12:09, 2.42s/it] 73%|███████▎ | 829/1130 [49:41<12:26, 2.48s/it] 73%|███████▎ | 830/1130 [49:44<12:01, 2.40s/it] 74%|███████▎ | 831/1130 [49:46<12:11, 2.45s/it] 74%|███████▎ | 832/1130 [49:49<11:59, 2.42s/it] 74%|███████▎ | 833/1130 [49:51<11:32, 2.33s/it] 74%|███████▍ | 834/1130 [49:54<12:58, 2.63s/it] 74%|███████▍ | 835/1130 [49:57<12:55, 2.63s/it] 74%|███████▍ | 836/1130 [49:59<12:49, 2.62s/it] 74%|███████▍ | 837/1130 [50:02<12:28, 2.56s/it] 74%|███████▍ | 838/1130 [50:04<12:45, 2.62s/it] 74%|███████▍ | 839/1130 [50:07<12:36, 2.60s/it] 74%|███████▍ | 840/1130 [50:10<12:39, 2.62s/it] 74%|███████▍ | 841/1130 [50:12<12:17, 2.55s/it] 75%|███████▍ | 842/1130 [50:15<12:21, 2.57s/it] 75%|███████▍ | 843/1130 [50:17<12:11, 2.55s/it] 75%|███████▍ | 844/1130 [50:20<12:14, 2.57s/it] 75%|███████▍ | 845/1130 [50:22<11:57, 2.52s/it] 75%|███████▍ | 846/1130 [50:25<11:54, 2.52s/it] 75%|███████▍ | 847/1130 [50:27<12:07, 2.57s/it] 75%|███████▌ | 848/1130 [50:30<12:02, 2.56s/it] 75%|███████▌ | 849/1130 [50:33<12:29, 2.67s/it] 75%|███████▌ | 850/1130 [50:36<12:59, 2.79s/it] 75%|███████▌ | 851/1130 [50:38<12:27, 2.68s/it] 75%|███████▌ | 852/1130 [50:41<12:20, 2.66s/it] 75%|███████▌ | 853/1130 [50:43<11:58, 2.59s/it] 76%|███████▌ | 854/1130 [50:46<11:56, 2.59s/it] 76%|███████▌ | 855/1130 [50:48<11:49, 2.58s/it] 76%|███████▌ | 856/1130 [50:51<11:04, 2.43s/it] 76%|███████▌ | 857/1130 [50:53<11:14, 2.47s/it] 76%|███████▌ | 858/1130 [50:56<11:03, 2.44s/it] 76%|███████▌ | 859/1130 [50:58<11:07, 2.46s/it] 76%|███████▌ | 860/1130 [51:01<11:58, 2.66s/it] 76%|███████▌ | 861/1130 [51:04<11:32, 2.58s/it] 76%|███████▋ | 862/1130 [51:06<11:20, 2.54s/it] 76%|███████▋ | 863/1130 [51:08<11:05, 2.49s/it] 76%|███████▋ | 864/1130 [51:11<11:16, 2.54s/it] 77%|███████▋ | 865/1130 [51:13<11:02, 2.50s/it] 77%|███████▋ | 866/1130 [51:16<10:33, 2.40s/it] 77%|███████▋ | 867/1130 [51:18<10:43, 2.45s/it] 77%|███████▋ | 868/1130 [51:21<10:42, 2.45s/it] 77%|███████▋ | 869/1130 [51:23<10:27, 2.41s/it] 77%|███████▋ | 870/1130 [51:26<10:40, 2.46s/it] 77%|███████▋ | 871/1130 [51:28<10:35, 2.45s/it] 77%|███████▋ | 872/1130 [51:31<10:45, 2.50s/it] 77%|███████▋ | 873/1130 [51:33<11:10, 2.61s/it] 77%|███████▋ | 874/1130 [51:36<10:46, 2.53s/it] 77%|███████▋ | 875/1130 [51:38<10:35, 2.49s/it] 78%|███████▊ | 876/1130 [51:41<10:27, 2.47s/it] 78%|███████▊ | 877/1130 [51:43<10:15, 2.43s/it] 78%|███████▊ | 878/1130 [51:45<10:12, 2.43s/it] 78%|███████▊ | 879/1130 [51:48<10:26, 2.50s/it] 78%|███████▊ | 880/1130 [51:50<10:20, 2.48s/it] 78%|███████▊ | 881/1130 [51:53<10:31, 2.54s/it] 78%|███████▊ | 882/1130 [51:56<11:11, 2.71s/it] 78%|███████▊ | 883/1130 [51:59<10:47, 2.62s/it] 78%|███████▊ | 884/1130 [52:01<10:46, 2.63s/it] 78%|███████▊ | 885/1130 [52:04<10:25, 2.55s/it] 78%|███████▊ | 886/1130 [52:06<10:07, 2.49s/it] 78%|███████▊ | 887/1130 [52:09<10:15, 2.53s/it] 79%|███████▊ | 888/1130 [52:11<10:20, 2.56s/it] 79%|███████▊ | 889/1130 [52:14<09:57, 2.48s/it] 79%|███████▉ | 890/1130 [52:16<10:03, 2.52s/it] 79%|███████▉ | 891/1130 [52:19<10:19, 2.59s/it] 79%|███████▉ | 892/1130 [52:21<10:00, 2.52s/it] 79%|███████▉ | 893/1130 [52:24<10:02, 2.54s/it] 79%|███████▉ | 894/1130 [52:26<09:56, 2.53s/it] 79%|███████▉ | 895/1130 [52:29<09:45, 2.49s/it] 79%|███████▉ | 896/1130 [52:32<10:06, 2.59s/it] 79%|███████▉ | 897/1130 [52:34<10:08, 2.61s/it] 79%|███████▉ | 898/1130 [52:37<09:49, 2.54s/it] 80%|███████▉ | 899/1130 [52:40<10:18, 2.68s/it] 80%|███████▉ | 900/1130 [52:42<10:04, 2.63s/it] {'loss': 0.0698, 'learning_rate': 4.939236715580884e-06, 'epoch': 0.8} 80%|███████▉ | 900/1130 [52:42<10:04, 2.63s/it][INFO|trainer.py:3166] 2024-01-21 12:53:59,633 >> ***** Running Evaluation ***** [INFO|trainer.py:3168] 2024-01-21 12:53:59,633 >> Num examples = 1006 [INFO|trainer.py:3171] 2024-01-21 12:53:59,633 >> Batch size = 1 0%| | 0/503 [00:00> Saving model checkpoint to ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-900 [INFO|tokenization_utils_base.py:2432] 2024-01-21 12:55:49,578 >> tokenizer config file saved in ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-900/tokenizer_config.json [INFO|tokenization_utils_base.py:2441] 2024-01-21 12:55:49,578 >> Special tokens file saved in ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-900/special_tokens_map.json [INFO|tokenization_utils_base.py:2492] 2024-01-21 12:55:49,578 >> added tokens file saved in ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-900/added_tokens.json 80%|███████▉ | 901/1130 [54:35<2:16:29, 35.76s/it] 80%|███████▉ | 902/1130 [54:38<1:38:01, 25.80s/it] 80%|███████▉ | 903/1130 [54:40<1:11:14, 18.83s/it] 80%|████████ | 904/1130 [54:43<52:30, 13.94s/it] 80%|████████ | 905/1130 [54:45<38:54, 10.38s/it] 80%|████████ | 906/1130 [54:47<29:31, 7.91s/it] 80%|████████ | 907/1130 [54:50<23:49, 6.41s/it] 80%|████████ | 908/1130 [54:52<19:14, 5.20s/it] 80%|████████ | 909/1130 [54:55<16:21, 4.44s/it] 81%|████████ | 910/1130 [54:57<14:03, 3.83s/it] 81%|████████ | 911/1130 [55:00<12:29, 3.42s/it] 81%|████████ | 912/1130 [55:02<11:21, 3.12s/it] 81%|████████ | 913/1130 [55:05<10:22, 2.87s/it] 81%|████████ | 914/1130 [55:07<09:52, 2.74s/it] 81%|████████ | 915/1130 [55:10<09:45, 2.72s/it] 81%|████████ | 916/1130 [55:12<09:23, 2.63s/it] 81%|████████ | 917/1130 [55:15<09:43, 2.74s/it] 81%|████████ | 918/1130 [55:18<09:19, 2.64s/it] 81%|████████▏ | 919/1130 [55:20<09:12, 2.62s/it] 81%|████████▏ | 920/1130 [55:22<08:50, 2.52s/it] 82%|████████▏ | 921/1130 [55:25<09:04, 2.60s/it] 82%|████████▏ | 922/1130 [55:28<08:57, 2.58s/it] 82%|████████▏ | 923/1130 [55:30<08:42, 2.53s/it] 82%|████████▏ | 924/1130 [55:33<08:49, 2.57s/it] 82%|████████▏ | 925/1130 [55:35<08:49, 2.58s/it] 82%|████████▏ | 926/1130 [55:38<09:01, 2.65s/it] 82%|████████▏ | 927/1130 [55:41<09:02, 2.67s/it] 82%|████████▏ | 928/1130 [55:43<08:35, 2.55s/it] 82%|████████▏ | 929/1130 [55:46<08:36, 2.57s/it] 82%|████████▏ | 930/1130 [55:48<08:14, 2.47s/it] 82%|████████▏ | 931/1130 [55:50<08:00, 2.41s/it] 82%|████████▏ | 932/1130 [55:53<08:05, 2.45s/it] 83%|████████▎ | 933/1130 [55:56<08:25, 2.57s/it] 83%|████████▎ | 934/1130 [55:58<08:27, 2.59s/it] 83%|████████▎ | 935/1130 [56:01<08:27, 2.60s/it] 83%|████████▎ | 936/1130 [56:03<08:08, 2.52s/it] 83%|████████▎ | 937/1130 [56:06<07:59, 2.48s/it] 83%|████████▎ | 938/1130 [56:08<07:46, 2.43s/it] 83%|████████▎ | 939/1130 [56:11<07:54, 2.48s/it] 83%|████████▎ | 940/1130 [56:13<07:45, 2.45s/it] 83%|████████▎ | 941/1130 [56:16<07:46, 2.47s/it] 83%|████████▎ | 942/1130 [56:18<07:47, 2.49s/it] 83%|████████▎ | 943/1130 [56:21<07:45, 2.49s/it] 84%|████████▎ | 944/1130 [56:23<07:48, 2.52s/it] 84%|████████▎ | 945/1130 [56:25<07:29, 2.43s/it] 84%|████████▎ | 946/1130 [56:28<07:31, 2.45s/it] 84%|████████▍ | 947/1130 [56:30<07:36, 2.50s/it] 84%|████████▍ | 948/1130 [56:33<07:26, 2.45s/it] 84%|████████▍ | 949/1130 [56:35<07:16, 2.41s/it] 84%|████████▍ | 950/1130 [56:38<07:22, 2.46s/it] 84%|████████▍ | 951/1130 [56:40<07:37, 2.55s/it] 84%|████████▍ | 952/1130 [56:43<07:50, 2.64s/it] 84%|████████▍ | 953/1130 [56:46<07:39, 2.59s/it] 84%|████████▍ | 954/1130 [56:48<07:30, 2.56s/it] 85%|████████▍ | 955/1130 [56:51<07:20, 2.52s/it] 85%|████████▍ | 956/1130 [56:54<07:32, 2.60s/it] 85%|████████▍ | 957/1130 [56:56<07:31, 2.61s/it] 85%|████████▍ | 958/1130 [56:59<07:38, 2.67s/it] 85%|████████▍ | 959/1130 [57:02<07:34, 2.66s/it] 85%|████████▍ | 960/1130 [57:04<07:18, 2.58s/it] 85%|████████▌ | 961/1130 [57:07<07:15, 2.58s/it] 85%|████████▌ | 962/1130 [57:09<06:59, 2.50s/it] 85%|████████▌ | 963/1130 [57:11<06:55, 2.49s/it] 85%|████████▌ | 964/1130 [57:14<06:49, 2.47s/it] 85%|████████▌ | 965/1130 [57:16<06:49, 2.48s/it] 85%|████████▌ | 966/1130 [57:19<06:38, 2.43s/it] 86%|████████▌ | 967/1130 [57:21<06:31, 2.40s/it] 86%|████████▌ | 968/1130 [57:24<06:39, 2.46s/it] 86%|████████▌ | 969/1130 [57:26<06:25, 2.40s/it] 86%|████████▌ | 970/1130 [57:29<06:43, 2.52s/it] 86%|████████▌ | 971/1130 [57:31<06:51, 2.59s/it] 86%|████████▌ | 972/1130 [57:34<07:06, 2.70s/it] 86%|████████▌ | 973/1130 [57:37<06:59, 2.67s/it] 86%|████████▌ | 974/1130 [57:40<07:17, 2.81s/it] 86%|████████▋ | 975/1130 [57:42<06:59, 2.71s/it] 86%|████████▋ | 976/1130 [57:45<06:35, 2.57s/it] 86%|████████▋ | 977/1130 [57:47<06:28, 2.54s/it] 87%|████████▋ | 978/1130 [57:50<06:47, 2.68s/it] 87%|████████▋ | 979/1130 [57:53<06:33, 2.61s/it] 87%|████████▋ | 980/1130 [57:55<06:30, 2.61s/it] 87%|████████▋ | 981/1130 [57:58<06:30, 2.62s/it] 87%|████████▋ | 982/1130 [58:00<06:20, 2.57s/it] 87%|████████▋ | 983/1130 [58:03<06:05, 2.49s/it] 87%|████████▋ | 984/1130 [58:05<05:53, 2.42s/it] 87%|████████▋ | 985/1130 [58:07<05:51, 2.42s/it] 87%|████████▋ | 986/1130 [58:10<06:00, 2.51s/it] 87%|████████▋ | 987/1130 [58:13<05:58, 2.51s/it] 87%|████████▋ | 988/1130 [58:15<06:00, 2.54s/it] 88%|████████▊ | 989/1130 [58:18<05:50, 2.48s/it] 88%|████████▊ | 990/1130 [58:20<05:43, 2.45s/it] 88%|████████▊ | 991/1130 [58:23<05:49, 2.51s/it] 88%|████████▊ | 992/1130 [58:25<05:44, 2.49s/it] 88%|████████▊ | 993/1130 [58:28<05:44, 2.52s/it] 88%|████████▊ | 994/1130 [58:30<05:42, 2.52s/it] 88%|████████▊ | 995/1130 [58:32<05:23, 2.39s/it] 88%|████████▊ | 996/1130 [58:35<05:43, 2.56s/it] 88%|████████▊ | 997/1130 [58:37<05:29, 2.48s/it] 88%|████████▊ | 998/1130 [58:40<05:22, 2.45s/it] 88%|████████▊ | 999/1130 [58:42<05:18, 2.43s/it] 88%|████████▊ | 1000/1130 [58:44<05:10, 2.39s/it] {'loss': 0.0648, 'learning_rate': 1.615127855610496e-06, 'epoch': 0.88} 88%|████████▊ | 1000/1130 [58:44<05:10, 2.39s/it][INFO|trainer.py:3166] 2024-01-21 13:00:01,987 >> ***** Running Evaluation ***** [INFO|trainer.py:3168] 2024-01-21 13:00:01,987 >> Num examples = 1006 [INFO|trainer.py:3171] 2024-01-21 13:00:01,987 >> Batch size = 1 0%| | 0/503 [00:00> Saving model checkpoint to ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-1000 [INFO|tokenization_utils_base.py:2432] 2024-01-21 13:01:51,493 >> tokenizer config file saved in ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-1000/tokenizer_config.json [INFO|tokenization_utils_base.py:2441] 2024-01-21 13:01:51,493 >> Special tokens file saved in ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-1000/special_tokens_map.json [INFO|tokenization_utils_base.py:2492] 2024-01-21 13:01:51,493 >> added tokens file saved in ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-1000/added_tokens.json 89%|████████▊ | 1001/1130 [1:00:36<1:15:46, 35.25s/it] 89%|████████▊ | 1002/1130 [1:00:39<54:20, 25.47s/it] 89%|████████▉ | 1003/1130 [1:00:42<39:20, 18.59s/it] 89%|████████▉ | 1004/1130 [1:00:44<28:56, 13.78s/it] 89%|████████▉ | 1005/1130 [1:00:47<21:39, 10.40s/it] 89%|████████▉ | 1006/1130 [1:00:49<16:35, 8.03s/it] 89%|████████▉ | 1007/1130 [1:00:51<12:56, 6.31s/it] 89%|████████▉ | 1008/1130 [1:00:54<10:27, 5.14s/it] 89%|████████▉ | 1009/1130 [1:00:57<08:51, 4.40s/it] 89%|████████▉ | 1010/1130 [1:00:59<07:39, 3.83s/it] 89%|████████▉ | 1011/1130 [1:01:01<06:37, 3.34s/it] 90%|████████▉ | 1012/1130 [1:01:04<06:03, 3.08s/it] 90%|████████▉ | 1013/1130 [1:01:06<05:40, 2.91s/it] 90%|████████▉ | 1014/1130 [1:01:09<05:28, 2.83s/it] 90%|████████▉ | 1015/1130 [1:01:11<05:15, 2.74s/it] 90%|████████▉ | 1016/1130 [1:01:14<05:04, 2.67s/it] 90%|█████████ | 1017/1130 [1:01:16<04:55, 2.62s/it] 90%|█████████ | 1018/1130 [1:01:19<04:53, 2.62s/it] 90%|█████████ | 1019/1130 [1:01:21<04:43, 2.55s/it] 90%|█████████ | 1020/1130 [1:01:24<04:34, 2.50s/it] 90%|█████████ | 1021/1130 [1:01:26<04:27, 2.45s/it] 90%|█████████ | 1022/1130 [1:01:28<04:21, 2.42s/it] 91%|█████████ | 1023/1130 [1:01:31<04:19, 2.42s/it] 91%|█████████ | 1024/1130 [1:01:33<04:20, 2.46s/it] 91%|█████████ | 1025/1130 [1:01:36<04:20, 2.48s/it] 91%|█████████ | 1026/1130 [1:01:39<04:23, 2.53s/it] 91%|█████████ | 1027/1130 [1:01:41<04:17, 2.50s/it] 91%|█████████ | 1028/1130 [1:01:44<04:19, 2.55s/it] 91%|█████████ | 1029/1130 [1:01:46<04:18, 2.56s/it] 91%|█████████ | 1030/1130 [1:01:49<04:12, 2.53s/it] 91%|█████████ | 1031/1130 [1:01:51<04:08, 2.51s/it] 91%|█████████▏| 1032/1130 [1:01:54<04:09, 2.54s/it] 91%|█████████▏| 1033/1130 [1:01:56<04:03, 2.51s/it] 92%|█████████▏| 1034/1130 [1:01:59<04:01, 2.52s/it] 92%|█████████▏| 1035/1130 [1:02:01<03:56, 2.49s/it] 92%|█████████▏| 1036/1130 [1:02:04<03:48, 2.43s/it] 92%|█████████▏| 1037/1130 [1:02:06<03:42, 2.39s/it] 92%|█████████▏| 1038/1130 [1:02:08<03:47, 2.47s/it] 92%|█████████▏| 1039/1130 [1:02:11<03:57, 2.61s/it] 92%|█████████▏| 1040/1130 [1:02:14<03:49, 2.55s/it] 92%|█████████▏| 1041/1130 [1:02:16<03:43, 2.52s/it] 92%|█████████▏| 1042/1130 [1:02:19<03:46, 2.57s/it] 92%|█████████▏| 1043/1130 [1:02:22<03:50, 2.65s/it] 92%|█████████▏| 1044/1130 [1:02:24<03:34, 2.49s/it] 92%|█████████▏| 1045/1130 [1:02:27<03:43, 2.63s/it] 93%|█████████▎| 1046/1130 [1:02:30<03:45, 2.68s/it] 93%|█████████▎| 1047/1130 [1:02:32<03:36, 2.61s/it] 93%|█████████▎| 1048/1130 [1:02:34<03:21, 2.45s/it] 93%|█████████▎| 1049/1130 [1:02:36<03:13, 2.39s/it] 93%|█████████▎| 1050/1130 [1:02:39<03:12, 2.40s/it] 93%|█████████▎| 1051/1130 [1:02:41<03:12, 2.43s/it] 93%|█████████▎| 1052/1130 [1:02:44<03:12, 2.47s/it] 93%|█████████▎| 1053/1130 [1:02:47<03:13, 2.51s/it] 93%|█████████▎| 1054/1130 [1:02:49<03:07, 2.47s/it] 93%|█████████▎| 1055/1130 [1:02:51<03:02, 2.43s/it] 93%|█████████▎| 1056/1130 [1:02:54<03:06, 2.52s/it] 94%|█████████▎| 1057/1130 [1:02:57<03:05, 2.53s/it] 94%|█████████▎| 1058/1130 [1:02:59<03:00, 2.51s/it] 94%|█████████▎| 1059/1130 [1:03:01<02:54, 2.46s/it] 94%|█████████▍| 1060/1130 [1:03:04<02:52, 2.46s/it] 94%|█████████▍| 1061/1130 [1:03:06<02:50, 2.47s/it] 94%|█████████▍| 1062/1130 [1:03:09<02:49, 2.49s/it] 94%|█████████▍| 1063/1130 [1:03:12<02:50, 2.55s/it] 94%|█████████▍| 1064/1130 [1:03:14<02:49, 2.57s/it] 94%|█████████▍| 1065/1130 [1:03:17<02:46, 2.56s/it] 94%|█████████▍| 1066/1130 [1:03:19<02:41, 2.53s/it] 94%|█████████▍| 1067/1130 [1:03:22<02:39, 2.53s/it] 95%|█████████▍| 1068/1130 [1:03:24<02:37, 2.54s/it] 95%|█████████▍| 1069/1130 [1:03:27<02:33, 2.52s/it] 95%|█████████▍| 1070/1130 [1:03:29<02:29, 2.48s/it] 95%|█████████▍| 1071/1130 [1:03:32<02:26, 2.48s/it] 95%|█████████▍| 1072/1130 [1:03:34<02:27, 2.54s/it] 95%|█████████▍| 1073/1130 [1:03:37<02:24, 2.54s/it] 95%|█████████▌| 1074/1130 [1:03:39<02:23, 2.57s/it] 95%|█████████▌| 1075/1130 [1:03:42<02:20, 2.56s/it] 95%|█████████▌| 1076/1130 [1:03:44<02:17, 2.55s/it] 95%|█████████▌| 1077/1130 [1:03:47<02:13, 2.52s/it] 95%|█████████▌| 1078/1130 [1:03:50<02:12, 2.54s/it] 95%|█████████▌| 1079/1130 [1:03:52<02:09, 2.53s/it] 96%|█████████▌| 1080/1130 [1:03:55<02:07, 2.55s/it] 96%|█████████▌| 1081/1130 [1:03:57<02:07, 2.59s/it] 96%|█████████▌| 1082/1130 [1:04:00<02:00, 2.51s/it] 96%|█████████▌| 1083/1130 [1:04:02<01:59, 2.55s/it] 96%|█████████▌| 1084/1130 [1:04:05<02:00, 2.62s/it] 96%|█████████▌| 1085/1130 [1:04:08<01:57, 2.61s/it] 96%|█████████▌| 1086/1130 [1:04:10<01:57, 2.67s/it] 96%|█████████▌| 1087/1130 [1:04:13<01:52, 2.61s/it] 96%|█████████▋| 1088/1130 [1:04:15<01:45, 2.51s/it] 96%|█████████▋| 1089/1130 [1:04:18<01:50, 2.68s/it] 96%|█████████▋| 1090/1130 [1:04:21<01:44, 2.61s/it] 97%|█████████▋| 1091/1130 [1:04:23<01:39, 2.56s/it] 97%|█████████▋| 1092/1130 [1:04:26<01:36, 2.54s/it] 97%|█████████▋| 1093/1130 [1:04:28<01:34, 2.55s/it] 97%|█████████▋| 1094/1130 [1:04:31<01:29, 2.48s/it] 97%|█████████▋| 1095/1130 [1:04:33<01:27, 2.51s/it] 97%|█████████▋| 1096/1130 [1:04:36<01:28, 2.60s/it] 97%|█████████▋| 1097/1130 [1:04:38<01:23, 2.53s/it] 97%|█████████▋| 1098/1130 [1:04:40<01:17, 2.43s/it] 97%|█████████▋| 1099/1130 [1:04:43<01:15, 2.45s/it] 97%|█████████▋| 1100/1130 [1:04:46<01:14, 2.50s/it] {'loss': 0.0654, 'learning_rate': 8.690476815339244e-08, 'epoch': 0.97} 97%|█████████▋| 1100/1130 [1:04:46<01:14, 2.50s/it][INFO|trainer.py:3166] 2024-01-21 13:06:03,102 >> ***** Running Evaluation ***** [INFO|trainer.py:3168] 2024-01-21 13:06:03,102 >> Num examples = 1006 [INFO|trainer.py:3171] 2024-01-21 13:06:03,102 >> Batch size = 1 0%| | 0/503 [00:00> Saving model checkpoint to ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-1100 [INFO|tokenization_utils_base.py:2432] 2024-01-21 13:07:52,634 >> tokenizer config file saved in ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-1100/tokenizer_config.json [INFO|tokenization_utils_base.py:2441] 2024-01-21 13:07:52,635 >> Special tokens file saved in ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-1100/special_tokens_map.json [INFO|tokenization_utils_base.py:2492] 2024-01-21 13:07:52,635 >> added tokens file saved in ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tmp-checkpoint-1100/added_tokens.json 97%|█████████▋| 1101/1130 [1:06:38<17:06, 35.39s/it] 98%|█████████▊| 1102/1130 [1:06:41<11:58, 25.66s/it] 98%|█████████▊| 1103/1130 [1:06:43<08:24, 18.70s/it] 98%|█████████▊| 1104/1130 [1:06:46<05:59, 13.84s/it] 98%|█████████▊| 1105/1130 [1:06:48<04:23, 10.53s/it] 98%|█████████▊| 1106/1130 [1:06:51<03:15, 8.15s/it] 98%|█████████▊| 1107/1130 [1:06:53<02:26, 6.35s/it] 98%|█████████▊| 1108/1130 [1:06:56<01:53, 5.18s/it] 98%|█████████▊| 1109/1130 [1:06:59<01:34, 4.48s/it] 98%|█████████▊| 1110/1130 [1:07:01<01:17, 3.87s/it] 98%|█████████▊| 1111/1130 [1:07:04<01:06, 3.51s/it] 98%|█████████▊| 1112/1130 [1:07:06<00:58, 3.24s/it] 98%|█████████▊| 1113/1130 [1:07:09<00:51, 3.00s/it] 99%|█████████▊| 1114/1130 [1:07:11<00:46, 2.90s/it] 99%|█████████▊| 1115/1130 [1:07:14<00:41, 2.80s/it] 99%|█████████▉| 1116/1130 [1:07:16<00:36, 2.64s/it] 99%|█████████▉| 1117/1130 [1:07:19<00:33, 2.58s/it] 99%|█████████▉| 1118/1130 [1:07:21<00:29, 2.45s/it] 99%|█████████▉| 1119/1130 [1:07:23<00:26, 2.42s/it] 99%|█████████▉| 1120/1130 [1:07:26<00:25, 2.51s/it] 99%|█████████▉| 1121/1130 [1:07:28<00:22, 2.52s/it] 99%|█████████▉| 1122/1130 [1:07:31<00:21, 2.66s/it] 99%|█████████▉| 1123/1130 [1:07:34<00:18, 2.61s/it] 99%|█████████▉| 1124/1130 [1:07:36<00:15, 2.54s/it] 100%|█████████▉| 1125/1130 [1:07:39<00:12, 2.51s/it] 100%|█████████▉| 1126/1130 [1:07:41<00:09, 2.49s/it] 100%|█████████▉| 1127/1130 [1:07:44<00:07, 2.52s/it] 100%|█████████▉| 1128/1130 [1:07:47<00:05, 2.61s/it] 100%|█████████▉| 1129/1130 [1:07:49<00:02, 2.55s/it] 100%|██████████| 1130/1130 [1:07:51<00:00, 2.52s/it][INFO|trainer.py:1947] 2024-01-21 13:09:08,898 >> Training completed. Do not forget to share your model on huggingface.co/models =) {'train_runtime': 4073.2029, 'train_samples_per_second': 2.221, 'train_steps_per_second': 0.277, 'train_loss': 0.09174010732532603, 'epoch': 1.0} 100%|██████████| 1130/1130 [1:07:51<00:00, 2.52s/it] 100%|██████████| 1130/1130 [1:07:51<00:00, 3.60s/it] [INFO|trainer.py:2889] 2024-01-21 13:09:08,901 >> Saving model checkpoint to ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora [INFO|tokenization_utils_base.py:2432] 2024-01-21 13:09:09,013 >> tokenizer config file saved in ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/tokenizer_config.json [INFO|tokenization_utils_base.py:2441] 2024-01-21 13:09:09,014 >> Special tokens file saved in ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/special_tokens_map.json [INFO|tokenization_utils_base.py:2492] 2024-01-21 13:09:09,014 >> added tokens file saved in ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/added_tokens.json ***** train metrics ***** epoch = 1.0 train_loss = 0.0917 train_runtime = 1:07:53.20 train_samples_per_second = 2.221 train_steps_per_second = 0.277 Figure saved: ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/training_loss.png Figure saved: ./models/sft/dolphin-2.6-mistral-7b-dpo-laser-sft-glaive-function-calling-v2-ep1-lora/training_eval_loss.png [INFO|trainer.py:3166] 2024-01-21 13:09:12,308 >> ***** Running Evaluation ***** [INFO|trainer.py:3168] 2024-01-21 13:09:12,308 >> Num examples = 1006 [INFO|trainer.py:3171] 2024-01-21 13:09:12,308 >> Batch size = 1 0%| | 0/503 [00:00> Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}