Uploaded model
- Developed by: mervinpraison
- License: apache-2.0
- Finetuned from model : unsloth/Phi-4-unsloth-bnb-4bit
praisonai train \
--model unsloth/Phi-4-unsloth-bnb-4bit \
--dataset mervinpraison/ur-fall-raw \
--hf mervinpraison/Phi-4-ur-fall-bnb-4bit \
--ollama mervinpraison/Phi-4-ur-fall-bnb-4bit
(test) ➜ test praisonai train \
--model unsloth/Phi-4-unsloth-bnb-4bit \
--dataset mervinpraison/ur-fall-raw \
--hf mervinpraison/Phi-4-ur-fall-bnb-4bit \
--ollama mervinpraison/Phi-4-ur-fall-bnb-4bit
Conda environment 'praison_env' found.
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
DEBUG: Loaded config: {'dataset': [{'name': 'mervinpraison/ur-fall-raw'}], 'dataset_num_proc': 2,
'dataset_text_field': 'text', 'gradient_accumulation_steps': 2, 'hf_model_name':
'mervinpraison/Phi-4-ur-fall-bnb-4bit', 'huggingface_save': 'true', 'learning_rate': 0.0002, 'load_in_4bit':
True, 'loftq_config': None, 'logging_steps': 1, 'lora_alpha': 16, 'lora_bias': 'none', 'lora_dropout': 0,
'lora_r': 16, 'lora_target_modules': ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj',
'down_proj'], 'lr_scheduler_type': 'linear', 'max_seq_length': 2048, 'max_steps': 10, 'model_name':
'unsloth/Phi-4-unsloth-bnb-4bit', 'model_parameters': '8b', 'num_train_epochs': 1, 'ollama_model':
'mervinpraison/Phi-4-ur-fall-bnb-4bit', 'ollama_save': 'true', 'optim': 'adamw_8bit', 'output_dir':
'outputs', 'packing': False, 'per_device_train_batch_size': 2, 'quantization_method': ['q4_k_m'],
'random_state': 3407, 'seed': 3407, 'train': 'true', 'use_gradient_checkpointing': 'unsloth', 'use_rslora':
False, 'warmup_steps': 5, 'weight_decay': 0.01}
DEBUG: PyTorch version: 2.3.0
DEBUG: CUDA version: 12.1
DEBUG: CUDA Device Capability: (8, 6)
DEBUG: Python Version: 3.10.16 (main, Dec 11 2024, 16:24:50) [GCC 11.2.0]
DEBUG: Python Path: /home/Ubuntu/miniconda3/envs/praison_env/bin/python
DEBUG: GPU = NVIDIA RTX A6000. Max memory = 47.431 GB.
DEBUG: Your runtime has 50.6 gigabytes of available RAM
DEBUG: You are using a high-RAM runtime!
DEBUG: Preparing model and tokenizer...
==((====))== Unsloth 2025.1.8: Fast Llama patching. Transformers: 4.48.2.
\\ /| GPU: NVIDIA RTX A6000. Max memory: 47.431 GB. Platform: Linux.
O^O/ \_/ \ Torch: 2.3.0. CUDA: 8.6. CUDA Toolkit: 12.1. Triton: 2.3.0
\ / Bfloat16 = TRUE. FA [Xformers = 0.0.26.post1. FA2 = False]
"-____-" Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Downloading shards: 0%| | 0/3 [00:00<?, ?it/s]
Downloading shards: 33%|███▎ | 1/3 [01:53<03:46, 113.06s/it]
Downloading shards: 67%|██████▋ | 2/3 [03:37<01:47, 107.98s/it]
Downloading shards: 100%|██████████| 3/3 [04:01<00:00, 69.80s/it]
Downloading shards: 100%|██████████| 3/3 [04:01<00:00, 80.61s/it]
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
Loading checkpoint shards: 33%|███▎ | 1/3 [00:01<00:03, 1.80s/it]
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:03<00:01, 1.75s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:04<00:00, 1.23s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:04<00:00, 1.38s/it]
DEBUG: Model and original tokenizer loaded.
DEBUG: Chat tokenizer created; HF tokenizer saved.
Unsloth 2025.1.8 patched 40 layers with 40 QKV layers, 40 O layers and 40 MLP layers.
DEBUG: LoRA adapters added.
DEBUG: Starting training...
DEBUG: Processing dataset info: {'name': 'mervinpraison/ur-fall-raw'}
DEBUG: Loading dataset 'mervinpraison/ur-fall-raw' split 'train'...
Generating train split: 0%| | 0/56 [00:00<?, ? examples/s]
Generating train split: 100%|██████████| 56/56 [00:00<00:00, 1580.22 examples/s]
Generating test split: 0%| | 0/14 [00:00<?, ? examples/s]
Generating test split: 100%|██████████| 14/14 [00:00<00:00, 1853.31 examples/s]
DEBUG: Dataset columns: ['instruction', 'input', 'output']
DEBUG: Dataset does not have 'conversations'; assuming Alpaca format.
DEBUG: Applying formatting function to dataset...
Map: 0%| | 0/56 [00:00<?, ? examples/s]DEBUG: formatting_prompts_func() received batch with keys:
['instruction', 'input', 'output']
DEBUG: Raw texts sample (first 200 chars): <|endoftext|><|start_header_id|>system<|end_header_id|>
Cutting Knowledge Date: December 2023
Today Date: 26 July 2024
<|eot_id|><|start_header_id|>user<|end_header_id|>
Analyse accelerometer senso
Map: 100%|██████████| 56/56 [00:00<00:00, 1481.79 examples/s]
DEBUG: Sample processed example keys: ['text']
DEBUG: Sample processed 'text' type: <class 'str'>
DEBUG: Sample processed 'text' content (first 200 chars):
<|endoftext|><|start_header_id|>system<|end_header_id|>
Cutting Knowledge Date: December 2023
Today Date: 26 July 2024
<|eot_id|><|start_header_id|>user<|end_header_id|>
Analyse accelerometer senso
DEBUG: Combined dataset has 56 examples.
DEBUG: Tokenizing the entire dataset...
Map: 0%| | 0/56 [00:00<?, ? examples/s]DEBUG: Tokenizing a batch of size: 56
DEBUG: Tokenized sample (first 10 tokens of input_ids ): [100257, 27, 91, 2527, 8932, 851, 91, 29, 9125, 27]
Map: 100%|██████████| 56/56 [00:00<00:00, 123.90 examples/s]
Map: 100%|██████████| 56/56 [00:00<00:00, 120.58 examples/s]
DEBUG: Tokenized dataset sample keys: dict_keys(['input_ids', 'attention_mask'])
DEBUG: Dataset tokenization complete.
Map: 0%| | 0/56 [00:00<?, ? examples/s]
Map: 100%|██████████| 56/56 [00:00<00:00, 546.20 examples/s]
Map: 100%|██████████| 56/56 [00:00<00:00, 529.34 examples/s]
DEBUG: Beginning trainer.train() ...
Unsloth: Most labels in your dataset are -100. Training losses will be all 0.
For example, are you sure you used `train_on_responses_only` correctly?
Or did you mask our tokens incorrectly? Maybe this is intended?
==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1
\\ /| Num examples = 56 | Num Epochs = 9
O^O/ \_/ \ Batch size per device = 2 | Gradient Accumulation steps = 4
\ / Total batch size = 8 | Total steps = 60
"-____-" Number of trainable parameters = 65,536,000
0%| | 0/60 [00:00<?, ?it/s]
2%|▏ | 1/60 [00:23<22:55, 23.31s/it]
{'loss': 2.8325, 'grad_norm': 0.08582139760255814, 'learning_rate': 4e-05, 'epoch': 0.14}
2%|▏ | 1/60 [00:23<22:55, 23.31s/it]
3%|▎ | 2/60 [00:41<19:44, 20.43s/it]
{'loss': 2.8472, 'grad_norm': 0.1093076840043068, 'learning_rate': 8e-05, 'epoch': 0.29}
3%|▎ | 2/60 [00:41<19:44, 20.43s/it]
5%|▌ | 3/60 [00:58<17:51, 18.81s/it]
{'loss': 2.6442, 'grad_norm': 0.10155556350946426, 'learning_rate': 0.00012, 'epoch': 0.43}
5%|▌ | 3/60 [00:58<17:51, 18.81s/it]
7%|▋ | 4/60 [01:15<16:50, 18.05s/it]
{'loss': 2.675, 'grad_norm': 0.0939909964799881, 'learning_rate': 0.00016, 'epoch': 0.57}
7%|▋ | 4/60 [01:15<16:50, 18.05s/it]
8%|▊ | 5/60 [01:32<16:10, 17.65s/it]
{'loss': 2.496, 'grad_norm': 0.10907075554132462, 'learning_rate': 0.0002, 'epoch': 0.71}
8%|▊ | 5/60 [01:32<16:10, 17.65s/it]
10%|█ | 6/60 [01:49<15:41, 17.43s/it]
{'loss': 2.7976, 'grad_norm': 0.12686112523078918, 'learning_rate': 0.00019636363636363636, 'epoch': 0.86}
10%|█ | 6/60 [01:49<15:41, 17.43s/it]
12%|█▏ | 7/60 [02:06<15:16, 17.30s/it]
{'loss': 2.6129, 'grad_norm': 0.11858198791742325, 'learning_rate': 0.00019272727272727274, 'epoch': 1.0}
12%|█▏ | 7/60 [02:06<15:16, 17.30s/it]
13%|█▎ | 8/60 [02:23<14:55, 17.23s/it]
{'loss': 2.5396, 'grad_norm': 0.11999646574258804, 'learning_rate': 0.0001890909090909091, 'epoch': 1.14}
13%|█▎ | 8/60 [02:23<14:55, 17.23s/it]
15%|█▌ | 9/60 [02:40<14:36, 17.18s/it]
{'loss': 2.6044, 'grad_norm': 0.1258288472890854, 'learning_rate': 0.00018545454545454545, 'epoch': 1.29}
15%|█▌ | 9/60 [02:40<14:36, 17.18s/it]
17%|█▋ | 10/60 [02:57<14:17, 17.14s/it]
{'loss': 2.4266, 'grad_norm': 0.12394820153713226, 'learning_rate': 0.00018181818181818183, 'epoch': 1.43}
17%|█▋ | 10/60 [02:57<14:17, 17.14s/it]
18%|█▊ | 11/60 [03:14<13:59, 17.13s/it]
{'loss': 2.5893, 'grad_norm': 0.13283300399780273, 'learning_rate': 0.0001781818181818182, 'epoch': 1.57}
18%|█▊ | 11/60 [03:14<13:59, 17.13s/it]
20%|██ | 12/60 [03:31<13:42, 17.13s/it]
{'loss': 2.7021, 'grad_norm': 0.1271115243434906, 'learning_rate': 0.00017454545454545454, 'epoch': 1.71}
20%|██ | 12/60 [03:31<13:42, 17.13s/it]
22%|██▏ | 13/60 [03:49<13:25, 17.13s/it]
{'loss': 2.4765, 'grad_norm': 0.10840773582458496, 'learning_rate': 0.0001709090909090909, 'epoch': 1.86}
22%|██▏ | 13/60 [03:49<13:25, 17.13s/it]
23%|██▎ | 14/60 [04:06<13:08, 17.14s/it]
{'loss': 2.5271, 'grad_norm': 0.10484796017408371, 'learning_rate': 0.00016727272727272728, 'epoch': 2.0}
23%|██▎ | 14/60 [04:06<13:08, 17.14s/it]
25%|██▌ | 15/60 [04:23<12:50, 17.13s/it]
{'loss': 2.4216, 'grad_norm': 0.09565936028957367, 'learning_rate': 0.00016363636363636366, 'epoch': 2.14}
25%|██▌ | 15/60 [04:23<12:50, 17.13s/it]
27%|██▋ | 16/60 [04:40<12:33, 17.13s/it]
{'loss': 2.4083, 'grad_norm': 0.08841613680124283, 'learning_rate': 0.00016, 'epoch': 2.29}
27%|██▋ | 16/60 [04:40<12:33, 17.13s/it]
28%|██▊ | 17/60 [04:57<12:16, 17.14s/it]
{'loss': 2.5839, 'grad_norm': 0.09088816493749619, 'learning_rate': 0.00015636363636363637, 'epoch': 2.43}
28%|██▊ | 17/60 [04:57<12:16, 17.14s/it]
30%|███ | 18/60 [05:14<11:59, 17.13s/it]
{'loss': 2.3157, 'grad_norm': 0.10538917034864426, 'learning_rate': 0.00015272727272727275, 'epoch': 2.57}
30%|███ | 18/60 [05:14<11:59, 17.13s/it]
32%|███▏ | 19/60 [05:31<11:42, 17.14s/it]
{'loss': 2.5337, 'grad_norm': 0.10168007761240005, 'learning_rate': 0.0001490909090909091, 'epoch': 2.71}
32%|███▏ | 19/60 [05:31<11:42, 17.14s/it]
33%|███▎ | 20/60 [05:49<11:25, 17.14s/it]
{'loss': 2.5389, 'grad_norm': 0.09013429284095764, 'learning_rate': 0.00014545454545454546, 'epoch': 2.86}
33%|███▎ | 20/60 [05:49<11:25, 17.14s/it]
35%|███▌ | 21/60 [06:06<11:08, 17.14s/it]
{'loss': 2.5527, 'grad_norm': 0.10220775753259659, 'learning_rate': 0.00014181818181818184, 'epoch': 3.0}
35%|███▌ | 21/60 [06:06<11:08, 17.14s/it]
37%|███▋ | 22/60 [06:23<10:51, 17.14s/it]
{'loss': 2.375, 'grad_norm': 0.09961222857236862, 'learning_rate': 0.0001381818181818182, 'epoch': 3.14}
37%|███▋ | 22/60 [06:23<10:51, 17.14s/it]
38%|███▊ | 23/60 [06:40<10:33, 17.13s/it]
{'loss': 2.5698, 'grad_norm': 0.10702770948410034, 'learning_rate': 0.00013454545454545455, 'epoch': 3.29}
38%|███▊ | 23/60 [06:40<10:33, 17.13s/it]
40%|████ | 24/60 [06:57<10:16, 17.13s/it]
{'loss': 2.3717, 'grad_norm': 0.10184632986783981, 'learning_rate': 0.00013090909090909093, 'epoch': 3.43}
40%|████ | 24/60 [06:57<10:16, 17.13s/it]
42%|████▏ | 25/60 [07:14<09:59, 17.13s/it]
{'loss': 2.4711, 'grad_norm': 0.11484692990779877, 'learning_rate': 0.00012727272727272728, 'epoch': 3.57}
42%|████▏ | 25/60 [07:14<09:59, 17.13s/it]
43%|████▎ | 26/60 [07:31<09:42, 17.12s/it]
{'loss': 2.4688, 'grad_norm': 0.12465589493513107, 'learning_rate': 0.00012363636363636364, 'epoch': 3.71}
43%|████▎ | 26/60 [07:31<09:42, 17.12s/it]
45%|████▌ | 27/60 [07:48<09:25, 17.12s/it]
{'loss': 2.3375, 'grad_norm': 0.10505373030900955, 'learning_rate': 0.00012, 'epoch': 3.86}
45%|████▌ | 27/60 [07:48<09:25, 17.12s/it]
47%|████▋ | 28/60 [08:05<09:07, 17.12s/it]
{'loss': 2.4719, 'grad_norm': 0.1254764050245285, 'learning_rate': 0.00011636363636363636, 'epoch': 4.0}
47%|████▋ | 28/60 [08:05<09:07, 17.12s/it]
48%|████▊ | 29/60 [08:23<08:50, 17.13s/it]
{'loss': 2.4156, 'grad_norm': 0.1411900520324707, 'learning_rate': 0.00011272727272727272, 'epoch': 4.14}
48%|████▊ | 29/60 [08:23<08:50, 17.13s/it]
50%|█████ | 30/60 [08:40<08:33, 17.13s/it]
{'loss': 2.3294, 'grad_norm': 0.12304367870092392, 'learning_rate': 0.00010909090909090909, 'epoch': 4.29}
50%|█████ | 30/60 [08:40<08:33, 17.13s/it]
52%|█████▏ | 31/60 [08:57<08:16, 17.13s/it]
{'loss': 2.3492, 'grad_norm': 0.14749528467655182, 'learning_rate': 0.00010545454545454545, 'epoch': 4.43}
52%|█████▏ | 31/60 [08:57<08:16, 17.13s/it]
53%|█████▎ | 32/60 [09:14<07:59, 17.13s/it]
{'loss': 2.2798, 'grad_norm': 0.14352062344551086, 'learning_rate': 0.00010181818181818181, 'epoch': 4.57}
53%|█████▎ | 32/60 [09:14<07:59, 17.13s/it]
55%|█████▌ | 33/60 [09:31<07:42, 17.13s/it]
{'loss': 2.4132, 'grad_norm': 0.16667835414409637, 'learning_rate': 9.818181818181818e-05, 'epoch': 4.71}
55%|█████▌ | 33/60 [09:31<07:42, 17.13s/it]
57%|█████▋ | 34/60 [09:48<07:25, 17.13s/it]
{'loss': 2.4788, 'grad_norm': 0.1789444237947464, 'learning_rate': 9.454545454545455e-05, 'epoch': 4.86}
57%|█████▋ | 34/60 [09:48<07:25, 17.13s/it]
58%|█████▊ | 35/60 [10:05<07:08, 17.12s/it]
{'loss': 2.4947, 'grad_norm': 0.19302651286125183, 'learning_rate': 9.090909090909092e-05, 'epoch': 5.0}
58%|█████▊ | 35/60 [10:05<07:08, 17.12s/it]
60%|██████ | 36/60 [10:23<06:50, 17.12s/it]
{'loss': 2.2137, 'grad_norm': 0.17413997650146484, 'learning_rate': 8.727272727272727e-05, 'epoch': 5.14}
60%|██████ | 36/60 [10:23<06:50, 17.12s/it]
62%|██████▏ | 37/60 [10:40<06:33, 17.12s/it]
{'loss': 2.3259, 'grad_norm': 0.17235468327999115, 'learning_rate': 8.363636363636364e-05, 'epoch': 5.29}
62%|██████▏ | 37/60 [10:40<06:33, 17.12s/it]
63%|██████▎ | 38/60 [10:57<06:16, 17.12s/it]
{'loss': 2.3787, 'grad_norm': 0.20865033566951752, 'learning_rate': 8e-05, 'epoch': 5.43}
63%|██████▎ | 38/60 [10:57<06:16, 17.12s/it]wandb: WARNING Fatal error while uploading data. Some run
data will not be synced, but it will still be written to disk. Use `wandb sync` at the end of the run to try
uploading.
65%|██████▌ | 39/60 [11:14<05:59, 17.12s/it]
{'loss': 2.4876, 'grad_norm': 0.15544921159744263, 'learning_rate': 7.636363636363637e-05, 'epoch': 5.57}
65%|██████▌ | 39/60 [11:14<05:59, 17.12s/it]
67%|██████▋ | 40/60 [11:31<05:42, 17.12s/it]
{'loss': 2.2552, 'grad_norm': 0.17868204414844513, 'learning_rate': 7.272727272727273e-05, 'epoch': 5.71}
67%|██████▋ | 40/60 [11:31<05:42, 17.12s/it]
68%|██████▊ | 41/60 [11:48<05:25, 17.13s/it]
{'loss': 2.419, 'grad_norm': 0.20979967713356018, 'learning_rate': 6.90909090909091e-05, 'epoch': 5.86}
68%|██████▊ | 41/60 [11:48<05:25, 17.13s/it]
70%|███████ | 42/60 [12:05<05:08, 17.12s/it]
{'loss': 2.3014, 'grad_norm': 0.20691952109336853, 'learning_rate': 6.545454545454546e-05, 'epoch': 6.0}
70%|███████ | 42/60 [12:05<05:08, 17.12s/it]
72%|███████▏ | 43/60 [12:22<04:51, 17.13s/it]
{'loss': 2.4537, 'grad_norm': 0.1874876469373703, 'learning_rate': 6.181818181818182e-05, 'epoch': 6.14}
72%|███████▏ | 43/60 [12:22<04:51, 17.13s/it]
73%|███████▎ | 44/60 [12:40<04:33, 17.12s/it]
{'loss': 2.4091, 'grad_norm': 0.17996680736541748, 'learning_rate': 5.818181818181818e-05, 'epoch': 6.29}
73%|███████▎ | 44/60 [12:40<04:33, 17.12s/it]
75%|███████▌ | 45/60 [12:57<04:16, 17.12s/it]
{'loss': 2.3332, 'grad_norm': 0.19999149441719055, 'learning_rate': 5.4545454545454546e-05, 'epoch': 6.43}
75%|███████▌ | 45/60 [12:57<04:16, 17.12s/it]
77%|███████▋ | 46/60 [13:14<03:59, 17.12s/it]
{'loss': 2.1857, 'grad_norm': 0.20532098412513733, 'learning_rate': 5.090909090909091e-05, 'epoch': 6.57}
77%|███████▋ | 46/60 [13:14<03:59, 17.12s/it]
78%|███████▊ | 47/60 [13:31<03:42, 17.12s/it]
{'loss': 1.9826, 'grad_norm': 0.20537638664245605, 'learning_rate': 4.7272727272727275e-05, 'epoch': 6.71}
78%|███████▊ | 47/60 [13:31<03:42, 17.12s/it]
80%|████████ | 48/60 [13:48<03:25, 17.12s/it]
{'loss': 2.3911, 'grad_norm': 0.2094985544681549, 'learning_rate': 4.3636363636363636e-05, 'epoch': 6.86}
80%|████████ | 48/60 [13:48<03:25, 17.12s/it]
82%|████████▏ | 49/60 [14:05<03:08, 17.11s/it]
{'loss': 2.2917, 'grad_norm': 0.24426180124282837, 'learning_rate': 4e-05, 'epoch': 7.0}
82%|████████▏ | 49/60 [14:05<03:08, 17.11s/it]
83%|████████▎ | 50/60 [14:22<02:51, 17.11s/it]
{'loss': 2.2391, 'grad_norm': 0.21321971714496613, 'learning_rate': 3.6363636363636364e-05, 'epoch': 7.14}
83%|████████▎ | 50/60 [14:22<02:51, 17.11s/it]
85%|████████▌ | 51/60 [14:39<02:34, 17.11s/it]
{'loss': 2.1975, 'grad_norm': 0.20057713985443115, 'learning_rate': 3.272727272727273e-05, 'epoch': 7.29}
85%|████████▌ | 51/60 [14:39<02:34, 17.11s/it]
87%|████████▋ | 52/60 [14:56<02:17, 17.13s/it]
{'loss': 2.1757, 'grad_norm': 0.1959208995103836, 'learning_rate': 2.909090909090909e-05, 'epoch': 7.43}
87%|████████▋ | 52/60 [14:56<02:17, 17.13s/it]
88%|████████▊ | 53/60 [15:14<01:59, 17.14s/it]
{'loss': 2.3574, 'grad_norm': 0.24973013997077942, 'learning_rate': 2.5454545454545454e-05, 'epoch': 7.57}
88%|████████▊ | 53/60 [15:14<01:59, 17.14s/it]
90%|█████████ | 54/60 [15:31<01:42, 17.14s/it]
{'loss': 2.3086, 'grad_norm': 0.22749045491218567, 'learning_rate': 2.1818181818181818e-05, 'epoch': 7.71}
90%|█████████ | 54/60 [15:31<01:42, 17.14s/it]
92%|█████████▏| 55/60 [15:48<01:25, 17.13s/it]
{'loss': 2.2214, 'grad_norm': 0.20809921622276306, 'learning_rate': 1.8181818181818182e-05, 'epoch': 7.86}
92%|█████████▏| 55/60 [15:48<01:25, 17.13s/it]
93%|█████████▎| 56/60 [16:05<01:08, 17.13s/it]
{'loss': 2.2949, 'grad_norm': 0.2743261158466339, 'learning_rate': 1.4545454545454545e-05, 'epoch': 8.0}
93%|█████████▎| 56/60 [16:05<01:08, 17.13s/it]
95%|█████████▌| 57/60 [16:22<00:51, 17.13s/it]
{'loss': 2.3547, 'grad_norm': 0.22062085568904877, 'learning_rate': 1.0909090909090909e-05, 'epoch': 8.14}
95%|█████████▌| 57/60 [16:22<00:51, 17.13s/it]
97%|█████████▋| 58/60 [16:39<00:34, 17.12s/it]
{'loss': 2.1947, 'grad_norm': 0.1845628023147583, 'learning_rate': 7.272727272727272e-06, 'epoch': 8.29}
97%|█████████▋| 58/60 [16:39<00:34, 17.12s/it]
98%|█████████▊| 59/60 [16:56<00:17, 17.12s/it]
{'loss': 2.1337, 'grad_norm': 0.22526168823242188, 'learning_rate': 3.636363636363636e-06, 'epoch': 8.43}
98%|█████████▊| 59/60 [16:56<00:17, 17.12s/it]
100%|██████████| 60/60 [17:13<00:00, 17.12s/it]
{'loss': 2.1907, 'grad_norm': 0.22521968185901642, 'learning_rate': 0.0, 'epoch': 8.57}
100%|██████████| 60/60 [17:13<00:00, 17.12s/it]wandb: Adding directory to artifact
(./outputs/checkpoint-60)... Done. 0.9s
{'train_runtime': 1039.7781, 'train_samples_per_second': 0.462, 'train_steps_per_second': 0.058,
'train_loss': 2.417484853665034, 'epoch': 8.57}
100%|██████████| 60/60 [17:18<00:00, 17.12s/it]
100%|██████████| 60/60 [17:21<00:00, 17.35s/it]
DEBUG: Training complete. Saving model and tokenizer locally...
DEBUG: Saved model and tokenizer to 'lora_model'.
Unsloth: You are pushing to hub, but you passed your HF username = mervinpraison.
We shall truncate mervinpraison/Phi-4-ur-fall-bnb-4bit to Phi-4-ur-fall-bnb-4bit
Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 26.41 out of 47.13 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...
0%| | 0/40 [00:00<?, ?it/s]
5%|▌ | 2/40 [00:00<00:02, 17.08it/s]
12%|█▎ | 5/40 [00:00<00:01, 23.72it/s]
20%|██ | 8/40 [00:00<00:01, 26.24it/s]
28%|██▊ | 11/40 [00:00<00:01, 27.43it/s]
35%|███▌ | 14/40 [00:00<00:00, 28.19it/s]
42%|████▎ | 17/40 [00:00<00:00, 28.44it/s]
50%|█████ | 20/40 [00:00<00:00, 28.74it/s]
57%|█████▊ | 23/40 [00:00<00:00, 28.98it/s]
65%|██████▌ | 26/40 [00:00<00:00, 29.14it/s]
72%|███████▎ | 29/40 [00:01<00:00, 29.08it/s]
80%|████████ | 32/40 [00:01<00:00, 29.19it/s]
88%|████████▊ | 35/40 [00:01<00:00, 29.28it/s]
95%|█████████▌| 38/40 [00:01<00:00, 29.29it/s]
100%|██████████| 40/40 [00:01<00:00, 28.24it/s]
Unsloth: Saving tokenizer... Done.
- Downloads last month
- 0
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model’s pipeline type.