See axolotl config

axolotl version: 0.4.1

adapter: lora
auto_find_batch_size: true
base_model: unsloth/tinyllama
bf16: auto
chat_template: llama3
dataloader_num_workers: 12
dataset_prepared_path: null
datasets:
- data_files:
  - 8bd0da3b508b00b8_train_data.json
  ds_type: json
  format: custom
  path: /workspace/input_data/8bd0da3b508b00b8_train_data.json
  type:
    field_instruction: prompt
    field_output: chosen
    format: '{instruction}'
    no_input_format: '{instruction}'
    system_format: '{system}'
    system_prompt: ''
debug: null
deepspeed: null
early_stopping_patience: 3
early_stopping_threshold: 0.001
eval_max_new_tokens: 128
eval_steps: 20
flash_attention: false
fp16: null
fsdp: null
fsdp_config: null
gradient_accumulation_steps: 2
gradient_checkpointing: false
group_by_length: false
hub_model_id: mrferr3t/40a6e232-1d76-4065-867f-8d8c058ab9b5
hub_repo: null
hub_strategy: checkpoint
hub_token: null
learning_rate: 0.0003
load_in_4bit: false
load_in_8bit: false
local_rank: null
logging_steps: 100
lora_alpha: 16
lora_dropout: 0.05
lora_fan_in_fan_out: null
lora_model_dir: null
lora_r: 8
lora_target_linear: true
lr_scheduler: cosine
micro_batch_size: 32
mlflow_experiment_name: /tmp/8bd0da3b508b00b8_train_data.json
model_type: AutoModelForCausalLM
num_epochs: 5
optimizer: adamw_bnb_8bit
output_dir: miner_id_24
pad_to_sequence_len: true
s2_attention: null
sample_packing: false
save_steps: 20
saves_per_epoch: 0
sequence_len: 512
strict: false
tf32: false
tokenizer_type: AutoTokenizer
train_on_inputs: false
trust_remote_code: true
val_set_size: 0.05
wandb_entity: null
wandb_mode: online
wandb_name: 0fd25184-a976-4cc1-9dd6-b91fe10c3a8f
wandb_project: Gradients-On-Demand
wandb_run: your_name
wandb_runid: 0fd25184-a976-4cc1-9dd6-b91fe10c3a8f
warmup_ratio: 0.05
weight_decay: 0.0
xformers_attention: null

40a6e232-1d76-4065-867f-8d8c058ab9b5

This model is a fine-tuned version of unsloth/tinyllama on the None dataset. It achieves the following results on the evaluation set:

Loss: 1.1493

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 32
eval_batch_size: 32
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 64
optimizer: Use adamw_bnb_8bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 122
num_epochs: 5

Training results

Training Loss	Epoch	Step	Validation Loss
No log	0.0010	1	1.7236
No log	0.0204	20	1.5940
No log	0.0409	40	1.3535
No log	0.0613	60	1.2795
No log	0.0818	80	1.2480
1.3847	0.1022	100	1.2336
1.3847	0.1227	120	1.2239
1.3847	0.1431	140	1.2142
1.3847	0.1636	160	1.2047
1.3847	0.1840	180	1.1998
1.2025	0.2045	200	1.1945
1.2025	0.2249	220	1.1917
1.2025	0.2454	240	1.1872
1.2025	0.2658	260	1.1860
1.2025	0.2863	280	1.1830
1.1872	0.3067	300	1.1791
1.1872	0.3272	320	1.1772
1.1872	0.3476	340	1.1755
1.1872	0.3681	360	1.1734
1.1872	0.3885	380	1.1710
1.1716	0.4090	400	1.1689
1.1716	0.4294	420	1.1661
1.1716	0.4499	440	1.1688
1.1716	0.4703	460	1.1643
1.1716	0.4908	480	1.1628
1.1504	0.5112	500	1.1618
1.1504	0.5317	520	1.1615
1.1504	0.5521	540	1.1595
1.1504	0.5726	560	1.1584
1.1504	0.5930	580	1.1580
1.1607	0.6135	600	1.1584
1.1607	0.6339	620	1.1562
1.1607	0.6544	640	1.1560
1.1607	0.6748	660	1.1542
1.1607	0.6953	680	1.1540
1.1576	0.7157	700	1.1540
1.1576	0.7362	720	1.1538
1.1576	0.7566	740	1.1515
1.1576	0.7771	760	1.1518
1.1576	0.7975	780	1.1492
1.1404	0.8180	800	1.1495
1.1404	0.8384	820	1.1488
1.1404	0.8589	840	1.1473
1.1404	0.8793	860	1.1463
1.1404	0.8998	880	1.1456
1.1427	0.9202	900	1.1454
1.1427	0.9407	920	1.1452
1.1427	0.9611	940	1.1436
1.1427	0.9816	960	1.1440
1.1427	1.0020	980	1.1433
1.117	1.0225	1000	1.1475
1.117	1.0429	1020	1.1482
1.117	1.0634	1040	1.1493

Framework versions

PEFT 0.13.2
Transformers 4.46.0
Pytorch 2.3.1+cu121
Datasets 3.0.1
Tokenizers 0.20.1

mrferr3t
/

40a6e232-1d76-4065-867f-8d8c058ab9b5

40a6e232-1d76-4065-867f-8d8c058ab9b5

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for mrferr3t/40a6e232-1d76-4065-867f-8d8c058ab9b5

Evaluation results