See axolotl config

axolotl version: 0.4.1

base_model: fxmarty/small-llama-testing
batch_size: 32
bf16: true
chat_template: tokenizer_default_fallback_alpaca
datasets:
- format: custom
  path: argilla/databricks-dolly-15k-curated-en
  type:
    field_input: original-instruction
    field_instruction: original-instruction
    field_output: original-response
    format: '{instruction} {input}'
    no_input_format: '{instruction}'
    system_format: '{system}'
    system_prompt: ''
eval_steps: 20
flash_attention: true
gpu_memory_limit: 80GiB
gradient_checkpointing: true
group_by_length: true
hub_model_id: willtensora/test-repo124
hub_strategy: checkpoint
learning_rate: 0.0002
logging_steps: 10
lr_scheduler: cosine
micro_batch_size: 19
model_type: AutoModelForCausalLM
num_epochs: 10
optimizer: adamw_bnb_8bit
output_dir: /workspace/axolotl/configs
pad_to_sequence_len: true
resize_token_embeddings_to_32x: false
sample_packing: false
save_steps: 40
save_total_limit: 1
sequence_len: 2048
special_tokens:
  pad_token: </s>
tokenizer_type: LlamaTokenizerFast
train_on_inputs: false
trust_remote_code: true
val_set_size: 0.1
wandb_entity: ''
wandb_mode: online
wandb_name: fxmarty/small-llama-testing-argilla/databricks-dolly-15k-curated-en
wandb_project: Gradients-On-Demand
wandb_run: your_name
wandb_runid: default
warmup_ratio: 0.05
xformers_attention: true

test-repo124

This model is a fine-tuned version of fxmarty/small-llama-testing on the None dataset. It achieves the following results on the evaluation set:

Loss: 6.0841

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 19
eval_batch_size: 19
seed: 42
distributed_type: multi-GPU
num_devices: 8
total_train_batch_size: 152
total_eval_batch_size: 152
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 26
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss
No log	0.0112	1	10.4228
10.127	0.2247	20	9.8630
9.039	0.4494	40	8.7400
8.1126	0.6742	60	7.9190
7.5515	0.8989	80	7.4582
7.2771	1.1236	100	7.2773
7.1388	1.3483	120	7.1772
7.0583	1.5730	140	7.0581
6.957	1.7978	160	6.9383
6.8789	2.0225	180	6.8210
6.7029	2.2472	200	6.7213
6.5913	2.4719	220	6.6364
6.4981	2.6966	240	6.5572
6.4453	2.9213	260	6.4724
6.2642	3.1461	280	6.4135
6.2365	3.3708	300	6.3668
6.2746	3.5955	320	6.3166
6.2488	3.8202	340	6.2882
6.1749	4.0449	360	6.2413
6.0514	4.2697	380	6.2183
6.0162	4.4944	400	6.1961
6.0043	4.7191	420	6.1763
6.0239	4.9438	440	6.1544
5.9605	5.1685	460	6.1444
6.0214	5.3933	480	6.1288
5.9376	5.6180	500	6.1234
5.8258	5.8427	520	6.1087
5.8755	6.0674	540	6.1070
5.838	6.2921	560	6.1017
5.852	6.5169	580	6.0974
5.9298	6.7416	600	6.1005
5.8933	6.9663	620	6.0981
5.8921	7.1910	640	6.0962
5.8261	7.4157	660	6.0942
5.8207	7.6404	680	6.0891
5.8623	7.8652	700	6.0875
5.8938	8.0899	720	6.0950
5.8684	8.3146	740	6.0897
5.9065	8.5393	760	6.0881
5.8135	8.7640	780	6.0886
5.7873	8.9888	800	6.0908
5.8081	9.2135	820	6.0929
5.8307	9.4382	840	6.0849
5.8492	9.6629	860	6.0893
5.8504	9.8876	880	6.0841

Framework versions

Transformers 4.46.0
Pytorch 2.5.0+cu124
Datasets 3.0.1
Tokenizers 0.20.1

willtensora
/

test-repo124

test-repo124

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for willtensora/test-repo124

Evaluation results