See axolotl config
axolotl version: 0.4.1
base_model: fxmarty/small-llama-testing
batch_size: 32
bf16: true
chat_template: tokenizer_default_fallback_alpaca
datasets:
- format: custom
path: argilla/databricks-dolly-15k-curated-en
type:
field_input: original-instruction
field_instruction: original-instruction
field_output: original-response
format: '{instruction} {input}'
no_input_format: '{instruction}'
system_format: '{system}'
system_prompt: ''
eval_steps: 20
flash_attention: true
gpu_memory_limit: 80GiB
gradient_checkpointing: true
group_by_length: true
hub_model_id: willtensora/test-repo124
hub_strategy: checkpoint
learning_rate: 0.0002
logging_steps: 10
lr_scheduler: cosine
micro_batch_size: 19
model_type: AutoModelForCausalLM
num_epochs: 10
optimizer: adamw_bnb_8bit
output_dir: /workspace/axolotl/configs
pad_to_sequence_len: true
resize_token_embeddings_to_32x: false
sample_packing: false
save_steps: 40
save_total_limit: 1
sequence_len: 2048
special_tokens:
pad_token: </s>
tokenizer_type: LlamaTokenizerFast
train_on_inputs: false
trust_remote_code: true
val_set_size: 0.1
wandb_entity: ''
wandb_mode: online
wandb_name: fxmarty/small-llama-testing-argilla/databricks-dolly-15k-curated-en
wandb_project: Gradients-On-Demand
wandb_run: your_name
wandb_runid: default
warmup_ratio: 0.05
xformers_attention: true
test-repo124
This model is a fine-tuned version of fxmarty/small-llama-testing on the None dataset. It achieves the following results on the evaluation set:
- Loss: 6.0841
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 19
- eval_batch_size: 19
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 152
- total_eval_batch_size: 152
- optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 26
- num_epochs: 10
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
No log | 0.0112 | 1 | 10.4228 |
10.127 | 0.2247 | 20 | 9.8630 |
9.039 | 0.4494 | 40 | 8.7400 |
8.1126 | 0.6742 | 60 | 7.9190 |
7.5515 | 0.8989 | 80 | 7.4582 |
7.2771 | 1.1236 | 100 | 7.2773 |
7.1388 | 1.3483 | 120 | 7.1772 |
7.0583 | 1.5730 | 140 | 7.0581 |
6.957 | 1.7978 | 160 | 6.9383 |
6.8789 | 2.0225 | 180 | 6.8210 |
6.7029 | 2.2472 | 200 | 6.7213 |
6.5913 | 2.4719 | 220 | 6.6364 |
6.4981 | 2.6966 | 240 | 6.5572 |
6.4453 | 2.9213 | 260 | 6.4724 |
6.2642 | 3.1461 | 280 | 6.4135 |
6.2365 | 3.3708 | 300 | 6.3668 |
6.2746 | 3.5955 | 320 | 6.3166 |
6.2488 | 3.8202 | 340 | 6.2882 |
6.1749 | 4.0449 | 360 | 6.2413 |
6.0514 | 4.2697 | 380 | 6.2183 |
6.0162 | 4.4944 | 400 | 6.1961 |
6.0043 | 4.7191 | 420 | 6.1763 |
6.0239 | 4.9438 | 440 | 6.1544 |
5.9605 | 5.1685 | 460 | 6.1444 |
6.0214 | 5.3933 | 480 | 6.1288 |
5.9376 | 5.6180 | 500 | 6.1234 |
5.8258 | 5.8427 | 520 | 6.1087 |
5.8755 | 6.0674 | 540 | 6.1070 |
5.838 | 6.2921 | 560 | 6.1017 |
5.852 | 6.5169 | 580 | 6.0974 |
5.9298 | 6.7416 | 600 | 6.1005 |
5.8933 | 6.9663 | 620 | 6.0981 |
5.8921 | 7.1910 | 640 | 6.0962 |
5.8261 | 7.4157 | 660 | 6.0942 |
5.8207 | 7.6404 | 680 | 6.0891 |
5.8623 | 7.8652 | 700 | 6.0875 |
5.8938 | 8.0899 | 720 | 6.0950 |
5.8684 | 8.3146 | 740 | 6.0897 |
5.9065 | 8.5393 | 760 | 6.0881 |
5.8135 | 8.7640 | 780 | 6.0886 |
5.7873 | 8.9888 | 800 | 6.0908 |
5.8081 | 9.2135 | 820 | 6.0929 |
5.8307 | 9.4382 | 840 | 6.0849 |
5.8492 | 9.6629 | 860 | 6.0893 |
5.8504 | 9.8876 | 880 | 6.0841 |
Framework versions
- Transformers 4.46.0
- Pytorch 2.5.0+cu124
- Datasets 3.0.1
- Tokenizers 0.20.1
- Downloads last month
- 4
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for willtensora/test-repo124
Base model
fxmarty/small-llama-testing