Built with Axolotl

See axolotl config

axolotl version: 0.4.0

base_model: Undi95/Meta-Llama-3-8B-hf
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: false
strict: false

datasets:
  - path: Pbug/bftest
    type: sharegpt
dataset_prepared_path:
val_set_size: 0.05
output_dir: ./lora-out

sequence_len: 8192
sample_packing: true
pad_to_sequence_len: true

wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 8
micro_batch_size: 1
num_epochs: 10
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 2e-5

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
early_stopping_patience:
resume_from_checkpoint:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 100
evals_per_epoch: 2
eval_table_size:
eval_sample_packing: False
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
  pad_token: <|end_of_text|>

lora-out

This model is a fine-tuned version of Undi95/Meta-Llama-3-8B-hf on the None dataset. It achieves the following results on the evaluation set:

  • Loss: nan

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 10

Training results

Training Loss Epoch Step Validation Loss
2.5419 0.03 1 nan
2.5492 0.51 18 nan
2.434 1.01 36 nan
2.3504 1.5 54 nan
2.3643 2.0 72 nan
2.2834 2.48 90 nan
2.2383 2.98 108 nan
1.8786 3.47 126 nan
1.7963 3.98 144 nan
2.1853 4.47 162 nan
1.4333 4.98 180 nan
1.2058 5.46 198 nan
1.125 5.96 216 nan
0.809 6.44 234 nan
0.7118 6.94 252 nan
0.7175 7.44 270 nan
0.7341 7.94 288 nan
0.774 8.44 306 nan
0.6379 8.94 324 nan
0.562 9.41 342 nan

Framework versions

  • Transformers 4.40.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.15.0
  • Tokenizers 0.15.0
Downloads last month
1
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for Pbug/bf_rp_base

Finetuned
(8)
this model