Built with Axolotl

See axolotl config

axolotl version: 0.4.1

adapter: lora
base_model: EleutherAI/pythia-410m-deduped
bf16: auto
dataset_prepared_path: null
datasets:
- data_files:
  - 268406b2c8127f67_train_data.json
  ds_type: json
  format: custom
  path: 268406b2c8127f67_train_data.json
  type:
    field: null
    field_input: context
    field_instruction: instruction
    field_output: response
    field_system: null
    format: null
    no_input_format: null
    system_format: '{system}'
    system_prompt: ''
early_stopping_patience: null
evals_per_epoch: 4
gradient_accumulation_steps: 1
group_by_length: false
hub_model_id: FatCat87/taopanda-2_bcc7097d-6c73-48e3-aaee-f9f854afb9b4
learning_rate: 1.0e-05
load_in_8bit: true
local_rank: null
logging_steps: 1
lora_alpha: 32
lora_dropout: 0.05
lora_fan_in_fan_out: true
lora_model_dir: null
lora_r: 16
lora_target_linear: null
lora_target_modules:
- query_key_value
micro_batch_size: 4
num_epochs: 4
output_dir: ./outputs/lora-alpaca-pythia/taopanda-2_bcc7097d-6c73-48e3-aaee-f9f854afb9b4
resume_from_checkpoint: null
seed: 84664
sequence_len: 512
special_tokens:
  pad_token: <|endoftext|>
tf32: true
train_on_inputs: false
val_set_size: 0.05
wandb_entity: fatcat87-taopanda
wandb_log_model: null
wandb_mode: online
wandb_name: taopanda-2_bcc7097d-6c73-48e3-aaee-f9f854afb9b4
wandb_project: subnet56
wandb_runid: taopanda-2_bcc7097d-6c73-48e3-aaee-f9f854afb9b4
wandb_watch: null
weight_decay: 0.1

Visualize in Weights & Biases

taopanda-2_bcc7097d-6c73-48e3-aaee-f9f854afb9b4

This model is a fine-tuned version of EleutherAI/pythia-410m-deduped on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 2.2826

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 84664
  • distributed_type: multi-GPU
  • num_devices: 2
  • total_train_batch_size: 8
  • total_eval_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 4

Training results

Training Loss Epoch Step Validation Loss
2.7173 0.0006 1 2.8035
2.3948 0.2505 414 2.5510
2.6987 0.5009 828 2.4507
2.2231 0.7514 1242 2.3969
2.4612 1.0018 1656 2.3698
2.9173 1.2523 2070 2.3450
2.3121 1.5027 2484 2.3282
2.8931 1.7532 2898 2.3154
2.0185 2.0036 3312 2.3080
2.2114 2.2541 3726 2.2980
2.4148 2.5045 4140 2.2941
2.2134 2.7550 4554 2.2887
1.5517 3.0054 4968 2.2839
2.2136 3.2559 5382 2.2811
1.2004 3.5064 5796 2.2838
2.374 3.7568 6210 2.2826

Framework versions

  • PEFT 0.11.1
  • Transformers 4.42.3
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
16
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model’s pipeline type.

Model tree for FatCat87/taopanda-2_bcc7097d-6c73-48e3-aaee-f9f854afb9b4

Adapter
(203)
this model