model_hh_usp2_200 / README.md
guoyu-zhang's picture
model_hh_usp2_200
e1e379d verified
metadata
license: llama2
library_name: peft
tags:
  - trl
  - dpo
  - generated_from_trainer
base_model: meta-llama/Llama-2-7b-chat-hf
model-index:
  - name: model_hh_usp2_200
    results: []

model_hh_usp2_200

This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.7539
  • Rewards/chosen: -6.1342
  • Rewards/rejected: -7.0734
  • Rewards/accuracies: 0.5500
  • Rewards/margins: 0.9392
  • Logps/rejected: -123.0264
  • Logps/chosen: -118.7056
  • Logits/rejected: -0.0859
  • Logits/chosen: -0.0281

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.0 8.0 100 1.7627 -5.8776 -6.7947 0.5400 0.9171 -122.7167 -118.4203 -0.0739 -0.0163
0.0 16.0 200 1.7526 -5.9719 -6.9070 0.5200 0.9351 -122.8416 -118.5252 -0.0772 -0.0191
0.0 24.0 300 1.7452 -5.9893 -6.9334 0.5400 0.9440 -122.8708 -118.5445 -0.0823 -0.0239
0.0 32.0 400 1.7405 -6.0454 -7.0112 0.5400 0.9658 -122.9573 -118.6068 -0.0827 -0.0247
0.0 40.0 500 1.7542 -6.0927 -7.0508 0.5500 0.9581 -123.0013 -118.6594 -0.0849 -0.0269
0.0 48.0 600 1.7457 -6.1288 -7.0751 0.5300 0.9463 -123.0282 -118.6995 -0.0843 -0.0262
0.0 56.0 700 1.7426 -6.1364 -7.0982 0.5400 0.9619 -123.0540 -118.7079 -0.0868 -0.0288
0.0 64.0 800 1.7365 -6.1361 -7.0983 0.5600 0.9621 -123.0540 -118.7077 -0.0867 -0.0287
0.0 72.0 900 1.7559 -6.1205 -7.0808 0.5500 0.9604 -123.0346 -118.6903 -0.0874 -0.0288
0.0 80.0 1000 1.7539 -6.1342 -7.0734 0.5500 0.9392 -123.0264 -118.7056 -0.0859 -0.0281

Framework versions

  • PEFT 0.10.0
  • Transformers 4.39.1
  • Pytorch 2.2.1+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2