model_hh_usp4_200

This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.2047
  • Rewards/chosen: -10.4203
  • Rewards/rejected: -11.3883
  • Rewards/accuracies: 0.6100
  • Rewards/margins: 0.9680
  • Logps/rejected: -126.6996
  • Logps/chosen: -122.1098
  • Logits/rejected: -0.0920
  • Logits/chosen: -0.0765

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.0 8.0 100 2.1928 -10.7974 -11.7956 0.6200 0.9982 -127.1522 -122.5288 -0.0977 -0.0806
0.0 16.0 200 2.2073 -10.6145 -11.6045 0.6200 0.9900 -126.9398 -122.3256 -0.0957 -0.0785
0.0 24.0 300 2.1834 -10.5576 -11.5387 0.6100 0.9810 -126.8667 -122.2624 -0.0926 -0.0766
0.0 32.0 400 2.2206 -10.5218 -11.4563 0.6000 0.9345 -126.7752 -122.2226 -0.0924 -0.0768
0.0 40.0 500 2.1989 -10.4576 -11.4408 0.6100 0.9832 -126.7580 -122.1513 -0.0920 -0.0763
0.0 48.0 600 2.1897 -10.4344 -11.3970 0.6000 0.9626 -126.7093 -122.1255 -0.0915 -0.0758
0.0 56.0 700 2.1723 -10.3994 -11.3863 0.6000 0.9869 -126.6974 -122.0866 -0.0916 -0.0760
0.0 64.0 800 2.1910 -10.4312 -11.3832 0.6100 0.9520 -126.6939 -122.1220 -0.0918 -0.0760
0.0 72.0 900 2.1762 -10.4083 -11.3782 0.6100 0.9699 -126.6885 -122.0965 -0.0916 -0.0762
0.0 80.0 1000 2.2047 -10.4203 -11.3883 0.6100 0.9680 -126.6996 -122.1098 -0.0920 -0.0765

Framework versions

  • PEFT 0.10.0
  • Transformers 4.39.3
  • Pytorch 2.2.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for guoyu-zhang/model_hh_usp4_200

Adapter
(1101)
this model