NicholasCorrado's picture
End of training
0387d47 verified
metadata
library_name: transformers
license: apache-2.0
base_model: alignment-handbook/zephyr-7b-sft-full
tags:
  - alignment-handbook
  - trl
  - dpo
  - generated_from_trainer
  - trl
  - dpo
  - generated_from_trainer
datasets:
  - data/zephyr_uf_rlced_conifer_ref
model-index:
  - name: zephyr-7b-uf-rlced-conifer-group-dpo-2e-alr-0.01
    results: []

zephyr-7b-uf-rlced-conifer-group-dpo-2e-alr-0.01

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the data/zephyr_uf_rlced_conifer_ref dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2395
  • Rewards/chosen: -2.8511
  • Rewards/rejected: -8.5888
  • Rewards/accuracies: 0.8778
  • Rewards/margins: 5.7377
  • Logps/rejected: -1262.6172
  • Logps/chosen: -677.5837
  • Logits/rejected: 3.8778
  • Logits/chosen: 1.9376
  • Excess Loss: 0.0374
  • Alpha 0 Uf: 0.5116
  • Alpha 1 Rlced Conifer: 0.4884
  • Rewards/chosen 1 Rlced Conifer: -3.0535
  • Rewards/rejected 1 Rlced Conifer: -10.0348
  • Rewards/accuracies 1 Rlced Conifer: 0.9097
  • Rewards/margins 1 Rlced Conifer: 6.9812
  • Logps/rejected 1 Rlced Conifer: -1451.0132
  • Logps/chosen 1 Rlced Conifer: -728.9337
  • Logits/rejected 1 Rlced Conifer: 3.5676
  • Logits/chosen 1 Rlced Conifer: 1.5730
  • Task Loss 1 Rlced Conifer: 0.1787
  • Task Excess Loss 1 Rlced Conifer: 0.0427
  • Rewards/chosen 0 Uf: -2.0820
  • Rewards/rejected 0 Uf: -3.4336
  • Rewards/accuracies 0 Uf: 0.7633
  • Rewards/margins 0 Uf: 1.3516
  • Logps/rejected 0 Uf: -584.9677
  • Logps/chosen 0 Uf: -497.4562
  • Logits/rejected 0 Uf: 5.1753
  • Logits/chosen 0 Uf: 3.1000
  • Task Loss 0 Uf: 0.5185
  • Task Excess Loss 0 Uf: 0.0724

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 256
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen Excess Loss Alpha 0 Uf Alpha 1 Rlced Conifer Rewards/chosen 1 Rlced Conifer Rewards/rejected 1 Rlced Conifer Rewards/accuracies 1 Rlced Conifer Rewards/margins 1 Rlced Conifer Logps/rejected 1 Rlced Conifer Logps/chosen 1 Rlced Conifer Logits/rejected 1 Rlced Conifer Logits/chosen 1 Rlced Conifer Task Loss 1 Rlced Conifer Task Excess Loss 1 Rlced Conifer Rewards/chosen 0 Uf Rewards/rejected 0 Uf Rewards/accuracies 0 Uf Rewards/margins 0 Uf Logps/rejected 0 Uf Logps/chosen 0 Uf Logits/rejected 0 Uf Logits/chosen 0 Uf Task Loss 0 Uf Task Excess Loss 0 Uf
0.1689 0.4997 360 0.2674 -2.2066 -5.7976 0.8656 3.5910 -983.4942 -613.1316 1.9639 0.4895 0.0642 0.5765 0.4235 -2.3017 -6.6520 0.8965 4.3503 -1112.7397 -653.7553 1.7066 0.1879 0.2091 0.0748 -1.8461 -2.7792 0.7426 0.9330 -519.5245 -473.8738 3.0556 1.4702 0.5392 0.0891
0.1413 0.9993 720 0.2485 -2.0138 -6.1196 0.8741 4.1059 -1015.6987 -593.8471 2.5252 1.3345 0.0465 0.6417 0.3583 -2.0972 -7.0507 0.9047 4.9535 -1152.6036 -633.2974 2.1536 1.0120 0.1925 0.0584 -1.6822 -2.7943 0.7670 1.1121 -521.0374 -457.4840 4.0168 2.3771 0.4989 0.0595
0.0671 1.4990 1080 0.2408 -2.5432 -7.7524 0.8741 5.2092 -1178.9786 -646.7894 3.9871 2.3348 0.0389 0.5284 0.4716 -2.6717 -8.9931 0.9071 6.3215 -1346.8500 -690.7497 3.5948 1.9516 0.1822 0.0462 -2.0401 -3.3250 0.7500 1.2849 -574.1076 -493.2740 5.5773 3.5557 0.5197 0.0655
0.0649 1.9986 1440 0.2395 -2.8511 -8.5888 0.8778 5.7377 -1262.6172 -677.5837 3.8778 1.9376 0.0374 0.5116 0.4884 -3.0535 -10.0348 0.9097 6.9812 -1451.0132 -728.9337 3.5676 1.5730 0.1787 0.0427 -2.0820 -3.4336 0.7633 1.3516 -584.9677 -497.4562 5.1753 3.1000 0.5185 0.0724

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.2.0a0+81ea7a4
  • Datasets 2.21.0
  • Tokenizers 0.19.1