NicholasCorrado's picture
End of training
67ac077 verified
|
raw
history blame
7.4 kB
metadata
library_name: transformers
license: apache-2.0
base_model: alignment-handbook/zephyr-7b-sft-full
tags:
  - alignment-handbook
  - trl
  - dpo
  - generated_from_trainer
  - trl
  - dpo
  - generated_from_trainer
datasets:
  - data/zephyr_uf_rlced_conifer_ref_1e2e
model-index:
  - name: zephyr-7b-uf-rlced-conifer-1e2e-group-dpo-2e
    results: []

zephyr-7b-uf-rlced-conifer-1e2e-group-dpo-2e

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the data/zephyr_uf_rlced_conifer_ref_1e2e dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2626
  • Rewards/chosen: -2.1843
  • Rewards/rejected: -5.4288
  • Rewards/accuracies: 0.8684
  • Rewards/margins: 3.2445
  • Logps/rejected: -946.6157
  • Logps/chosen: -610.9032
  • Logits/rejected: 1.2318
  • Logits/chosen: -0.7806
  • Excess Loss: 0.0374
  • Alpha 0 Uf: 0.8470
  • Alpha 1 Rlced Conifer: 0.1530
  • Rewards/chosen 1 Rlced Conifer: -2.2281
  • Rewards/rejected 1 Rlced Conifer: -6.0246
  • Rewards/accuracies 1 Rlced Conifer: 0.8987
  • Rewards/margins 1 Rlced Conifer: 3.7965
  • Logps/rejected 1 Rlced Conifer: -1049.9939
  • Logps/chosen 1 Rlced Conifer: -646.3860
  • Logits/rejected 1 Rlced Conifer: 1.1158
  • Logits/chosen 1 Rlced Conifer: -0.9982
  • Task Loss 1 Rlced Conifer: 0.2102
  • Task Excess Loss 1 Rlced Conifer: 0.0475
  • Rewards/chosen 0 Uf: -1.9978
  • Rewards/rejected 0 Uf: -3.3091
  • Rewards/accuracies 0 Uf: 0.7603
  • Rewards/margins 0 Uf: 1.3113
  • Logps/rejected 0 Uf: -572.5212
  • Logps/chosen 0 Uf: -489.0419
  • Logits/rejected 0 Uf: 1.8243
  • Logits/chosen 0 Uf: -0.1004
  • Task Loss 0 Uf: 0.4944
  • Task Excess Loss 0 Uf: 0.0469

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 256
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen Excess Loss Alpha 0 Uf Alpha 1 Rlced Conifer Rewards/chosen 1 Rlced Conifer Rewards/rejected 1 Rlced Conifer Rewards/accuracies 1 Rlced Conifer Rewards/margins 1 Rlced Conifer Logps/rejected 1 Rlced Conifer Logps/chosen 1 Rlced Conifer Logits/rejected 1 Rlced Conifer Logits/chosen 1 Rlced Conifer Task Loss 1 Rlced Conifer Task Excess Loss 1 Rlced Conifer Rewards/chosen 0 Uf Rewards/rejected 0 Uf Rewards/accuracies 0 Uf Rewards/margins 0 Uf Logps/rejected 0 Uf Logps/chosen 0 Uf Logits/rejected 0 Uf Logits/chosen 0 Uf Task Loss 0 Uf Task Excess Loss 0 Uf
0.1953 0.4997 360 0.3535 -1.5938 -3.1996 0.8402 1.6058 -723.6984 -551.8521 0.1112 -0.7863 0.1136 0.9694 0.0306 -1.5989 -3.4179 0.8677 1.8190 -789.3262 -583.4747 -0.1145 -0.9516 0.3087 0.1414 -1.5520 -2.3972 0.7448 0.8452 -481.3242 -444.4588 1.0137 -0.2527 0.5289 0.0768
0.1537 0.9993 720 0.3329 -1.4289 -3.2979 0.8609 1.8690 -733.5210 -535.3586 0.6830 -0.5276 0.0943 0.9852 0.0148 -1.4038 -3.4887 0.8869 2.0849 -796.4048 -563.9600 0.3914 -0.7372 0.2955 0.1278 -1.4972 -2.5982 0.7618 1.1009 -501.4233 -438.9818 1.8477 0.1514 0.4804 0.0530
0.0667 1.4990 1080 0.2667 -2.1402 -5.1839 0.8656 3.0437 -922.1221 -606.4852 1.0002 -0.7884 0.0408 0.8954 0.1046 -2.1729 -5.7323 0.8964 3.5594 -1020.7665 -640.8754 0.8903 -0.9784 0.2150 0.0521 -1.9916 -3.2293 0.7574 1.2377 -564.5363 -488.4239 1.5582 -0.1961 0.4940 0.0466
0.06 1.9986 1440 0.2626 -2.1843 -5.4288 0.8684 3.2445 -946.6157 -610.9032 1.2318 -0.7806 0.0374 0.8470 0.1530 -2.2281 -6.0246 0.8987 3.7965 -1049.9939 -646.3860 1.1158 -0.9982 0.2102 0.0475 -1.9978 -3.3091 0.7603 1.3113 -572.5212 -489.0419 1.8243 -0.1004 0.4944 0.0469

Framework versions

  • Transformers 4.44.1
  • Pytorch 2.1.2+cu121
  • Datasets 2.21.0
  • Tokenizers 0.19.1