phi-2-gpo-renew2-b0.001-i0

This model is a fine-tuned version of lole25/phi-2-sft-lora-ultrachat on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0367
  • Rewards/chosen: -0.0859
  • Rewards/rejected: -0.1297
  • Rewards/accuracies: 0.6335
  • Rewards/margins: 0.0439
  • Logps/rejected: -373.5459
  • Logps/chosen: -363.4243
  • Logits/rejected: 0.0915
  • Logits/chosen: 0.0487

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Logits/chosen Logits/rejected Logps/chosen Logps/rejected Validation Loss Rewards/accuracies Rewards/chosen Rewards/margins Rewards/rejected
0.066 0.03 100 0.9712 1.0637 -277.5782 -243.8812 0.0537 0.4725 -0.0000 0.0000 -0.0001
0.0611 0.05 200 0.9716 1.0643 -277.2496 -243.9921 0.0535 0.5780 0.0003 0.0005 -0.0002
0.0609 0.08 300 0.9689 1.0636 -276.0178 -244.3336 0.0529 0.6165 0.0015 0.0020 -0.0005
0.0513 0.1 400 0.8601 0.9583 -280.6138 -253.2858 0.0511 0.6150 -0.0031 0.0064 -0.0095
0.0501 0.13 500 0.4970 0.5770 -306.8101 -289.3190 0.0475 0.6050 -0.0293 0.0162 -0.0455
0.0508 0.16 600 0.2749 0.3282 -321.4783 -312.9566 0.0449 0.6055 -0.0439 0.0252 -0.0691
0.0421 0.18 700 0.2708 0.3240 -327.6276 -322.8759 0.0437 0.6055 -0.0501 0.0290 -0.0791
0.0437 0.21 800 0.3236 0.3805 -324.3805 -318.0196 0.0428 0.6005 -0.0468 0.0274 -0.0742
0.0387 0.24 900 0.1997 0.2503 -337.8515 -341.3827 0.0423 0.6055 -0.0603 0.0373 -0.0976
0.0469 0.26 1000 0.2683 0.3303 -319.0327 -318.2856 0.0410 0.6120 -0.0415 0.0330 -0.0745
0.0405 0.29 1100 0.3022 0.3569 -337.9239 -339.1555 0.0413 0.6065 -0.0604 0.0350 -0.0953
0.0532 0.31 1200 0.1261 0.1742 -339.1231 -347.9869 0.0414 0.6150 -0.0616 0.0426 -0.1042
0.0421 0.34 1300 0.2688 0.3279 -313.6982 -311.5635 0.0401 0.6240 -0.0362 0.0316 -0.0677
0.0454 0.37 1400 0.2034 0.2565 -344.0237 -346.2302 0.0401 0.6130 -0.0665 0.0359 -0.1024
0.03 0.39 1500 0.1958 0.2512 -358.4021 -367.0958 0.0394 0.6185 -0.0809 0.0424 -0.1233
0.0455 0.42 1600 0.2802 0.3432 -330.3630 -330.2539 0.0390 0.6220 -0.0528 0.0336 -0.0864
0.0444 0.44 1700 0.1433 0.1956 -335.1629 -339.5015 0.0383 0.6215 -0.0576 0.0381 -0.0957
0.0411 0.47 1800 0.0721 0.1143 -363.9651 -373.5191 0.0391 0.6165 -0.0864 0.0433 -0.1297
0.0486 0.5 1900 0.1298 0.1764 -356.7109 -364.1853 0.0382 0.6260 -0.0792 0.0412 -0.1204
0.0378 0.52 2000 0.0808 0.1294 -341.7246 -345.1359 0.0378 0.6290 -0.0642 0.0371 -0.1013
0.0316 0.55 2100 0.0245 0.0687 -354.5952 -362.2671 0.0375 0.6275 -0.0770 0.0414 -0.1185
0.0375 0.58 2200 0.0007 0.0391 -360.0626 -368.8188 0.0376 0.6280 -0.0825 0.0425 -0.1250
0.0344 0.6 2300 0.0376 -0.0705 -0.1082 0.6315 0.0377 -351.9891 -348.0063 0.1002 0.0554
0.0393 0.63 2400 0.0374 -0.0839 -0.1244 0.6330 0.0404 -368.2057 -361.4958 0.0124 -0.0271
0.0501 0.65 2500 0.0373 -0.0970 -0.1420 0.6265 0.0450 -385.8456 -374.5688 0.0053 -0.0307
0.03 0.68 2600 0.0372 -0.0948 -0.1408 0.6280 0.0460 -384.5748 -372.3464 0.0325 -0.0064
0.0445 0.71 2700 0.0372 -0.0927 -0.1378 0.6255 0.0450 -381.6031 -370.2887 0.0394 -0.0008
0.0359 0.73 2800 0.0369 -0.0822 -0.1244 0.6375 0.0422 -368.1677 -359.7133 0.0926 0.0476
0.0454 0.76 2900 0.0368 -0.0861 -0.1308 0.6340 0.0447 -374.6195 -363.6591 0.0788 0.0362
0.0422 0.79 3000 0.0368 -0.0872 -0.1317 0.6350 0.0445 -375.5086 -364.7430 0.0778 0.0354
0.0401 0.81 3100 0.0368 -0.0844 -0.1284 0.6350 0.0440 -372.1985 -361.9238 0.0778 0.0345
0.0455 0.84 3200 0.0368 -0.0842 -0.1275 0.6335 0.0434 -371.3240 -361.7043 0.0871 0.0436
0.0537 0.86 3300 0.0368 -0.0820 -0.1248 0.6350 0.0428 -368.5755 -359.5146 0.0936 0.0492
0.0415 0.89 3400 0.0367 -0.0845 -0.1281 0.6365 0.0436 -371.9387 -362.0815 0.0925 0.0492
0.0399 0.92 3500 0.0367 -0.0853 -0.1290 0.6325 0.0437 -372.8227 -362.8265 0.0937 0.0507
0.0386 0.94 3600 0.0367 -0.0855 -0.1294 0.6330 0.0438 -373.1803 -363.0746 0.0909 0.0479
0.0372 0.97 3700 0.0367 -0.0859 -0.1297 0.6375 0.0438 -373.5262 -363.4134 0.0910 0.0480
0.033 0.99 3800 0.0367 -0.0858 -0.1297 0.6325 0.0439 -373.5426 -363.3738 0.0911 0.0481

Framework versions

  • PEFT 0.7.1
  • Transformers 4.36.2
  • Pytorch 2.1.2
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
14
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for DUAL-GPO/phi-2-gpo-renew2-b0.001-i0

Base model

microsoft/phi-2
Adapter
(822)
this model

Dataset used to train DUAL-GPO/phi-2-gpo-renew2-b0.001-i0