Edit model card

phi-2-apo

This model is a fine-tuned version of rasyosef/phi-2-sft-openhermes-128k-v2-merged on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3695
  • Rewards/chosen: 1.5931
  • Rewards/rejected: -3.0842
  • Rewards/accuracies: 0.9350
  • Rewards/margins: 4.6772
  • Logps/rejected: -173.1941
  • Logps/chosen: -253.6105
  • Logits/rejected: -0.4322
  • Logits/chosen: 0.1424

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • num_epochs: 2
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.3669 0.2041 250 0.3828 1.5010 -2.9712 0.9450 4.4722 -172.0644 -254.5310 -0.4930 0.0860
0.3514 0.4082 500 0.3786 1.5375 -2.9788 0.9400 4.5163 -172.1404 -254.1665 -0.4834 0.0968
0.3539 0.6122 750 0.3756 1.5549 -3.0097 0.9400 4.5647 -172.4500 -253.9920 -0.4690 0.1096
0.3562 0.8163 1000 0.3736 1.5759 -3.0081 0.9450 4.5840 -172.4332 -253.7824 -0.4558 0.1220
0.3437 1.0204 1250 0.3720 1.5665 -3.0805 0.9350 4.6470 -173.1577 -253.8766 -0.4445 0.1325
0.3503 1.2245 1500 0.3710 1.5889 -3.0515 0.9400 4.6404 -172.8680 -253.6525 -0.4406 0.1347
0.3427 1.4286 1750 0.3697 1.5903 -3.0719 0.9450 4.6622 -173.0719 -253.6384 -0.4355 0.1387
0.3353 1.6327 2000 0.3699 1.5881 -3.0875 0.9400 4.6756 -173.2272 -253.6602 -0.4333 0.1412
0.3441 1.8367 2250 0.3695 1.5931 -3.0842 0.9350 4.6772 -173.1941 -253.6105 -0.4322 0.1424

Framework versions

  • PEFT 0.12.0
  • Transformers 4.44.2
  • Pytorch 2.4.1+cu121
  • Datasets 3.0.0
  • Tokenizers 0.19.1
Downloads last month
71
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for rasyosef/phi-2-apo

Adapter
this model