Edit model card

Visualize in Weights & Biases

pythia-410m-deduped

This model is a fine-tuned version of EleutherAI/pythia-410m-deduped on the princeton-nlp/llama3-ultrafeedback dataset. It achieves the following results on the evaluation set:

  • Loss: 1.6928
  • Original Losses: 1.7344
  • Weight: 1.0
  • Abs Diff: 0.3008
  • Rewards/chosen: -5.4375
  • Rewards/rejected: -5.4688
  • Rewards/accuracies: 0.4758
  • Rewards/margins: 0.0228
  • Logps/rejected: -2.1875
  • Logps/chosen: -2.1719
  • Logits/rejected: 5.7188
  • Logits/chosen: 5.7188
  • All Logps 1: -811.2697
  • All Logps 1 Values: -811.2697
  • All Logps 2: 447.4254
  • All Logps 2 Values: 447.4254

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 128
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Original Losses Weight Abs Diff Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen All Logps 1 All Logps 1 Values All Logps 2 All Logps 2 Values
1.9439 0.0427 20 1.7861 1.8125 1.0 0.3574 -4.9688 -5.0 0.4556 0.0187 -1.9922 -1.9844 5.1875 5.2188 -694.3344 -694.3344 447.4254 447.4254
1.8637 0.0855 40 1.7850 1.8125 1.0 0.3574 -4.9688 -4.9688 0.4637 0.0112 -1.9922 -1.9844 5.1875 5.25 -694.3014 -694.3014 447.4254 447.4254
1.8856 0.1282 60 1.7741 1.8125 1.0 0.3496 -4.9375 -4.9375 0.4435 -0.0004 -1.9766 -1.9766 5.2188 5.25 -695.6515 -695.6515 447.4254 447.4254
1.8193 0.1710 80 1.7628 1.8047 1.0 0.3477 -4.9375 -4.9375 0.4637 0.0016 -1.9844 -1.9766 5.3125 5.3438 -699.6716 -699.6716 447.4254 447.4254
1.8542 0.2137 100 1.7501 1.7891 1.0 0.3340 -4.9375 -4.9688 0.4758 0.0138 -1.9844 -1.9766 5.4062 5.4375 -707.3261 -707.3261 447.4254 447.4254
1.7907 0.2565 120 1.7458 1.7891 1.0 0.3301 -5.0 -4.9688 0.4315 -0.0052 -1.9922 -1.9922 5.4688 5.5 -714.8251 -714.8251 447.4254 447.4254
1.8332 0.2992 140 1.7375 1.7969 1.0 0.3281 -5.0312 -5.0 0.4637 -0.0200 -2.0 -2.0156 5.5312 5.5625 -723.8403 -723.8403 447.4254 447.4254
1.7599 0.3420 160 1.7328 1.7969 1.0 0.3301 -5.0938 -5.0625 0.4355 -0.0156 -2.0312 -2.0312 5.5625 5.5938 -734.5149 -734.5149 447.4254 447.4254
1.8462 0.3847 180 1.7246 1.7734 1.0 0.3184 -5.125 -5.125 0.4516 -0.0015 -2.0469 -2.0469 5.5625 5.5938 -745.0103 -745.0103 447.4254 447.4254
1.8253 0.4275 200 1.7154 1.7656 1.0 0.3145 -5.1562 -5.1875 0.4476 0.0043 -2.0625 -2.0625 5.5625 5.5938 -755.3181 -755.3181 447.4254 447.4254
1.8056 0.4702 220 1.7119 1.7734 1.0 0.3203 -5.2188 -5.2188 0.4476 0.0032 -2.0938 -2.0938 5.5938 5.625 -762.7902 -762.7902 447.4254 447.4254
1.7958 0.5130 240 1.7096 1.7734 1.0 0.3164 -5.25 -5.25 0.4556 -0.0002 -2.1094 -2.1094 5.5938 5.625 -770.9695 -770.9695 447.4254 447.4254
1.7141 0.5557 260 1.7073 1.7578 1.0 0.3086 -5.2812 -5.2812 0.4355 0.0052 -2.1094 -2.1094 5.625 5.625 -775.2407 -775.2407 447.4254 447.4254
1.7021 0.5985 280 1.7085 1.7656 1.0 0.3125 -5.2812 -5.2812 0.4597 -0.0014 -2.1094 -2.1094 5.625 5.6562 -778.4560 -778.4560 447.4254 447.4254
1.7788 0.6412 300 1.7020 1.7578 1.0 0.3066 -5.3125 -5.3125 0.4677 0.0104 -2.125 -2.125 5.6562 5.6875 -784.0049 -784.0049 447.4254 447.4254
1.679 0.6839 320 1.7053 1.7578 1.0 0.3105 -5.3438 -5.3438 0.4476 0.0002 -2.1406 -2.1406 5.6562 5.6875 -791.0703 -791.0703 447.4254 447.4254
1.751 0.7267 340 1.7006 1.7578 1.0 0.3105 -5.375 -5.4062 0.4919 0.0085 -2.1562 -2.1562 5.6562 5.6875 -797.0882 -797.0882 447.4254 447.4254
1.7191 0.7694 360 1.6990 1.7656 1.0 0.3086 -5.4375 -5.4062 0.4476 -0.0044 -2.1719 -2.1719 5.6875 5.6875 -803.0909 -803.0909 447.4254 447.4254
1.7226 0.8122 380 1.6993 1.7578 1.0 0.3086 -5.4375 -5.4375 0.4758 0.0093 -2.1719 -2.1719 5.6875 5.7188 -806.9357 -806.9357 447.4254 447.4254
1.7198 0.8549 400 1.6968 1.7578 1.0 0.3066 -5.4688 -5.4688 0.4556 0.0020 -2.1875 -2.1875 5.6875 5.7188 -810.5368 -810.5368 447.4254 447.4254
1.7057 0.8977 420 1.6963 1.75 1.0 0.3047 -5.4688 -5.4688 0.4718 0.0151 -2.1875 -2.1875 5.6875 5.7188 -811.7772 -811.7772 447.4254 447.4254
1.75 0.9404 440 1.6973 1.7578 1.0 0.3086 -5.4688 -5.4688 0.4677 0.0077 -2.1875 -2.1875 5.6875 5.7188 -811.8970 -811.8970 447.4254 447.4254
1.6912 0.9832 460 1.6928 1.7344 1.0 0.3008 -5.4375 -5.4688 0.4758 0.0228 -2.1875 -2.1719 5.7188 5.7188 -811.2697 -811.2697 447.4254 447.4254

Framework versions

  • Transformers 4.42.3
  • Pytorch 2.2.2+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
405M params
Tensor type
BF16
·
Inference Examples
Inference API (serverless) is not available, repository is disabled.

Model tree for RAY2L/pythia-410m-deduped-SimPOW-1

Finetuned
this model

Dataset used to train RAY2L/pythia-410m-deduped-SimPOW-1