SameedHussain's picture
Update README.md
8d46638 verified
metadata
base_model: unsloth/gemma-2-2b-it-bnb-4bit
language:
  - en
license: apache-2.0
tags:
  - text-generation-inference
  - transformers
  - unsloth
  - gemma2
  - trl
  - dpo

Uploaded model

  • Developed by: SameedHussain
  • License: apache-2.0
  • Finetuned from model : unsloth/gemma-2-2b-it-bnb-4bit

This gemma2 model was trained 2x faster with Unsloth and Huggingface's TRL library.

Step Training Loss Rewards / Chosen Rewards / Rejected Rewards / Accuracies Rewards / Margins Logps / Rejected Logps / Chosen Logits / Rejected Logits / Chosen
100 0.454700 6.241566 3.175092 0.750000 3.066474 -102.758446 -53.181263 -14.580903 -14.938275
200 0.264100 6.640531 2.823826 0.888750 3.816705 -110.525520 -50.815018 -14.796252 -15.198202
300 0.110200 6.310797 1.718347 0.985000 4.592450 -118.720840 -48.524315 -15.263680 -15.698647
400 0.046900 6.744057 0.677384 0.997500 6.066672 -128.757660 -48.107479 -15.710546 -16.174524
500 0.019700 6.714230 -0.529035 1.000000 7.243264 -143.408020 -49.327625 -16.120342 -16.611662
600 0.013700 6.605389 -1.275738 1.000000 7.881127 -146.968491 -48.847641 -16.320650 -16.836390
700 0.007900 6.333577 -2.010140 1.000000 8.343716 -154.255066 -50.590134 -16.486574 -16.987421
800 0.006300 6.489099 -2.076626 1.000000 8.565723 -150.381393 -49.992256 -16.614525 -17.117744
900 0.005100 6.429256 -2.340122 1.000000 8.769380 -160.874405 -51.164425 -16.687891 -17.165791
1000 0.004700 6.494193 -2.520164 1.000000 9.014358 -163.852982 -54.317467 -16.757954 -17.206339
1100 0.005900 6.287598 -2.524287 1.000000 8.811884 -161.473770 -52.012741 -16.825716 -17.266563
1200 0.005200 6.246828 -3.126722 0.998750 9.373549 -167.766861 -52.052780 -16.795412 -17.277397
1300 0.004300 6.347938 -2.930621 1.000000 9.278559 -165.971939 -50.738480 -16.836918 -17.304783
1400 0.003900 6.232501 -3.073614 1.000000 9.306114 -165.787643 -50.953049 -16.813383 -17.290031