End of training

Browse files

Files changed (4) hide show

README.md +14 -14
model.safetensors +1 -1
runs/Oct22_06-57-56_225084e56351/events.out.tfevents.1729580289.225084e56351.24.0 +3 -0
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
-base_model: HuggingFaceTB/SmolLM-135M-Instruct
 license: apache-2.0
 tags:
 - trl
 - orpo
@@ -17,18 +17,18 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [HuggingFaceTB/SmolLM-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct) on the HuggingFaceH4/ultrafeedback_binarized dataset.
 It achieves the following results on the evaluation set:
-- Loss: 1.1419
-- Rewards/chosen: -0.1302
-- Rewards/rejected: -0.1303
-- Rewards/accuracies: 0.4700
-- Rewards/margins: 0.0001
-- Logps/rejected: -1.3026
-- Logps/chosen: -1.3019
-- Logits/rejected: 27.7330
-- Logits/chosen: 27.3925
-- Nll Loss: 1.0665
-- Log Odds Ratio: -0.7538
-- Log Odds Chosen: 0.0140
 ## Model description
@@ -62,7 +62,7 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Nll Loss | Log Odds Ratio | Log Odds Chosen |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:--------:|:--------------:|:---------------:|
-| 1.3553        | 0.8   | 100  | 1.1419          | -0.1302        | -0.1303          | 0.4700             | 0.0001          | -1.3026        | -1.3019      | 27.7330         | 27.3925       | 1.0665   | -0.7538        | 0.0140          |
 ### Framework versions

 ---
 license: apache-2.0
+base_model: HuggingFaceTB/SmolLM-135M-Instruct
 tags:
 - trl
 - orpo
 This model is a fine-tuned version of [HuggingFaceTB/SmolLM-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct) on the HuggingFaceH4/ultrafeedback_binarized dataset.
 It achieves the following results on the evaluation set:
+- Loss: 1.1429
+- Rewards/chosen: -0.1303
+- Rewards/rejected: -0.1304
+- Rewards/accuracies: 0.4670
+- Rewards/margins: 0.0000
+- Logps/rejected: -1.3036
+- Logps/chosen: -1.3032
+- Logits/rejected: 27.7664
+- Logits/chosen: 27.4331
+- Nll Loss: 1.0675
+- Log Odds Ratio: -0.7542
+- Log Odds Chosen: 0.0132
 ## Model description
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Nll Loss | Log Odds Ratio | Log Odds Chosen |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:--------:|:--------------:|:---------------:|
+| 1.3569        | 0.8   | 100  | 1.1429          | -0.1303        | -0.1304          | 0.4670             | 0.0000          | -1.3036        | -1.3032      | 27.7664         | 27.4331       | 1.0675   | -0.7542        | 0.0132          |
 ### Framework versions

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:fe61099913e4c4b899753f4423418f9313d933042995693bb102066bf3c08903
 size 269060280

 version https://git-lfs.github.com/spec/v1
+oid sha256:d28f27bd149ff10f54d84f23b8082ea3af7dea6883a4960604b05434de3d1ad5
 size 269060280

runs/Oct22_06-57-56_225084e56351/events.out.tfevents.1729580289.225084e56351.24.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e39d42b5a06058c2bdad05a483c081f5a60ff62e7bd779241a8342683e81839c
+size 7191

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:63219e39fb21b47efab9f4f6e7a3674fd413e8dbc80e33ff1cbe55c21a748044
 size 5304

 version https://git-lfs.github.com/spec/v1
+oid sha256:b1f1dfcf119634759017c77ba2e3ccd542e10cb200f8bb98deef083bae98401c
 size 5304