End of training
Browse files
README.md
CHANGED
@@ -1,6 +1,6 @@
|
|
1 |
---
|
2 |
-
base_model: HuggingFaceTB/SmolLM-135M-Instruct
|
3 |
license: apache-2.0
|
|
|
4 |
tags:
|
5 |
- trl
|
6 |
- orpo
|
@@ -17,18 +17,18 @@ should probably proofread and complete it, then remove this comment. -->
|
|
17 |
|
18 |
This model is a fine-tuned version of [HuggingFaceTB/SmolLM-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct) on the HuggingFaceH4/ultrafeedback_binarized dataset.
|
19 |
It achieves the following results on the evaluation set:
|
20 |
-
- Loss: 1.
|
21 |
-
- Rewards/chosen: -0.
|
22 |
-
- Rewards/rejected: -0.
|
23 |
-
- Rewards/accuracies: 0.
|
24 |
-
- Rewards/margins: 0.
|
25 |
-
- Logps/rejected: -1.
|
26 |
-
- Logps/chosen: -1.
|
27 |
-
- Logits/rejected: 27.
|
28 |
-
- Logits/chosen: 27.
|
29 |
-
- Nll Loss: 1.
|
30 |
-
- Log Odds Ratio: -0.
|
31 |
-
- Log Odds Chosen: 0.
|
32 |
|
33 |
## Model description
|
34 |
|
@@ -62,7 +62,7 @@ The following hyperparameters were used during training:
|
|
62 |
|
63 |
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Nll Loss | Log Odds Ratio | Log Odds Chosen |
|
64 |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:--------:|:--------------:|:---------------:|
|
65 |
-
| 1.
|
66 |
|
67 |
|
68 |
### Framework versions
|
|
|
1 |
---
|
|
|
2 |
license: apache-2.0
|
3 |
+
base_model: HuggingFaceTB/SmolLM-135M-Instruct
|
4 |
tags:
|
5 |
- trl
|
6 |
- orpo
|
|
|
17 |
|
18 |
This model is a fine-tuned version of [HuggingFaceTB/SmolLM-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct) on the HuggingFaceH4/ultrafeedback_binarized dataset.
|
19 |
It achieves the following results on the evaluation set:
|
20 |
+
- Loss: 1.1429
|
21 |
+
- Rewards/chosen: -0.1303
|
22 |
+
- Rewards/rejected: -0.1304
|
23 |
+
- Rewards/accuracies: 0.4670
|
24 |
+
- Rewards/margins: 0.0000
|
25 |
+
- Logps/rejected: -1.3036
|
26 |
+
- Logps/chosen: -1.3032
|
27 |
+
- Logits/rejected: 27.7664
|
28 |
+
- Logits/chosen: 27.4331
|
29 |
+
- Nll Loss: 1.0675
|
30 |
+
- Log Odds Ratio: -0.7542
|
31 |
+
- Log Odds Chosen: 0.0132
|
32 |
|
33 |
## Model description
|
34 |
|
|
|
62 |
|
63 |
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Nll Loss | Log Odds Ratio | Log Odds Chosen |
|
64 |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:--------:|:--------------:|:---------------:|
|
65 |
+
| 1.3569 | 0.8 | 100 | 1.1429 | -0.1303 | -0.1304 | 0.4670 | 0.0000 | -1.3036 | -1.3032 | 27.7664 | 27.4331 | 1.0675 | -0.7542 | 0.0132 |
|
66 |
|
67 |
|
68 |
### Framework versions
|
model.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 269060280
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d28f27bd149ff10f54d84f23b8082ea3af7dea6883a4960604b05434de3d1ad5
|
3 |
size 269060280
|
runs/Oct22_06-57-56_225084e56351/events.out.tfevents.1729580289.225084e56351.24.0
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:e39d42b5a06058c2bdad05a483c081f5a60ff62e7bd779241a8342683e81839c
|
3 |
+
size 7191
|
training_args.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 5304
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b1f1dfcf119634759017c77ba2e3ccd542e10cb200f8bb98deef083bae98401c
|
3 |
size 5304
|