micost commited on
Commit
87a68cd
·
verified ·
1 Parent(s): b1fb271

End of training

Browse files
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
- base_model: HuggingFaceTB/SmolLM-135M-Instruct
3
  license: apache-2.0
 
4
  tags:
5
  - trl
6
  - orpo
@@ -17,18 +17,18 @@ should probably proofread and complete it, then remove this comment. -->
17
 
18
  This model is a fine-tuned version of [HuggingFaceTB/SmolLM-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct) on the HuggingFaceH4/ultrafeedback_binarized dataset.
19
  It achieves the following results on the evaluation set:
20
- - Loss: 1.1419
21
- - Rewards/chosen: -0.1302
22
- - Rewards/rejected: -0.1303
23
- - Rewards/accuracies: 0.4700
24
- - Rewards/margins: 0.0001
25
- - Logps/rejected: -1.3026
26
- - Logps/chosen: -1.3019
27
- - Logits/rejected: 27.7330
28
- - Logits/chosen: 27.3925
29
- - Nll Loss: 1.0665
30
- - Log Odds Ratio: -0.7538
31
- - Log Odds Chosen: 0.0140
32
 
33
  ## Model description
34
 
@@ -62,7 +62,7 @@ The following hyperparameters were used during training:
62
 
63
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Nll Loss | Log Odds Ratio | Log Odds Chosen |
64
  |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:--------:|:--------------:|:---------------:|
65
- | 1.3553 | 0.8 | 100 | 1.1419 | -0.1302 | -0.1303 | 0.4700 | 0.0001 | -1.3026 | -1.3019 | 27.7330 | 27.3925 | 1.0665 | -0.7538 | 0.0140 |
66
 
67
 
68
  ### Framework versions
 
1
  ---
 
2
  license: apache-2.0
3
+ base_model: HuggingFaceTB/SmolLM-135M-Instruct
4
  tags:
5
  - trl
6
  - orpo
 
17
 
18
  This model is a fine-tuned version of [HuggingFaceTB/SmolLM-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct) on the HuggingFaceH4/ultrafeedback_binarized dataset.
19
  It achieves the following results on the evaluation set:
20
+ - Loss: 1.1429
21
+ - Rewards/chosen: -0.1303
22
+ - Rewards/rejected: -0.1304
23
+ - Rewards/accuracies: 0.4670
24
+ - Rewards/margins: 0.0000
25
+ - Logps/rejected: -1.3036
26
+ - Logps/chosen: -1.3032
27
+ - Logits/rejected: 27.7664
28
+ - Logits/chosen: 27.4331
29
+ - Nll Loss: 1.0675
30
+ - Log Odds Ratio: -0.7542
31
+ - Log Odds Chosen: 0.0132
32
 
33
  ## Model description
34
 
 
62
 
63
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Nll Loss | Log Odds Ratio | Log Odds Chosen |
64
  |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:--------:|:--------------:|:---------------:|
65
+ | 1.3569 | 0.8 | 100 | 1.1429 | -0.1303 | -0.1304 | 0.4670 | 0.0000 | -1.3036 | -1.3032 | 27.7664 | 27.4331 | 1.0675 | -0.7542 | 0.0132 |
66
 
67
 
68
  ### Framework versions
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:fe61099913e4c4b899753f4423418f9313d933042995693bb102066bf3c08903
3
  size 269060280
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d28f27bd149ff10f54d84f23b8082ea3af7dea6883a4960604b05434de3d1ad5
3
  size 269060280
runs/Oct22_06-57-56_225084e56351/events.out.tfevents.1729580289.225084e56351.24.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e39d42b5a06058c2bdad05a483c081f5a60ff62e7bd779241a8342683e81839c
3
+ size 7191
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:63219e39fb21b47efab9f4f6e7a3674fd413e8dbc80e33ff1cbe55c21a748044
3
  size 5304
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b1f1dfcf119634759017c77ba2e3ccd542e10cb200f8bb98deef083bae98401c
3
  size 5304