Fine-tuning
- this model was trained to classify whether input text comes from "chosen sentence" or "rejected sentence"
- the probability (logits after passing softmax function) in last layer of this model can be used to quantify the preference from user input
- fine-tuned studio-ousia/mluke-large-lite via full parameter tuning using open-preference-v0.3
- trained on bf16 format
- Label 0 stands for rejected sentence
- Label 1 stands for chosen sentence
- Note that this model can handle only 512 tokens in maximum
- The limitation arises from Luke-based pre-trained model
Metric
- train and validation split
train loss |
eval loss |
accuracy |
recall |
precision |
f1-score |
0.114 |
0.1615 |
0.9399 |
0.9459 |
0.9346 |
0.9402 |
accuracy |
recall |
precision |
f1-score |
0.9416 |
0.9319 |
0.9504 |
0.9411 |
- confusion matrix when test split
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 3
Training results
Training Loss |
Epoch |
Step |
Validation Loss |
Accuracy |
Precision |
Recall |
F1 |
0.4109 |
1.0 |
1479 |
0.2462 |
0.9003 |
0.8710 |
0.9399 |
0.9041 |
0.1579 |
2.0 |
2958 |
0.1573 |
0.9399 |
0.9495 |
0.9293 |
0.9393 |
0.114 |
3.0 |
4437 |
0.1615 |
0.9399 |
0.9346 |
0.9460 |
0.9403 |
Framework versions
- Transformers 4.42.3
- Pytorch 2.1.0+cu118
- Datasets 2.20.0
- Tokenizers 0.19.1