ryota39/mluke-large-lite-reward

Fine-tuning

this model was trained to classify whether input text comes from "chosen sentence" or "rejected sentence"
the probability (logits after passing softmax function) in last layer of this model can be used to quantify the preference from user input
fine-tuned studio-ousia/mluke-large-lite via full parameter tuning using open-preference-v0.3
trained on bf16 format
Label 0 stands for rejected sentence
Label 1 stands for chosen sentence
Note that this model can handle only 512 tokens in maximum
- The limitation arises from Luke-based pre-trained model

train loss	eval loss	accuracy	recall	precision	f1-score
0.114	0.1615	0.9399	0.9459	0.9346	0.9402

accuracy	recall	precision	f1-score
0.9416	0.9319	0.9504	0.9411

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Accuracy	Precision	Recall	F1
0.4109	1.0	1479	0.2462	0.9003	0.8710	0.9399	0.9041
0.1579	2.0	2958	0.1573	0.9399	0.9495	0.9293	0.9393
0.114	3.0	4437	0.1615	0.9399	0.9346	0.9460	0.9403