Edit model card

Fine-tuning

  • this model was trained to classify whether input text comes from "chosen sentence" or "rejected sentence"
  • the probability (logits after passing softmax function) in last layer of this model can be used to quantify the preference from user input
  • fine-tuned studio-ousia/mluke-large-lite via full parameter tuning using open-preference-v0.3
  • trained on bf16 format
  • Label 0 stands for rejected sentence
  • Label 1 stands for chosen sentence
  • Note that this model can handle only 512 tokens in maximum
    • The limitation arises from Luke-based pre-trained model

Metric

  • train and validation split
train loss eval loss accuracy recall precision f1-score
0.114 0.1615 0.9399 0.9459 0.9346 0.9402
  • test split
accuracy recall precision f1-score
0.9416 0.9319 0.9504 0.9411
  • confusion matrix when test split

image/png

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Accuracy Precision Recall F1
0.4109 1.0 1479 0.2462 0.9003 0.8710 0.9399 0.9041
0.1579 2.0 2958 0.1573 0.9399 0.9495 0.9293 0.9393
0.114 3.0 4437 0.1615 0.9399 0.9346 0.9460 0.9403

Framework versions

  • Transformers 4.42.3
  • Pytorch 2.1.0+cu118
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
9
Safetensors
Model size
561M params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for ryota39/mluke-large-lite-reward

Finetuned
(1)
this model

Collection including ryota39/mluke-large-lite-reward