File size: 6,290 Bytes
ae33c7e 4ebf15e c1bd7b1 ae33c7e 4ebf15e ae5ad0d ad7a3fc ae5ad0d ad7a3fc ae5ad0d 61ce6b5 ae5ad0d ad7a3fc ae5ad0d 0f685c5 ae5ad0d 2e8e831 ae5ad0d eddac7a ae5ad0d 0f685c5 ae5ad0d 0f685c5 ae5ad0d 0f685c5 ae5ad0d ad7a3fc ae5ad0d c1bd7b1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 |
---
license: mit
datasets:
- Anthropic/hh-rlhf
language:
- en
library_name: keras
tags:
- evaluations
pipeline_tag: text-classification
---
# PreferED: Preference Evaluation DeBERTa Model
## Model Description
PreferED is a 400M parameter preference evaluation model based on the DeBERTa architecture, designed for evaluating LLM apps.
The model is trained to take in context and text data and output a logit score, which can be used to compare
different text generations on evaluative aspects such as hallucinations, quality etc. The `context` variable
can be used to provide evaluation criteria in addition to any relevant retreived context. The `gen_text` variable
provides the actual text that is being evaluated.
- **Model name**: PreferED
- **Model type**: DeBERTa
- **Training data**: This model was trained on [Anthropic HH/RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf) using a [Deberta-v3-large](https://huggingface.co/microsoft/deberta-v3-large) base model
- **Evaluation data**: Achieves 69.7% accuracy on the Anthropic hh-rlhf test split.
## Usage
### Loading the Model
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("samagra14wefi/PreferED")
model = AutoModelForSequenceClassification.from_pretrained("samagra14wefi/PreferED")
device = "cuda:0" if torch.cuda.is_available() else "cpu"
model = model.to(device)
```
### Measuring hallucinations
Use the `context` variable to give the retreived context.
```python
def calc_score(context, gen_text):
with torch.no_grad():
inputs = tokenizer(context, gen_text, return_tensors='pt')
logits = model(**inputs).logits
score = logits[0].cpu().detach()
return score
context_string = '''India won the world cup in 1983 and 2011. Australia won the world cup five times.
West Indies have won the world cup twice. Sri Lanka, Pakistan and England have won the world cup once.
Evaluate if the facts below are consistent with the statement.'''
response_string_wrong = '''India has won the world cup most number of times.'''
response_string_correct = '''Australia has won the world cup most number of times.'''
score_wrong = calc_score(context_string, response_string_wrong)
score_correct = calc_score(context_string, response_string_correct)
print(score_correct > score_wrong)
```
### Evaluating Response relevance
```python
inquiry = "What is your return policy?"
response_good = "Our return policy lasts 30 days. If 30 days have gone by since your purchase,
unfortunately, we can’t offer you a refund or exchange."
response_bad = "We offer a variety of fresh produce including apples, oranges, and bananas."
score_good = calc_score(inquiry, response_good)
score_bad = calc_score(inquiry, response_bad)
print(score_good > score_bad)
```
### Evaluating Content Appropriateness
```python
context = "Discussing the political scenario in Country X."
response_clean = "The political scenario in Country X is quite dynamic with multiple parties vying for power."
response_offensive = "The politicians in Country X are all corrupt and stupid."
score_clean = calc_score(context, response_clean)
score_offensive = calc_score(context, response_offensive)
print(score_clean > score_offensive)
```
### Comparing Different Language Models
```python
context = "Explain the process of photosynthesis."
response_gpt3 = "Photosynthesis is the process by which green plants and some other organisms use sunlight to synthesize foods with the help of chlorophyll pigments."
response_bert = "Photosynthesis is a method that converts carbon dioxide into organic compounds, especially sugars, in the presence of sunlight."
score_gpt3 = calc_score(context, response_gpt3)
score_bert = calc_score(context, response_bert)
print(score_gpt3 > score_bert)
```
## Finetuning on your production data
The PreferED model is relatively lightweight compared to some other large language models, making it a good candidate for fine-tuning on specific tasks or datasets. Fine-tuning the model on your own production data can lead to better performance as it helps the model to better understand the nuances and context specific to your application.
### Preparing the Training Dataset
For fine-tuning the PreferED model on production evaluation tasks, it's crucial to structure your data correctly. The dataset should be formatted such that each example contains a shared context that provides the evaluation criteria, a text input, and a binary label indicating the preference or correctness of the text input in relation to the evaluation criteria.
Here's an example of how your data might look:
```plaintext
context,text,label
"Evaluate the accuracy of the statement based on historical facts.","The sun revolves around the Earth.",0
"Evaluate the accuracy of the statement based on historical facts.","The Earth revolves around the sun.",1
```
You can then load this data into a `Dataset` object using a library such as Hugging Face's `datasets`.
### Finetuning Example
```python
from transformers import DebertaTokenizer, DebertaForSequenceClassification, Trainer, TrainingArguments
import torch
tokenizer = DebertaTokenizer.from_pretrained("samagra14wefi/PreferED")
model = DebertaForSequenceClassification.from_pretrained("samagra14wefi/PreferED")
# Define the training arguments
training_args = TrainingArguments(
per_device_train_batch_size=8,
num_train_epochs=3,
logging_dir='./logs',
)
# Create the Trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset, # provide your training dataset
eval_dataset=eval_dataset, # provide your evaluation dataset
)
# Train the model
trainer.train()
```
### Loss Function Consideration
Anthropic recommends using the loss function L<sub>PM</sub> = log(1 + e^(r<sub>bad</sub> - r<sub>good</sub>)) for preference models. However, this PreferED model was trained using binary cross-entropy loss, and therefore changing the loss functions might increase the training time to converge. For more details on preference models and loss functions, you may refer to the paper by Askell et al., 2021: [A General Language Assistant as a Laboratory for Alignment](https://arxiv.org/abs/2112.00861). |