Experimenting with different training objectives for an AI evaluator

Community Article Published October 31, 2024

Lots of research has been published around LLM-as-a-judge as it's becoming a popular approach to evaluate cheap + fast. A pretty cool paper that recently came out was from the Salesforce AI Research team ; tldr: they found preference optimisation techniques like DPO and RPO could yield better results than supervised fine-tuning (SFT) alone as a training objective for LLM-as-a-judge models. Our team wanted to test this hypothesis as it it's not yet clear which training objective performs best for aligning eval models...

Our experiments

We trained a Llama-3.1-70B-Instruct with SFT and compared it to base Llama-3.1-70B-Instruct on core benchmarks to see how SFT fares alone.

We also trained a Llama-3.1-8B-Instruct model on two training datasets with

Purely SFT
DPO
RPO (compound loss objective incorporates both SFT and DPO)

and compared their performance against the base model across four core benchmarks covering both Pairwise Preference and Direct Scoring.

Here's a summary of our key findings:

SFT (Atla Caprioska 70B) showed improvements on in-distribution tasks whereas quality dropped on out-of-distribution tasks, underperforming base Llama-70B on aggregate metrics

DPO performed best on PreferenceCollection with 98.89% accuracy
RPO performed best on RewardBench with 81.96% accuracy
RPO outperformed both SFT and DPO on UltraFeedback (No CoT), with a score of 0.57
RPO achieved the highest average Pearson correlation on evaluation scores (0.49), compared to SFT (0.43) and DPO (0.43)

If you want the experiment details, here's our blog post - with extra information on why we think this works. We're working on scaling this up and seeing how far we can push this thing :)

Open questions for you all

Will this trend hold for larger models?
What kind of data might be particularly useful for training an LLM-as-a-judge?

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote