Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Paper • 1910.10683 • Published Oct 23, 2019 • 10
Solving math word problems with process- and outcome-based feedback Paper • 2211.14275 • Published Nov 25, 2022 • 7
Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation Paper • 2401.08417 • Published Jan 16 • 34
PaliGemma Release Collection Pretrained and mix checkpoints for PaliGemma • 16 items • Updated 11 days ago • 142
Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF Paper • 2405.21046 • Published May 31 • 4
ORPO: Monolithic Preference Optimization without Reference Model Paper • 2403.07691 • Published Mar 12 • 63
Binary Classifier Optimization for Large Language Model Alignment Paper • 2404.04656 • Published Apr 6 • 2
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22 • 124
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs Paper • 2402.14740 • Published Feb 22 • 11
Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking Paper • 2312.09244 • Published Dec 14, 2023 • 7
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment Paper • 2408.06266 • Published Aug 12 • 9
A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA Paper • 2312.03732 • Published Nov 28, 2023 • 7
Understanding Reference Policies in Direct Preference Optimization Paper • 2407.13709 • Published Jul 18 • 16