data:image/s3,"s3://crabby-images/a4f48/a4f480799f17f90306119f1fe4dbbfcc1e7b1516" alt=""
TRL - Transformer Reinforcement Learning
TRL is a full stack library where we provide a set of tools to train transformer language models with Reinforcement Learning, from the Supervised Fine-tuning step (SFT), Reward Modeling step (RM) to the Proximal Policy Optimization (PPO) step. The library is integrated with 🤗 transformers.
Learn
Learn post-training with TRL and other libraries in 🤗 smol course.
API documentation
- Model Classes: A brief overview of what each public model class does.
SFTTrainer
: Supervise Fine-tune your model easily withSFTTrainer
RewardTrainer
: Train easily your reward model usingRewardTrainer
.PPOTrainer
: Further fine-tune the supervised fine-tuned model using PPO algorithm- Best-of-N Sampling: Use best of n sampling as an alternative way to sample predictions from your active model
DPOTrainer
: Direct Preference Optimization training usingDPOTrainer
.TextEnvironment
: Text environment to train your model using tools with RL.
Examples
- Sentiment Tuning: Fine tune your model to generate positive movie contents
- Training with PEFT: Memory efficient RLHF training using adapters with PEFT
- Detoxifying LLMs: Detoxify your language model through RLHF
- StackLlama: End-to-end RLHF training of a Llama model on Stack exchange dataset
- Learning with Tools: Walkthrough of using
TextEnvironments
- Multi-Adapter Training: Use a single base model and multiple adapters for memory efficient end-to-end training
Blog posts
data:image/s3,"s3://crabby-images/de224/de224df8759479b3090a89de5754d8c5ed8a1730" alt="thumbnail"
Published on July 10, 2024
Preference Optimization for Vision Language Models with TRL
data:image/s3,"s3://crabby-images/ffa3f/ffa3f90c4fa72662ca2eaba75e870a4c76d33773" alt="thumbnail"
Published on June 12, 2024
Putting RL back in RLHF
data:image/s3,"s3://crabby-images/499d6/499d617c23eb5defb36b9f1422a1011176ab21e7" alt="thumbnail"
Published on September 29, 2023
Finetune Stable Diffusion Models with DDPO via TRL
data:image/s3,"s3://crabby-images/b63af/b63af798e5719847606ab2482bad483886d6e904" alt="thumbnail"
Published on August 8, 2023
Fine-tune Llama 2 with DPO
data:image/s3,"s3://crabby-images/678d5/678d5cdfe7934fd839b49c2d854a33b042d9b1a0" alt="thumbnail"
Published on April 5, 2023
StackLLaMA: A hands-on guide to train LLaMA with RLHF
data:image/s3,"s3://crabby-images/d8e4e/d8e4e34bd2da8bf087401e82022512a56e8621a8" alt="thumbnail"
Published on March 9, 2023
Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU
data:image/s3,"s3://crabby-images/cd040/cd040ffe9ac868b8c314e60b8cf49f6844b0ad83" alt="thumbnail"
Published on December 9, 2022
Illustrating Reinforcement Learning from Human Feedback