Skip to content

Latest commit

 

History

History

examples

Examples

The best place to learn about examples in TRL is our docs page!

Installation

pip install trl
#optional: wandb
pip install wandb

Note: if you don't want to log with wandb remove log_with="wandb" in the scripts/notebooks. You can also replace it with your favourite experiment tracker that's supported by accelerate.

Accelerate Config

For all the examples, you'll need to generate an Accelerate config with:

accelerate config # will prompt you to define the training configuration

Then, it is encouraged to launch jobs with accelerate launch!

Categories

The examples are currently split over the following categories:

1: Sentiment: Fine-tune a model with a sentiment classification model. 2: StackOverflow: Perform the full RLHF process (fine-tuning, reward model training, and RLHF) on StackOverflow data. 3: summarization: Recreate OpenAI's Learning to Summarize paper. 4: toxicity: Fine-tune a model to reduce the toxicity of its generations. write about best-of-n as an alternative rlhf 5: best-of-n sampling: Comparative demonstration of best-of-n sampling as a simpler (but relatively expensive) alternative to RLHF