Name		Name	Last commit message	Last commit date
parent directory ..
best_of_n_sampling		best_of_n_sampling
multi-adapter-rl		multi-adapter-rl
sentiment		sentiment
stack_llama/scripts		stack_llama/scripts
summarization		summarization
toxicity		toxicity
README.md		README.md
hello_world.py		hello_world.py

README.md

Examples

The best place to learn about examples in TRL is our docs page!

Installation

pip install trl
#optional: wandb
pip install wandb

Note: if you don't want to log with wandb remove log_with="wandb" in the scripts/notebooks. You can also replace it with your favourite experiment tracker that's supported by accelerate.

Accelerate Config

For all the examples, you'll need to generate an Accelerate config with:

accelerate config # will prompt you to define the training configuration

Then, it is encouraged to launch jobs with accelerate launch!

Categories

The examples are currently split over the following categories:

1: Sentiment: Fine-tune a model with a sentiment classification model. 2: StackOverflow: Perform the full RLHF process (fine-tuning, reward model training, and RLHF) on StackOverflow data. 3: summarization: Recreate OpenAI's Learning to Summarize paper. 4: toxicity: Fine-tune a model to reduce the toxicity of its generations. write about best-of-n as an alternative rlhf 5: best-of-n sampling: Comparative demonstration of best-of-n sampling as a simpler (but relatively expensive) alternative to RLHF

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples

examples

README.md

Examples

Installation

Accelerate Config

Categories

Files

examples

Directory actions

More options

Directory actions

More options

Latest commit

History

examples

Folders and files

parent directory

README.md

Examples

Installation

Accelerate Config

Categories