Skip to content

Tags: lzy37ld/trlx

Tags

v0.6.0

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Release: v0.6.0 (CarperAI#407)

v0.5.0

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Release: v0.5.0 (CarperAI#329)

v0.4

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Update generation utilities (CarperAI#172)

* feat(base_trainer): enable sweeping over a single `gen_kwargs` value

* refactor(base_trainer): rename relevant variables

* fix(base_trainer): initialize `gen_sweep_arg` regardless

* feat(base_trainer): change `reward_fn`'s signature to accept kwargs

* merge(base_trainer): refactor to reflect main

* feat(*_trainer): add `stop_word`

* refactor(base_trainer): remove `seq2seq` if-case

* refactor(base_trainer): clean up logging of samples

* fix(base_trainer): remove inconsistencies

* fix(ppo_orchestrator): consistent padding and gpu device

* feat(base_trainer): add `rich` as dependency

* chore(examples): update signatures

* fix(ppo_orchestrator): logprob gather indexing

* docs(trlx): update `train`'s signature

* fix(base_trainer): disable `save_best` when training with deepspeed

* merge(base): complete merge

* feat(base_trainer): rework `stop_word` -> `stop_sequences`

* docs(base_trainer): update `decode`'s signature

* chore(base_trainer): `print` -> `print_rank_0`

* feat(base_trainer): clean up table's output

* feat(base_trainer): add number of gpus to the run's name

* style(trlx): satisfy black

* style(wandb): satisfy isort

v0.3

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Restructure sweeps for reuse (CarperAI#102)

* chore(readme): update instructions

* refactor(sweep): reuse existing examples and configs

* fix(sweep): enable checkpointing for hyperband

* feat(sweep): add accelerate support

* fix(sweep): report with new params space

* feat(sweep): replace generic names

* chore(ppo_config): update better values

* chore(sweep): set max_concurrent_trials to default

* chore(examples): update the rest of examples to a new main signature

* chore(readme): update sweep instruction

* chore(sweep): add warning/confirmation check before importing

* chore(sweep): update sweep instruction

* update(config): to more stable values

v0.2

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Simplify api (CarperAI#24)

* fix(ilql): sampling on variable sized prompts & stage simplified api

* Save strategy (CarperAI#23)

* Had to add py_modules=trlx to setup.

* Added a save strategy.

* Cleaned up a few things.

* Added save_steps to ilql_config.yaml and save steps strategy to accelerate_ilql_model.py for consistency. The save_steps parameter must be set now because of how TrainConfig.from_dict operates. If not save_steps parameter is given in the configs it throws an error.

* Adding mininal changes to enable step based save strategy in configs/ppo_config.yml, trlx/data/configs.py, and trlx/model_accelerate_ppo_model.py

* Some problems crept in despite merge check. This fixes them.

* Realized I am merging into stage-api not main so fixed an issue with ilql_config.yml

* fix(ilql): eval on a set of betas & add simple timers

* fix: saving checkpoints

* refactor(ilql): subsume under base_model

* fix(ilql): mask prompts

* merge hydra

* fix(ppo): generalize and stage for api

* feat: add architext examples

* fix(ppo,ilql): ddp + accelerate

* refactor: clean pipelines

* feat: add simulacra example

* fix(ppo): single token prompting

* refactor: fully merge models

* refactor(configs): lower batch_sizes & remove dead entries

* refactor(examples): update for new api

* fix(tests,style): one way to pass tests is to change them

* fix(ppo): warnings of the most recent version of transformers

4.23.1 complains if .generate() starts with single bos token, when bos=eos=pad token

* refactor(readme): add api

* chore: add doc strings

* fix: remove dropout

* chore: keep gpt2 small in examples

* chore: revert to previous default configs

* chore(docs): rename classes, remove unused, add examples

* chore(readme): add contributing.md & deepspeed note

* style(readme): US spelling

* chore(examples): add explanations for each task