Tags: lzy37ld/trlx
Tags
Update generation utilities (CarperAI#172) * feat(base_trainer): enable sweeping over a single `gen_kwargs` value * refactor(base_trainer): rename relevant variables * fix(base_trainer): initialize `gen_sweep_arg` regardless * feat(base_trainer): change `reward_fn`'s signature to accept kwargs * merge(base_trainer): refactor to reflect main * feat(*_trainer): add `stop_word` * refactor(base_trainer): remove `seq2seq` if-case * refactor(base_trainer): clean up logging of samples * fix(base_trainer): remove inconsistencies * fix(ppo_orchestrator): consistent padding and gpu device * feat(base_trainer): add `rich` as dependency * chore(examples): update signatures * fix(ppo_orchestrator): logprob gather indexing * docs(trlx): update `train`'s signature * fix(base_trainer): disable `save_best` when training with deepspeed * merge(base): complete merge * feat(base_trainer): rework `stop_word` -> `stop_sequences` * docs(base_trainer): update `decode`'s signature * chore(base_trainer): `print` -> `print_rank_0` * feat(base_trainer): clean up table's output * feat(base_trainer): add number of gpus to the run's name * style(trlx): satisfy black * style(wandb): satisfy isort
Restructure sweeps for reuse (CarperAI#102) * chore(readme): update instructions * refactor(sweep): reuse existing examples and configs * fix(sweep): enable checkpointing for hyperband * feat(sweep): add accelerate support * fix(sweep): report with new params space * feat(sweep): replace generic names * chore(ppo_config): update better values * chore(sweep): set max_concurrent_trials to default * chore(examples): update the rest of examples to a new main signature * chore(readme): update sweep instruction * chore(sweep): add warning/confirmation check before importing * chore(sweep): update sweep instruction * update(config): to more stable values
Simplify api (CarperAI#24) * fix(ilql): sampling on variable sized prompts & stage simplified api * Save strategy (CarperAI#23) * Had to add py_modules=trlx to setup. * Added a save strategy. * Cleaned up a few things. * Added save_steps to ilql_config.yaml and save steps strategy to accelerate_ilql_model.py for consistency. The save_steps parameter must be set now because of how TrainConfig.from_dict operates. If not save_steps parameter is given in the configs it throws an error. * Adding mininal changes to enable step based save strategy in configs/ppo_config.yml, trlx/data/configs.py, and trlx/model_accelerate_ppo_model.py * Some problems crept in despite merge check. This fixes them. * Realized I am merging into stage-api not main so fixed an issue with ilql_config.yml * fix(ilql): eval on a set of betas & add simple timers * fix: saving checkpoints * refactor(ilql): subsume under base_model * fix(ilql): mask prompts * merge hydra * fix(ppo): generalize and stage for api * feat: add architext examples * fix(ppo,ilql): ddp + accelerate * refactor: clean pipelines * feat: add simulacra example * fix(ppo): single token prompting * refactor: fully merge models * refactor(configs): lower batch_sizes & remove dead entries * refactor(examples): update for new api * fix(tests,style): one way to pass tests is to change them * fix(ppo): warnings of the most recent version of transformers 4.23.1 complains if .generate() starts with single bos token, when bos=eos=pad token * refactor(readme): add api * chore: add doc strings * fix: remove dropout * chore: keep gpt2 small in examples * chore: revert to previous default configs * chore(docs): rename classes, remove unused, add examples * chore(readme): add contributing.md & deepspeed note * style(readme): US spelling * chore(examples): add explanations for each task