forked from HumanCompatibleAI/imitation
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add scripts and configs for hyperparameter tuning (HumanCompatibleAI#675
) * Merge py file changes from benchmark-algs * Clean parallel script * Undo the changes from HumanCompatibleAI#653 to the dagger benchmark config files. This change just made some error messages go away indicating the missing imitation.algorithms.dagger.ExponentialBetaSchedule but it did not fix the root cause. * Improve readability and interpretability of benchmarking tests. * Add pxponential beta scheduler for dagger * Ignore coverage for unknown algorithms. * Cleanup and extend tests for beta schedules in dagger. * Add optuna to dependencies * Fix test case * Clean up the scripts * Remove reporter(done) since mean_return is reported by the runs * Add beta_schedule parameter to dagger script * Update config policy kwargs * Changes from review * Fix errors with some configs * Updates based on review * Change metric everywhere * Separate tuning code from parallel.py * Fix docstring * Removing resume option as it is getting tricky to correctly implement * Minor fixes * Updates from review * fix lint error * Add documentation for using the tuning script * Fix lint error * Updates from the review * Fix file name test errors * Add tune_run_kwargs in parallel script * Fix test errors * Fix test * Fix lint * Updates from review * Simplify few lines of code * Updates from review * Fix test * Revert "Fix test" This reverts commit 8b55134. * Fix test * Convert Dict to Mapping in input argument * Ignore coverage in script configurations. * Pin huggingface_sb3 version. * Update to the newest seals environment versions. * Push gymnasium dependency to 0.29 to ensure mujoco envs work. * Incorporate review comments * Fix test errors * Move benchmarking/ to scripts/ and add named configs for tuned hyperparams * Bump cache version & remove unnecessary files * Include tuned hyperparam json files in package data * Update storage hash * Update search space of bc * update benchmark and hyper parameter tuning readme * Update README.md * Incorporate reviewer's comments in benchmarking readme * Update gymnasium version and render mode in eval policy * Fix error * Update commands.py hex strings --------- Co-authored-by: Maximilian Ernestus <[email protected]> Co-authored-by: ZiyueWang25 <[email protected]>
- Loading branch information
1 parent
f099c33
commit 20366b0
Showing
43 changed files
with
1,023 additions
and
264 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,19 +1,42 @@ | ||
# Benchmarking imitation | ||
|
||
This directory contains sacred configuration files for benchmarking imitation's algorithms. For v0.3.2, these correspond to the hyperparameters used in the paper [imitation: Clean Imitation Learning Implementations](https://www.rocamonde.com/publication/gleave-imitation-2022/). | ||
The `src/imitation/scripts/config/tuned_hps` directory provides the tuned hyperparameter configs for benchmarking imitation. For v0.4.0, these correspond to the hyperparameters used in the paper [imitation: Clean Imitation Learning Implementations](https://arxiv.org/abs/2211.11972). | ||
|
||
Configuration files can be loaded either from the CLI or from the Python API. The examples below assume that your current working directory is the root of the `imitation` repository. This is not necessarily the case and you should adjust your paths accordingly. | ||
Configuration files can be loaded either from the CLI or from the Python API. | ||
|
||
## CLI | ||
|
||
```bash | ||
python -m imitation.scripts.<train_script> <algo> with benchmarking/<config_name>.json | ||
python -m imitation.scripts.<train_script> <algo> with <algo>_<env> | ||
``` | ||
`train_script` can be either 1) `train_imitation` with `algo` as `bc` or `dagger` or 2) `train_adversarial` with `algo` as `gail` or `airl`. | ||
`train_script` can be either 1) `train_imitation` with `algo` as `bc` or `dagger` or 2) `train_adversarial` with `algo` as `gail` or `airl`. The `env` can be either of `seals_ant`, `seals_half_cheetah`, `seals_hopper`, `seals_swimmer`, or `seals_walker`. The hyperparameters for other environments are not tuned yet. You may be able to get reasonable performance by using hyperparameters tuned for a similar environment; alternatively, you can tune the hyperparameters using the `tuning` script. | ||
|
||
## Python | ||
|
||
```python | ||
... | ||
ex.add_config('benchmarking/<config_name>.json') | ||
from imitation.scripts.<train_script> import <train_ex> | ||
<train_ex>.run(command_name="<algo>", named_configs=["<algo>_<env>"]) | ||
``` | ||
|
||
# Tuning Hyperparameters | ||
|
||
The hyperparameters of any algorithm in imitation can be tuned using `src/imitation/scripts/tuning.py`. | ||
The benchmarking hyperparameter configs were generated by tuning the hyperparameters using | ||
the search space defined in the `scripts/config/tuning.py`. | ||
|
||
The tuning script proceeds in two phases: | ||
1. Tune the hyperparameters using the search space provided. | ||
2. Re-evaluate the best hyperparameter config found in the first phase based on the maximum mean return on a separate set of seeds. Report the mean and standard deviation of these trials. | ||
|
||
To use it with the default search space: | ||
```bash | ||
python -m imitation.scripts.tuning with <algo> 'parallel_run_config.base_named_configs=["<env>"]' | ||
``` | ||
|
||
In this command: | ||
- `<algo>` provides the default search space and settings for the specific algorithm, which is defined in the `scripts/config/tuning.py` | ||
- `<env>` sets the environment to tune the algorithm in. They are defined in the algo-specifc `scripts/config/train_[adversarial|imitation|preference_comparisons|rl].py` files. For the already tuned environments, use the `<algo>_<env>` named configs here. | ||
|
||
See the documentation of `scripts/tuning.py` and `scripts/parallel.py` for many other arguments that can be | ||
provided through the command line to change the tuning behavior. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.