DualDICE

Code for DualDICE as described in `DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections' by Ofir Nachum, Yinlam Chow, Bo Dai, Lihong Li.

Paper available at https://arxiv.org/abs/1906.04733

If you use this codebase for your research, please cite the paper:

@article{nachum2019dualdice,
  title={DualDICE: Behavior-Agnostic Estimation of Discounted Stationary
  Distribution Corrections},
  author={Nachum, Ofir and Chow, Yinlam and Dai, Bo and Li, Lihong},
  journal={NeurIPS},
  year={2019}
}

Basic Commands

Run DualDICE on a tabular version of GridWalk:

python -m dual_dice.run --logtostderr --env_name=grid --tabular_obs --tabular_solver

Run DualDICE on a non-tabular version of GridWalk:

python -m dual_dice.run --logtostderr --env_name=grid --notabular_obs --notabular_solver \
    --nu_learning_rate=0.0001 --zeta_learning_rate=0.001

Additional Information

When tuning DualDICE, some things to keep in mind:

Optimization of nu and zeta at the same time can be unstable. Try utilizing the --deterministic_env flag, which trains nu on its own (without application of Fenchel duality).
Lower gamma can make optimization easier.
Otherwise, nu and zeta learning rate have the biggest impact on performance.

When evaluating DualDICE, keep in mind that the print-out of the Target (oracle) rewards is an estimate! In a perfect world this would be calculated with num_trajectories=\infty and max_trajectory_length=\infty. In practice, to get a better estimate, just set these to a large value (e.g., 1000).

This is a minimal version of the code. For example, continuous-action environments are not supported.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

DualDICE

Basic Commands

Additional Information

Files

README.md

Latest commit

History

README.md

File metadata and controls

DualDICE

Basic Commands

Additional Information