Runs reinforcement learning algorithms with parallel sampling and GPU training, if available. Highly modular (modifiable) and optimized codebase with functionality for launching large sets of parallel experiments locally on multi-GPU or many-core machines.
Based on accel_rl, which in turn was based on rllab (the logger
is nearly a direct copy).
Follows the rllab interfaces: agents output action, agent_info
, environments output observation, reward, done, env_info
, but introduces new object classes namedarraytuple
for easier organization (see rlpyt/utils/collections.py
). This permits each output to be be either an individual numpy array [torch tensor] or an arbitrary collection of numpy arrays [torch tensors], without changing interfaces. In general, agent inputs/outputs are torch tensors, and environment inputs/ouputs are numpy arrays, with conversions handled automatically.
Recurrent agents are supported, as training batches are organized with leading indexes as [Time, Batch]
, and agents receive previous action and previous reward as input, in addition to the observation.
Start from rlpyt/experiments/scripts/atari/pg/launch/launch_atari_ff_a2c_cpu.py
as a complete launch script example, and follow the code backwards from there. :)
Multi-GPU training within one learning run is not implemented (see accel_rl for hint of how it might be done, or maybe easier with PyTorch's data parallel functionality). Stacking multiple experiments per machine is more effective for multiple runs / variations.
A2C is the first algorithm in place. See accel_rl for similar implementations of other algorithms, including DQN+variants, which could be ported.
This package does not include its own visualization, as the logged data is compatible with previous editions (see above). For more features, use https://github.com/vitchyr/viskit.
- Install the anaconda environment appropriate for the machine.
conda env create -f linux_[cpu|cuda9|cuda10].yml
source activate rlpyt
- Either A) Edit the PYTHONPATH to include the rlpyt directory, or B) Install as editable python package
#A
export PYTHONPATH=path_to_rlpyt:$PYTHONPATH
#B
pip install -e .
- Install any packages / files pertaining to desired environments. Atari is included.
The class types perform the following roles:
- Runner - Connects the
sampler
,agent
, andalgorithm
; manages the training loop and logging of diagnostics.- Sampler - Manages
agent
/environment
interaction to collect training data, can initialize parallel workers.- Collector - Steps
environments
(and maybe operatesagent
) and records samples, attached tosampler
.- Environment - The task to be learned.
- Space - Interface specifications from
environment
toagent
.
- Space - Interface specifications from
- TrajectoryInfo - Diagnostics logged on a per-trajectory basis.
- Environment - The task to be learned.
- Collector - Steps
- Agent - Chooses control action to the
environment
insampler
; trained by thealgorithm
. Interface tomodel
.- Model - Neural network module, attached to the
agent
. - Distribution - Samples actions for stochastic
agents
and defines related formulas for use in loss function, attached to theagent
.
- Model - Neural network module, attached to the
- Algorithm - Uses gathered samples to train the
agent
(e.g. defines a loss function and performs gradient descent).- Optimizer - Training update rule (e.g. Adam), attached to the
algorithm
. - OptimizationInfo - Diagnostics logged on a per-training batch basis.
- Optimizer - Training update rule (e.g. Adam), attached to the
- Sampler - Manages