GitHub - StepNeverStop/RLs at ae4176744f72962ddc8d2751b7a7740e9cc2a5ee

Name	Name	Last commit message	Last commit date
Latest commit History 495 Commits
gym_Leaderboard	gym_Leaderboard
gym_env_list	gym_env_list
pics	pics
rls	rls
.gitignore	.gitignore
LICENSE	LICENSE
README.md	README.md
config.yaml	config.yaml
dockerfile	dockerfile
dockerfile_cn	dockerfile_cn
environment.yaml	environment.yaml
requirements.txt	requirements.txt
run.py	run.py
run_ds.py	run_ds.py
setup.py	setup.py

RLs: Reinforcement Learning Algorithm Based On PyTorch.

This project includes SOTA or classic RL(reinforcement learning) algorithms used for training agents by interacting with Unity through ml-agents Release 18 or with gym. The goal of this framework is to provide stable implementations of standard RL algorithms and simultaneously enable fast prototyping of new methods.

About

It aims to fill the need for a small, easily grokked codebase in which users can freely experiment with wild ideas (speculative research).

Characteristics

Suitable for Windows, Linux, and OSX
Almost reimplementation and competitive performance of original papers
Reusable modules
Clear hierarchical structure and easy code control
Compatible with OpenAI Gym and Unity3D Ml-agents
Restoring the training process from where it stopped, retraining on a new task, fine-tuning
Using other training task's model as parameter initialization, specifying --load

Supports

This project supports:

Unity3D ml-agents.
Gym{MuJoCo, PyBullet, gym_minigrid}, for now only two data types are compatible——[Box, Discrete]. Support 99.65% environment settings of Gym(except Blackjack-v0, KellyCoinflip-v0, and KellyCoinflipGeneralized-v0). Support parallel training using gym envs, just need to specify --copys to how many agents you want to train in parallel.
- Discrete -> Discrete (observation type -> action type)
- Discrete -> Box
- Box -> Discrete
- Box -> Box
- Box/Discrete -> Tuple(Discrete, Discrete, Discrete)
MultiAgent training. One group controls multiple agents.
MultiBrain training. Brains' model should be same algorithm or have the same learning-progress(perStep or perEpisode).
MultiImage input(only for ml-agents). Images will resized to same shape before store into replay buffer, like [84, 84, 3].
Four types of Replay Buffer, Default is ER:
- ER
- n-step ER
- Prioritized ER
- n-step Prioritized ER
Noisy Net for better exploration.
Intrinsic Curiosity Module for almost all off-policy algorithms implemented.

Advantages

Parallel training multiple scenes for Gym
Unified data format of environments between ml-agents and gym
Just need to write a single file for other algorithms' implementation(Similar algorithm structure).
Many controllable factors and adjustable parameters

Installation

method 1:

conda env create -f environment.yaml

method 2:

$ git clone https://github.com/StepNeverStop/RLs.git
$ cd RLs
$ conda create -n rls python=3.6
$ conda activate rls
# Windows
$ pip install -e .[windows]
# Linux or Mac OS
$ pip install -e .

If using ml-agents:

$ pip install -e .[unity]

If using atari:

$ pip install -e .[atari]

You can download the builded docker image from here:

$ docker pull keavnn/rls:latest

Implemented Algorithms

For now, these algorithms are available:

Single-Agent training algorithms(Some algorithms that only support continuous space problems use Gumbel-softmax trick to implement discrete versions, i.e. DDPG):
- Q-Learning, Sarsa, Expected Sarsa
- 🐛Policy Gradient, PG
- 🐛Actor Critic, AC
- Advantage Actor Critic, A2C
- Trust Region Policy Optimization, TRPO
- 💥Proximal Policy Optimization, PPO, DPPO
- Deterministic Policy Gradient, DPG
- Deep Deterministic Policy Gradient, DDPG
- 🔥Soft Actor Critic, SAC, Discrete SAC
- Tsallis Actor Critic, TAC
- 🔥Twin Delayed Deep Deterministic Policy Gradient, TD3
- Deep Q-learning Network, DQN, 2013, 2015
- Double Deep Q-learning Network, DDQN
- Dueling Double Deep Q-learning Network, DDDQN
- Deep Recurrent Q-learning Network, DRQN
- Deep Recurrent Double Q-learning, DRDQN
- Category 51, C51
- Quantile Regression DQN, QR-DQN
- Implicit Quantile Networks, IQN
- Rainbow DQN
- MaxSQN
- Soft Q-Learning, SQL
- Bootstrapped DQN
- Averaged DQN
- Contrastive Unsupervised RL, CURL
Hierachical training algorithms:
Multi-Agent training algorithms(only Unity3D, not support visual input yet):
- Multi-Agent Deep Deterministic Policy Gradient, MADDPG
Safe Reinforcement Learning algorithms(not stable yet):
- Primal-Dual Deep Deterministic Policy Gradient, PD-DDPG

Algorithms(29)	Discrete	Continuous	Image	RNN	Command parameter
Q-Learning/Sarsa/Expected Sarsa	√				qs
~~CEM~~	√	√			cem
PG	√	√	√		pg
AC	√	√	√	√	ac
A2C	√	√	√		a2c
TRPO	√	√	√		trpo
PPO	√	√	√		ppo
DQN	√		√	√	dqn
Double DQN	√		√	√	ddqn
Dueling Double DQN	√		√	√	dddqn
Averaged DQN	√		√	√	averaged_dqn
Bootstrapped DQN	√		√	√	bootstrappeddqn
Soft Q-Learning	√		√	√	sql
C51	√		√	√	c51
QR-DQN	√		√	√	qrdqn
IQN	√		√	√	iqn
Rainbow	√		√	√	rainbow
DPG	√	√	√	√	dpg
DDPG	√	√	√	√	ddpg
PD-DDPG	√	√	√	√	pd_ddpg
TD3	√	√	√	√	td3
SAC(has V network)	√	√	√	√	sac_v
SAC	√	√	√	√	sac
TAC	sac	√	√	√	tac
MaxSQN	√		√	√	maxsqn
OC	√	√	√	√	oc
AOC	√	√	√	√	aoc
PPOC	√	√	√	√	ppoc
IOC	√	√	√	√	ioc
HIRO	√	√			hiro
CURL	√	√	√		curl
IQL	√		√		iql
VDN	√		√		vdn
MADDPG	√	√	√		maddpg

Getting started

"""
Usage:
    python [options]

Options:
    -h,--help                   show help info
    -a,--algorithm=<name>       specify the training algorithm [default: ppo]
    -c,--copys=<n>              nums of environment copys that collect data in parallel [default: 1]
    -d, --device=<str>          specify the device that operate Torch.Tensor [default: None]
    -e, --env=<name>            specify the environment name [default: CartPole-v0]
    -f,--file-name=<file>       specify the path of builded training environment of UNITY3D [default: None]
    -g,--graphic                whether show graphic interface when using UNITY3D [default: False]
    -i,--inference              inference the trained model, not train policies [default: False]
    -p,--platform=<str>         specify the platform of training environment [default: gym]
    -l,--load=<name>            specify the name of pre-trained model that need to load [default: None]
    -m,--models=<n>             specify the number of trails that using different random seeds [default: 1]
    -n,--name=<name>            specify the name of this training task [default: None]
    -r,--rnn                    whether use rnn[GRU, LSTM, ...] or not [default: False]
    -s,--save-frequency=<n>     specify the interval that saving model checkpoint [default: None]
    -t,--train-step=<n>         specify the training step that optimize the policy model [default: None]
    -u,--unity                  whether training with UNITY3D editor [default: False]
    --port=<n>                  specify the port that communicate with training environment of UNITY3D [default: 5005]
    --apex=<str>                i.e. "learner"/"worker"/"buffer"/"evaluator" [default: None]
    --config-file=<file>        specify the path of training configuration file [default: None]
    --store-dir=<file>          specify the directory that store model, log and others [default: None]
    --seed=<n>                  specify the random seed of module random, numpy and pytorch [default: 42]
    --env-seed=<n>              specify the environment random seed [default: 42]
    --max-step=<n>              specify the maximum step per episode [default: None]
    --train-episode=<n>         specify the training maximum episode [default: None]
    --train-frame=<n>           specify the training maximum steps interacting with environment [default: None]
    --prefill-steps=<n>         specify the number of experiences that should be collected before start training, use for off-policy algorithms [default: None]
    --prefill-choose            whether choose action using model or choose randomly [default: False]
    --render-episode=<n>        specify when to render the graphic interface of gym environment [default: None]
    --info=<str>                write another information that describe this training task [default: None]
    --hostname                  whether concatenate hostname with the training name [default: False]
    --no-save                   specify whether save models/logs/summaries while training or not [default: False]
Example:
    python run.py
    python run.py -p gym -a dqn -e CartPole-v0 -c 12 -n dqn_cartpole --no-save
    python run.py -p unity -a ppo -n run_with_unity
    python run.py -p unity --file-name /root/env/3dball.app -a sac -n run_with_execution_file
"""

If you specify gym, unity, and environment executable file path simultaneously, the following priorities will be followed: gym > unity > unity_env.

Notes

log, model, training parameter configuration, and data are stored in C:\RLData for Windows, or $HOME/RLData for Linux/OSX
maybe need to use command su or sudo to run on a Linux/OSX
record directory format is RLData/Environment/Algorithm/Behavior name(for ml-agents)/Training name/config&log&model
make sure brains' number > 1 if specifying ma* algorithms like maddpg
multi-agents algorithms doesn't support visual input and PER for now
need 3 steps to implement a new algorithm
1. write .py in rls/algos/{single/multi/hierarchical} directory and make the policy inherit from class Policy, On_Policy, Off_Policy or other super-class defined in rls/algos/base
2. write default configuration in rls/configs/algorithms.yaml
3. register new algorithm at dictionary algos in rls/algos/__init__.py, make sure the class name matches the name of the algorithm class
set algorithms' hyper-parameters in rls/configs/algorithms.yaml
set training default configuration in config.yaml
change neural network structure in rls/nn/models.py
MADDPG is only suitable for Unity3D ML-Agents for now.

Ongoing things

DARQN
ACER
Ape-X
R2D2
~~ACKTR~~

Giving credit

If using this repository for your research, please cite:

@misc{RLs,
  author = {Keavnn},
  title = {RLs: Reinforcement Learning research framework for Unity3D and Gym},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/StepNeverStop/RLs}},
}

Issues

Any questions/errors about this project, please let me know in here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Characteristics

Supports

Advantages

Installation

Implemented Algorithms

Getting started

Notes

Ongoing things

Giving credit

Issues

About

Contributors 5

Languages

License

StepNeverStop/RLs

Folders and files

Latest commit

History

Repository files navigation

About

Characteristics

Supports

Advantages

Installation

Implemented Algorithms

Getting started

Notes

Ongoing things

Giving credit

Issues

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 5

Languages