Name	Name	Last commit message	Last commit date
Latest commit History 550 Commits
pics	pics
rls	rls
.gitignore	.gitignore
LICENSE	LICENSE
README.md	README.md
auto_format.py	auto_format.py
dockerfile	dockerfile
dockerfile_cn	dockerfile_cn
environment.yaml	environment.yaml
requirements.txt	requirements.txt
run.py	run.py
setup.py	setup.py

RLs: Reinforcement Learning Algorithm Based On PyTorch.

RLs

This project includes SOTA or classic reinforcement learning (single and multi-agent) algorithms used for training agents by interacting with Unity through ml-agents Release 18 or with gym.

About

The goal of this framework is to provide stable implementations of standard RL algorithms and simultaneously enable fast prototyping of new methods. It aims to fill the need for a small, easily grokked codebase in which users can freely experiment with wild ideas (speculative research).

Characteristics

This project supports:

Suitable for Windows, Linux, and OSX
Only need 3 steps to implement a new algorithm:
1. policy write .py in rls/algorithms/{single/multi} directory and make the policy inherit from super-class defined in rls/algorithms/base
2. config write default configuration in rls/configs/algorithms.yaml
3. register register new algorithm in rls/algorithms/__init__.py
Only need 3 steps to adapt to a new training environment:
1. wrapper write environment wrappers in rls/envs/{new platform} directory and make it inherit from super-class defined in rls/envs/env_base.py
2. config write default configuration in rls/configs/{new platform}
3. register register new environment platform in rls/envs/__init__.py
Compatible with several different environment platforms
- Unity3D ml-agents.
- PettingZoo
- Gym{MuJoCo(v2.0.2.13), PyBullet, gym_minigrid}, for now only two data types are compatible——[Box, Discrete]. Support parallel training using gym envs, just need to specify --copys to how many agents you want to train in parallel.
  - Discrete -> Discrete (observation type -> action type)
  - Discrete -> Box
  - Box -> Discrete
  - Box -> Box
  - Box/Discrete -> Tuple(Discrete, Discrete, Discrete)
Multi-Agent training.
Multi-Image input. Images will resized to same shape before store into replay buffer, like [84, 84, 3].
Four types of Replay Buffer, Default is ER:
- ER
- Prioritized ER
Noisy Net for better exploration.
Intrinsic Curiosity Module for almost all off-policy algorithms implemented.
Parallel training multiple scenes for Gym
Unified data format

Installation

method 1:

$ git clone https://github.com/StepNeverStop/RLs.git
$ cd RLs
$ conda create -n rls python=3.8
$ conda activate rls
# Windows
$ pip install -e .[windows]
# Linux or Mac OS
$ pip install -e .

method 1:

conda env create -f environment.yaml

If using ml-agents:

$ pip install -e .[unity]

You can download the builded docker image from here:

$ docker pull keavnn/rls:latest

If anyone who wants to send a PR, plz format all code-files first:

$ pip install -e .[pr]
$ python auto_format.py -d ./

Implemented Algorithms

For now, these algorithms are available:

Multi-Agent training algorithms:
- Independent-SARL, i.e. IQL, I-DQN, etc.
- Value-Decomposition Networks, VDN
- Monotonic Value Function Factorisation Networks, QMIX
- Multi-Agent Deep Deterministic Policy Gradient, MADDPG
Single-Agent training algorithms(Some algorithms that only support continuous space problems use Gumbel-softmax trick to implement discrete versions, i.e. DDPG):
- Policy Gradient, PG
- Actor Critic, AC
- Synchronous Advantage Actor Critic, A2C
- 💥Proximal Policy Optimization, PPO, DPPO
- Trust Region Policy Optimization, TRPO
- Natural Policy Gradient, NPG
- Deterministic Policy Gradient, DPG
- Deep Deterministic Policy Gradient, DDPG
- 🔥Soft Actor Critic, SAC, Discrete SAC
- Tsallis Actor Critic, TAC
- 🔥Twin Delayed Deep Deterministic Policy Gradient, TD3
- Deep Q-learning Network, DQN, 2013, 2015
- Double Deep Q-learning Network, DDQN
- Dueling Double Deep Q-learning Network, DDDQN
- Deep Recurrent Q-learning Network, DRQN
- Deep Recurrent Double Q-learning, DRDQN
- Category 51, C51
- Quantile Regression DQN, QR-DQN
- Implicit Quantile Networks, IQN
- Rainbow DQN
- MaxSQN
- Soft Q-Learning, SQL
- Bootstrapped DQN
- Averaged DQN
- Hierachical training algorithms:
  - Option-Critic, OC
  - Asynchronous Advantage Option-Critic, A2OC
  - PPO Option-Critic, PPOC
  - Interest-Option-Critic, IOC

Algorithms	Discrete	Continuous	Image	RNN	Command parameter
PG	✓	✓	✓	✓	pg
AC	✓	✓	✓	✓	ac
A2C	✓	✓	✓	✓	a2c
NPG	✓	✓	✓	✓	npg
TRPO	✓	✓	✓	✓	trpo
PPO	✓	✓	✓	✓	ppo
DQN	✓		✓	✓	dqn
Double DQN	✓		✓	✓	ddqn
Dueling Double DQN	✓		✓	✓	dddqn
Averaged DQN	✓		✓	✓	averaged_dqn
Bootstrapped DQN	✓		✓	✓	bootstrappeddqn
Soft Q-Learning	✓		✓	✓	sql
C51	✓		✓	✓	c51
QR-DQN	✓		✓	✓	qrdqn
IQN	✓		✓	✓	iqn
Rainbow	✓		✓	✓	rainbow
DPG	✓	✓	✓	✓	dpg
DDPG	✓	✓	✓	✓	ddpg
TD3	✓	✓	✓	✓	td3
SAC(has V network)	✓	✓	✓	✓	sac_v
SAC	✓	✓	✓	✓	sac
TAC	sac	✓	✓	✓	tac
MaxSQN	✓		✓	✓	maxsqn
OC	✓	✓	✓	✓	oc
AOC	✓	✓	✓	✓	aoc
PPOC	✓	✓	✓	✓	ppoc
IOC	✓	✓	✓	✓	ioc
VDN	✓		✓	✓	vdn
QMIX	✓		✓	✓	qmix
MADDPG	✓	✓	✓	✓	maddpg

Getting started

"""
usage: run.py [-h] [-c COPYS] [--seed SEED] [-r] [-p {gym,unity,pettingzoo}]
              [-a {pg,npg,trpo,ppo,a2c,aoc,ppoc,ac,dpg,ddpg,td3,sac_v,sac,tac,dqn,ddqn,dddqn,averaged_dqn,c51,qrdqn,rainbow,iqn,maxsqn,sql,bootstrappeddqn,oc,ioc,maddpg,vdn,qmix}]
              [-i] [-l LOAD_PATH] [-m MODELS] [-n NAME] [-s SAVE_FREQUENCY] [--config-file CONFIG_FILE] [--store-dir STORE_DIR] [--episode-length EPISODE_LENGTH]
              [--prefill-steps PREFILL_STEPS] [--hostname] [--info INFO] [-e ENV_NAME] [-f FILE_NAME] [--no-save] [-d DEVICE] [-t MAX_TRAIN_STEP]

optional arguments:
  -h, --help            show this help message and exit
  -c COPYS, --copys COPYS
                        nums of environment copys that collect data in parallel
  --seed SEED           specify the random seed of module random, numpy and pytorch
  -r, --render          whether render game interface
  -p {gym,unity,pettingzoo}, --platform {gym,unity,pettingzoo}
                        specify the platform of training environment
  -a {pg,npg,trpo,ppo,a2c,aoc,ppoc,ac,dpg,ddpg,td3,sac_v,sac,tac,dqn,ddqn,dddqn,averaged_dqn,c51,qrdqn,rainbow,iqn,maxsqn,sql,bootstrappeddqn,oc,ioc,maddpg,vdn,qmix}, --algorithm {pg,npg,trpo,ppo,a2c,aoc,ppoc,ac,dpg,ddpg,td3,sac_v,sac,tac,dqn,ddqn,dddqn,averaged_dqn,c51,qrdqn,rainbow,iqn,maxsqn,sql,bootstrappeddqn,oc,ioc,maddpg,vdn,qmix}
                        specify the training algorithm
  -i, --inference       inference the trained model, not train policies
  -l LOAD_PATH, --load-path LOAD_PATH
                        specify the name of pre-trained model that need to load
  -m MODELS, --models MODELS
                        specify the number of trails that using different random seeds
  -n NAME, --name NAME  specify the name of this training task
  -s SAVE_FREQUENCY, --save-frequency SAVE_FREQUENCY
                        specify the interval that saving model checkpoint
  --config-file CONFIG_FILE
                        specify the path of training configuration file
  --store-dir STORE_DIR
                        specify the directory that store model, log and others
  --episode-length EPISODE_LENGTH
                        specify the maximum step per episode
  --prefill-steps PREFILL_STEPS
                        specify the number of experiences that should be collected before start training, use for off-policy algorithms
  --hostname            whether concatenate hostname with the training name
  --info INFO           write another information that describe this training task
  -e ENV_NAME, --env-name ENV_NAME
                        specify the environment name
  -f FILE_NAME, --file-name FILE_NAME
                        specify the path of builded training environment of UNITY3D
  --no-save             specify whether save models/logs/summaries while training or not
  -d DEVICE, --device DEVICE
                        specify the device that operate Torch.Tensor
  -t MAX_TRAIN_STEP, --max-train-step MAX_TRAIN_STEP
                        specify the maximum training steps

Example:

python run.py
python run.py -p gym -a dqn -e CartPole-v0 -c 12 -n dqn_cartpole --no-save
python run.py -p unity -a ppo -n run_with_unity -c 1

Giving credit

If using this repository for your research, please cite:

@misc{RLs,
  author = {Keavnn},
  title = {RLs: Reinforcement Learning research framework for Unity3D and Gym},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/StepNeverStop/RLs}},
}

Issues

Any questions/errors about this project, please let me know in here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RLs

About

Characteristics

Installation

Implemented Algorithms

Getting started

Giving credit

Issues

About

Contributors 5

Languages

License

StepNeverStop/RLs

Folders and files

Latest commit

History

Repository files navigation

RLs

About

Characteristics

Installation

Implemented Algorithms

Getting started

Giving credit

Issues

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 5

Languages