This repository uses the following python dependencies unless explicitly stated:
Enter the folder of the algorithm that you want to use, and run the to train from scratch:
For more details, please check the file in the corresponding algorithm folder.
- 1.Q-learning
- 2.1Duel Double DQN
- 2.2Noisy Duel DDQN on Atari Game
- 2.3Prioritized Experience Replay(PER) DQN/DDQN
- 2.4Categorical DQN (C51)
- 2.5NoisyNet DQN
- 3.1Proximal Policy Optimization(PPO) for Discrete Action Space
- 3.2Proximal Policy Optimization(PPO) for Continuous Action Space
- 4.1Deep Deternimistic Policy Gradient(DDPG)
- 4.2Twin Delayed Deep Deterministic Policy Gradient(TD3)
- 5.1Soft Actor Critic(SAC) for Discrete Action Space
- 5.2Soft Actor Critic(SAC) for Continuous Action Space
- 6.Actor-Sharer-Learner(ASL)
- Isaac Gym (NVIDIA’s physics simulation environment; GPU accelerated; Superfast):
- Sparrow (Light Weight Simulator for Mobile Robot; DRL friendly):
- ROS (Popular & Comprehensive physical simulator for robots; Heavy and Slow):
- Webots (Popular physical simulator for robots; Faster than ROS; Less realistic):
- Envpool (Fast Vectorized Env)
- Other Popular Envs
- 《Reinforcement learning: An introduction》--Richard S. Sutton
- 《深度学习入门:基于Python的理论与实现》--斋藤康毅
- RL Courses(bilibili)--李宏毅(Hongyi Li)
- RL Courses(Youtube)--李宏毅(Hongyi Li)
- UCL Course on RL--David Silver
- 动手强化学习--上海交通大学
- OpenAI Spinning Up
- Policy Gradient Theorem --Cangxi
- Policy Gradient Algorithms --Lilian
- Theorem of PPO
- The 37 Implementation Details of Proximal Policy Optimization
- Prioritized Experience Replay
- Soft Actor Critic
- A (Long) Peek into Reinforcement Learning --Lilian
- Introduction to TD3
NoisyNet DQN: Fortunato M, Azar M G, Piot B, et al. Noisy networks for exploration[J]. arXiv preprint arXiv:1706.10295, 2017.
CartPole | LunarLander |
Pong | Enduro |
CartPole | LunarLander |
CartPole | LunarLander |
CartPole | LunarLander |
Pendulum | LunarLanderContinuous |