firechecking / CleanRL Public

Notifications You must be signed in to change notification settings
Fork 2
Star 27

Reinforcement Learning algorithms and use-cases, including DQN, PG, A3C, PPO etc. and RLHF, AlphaZero implementations. Designed for clarity, ease of use, and educational purposes.

27 stars 2 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
CleanRL		CleanRL
Examples/super_mario		Examples/super_mario
Experiments		Experiments
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Repository files navigation

CleanRL

项目特点

依赖最小化：除了python基础库和pytorch基础运算外，不借助其他第三方库
理论+代码：包含几乎所有强化学习相关的原理和推导，并从零实现常见的强化学习算法
实践案例：包含AlphaZero、RLHF等热门、最新的实践案例
代码简洁：参考原论文及多种开源实现，尽量简化代码，确保正确的同时降低学习难度

欢迎大家来一起完善代码和教程

文字教程见：

知乎: 从零实现强化学习、RLHF、AlphaZero

已更新文字教程

基于价值的强化学习1
- 强化学习中的蒙特卡洛法、时序差分法、贝尔曼方程的对比及代码实现
- q-learning原理、代码实现
- sarsa、sarsa-lambda原理、代码实现
- DQN原理、代码实现
- DQN优化1：Replay Buffer、Fixed Q Target
- DQN优化2：Double DQN
- DQN优化3：Dueling DQN
- DQN优化4：Prioritized Replay Buffer
- Sum-Tree原理、代码实现
基于价值的强化学习2
- DQN中六种优化方法的直观对比
- DQN优化5：Multi-step Learning
- DQN优化6：Noisy Net
- DQN优化7：Distributional RL
- Rainbow实践案例：super-mario训练（含环境、键盘demo、算法、checkpoint等）
基于策略的强化学习1

已更新算法代码

TODO

基于价值的强化学习-算法实现
基于价值的强化学习-实践案例 (super-mario)
基于策略的强化学习-算法实现
基于策略的强化学习-实践案例 (机械臂/机械狗)
RLHF: 通义千问模型的ppo、dpo训练
Alpha: AlphaZero、MuZero训练五子棋、斗地主、麻将

About

Reinforcement Learning algorithms and use-cases, including DQN, PG, A3C, PPO etc. and RLHF, AlphaZero implementations. Designed for clarity, ease of use, and educational purposes.

Report repository

Releases

No releases published

Packages

No packages published

Languages

Python 100.0%