Name		Name	Last commit message	Last commit date
parent directory ..
DQN		DQN
Policy_gradient		Policy_gradient
readme.md		readme.md

readme.md

经典强化学习论文解读

该部分是蘑菇书的扩展内容，整理&总结&解读强化学习领域的经典论文。主要有DQN类、策略梯度类、模仿学习类、分布式强化学习、多任务强化学习、探索策略、分层强化学习以及其他技巧等方向的论文。后续会配有视频解读（与WhalePaper合作），会陆续上线Datawhale B站公众号。

每周更新5篇左右的论文，欢迎关注。

如果在线阅读Markdown文件有问题（例如公式编译错误、图片显示较慢等），请下载到本地阅读，或观看PDF文件夹中的同名文件。

转发请加上链接&来源Easy RL项目

类别	论文题目	原文链接
Value-based	Playing Atari with Deep Reinforcement Learning (DQN) [Markdown] [PDF]	https://arxiv.org/abs/1312.5602
	DRQN: Deep Recurrent Q-Learning for Partially Observable MDPs [Markdown] [PDF]	https://arxiv.org/abs/1507.06527
	Dueling Network Architectures for Deep Reinforcement Learning (Dueling DQN) [Markdown] [PDF]	https://arxiv.org/abs/1511.06581
	Deep Reinforcement Learning with Double Q-learning (Double DQN) [Markdown] [PDF]	https://arxiv.org/abs/1509.06461
	NoisyDQN	https://arxiv.org/pdf/1706.10295.pdf
	QRDQN	https://arxiv.org/pdf/1710.10044.pdf
	CQL	https://arxiv.org/pdf/2006.04779.pdf
	Prioritized Experience Replay (PER) [Markdown] [PDF]	https://arxiv.org/abs/1511.05952
	Rainbow: Combining Improvements in Deep Reinforcement Learning (Rainbow) [Markdown] [PDF]	https://arxiv.org/abs/1710.02298
	A Distributional Perspective on Reinforcement Learning (C51) [Markdown] [PDF]	https://arxiv.org/abs/1707.06887
Policy -based	Asynchronous Methods for Deep Reinforcement Learning (A3C) [Markdown] [PDF]	https://arxiv.org/abs/1602.01783
	Trust Region Policy Optimization (TRPO) [Markdown] [PDF]	https://arxiv.org/abs/1502.05477
	High-Dimensional Continuous Control Using Generalized Advantage Estimation (GAE) [Markdown] [PDF]	https://arxiv.org/abs/1506.02438
	Proximal Policy Optimization Algorithms (PPO) [Markdown] [PDF]	https://arxiv.org/abs/1707.06347
	Emergence of Locomotion Behaviours in Rich Environments (PPO-Penalty) [Markdown] [PDF]	https://arxiv.org/abs/1707.02286
	Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTP) [Markdown] [PDF]	https://arxiv.org/abs/1708.05144
	Sample Efficient Actor-Critic with Experience Replay (ACER)	https://arxiv.org/abs/1611.01224
	Deterministic Policy Gradient Algorithms (DPG) [Markdown] [PDF]	http://proceedings.mlr.press/v32/silver14.pdf
	Continuous Control With Deep Reinforcement Learning (DDPG)	https://arxiv.org/abs/1509.02971
	Addressing Function Approximation Error in Actor-Critic Methods (TD3) [Markdown] [PDF]	https://arxiv.org/abs/1802.09477

	Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic (Q-Prop)	https://arxiv.org/abs/1611.02247
	Action-depedent Control Variates for Policy Optimization via Stein’s Identity (Stein Control Variates) [Markdown] [PDF]	https://arxiv.org/abs/1710.11198
	The Mirage of Action-Dependent Baselines in Reinforcement Learning [Markdown] [PDF]	https://arxiv.org/abs/1802.10031
	Bridging the Gap Between Value and Policy Based Reinforcement Learning (PCL) [Markdown] [PDF]	https://arxiv.org/abs/1702.08892
MaxEntropy RL	Soft Q learning	https://arxiv.org/abs/1702.08165
	Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor (SAC) [Markdown] [PDF]	https://arxiv.org/abs/1801.01290
Multi-Agent	IQL	https://web.media.mit.edu/~cynthiab/Readings/tan-MAS-reinfLearn.pdf
	VDN	https://arxiv.org/abs/1706.05296
	QTRAN	http://proceedings.mlr.press/v97/son19a/son19a.pdf
	QMIX	https://arxiv.org/abs/1803.11485
	Weighted QMIX	https://arxiv.org/abs/2006.10800
	COMA	https://ojs.aaai.org/index.php/AAAI/article/download/11794/11653
	MAPPO	https://arxiv.org/abs/2103.01955
	MADDPG
Sparse reward	Hierarchical DQN	https://arxiv.org/abs/1604.06057
	ICM	https://arxiv.org/pdf/1705.05363.pdf
	HER	https://arxiv.org/pdf/1707.01495.pdf
Imitation Learning	GAIL	https://arxiv.org/abs/1606.03476
	TD3+BC	https://arxiv.org/pdf/2106.06860.pdf
Model based	Dyna Q	https://arxiv.org/abs/1801.06176

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

papers

papers

readme.md

经典强化学习论文解读

Files

papers

Directory actions

More options

Directory actions

More options

Latest commit

History

papers

Folders and files

parent directory

readme.md

经典强化学习论文解读