This code is intended mainly as proof of concept of action-value learning by artificial neural networks, and was inspired by [1, 2, 3].
The implementations are not particularly clear, efficient, well tested or numerically stable. We advise against using this software for nondidactic purposes.
This software is licensed under the MIT License.
- Q-learning feedforward neural network (cf. [1])
- Q-learning long short-term memory network (cf. [2])
- Long short-term memory network model/Q-learning feedforward neural network controller (cf. [3], Sec. 5.1)
See the examples directory.
[1] Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013).
[2] Bakker, Pieter Bram. The State of Mind: Reinforcement Learning with Recurrent Neural Networks. PhD Thesis, Leiden University, 2004.
[3] Schmidhuber, J. On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models. arXiv preprint arXiv:1511.09249 (2015).