Human-Level Control through Deep Reinforcement Learning

This implementation contains:

Deep Q-network and Q-learning
Experience replay memory
- to reduce the correlations between consecutive updates
Network for Q-learnig targets are fixed for intervals
- to reduce the correlations between target and predicted Q-values

Requirements

First, install prerequisites with:

$ pip install tqdm gym[all]

To train a model for Breakout:

$ python main.py --env_name=Breakout-v0 --is_train=True
$ python main.py --env_name=Breakout-v0 --is_train=True --display=True

To test and record the screen with gym:

$ python main.py --is_train=False
$ python main.py --is_train=False --display=True

Result of training for 24 hours using GTX 980 ti.

Details of Breakout with model m2(red) for 30 hours using GTX 980 Ti.

Details of Breakout with model m3(red) for 30 hours using GTX 980 Ti.

MIT License.