Tensorflow implementation of Human-Level Control through Deep Reinforcement Learning.
This implementation contains:
- Deep Q-network and Q-learning
- Experience replay memory
- to reduce the correlations between consecutive updates
- Network for Q-learnig targets are fixed for intervals
- to reduce the correlations between target and predicted Q-values
- Python 2.7 or Python 3.3+
- gym
- tqdm
- OpenCV2
- TensorFlow
First, install prerequisites with:
$ pip install tqdm gym[all]
To train a model for Breakout:
$ python main.py --env_name=Breakout-v0 --is_train=True
$ python main.py --env_name=Breakout-v0 --is_train=True --display=True
To test and record the screen with gym:
$ python main.py --is_train=False
$ python main.py --is_train=False --display=True
Result of training for 24 hours using GTX 980 ti.
Details of Breakout
with model m2
(red) for 18 hours using GTX 980 Ti.
(episode/min reward
should be episode/average reward
. typo)
Details of Breakout
with model m1
(green), m2
(purple), m3
(blue) and m4
(red) for 15 hours using GTX 980 Ti.
MIT License.