Human-Level Control through Deep Reinforcement Learning

Tensorflow implementation of Human-Level Control through Deep Reinforcement Learning.

This implementation contains:

Deep Q-network and Q-learning
Experience replay memory
- to reduce the correlations between consecutive updates
Network for Q-learnig targets are fixed for intervals
- to reduce the correlations between target and predicted Q-values

Requirements

Python 2.7 or Python 3.3+
gym
tqdm
OpenCV2
TensorFlow

Usage

First, install prerequisites with:

$ pip install tqdm gym[all]

To train a model for Breakout:

$ python main.py --env_name=Breakout-v0 --is_train=True
$ python main.py --env_name=Breakout-v0 --is_train=True --display=True

To test and record the screen with gym:

$ python main.py --is_train=False
$ python main.py --is_train=False --display=True

Results

Result of training for 24 hours using GTX 980 ti.

Training details

Details of Breakout with model m2(red) for 18 hours using GTX 980 Ti.

(episode/min reward should be episode/average reward. typo)

Statistics of loss, q values, rewards and # of game / episode
Histogram of rewards / episode

Details of Breakout with model m1(green), m2(purple), m3(blue) and m4(red) for 15 hours using GTX 980 Ti.

Statistics of loss, q values, rewards and # of game / episode
Histogram of rewards / episode

References

License

MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
assets		assets
checkpoints/Breakout-v0/min_delta--1/max_delta-1/history_length-4/train_frequency-4/target_q_update_step-10000/memory_size-1000000/action_repeat-4/ep_end_t-1000000/backend-tf/random_start-30/scale-10000/env_type-simple/min_reward--1.0/ep_start-1.0/screen_width-84/learn_start-50000.0/cnn_format-NCHW/learning_rate-0.00025/batch_size-32/discount-0.99/max_reward-1.0/max_step-50000000/env_name-Breakout-v0/ep_end-0.1/model-m2/screen_height-84		checkpoints/Breakout-v0/min_delta--1/max_delta-1/history_length-4/train_frequency-4/target_q_update_step-10000/memory_size-1000000/action_repeat-4/ep_end_t-1000000/backend-tf/random_start-30/scale-10000/env_type-simple/min_reward--1.0/ep_start-1.0/screen_width-84/learn_start-50000.0/cnn_format-NCHW/learning_rate-0.00025/batch_size-32/discount-0.99/max_reward-1.0/max_step-50000000/env_name-Breakout-v0/ep_end-0.1/model-m2/screen_height-84
dqn		dqn
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Human-Level Control through Deep Reinforcement Learning

Requirements

Usage

Results

Training details

References

License

About

Releases

Packages

Languages

License

banjjagi/DQN-tensorflow

Folders and files

Latest commit

History

Repository files navigation

Human-Level Control through Deep Reinforcement Learning

Requirements

Usage

Results

Training details

References

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages