Skip to content

Commit

Permalink
dqn: change network arch
Browse files Browse the repository at this point in the history
  • Loading branch information
vmayoral committed Jun 19, 2016
1 parent 0db9ce1 commit 4d73866
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 6 deletions.
16 changes: 11 additions & 5 deletions tutorial5/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,20 @@ Up until this tutorial Q-learning algorithm has been storing state-action pairs

That's where neural networks come in. Or any other type of function approximator, even a simple linear model. We can use a neural network, instead of a lookup table, as our Q(s,a)Q(s,a) function. Just like before, it will accept a state and an action and spit out the value of that state-action.

### A bit of theoretical background
Discuss Deepmind's papers and original DQN algorithm.

(from second paper)
>Reinforcement learning is known to be unstable or even to diverge when a nonlinear function approximator such as a neural network is used to represent the action-value (also known as Q) function20.
The implementation is available at the `dqn-cartpole.py` script.

### Playing with the hyperparameters in a DQN:

The subsections below describe some of the tests performed over the hyperparameters of the Deep Q-learning algorithm:

#### Network architecture

| Algorithm | `epochs:` 100 | `epochs:` 500 | `epochs:` 1000 |
|-----------|----------------|----------------|-----------------|
| DQN (default params) [4x30x30x2`] | 24 (17) | 200 (199) | 200 (168)|
| DQN (default params), default rewards [4x300x300x2`] | 106 (43) | | |


#### Minibatch size
#### Memory size
#### Learning rate
Expand All @@ -37,6 +42,7 @@ Changing the rewards proved to be relevant when working with Q-learning however
| DQN (default params) [4x30x30x2`] | 24 (17) | 200 (199) | 200 (168)|
| DQN (default params) [4x30x30x30x2`] | 71 (26) | 150 (184) | 120 (126) |
| DQN (default params), default rewards [4x30x30x2`] | 9 (15) | 200 (199) | 200 (199) |
| DQN (default params), default rewards [4x300x300x2`] | 106 (43) | | |

*Each cell represents the best 100 scores for the number of epochs and in parenthesis the average score over all the epochs*

Expand Down
3 changes: 2 additions & 1 deletion tutorial5/dqn-cartpole.py
Original file line number Diff line number Diff line change
Expand Up @@ -312,7 +312,8 @@ def learnOnMiniBatch(self, miniBatchSize, useTargetNetwork=True):

deepQ = DeepQ(4, 2, memorySize, discountFactor, learningRate, learnStart)
# deepQ.initNetworks([30,30,30])
deepQ.initNetworks([30,30])
# deepQ.initNetworks([30,30])
deepQ.initNetworks([300,300])

stepCounter = 0

Expand Down

0 comments on commit 4d73866

Please sign in to comment.