Skip to content

Commit

Permalink
Readme changes.
Browse files Browse the repository at this point in the history
  • Loading branch information
Carlos Riquelme committed Jul 23, 2018
1 parent e0ef14f commit abfa50a
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion research/deep_contextual_bandits/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,8 @@ while attempting to incur low cost. Informally speaking, we assume the expected
reward is given by some function
**E**[r<sub>t</sub> | X<sub>t</sub>, a<sub>t</sub>] = f(X<sub>t</sub>, a<sub>t</sub>).
Unfortunately, function **f** is unknown, as otherwise we could just choose the
action with highest expected value: ![equation](https://latex.codecogs.com/gif.download?%5Cinline%20a_t%5E*%20%3D%20%5Carg%20%5Cmax_i%20f%28X_t%2C%20a_i%29).
action with highest expected value:
a<sub>t</sub><sup>*</sup> = arg max<sub>i</sub> f(X<sub>t</sub>, a<sub>t</sub>).

The idea behind Thompson Sampling is based on keeping a posterior distribution
![equation](https://latex.codecogs.com/gif.download?%5Cinline%20%5Cpi_t) over functions in some family ![equation](https://latex.codecogs.com/gif.download?%5Cinline%20f%20%5Cin%20F) after observing the first
Expand Down

0 comments on commit abfa50a

Please sign in to comment.