Skip to content

Commit

Permalink
Update 2021-07-27-update.markdown (absolute image links)
Browse files Browse the repository at this point in the history
  • Loading branch information
riveSunder authored Jun 25, 2024
1 parent f8d40f1 commit 718018c
Showing 1 changed file with 6 additions and 6 deletions.
12 changes: 6 additions & 6 deletions docs/_posts/2021-07-27-update.markdown
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ I'll just speak from personal experience when I say that one of the principal wa
In John Conway's Game of Life, one of the many Life-like cellular automata we can experiment with in [CARLE](https://github.com/rivesunder/carle), this might look something like the glider animation in Figure 1.

<div align="center">
<img src="/carles_game/assets/glider_prediction.gif">
<img src="https://raw.githubusercontent.com/riveSunder/carles_game/master/assets/glider_prediction.gif">
<br>
Figure 1: Glider in Conway's Life.
</div>
Expand All @@ -33,15 +33,15 @@ Figure 2: Glider + Methuselah acorn pattern in Conway's Life. This animation is
In Figure 2 the introduction of this long-lived, chaotic pattern reduces the ability of the model used by PredictionBonus to predict what the CA grid will look like based on past frames. We can probably relate to this, as the pattern sequence mostly looks like a mess, and although it would not be technically difficult to go through the grid cell-by-cell to predict the next frame at each step, it would be tedious. Eventually the pattern settles down into a field of still lifes and oscillators, becoming very predictable and generating a high, stable prediction bonus.

<div align="center">
<img src="/carles_game/assets/oscillators_prediction.gif">
<img src="https://raw.githubusercontent.com/riveSunder/carles_game/master/assets/oscillators_prediction.gif">
<br>
Figure 3: Prediction bonus for a field of still lifes and oscillators.
</div>

From casual inspection of just a few frames of Figure 3, we can observe that the grid only has a few different states (2) that it oscillates between. This makes for an easy prediction problem, and the prediction bonus reward is correspondingly higher than during the chaotic growth period. If we remove all the still lifes and oscillators and put in another glider, however, we'll see that the prediction bonus is even higher.

<div align="center">
<img src="/carles_game/assets/glider_prediction_again.gif" >
<img src="https://raw.githubusercontent.com/riveSunder/carles_game/master/assets/glider_prediction_again.gif" >
<br>
Figure 4: Prediction bonus for a single glider, again.
</div>
Expand All @@ -54,14 +54,14 @@ Of course, it might seem counterintuitive that, in the examples above, an agent


<div align="center">
<img src="/carles_game/assets/glider_surprise.gif" >
<img src="https://raw.githubusercontent.com/riveSunder/carles_game/master/assets/glider_surprise.gif" >
<br>
Figure 5: Surprise bonus for a single glider.
<br>
</div>

<div align="center">
<img src="/carles_game/assets/acorn_surprise.gif" >
<img src="https://raw.githubusercontent.com/riveSunder/carles_game/master/assets/acorn_surprise.gif" >
<br>
Figure 6: Surprise bonus for a glider and acorn Methuselah pattern.
</div>
Expand All @@ -72,7 +72,7 @@ Perhaps some combination of predictability and surprise would be the best approa
There are a few other details that may be useful for experimenters looking to work with prediction and surprise wrappers. Next-state prediction rewards are known for being vulnerability to what's known as "[the noisy TV problem](https://openai.com/blog/reinforcement-learning-with-prediction-based-rewards/)." This is the phenomenon by which any source of unpredictabilty in the environment, usually some sort of stochasticity, can become an irresistible draw for agents under a prediction-based reward. While Life-like cellular automata are fully deterministic and thus predictable, in practice a chaotic scene will produce higher prediction losses and higher rewards when `SurpriseBonus` is used. Another phenomenon occurs where the SurpriseBonus generates higher rewards for fields with more activations, even when they are predictable or entirely static. In Figure 7 the surprise bonus curve remains high for quite a while after the pattern has become static. The prediction model used in Figure 7 has a batch size of 16, and I noticed that for a batch size of 2 the reward never drops off. This is useful to keep in mind when setting up a surprise of prediction-based experiment, as the wrong hyperparameters used in the prediction model could lead to strange behavior like learning to stare endlessly at a static pattern, or seeking out noisy, mostly random patterns.

<div align="center">
<img src="/carles_game/assets/lwd_surprise.gif" >
<img src="https://raw.githubusercontent.com/riveSunder/carles_game/master/assets/lwd_surprise.gif" >
<br>
Figure 7: Surprise bonus for a coral growth pattern. Reward remains high for an extended period even after the pattern becomes static. This animation is shown at 5X speed.
</div>
Expand Down

0 comments on commit 718018c

Please sign in to comment.