diff --git a/chapter_attention-mechanisms-and-transformers/bahdanau-attention.md b/chapter_attention-mechanisms-and-transformers/bahdanau-attention.md index c61ec266a1..98702fdfab 100644 --- a/chapter_attention-mechanisms-and-transformers/bahdanau-attention.md +++ b/chapter_attention-mechanisms-and-transformers/bahdanau-attention.md @@ -74,7 +74,7 @@ $\mathbf{h}_{t}$ as both the key and the value. Note that $\mathbf{c}_{t'}$ is t using the additive attention scoring function defined by :eqref:`eq_additive-attn`. This RNN encoder-decoder architecture -using attention is depicted in :numref:`fig_s2s_attention_details`. Note that later this model was modified such as to include the already generated tokens in the decoder as further context (i.e., the attention sum does stop at $T$ but rather it proceeds up to $t'-1$). For instance, see :citet:`chan2015listen` for a description of this strategy, as applied to speech recognition. +using attention is depicted in :numref:`fig_s2s_attention_details`. Note that later this model was modified such as to include the already generated tokens in the decoder as further context (i.e., the attention sum does not stop at $T$ but rather it proceeds up to $t'-1$). For instance, see :citet:`chan2015listen` for a description of this strategy, as applied to speech recognition. ![Layers in an RNN encoder-decoder model with the Bahdanau attention mechanism.](../img/seq2seq-details-attention.svg) :label:`fig_s2s_attention_details`