Skip to content

Commit

Permalink
Edits in "Basic Models"
Browse files Browse the repository at this point in the history
  • Loading branch information
VladKha authored Jul 2, 2018
1 parent 1b95983 commit 93f0519
Showing 1 changed file with 19 additions and 19 deletions.
38 changes: 19 additions & 19 deletions 5- Sequence Models/Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -665,26 +665,26 @@ Here are the course summary as its given on the course [link](https://www.course
### Various sequence to sequence architectures

#### Basic Models
- In this section we will learn about sequence to sequence - Many to Many - models which are useful in various applications including machine translation and speech recognition.
- Lets start by the basic model:
- Given this machine translation problem in which X is a French sequence and Y is an English sequence.
- ![](Images/52.png)
- In this section we will learn about sequence to sequence - _Many to Many_ - models which are useful in various applications including machine translation and speech recognition.
- Let's start with the basic model:
- Given this machine translation problem in which X is a French sequence and Y is an English sequence.
![](Images/52.png)
- Our architecture will include **encoder** and **decoder**.
- The encoder is built with RNNs - LSTM or GRU are included - and takes the input sequence and then outputs a vector that should represent the whole input.
- After that the decoder network, are also built with RNNs and outputs the output sequence using the vector that has been built by the encoder.
- ![](Images/53.png)
- These ideas are from these papers:
- [[Sutskever](https://arxiv.org/abs/1409.3215) et al., 2014. Sequence to sequence learning with neural networks]
- [[Cho et](https://arxiv.org/abs/1406.1078) al., 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation]
- With an architecture similar to the one previously mentioned works for image captioning problem:
- In this problem X is an image, while Y is a sentence.
- The model architecture image:
- ![](Images/54.png)
- The architecture uses a CNN pretrained AlexNet as an encoder for the image, and the decoder is an RNN.
- The ideas are from these papers (They share similar ideas):
- [[Maoet](https://arxiv.org/abs/1412.6632). al., 2014. Deep captioning with multimodal recurrent neural networks]
- [[Vinyalset](https://arxiv.org/abs/1411.4555). al., 2014. Show and tell: Neural image caption generator]
- [[Karpathy](https://cs.stanford.edu/people/karpathy/cvpr2015.pdf) and Li, 2015. Deep visual-semantic alignments for generating imagedescriptions]
- The encoder is RNN - LSTM or GRU are included - and takes the input sequence and then outputs a vector that should represent the whole input.
- After that the decoder network, also RNN, takes the sequence built by the encoder and outputs the new sequence.
![](Images/53.png)
- These ideas are from the following papers:
- [Sutskever et al., 2014. Sequence to sequence learning with neural networks](https://arxiv.org/abs/1409.3215)
- [Cho et al., 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation](https://arxiv.org/abs/1406.1078)
- An architecture similar to the mentioned above works for image captioning problem:
- In this problem X is an image, while Y is a sentence (caption).
- The model architecture image:
![](Images/54.png)
- The architecture uses a pretrained CNN (like AlexNet) as an encoder for the image, and the decoder is an RNN.
- Ideas are from the following papers (they share similar ideas):
- [Maoet et. al., 2014. Deep captioning with multimodal recurrent neural networks](https://arxiv.org/abs/1412.6632)
- [Vinyals et. al., 2014. Show and tell: Neural image caption generator](https://arxiv.org/abs/1411.4555)
- [Karpathy and Li, 2015. Deep visual-semantic alignments for generating image descriptions](https://cs.stanford.edu/people/karpathy/cvpr2015.pdf)

#### Picking the most likely sentence
- There are some similarities between the language model we have learned previously, and the machine translation model we have just discussed, but there are some differences as well.
Expand Down

0 comments on commit 93f0519

Please sign in to comment.