Edits in "Basic Models"

xiaow6 · Jul 2, 2018 · 93f0519 · 93f0519
1 parent 1b95983
commit 93f0519
Showing 1 changed file with 19 additions and 19 deletions.
diff --git a/5- Sequence Models/Readme.md b/5- Sequence Models/Readme.md
@@ -665,26 +665,26 @@ Here are the course summary as its given on the course [link](https://www.course
 ### Various sequence to sequence architectures
 
 #### Basic Models
-- In this section we will learn about sequence to sequence - Many to Many -  models which are useful in various applications including machine translation and speech recognition.
-- Lets start by the basic model:
-  - Given this machine translation problem in which X is a French sequence and Y is an English sequence.
-    - ![](Images/52.png)
+- In this section we will learn about sequence to sequence - _Many to Many_ - models which are useful in various applications including machine translation and speech recognition.
+- Let's start with the basic model:
+  - Given this machine translation problem in which X is a French sequence and Y is an English sequence.   
+    ![](Images/52.png)
   - Our architecture will include **encoder** and **decoder**.
-  - The encoder is built with RNNs - LSTM or GRU are included - and takes the input sequence and then outputs a vector that should represent the whole input.
-  - After that the decoder network, are also built with RNNs and outputs the output sequence using the vector that has been built by the encoder.
-  - ![](Images/53.png)
-  - These ideas are from these papers:
-    - [[Sutskever](https://arxiv.org/abs/1409.3215) et al., 2014. Sequence to sequence learning with neural networks]
-    - [[Cho et](https://arxiv.org/abs/1406.1078) al., 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation]
-- With an architecture similar to the one previously mentioned works for image captioning problem:
-  - In this problem X is an image, while Y is a sentence.
-  - The model architecture image:
-    - ![](Images/54.png)
-  - The architecture uses a CNN pretrained AlexNet as an encoder for the image, and the decoder is an RNN.
-  - The ideas are from these papers (They share similar ideas):
-    - [[Maoet](https://arxiv.org/abs/1412.6632). al., 2014. Deep captioning with multimodal recurrent neural networks]
-    - [[Vinyalset](https://arxiv.org/abs/1411.4555). al., 2014. Show and tell: Neural image caption generator]
-    - [[Karpathy](https://cs.stanford.edu/people/karpathy/cvpr2015.pdf) and Li, 2015. Deep visual-semantic alignments for generating imagedescriptions]
+  - The encoder is RNN - LSTM or GRU are included - and takes the input sequence and then outputs a vector that should represent the whole input.
+  - After that the decoder network, also RNN, takes the sequence built by the encoder and outputs the new sequence.   
+  ![](Images/53.png)
+  - These ideas are from the following papers:
+    - [Sutskever et al., 2014. Sequence to sequence learning with neural networks](https://arxiv.org/abs/1409.3215)
+    - [Cho et al., 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation](https://arxiv.org/abs/1406.1078)
+- An architecture similar to the mentioned above works for image captioning problem:
+  - In this problem X is an image, while Y is a sentence (caption).
+  - The model architecture image:   
+    ![](Images/54.png)
+  - The architecture uses a pretrained CNN (like AlexNet) as an encoder for the image, and the decoder is an RNN.
+  - Ideas are from the following papers (they share similar ideas):
+    - [Maoet et. al., 2014. Deep captioning with multimodal recurrent neural networks](https://arxiv.org/abs/1412.6632)
+    - [Vinyals et. al., 2014. Show and tell: Neural image caption generator](https://arxiv.org/abs/1411.4555)
+    - [Karpathy and Li, 2015. Deep visual-semantic alignments for generating image descriptions](https://cs.stanford.edu/people/karpathy/cvpr2015.pdf)
 
 #### Picking the most likely sentence
 - There are some similarities between the language model we have learned previously, and the machine translation model we have just discussed, but there are some differences as well.