Minor edits in Week1

lidunyu · May 23, 2018 · 6314030 · 6314030
1 parent 9eb299f
commit 6314030
Showing 1 changed file with 1 addition and 6 deletions.
diff --git a/5- Sequence Models/Readme.md b/5- Sequence Models/Readme.md
@@ -70,7 +70,6 @@ Here are the course summary as its given on the course [link](https://www.course
 > Learn about recurrent neural networks. This type of model has been proven to perform extremely well on temporal data. It has several variants including LSTMs, GRUs and Bidirectional RNNs, which you are going to learn about in this section.
 
 ### Why sequence models
-
 - Sequence Models like RNN and LSTMs have greatly transformed learning on sequences in the past few years.
 - Examples of sequence data in applications:
   - Speech recognition (**sequence to sequence**):
@@ -98,7 +97,6 @@ Here are the course summary as its given on the course [link](https://www.course
 - All of these problems with different input and output (sequence or not) can be addressed as supervised learning with label data X, Y as the training set.
 
 ### Notation
-
 - In this section we will discuss the notations that we will use through the course.
 - **Motivating example**:
   - Named entity recognition example:
@@ -338,13 +336,12 @@ Here are the course summary as its given on the course [link](https://www.course
 - The multiplication in the equations are element wise multiplication.
 - What has been descried so far is the Simplified GRU unit. Let's now describe the full one:
   - The full GRU contains a new gate that is used with to calculate the candidate C. The gate tells you how relevant is C<sup>\<t-1></sup> to C<sup>\<t></sup>
-  - Equations:
+  - Equations:   
     ![](Images/20.png)
   - Shapes are the same
 - So why we use these architectures, why don't we change them, how we know they will work, why not add another gate, why not use the simpler GRU instead of the full GRU; well researchers has experimented over years all the various types of these architectures with many many different versions and also addressing the vanishing gradient problem. They have found that full GRUs are one of the best RNN architectures  to be used for many different problems. You can make your design but put in mind that GRUs and LSTMs are standards.
 
 ### Long Short Term Memory (LSTM)
-
 - LSTM - the other type of RNN that can enable you to account for long-term dependencies. It's more powerful and general than GRU.
 - In LSTM , C<sup>\<t></sup> != a<sup>\<t></sup>
 - Here are the equations of an LSTM unit:   
@@ -373,7 +370,6 @@ Here are the course summary as its given on the course [link](https://www.course
 - The disadvantage of BiRNNs that you need the entire sequence before you can process it. For example, in live speech recognition if you use BiRNNs you will need to wait for the person who speaks to stop to take the entire sequence and then make your predictions.
 
 ### Deep RNNs
-
 - In a lot of cases the standard one layer RNNs will solve your problem. But in some problems its useful to stack some RNN layers to make a deeper network.
 - For example, a deep RNN with 3 layers would look like this:  
   ![](Images/25.png)
@@ -382,7 +378,6 @@ Here are the course summary as its given on the course [link](https://www.course
 
 
 ### Back propagation with RNNs
-
 - > In modern deep learning frameworks, you only have to implement the forward pass, and the framework takes care of the backward pass, so most deep learning engineers do not need to bother with the details of the backward pass. If however you are an expert in calculus and want to see the details of backprop in RNNs, you can work through this optional portion of the notebook.
 
 - The quote is taken from this [notebook](https://www.coursera.org/learn/nlp-sequence-models/notebook/X20PE/building-a-recurrent-neural-network-step-by-step). If you want the details of the back propagation with programming notes look at the linked notebook.