exams

LycrusHamster · Mar 18, 2015 · f4336e0 · f4336e0
1 parent 93701d8
commit f4336e0
Show file tree

Hide file tree

Showing 7 changed files with 226 additions and 52 deletions.
diff --git a/exams/exams.html b/exams/exams.html
@@ -152,9 +152,6 @@ <h2>Remus</h2>
 <li>primary resumes and replies to client </li>
 </ul>
 
-<p><strong>TODO:</strong> Harp, understand how primary forwards ops to backups and/or witnesses.
-what happens when some of them fail, etc.</p>
-
 <h2>Flat data center storage</h2>
 
 <p>Blob ID + tract # are mapped to a tract entry. In FDS, there are <code>O(n^2)</code> tract entries. 3 servers per entry. All possible combinations. Why?</p>
@@ -240,6 +237,30 @@ <h2>Paxos</h2>
 
 <h2>Raft</h2>
 
+<p>See <a href="raft.png">specification here</a></p>
+
+<ul>
+<li>Agree on leader (majority needs to vote for leader)</li>
+<li>Leader tells everyone what the log entries are</li>
+<li><code>=&gt;</code> no dueling proposers</li>
+</ul>
+
+<p>A log is an array that maps an index to a command + term. First index is 1.
+Indices increase by one.</p>
+
+<p>commitIndex versus lastApplied? lastApplied keeps track of up to what point
+committed log entries have been applied to the FSM. When commitIndex exceeds
+lastApplied, it's time to apply some entries.</p>
+
+<p>AppendEntries is only called by leader? Only leader sends heartbeats? Yes. Yes.</p>
+
+<p>How do these guys keep synchronized clocks?</p>
+
+<p><strong>Most subtle Raft concept:</strong> If a log entry is replicated on a majority of
+servers, that <strong>does NOT</strong> mean it's committed. Only if "S replicates an entry 
+from <em>its current term</em> on a majority of the servers, then that entry and all
+entries before it are committed"</p>
+
 <h2>Go's memory model</h2>
 
 <p>The actual Go memory model is as follows:
@@ -269,6 +290,61 @@ <h2>Go's memory model</h2>
 
 <h2>Harp</h2>
 
+<ul>
+<li><code>2n+1</code> servers, <code>1</code> primary, <code>n</code> backups, <code>n</code> witnesses
+<ul>
+<li>need a majority of <code>n+1</code> servers <code>=&gt;</code></li>
+<li>tolerate up to <code>n</code> failures (see <a href="../l08-harp.html">notes</a> for why <code>2n+1</code>
+is required)</li>
+</ul></li>
+</ul>
+
+<p>Operation:</p>
+
+<ul>
+<li>primary gets NFS request</li>
+<li>primary forwards each request to all <code>n</code> backups</li>
+<li>after all backups reply, primary can execute op and apply to FS</li>
+<li>in a later request, primary piggybacks an ACK to tell backups the op committed</li>
+<li><strong>Note:</strong> witnesses do not ordinarily hear about ops or store state
+<ul>
+<li><code>b</code> failures out of <code>b+1</code> machines which do keep state <code>=&gt;</code> still have <code>1</code>
+machine w/ state</li>
+</ul></li>
+</ul>
+
+<p>Why have witnesses?</p>
+
+<ul>
+<li>Remember: need <code>2n+1</code> machines to break partitions
+<ul>
+<li>What machines do we use to break those partitions? The witnesses!</li>
+</ul></li>
+<li>If <code>m</code> backups are down, primary talks to <code>m</code> promoted witnesses to get a 
+majority for each op (as it would with <code>n</code> live backups)</li>
+<li>the witnesses record the ops in a log when they are promoted</li>
+</ul>
+
+<p>What happens on crash of a backup?</p>
+
+<ul>
+<li>If a backup crashes after it writes an op to disk, but before replying to
+primary <code>=&gt;</code> no way to (easily) tell if backup executed the op when it
+comes back up
+<ul>
+<li>This implies we need the ops in the log entries to be <em>side-effect free</em>
+so we can reapply them</li>
+</ul></li>
+</ul>
+
+<p>On a primary crash, when primary comes back up, witnesses are used to replay ops
+to it and bring it up to speed.</p>
+
+<p>If all backups crash, then can promote all witnesses and continue</p>
+
+<p><strong>TODO:</strong> Harp, understand how primary forwards ops to backups and/or witnesses.
+what happens when some of them fail, etc.</p>
+
 <h2>TreadMarks</h2>
 
 <p><strong>TODO:</strong> Write amplification vs. false sharing</p>
@@ -288,4 +364,6 @@ <h2>TreadMarks</h2>
 
 <h2>Ficus</h2>
 
+<h2>AnalogicFS</h2>
+
 <p><strong>TODO:</strong> The AnalogicFS paper, read it very carefully and understand it fully; it will definitely show up on the final.</p>
diff --git a/exams/exams.md b/exams/exams.md
@@ -102,9 +102,6 @@ Remus
  - backup tells primary it's done copying
  - primary resumes and replies to client 
 
-**TODO:** Harp, understand how primary forwards ops to backups and/or witnesses.
-what happens when some of them fail, etc.
-
 Flat data center storage
 ------------------------
 
@@ -166,6 +163,27 @@ no other different value can be re-chosen. Why?
 Raft
 ----
 
+See [specification here](raft.png)
+
+ - Agree on leader (majority needs to vote for leader)
+ - Leader tells everyone what the log entries are
+ - `=>` no dueling proposers
+
+A log is an array that maps an index to a command + term. First index is 1.
+Indices increase by one.
+
+commitIndex versus lastApplied? lastApplied keeps track of up to what point
+committed log entries have been applied to the FSM. When commitIndex exceeds
+lastApplied, it's time to apply some entries.
+
+AppendEntries is only called by leader? Only leader sends heartbeats? Yes. Yes.
+
+How do these guys keep synchronized clocks?
+
+**Most subtle Raft concept:** If a log entry is replicated on a majority of
+servers, that **does NOT** mean it's committed. Only if "S replicates an entry 
+from _its current term_ on a majority of the servers, then that entry and all
+entries before it are committed"
 
 Go's memory model
 -----------------
@@ -194,6 +212,45 @@ that ensure reads observe the desired writes.
 Harp
 -----
 
+ - `2n+1` servers, `1` primary, `n` backups, `n` witnesses
+   + need a majority of `n+1` servers `=>`
+   + tolerate up to `n` failures (see [notes](../l08-harp.html) for why `2n+1`
+     is required)
+
+Operation:
+
+ - primary gets NFS request
+ - primary forwards each request to all `n` backups
+ - after all backups reply, primary can execute op and apply to FS
+ - in a later request, primary piggybacks an ACK to tell backups the op committed
+ - **Note:** witnesses do not ordinarily hear about ops or store state
+   + `b` failures out of `b+1` machines which do keep state `=>` still have `1`
+     machine w/ state
+
+Why have witnesses?
+
+ - Remember: need `2n+1` machines to break partitions
+   + What machines do we use to break those partitions? The witnesses!
+ - If `m` backups are down, primary talks to `m` promoted witnesses to get a 
+   majority for each op (as it would with `n` live backups)
+ - the witnesses record the ops in a log when they are promoted
+
+What happens on crash of a backup?
+
+ - If a backup crashes after it writes an op to disk, but before replying to
+   primary `=>` no way to (easily) tell if backup executed the op when it
+   comes back up
+   + This implies we need the ops in the log entries to be _side-effect free_
+     so we can reapply them
+
+On a primary crash, when primary comes back up, witnesses are used to replay ops
+to it and bring it up to speed.
+
+If all backups crash, then can promote all witnesses and continue
+
+**TODO:** Harp, understand how primary forwards ops to backups and/or witnesses.
+what happens when some of them fail, etc.
+
 TreadMarks
 ----------
 
@@ -213,4 +270,7 @@ contribute to current write are also made visible?
 Ficus
 -----
 
+AnalogicFS
+----------
+
 **TODO:** The AnalogicFS paper, read it very carefully and understand it fully; it will definitely show up on the final.
diff --git a/exams/raft.png b/exams/raft.png
diff --git a/l06-raft.html b/l06-raft.html
@@ -286,7 +286,7 @@ <h3>What if old leader isn't aware a new one is elected?</h3>
 </ul></li>
 <li>S2 replies false (<code>AppendEntries</code> step 2)</li>
 <li>S3 decrements <code>nextIndex[S2]</code></li>
-<li>S3 sends <code>AppendEntries</code> for the term=5 op, saying prev has term=3</li>
+<li>S3 sends <code>AppendEntries</code> for the op w/ term=5, saying prev has term=3</li>
 <li>S2 deletes op from term 4 (<code>AppendEntries</code> step 3) and replaces with op for term 5 from S3
 (and S1 rejects b/c it doesn't have anything in that entry)
 <ul>
@@ -393,8 +393,8 @@ <h3>What if old leader isn't aware a new one is elected?</h3>
 <p>Figure 8:</p>
 
 <pre><code>S1 1, L 2,    ,      L 4,
-S2 1,   2,    ,      \A/
-S3 1,    ,    , 2   &lt;-| ,
+S2 1,   2,    ,      \A/,
+S3 1,   &lt;-------- 2 &lt;-| ,
 S4 1,    ,    ,         ,
 S5 1,    , L 3,         , L will erase all 2's
 </code></pre>
@@ -408,7 +408,7 @@ <h3>What if old leader isn't aware a new one is elected?</h3>
 <ul>
 <li>S1 was leader in term 2, sends out two copies of 2</li>
 <li>S5 leader in term 3</li>
-<li>S1 in term 4, sends one more copy of 2 (b/c S3 rejected op 4)</li>
+<li>S1 leader in term 4, sends one more copy of 2 (b/c S3 rejected op 4)</li>
 <li>what if S5 now becomes leader?
 <ul>
 <li>S5 can get a majority (w/o S1)</li>

diff --git a/l06-raft.md b/l06-raft.md
@@ -204,7 +204,7 @@ Roll-back is a big hammer -- forces leader's log on everyone
      + `AppendEntries` says previous entry must have term 5
    + S2 replies false (`AppendEntries` step 2)
    + S3 decrements `nextIndex[S2]`
-   + S3 sends `AppendEntries` for the term=5 op, saying prev has term=3
+   + S3 sends `AppendEntries` for the op w/ term=5, saying prev has term=3
    + S2 deletes op from term 4 (`AppendEntries` step 3) and replaces with op for term 5 from S3
      (and S1 rejects b/c it doesn't have anything in that entry)
      + S2 sets op for term 6 as well
@@ -278,8 +278,8 @@ The most subtle thing about Raft (figure 8)
 Figure 8:
 
     S1 1, L 2,    ,      L 4,
-    S2 1,   2,    ,      \A/
-    S3 1,    ,    , 2   <-| ,
+    S2 1,   2,    ,      \A/,
+    S3 1,   <-------- 2 <-| ,
     S4 1,    ,    ,         ,
     S5 1,    , L 3,         , L will erase all 2's
 
@@ -288,7 +288,7 @@ Figure 8:
  - Figure 8:
    + S1 was leader in term 2, sends out two copies of 2
    + S5 leader in term 3
-   + S1 in term 4, sends one more copy of 2 (b/c S3 rejected op 4)
+   + S1 leader in term 4, sends one more copy of 2 (b/c S3 rejected op 4)
    + what if S5 now becomes leader?
      - S5 can get a majority (w/o S1)
      - S5 will roll back 2 and replace it with 3

diff --git a/l08-harp.html b/l08-harp.html
@@ -110,11 +110,11 @@ <h4>Why are <code>2b+1</code> servers necessary to tolerate <code>b</code> failu
 <ul>
 <li>(This is review...)</li>
 <li>Suppose we have <code>N</code> servers, and execute a write.</li>
-<li>Can't wait for more than N-b, since b might be dead.</li>
-<li>So let's require waiting for N-b for each operation.</li>
-<li>The b we didn't wait for might be live and in another partition.</li>
-<li>We can prevent them from proceeding if <code>b &lt; N-b</code>.</li>
-<li>I.e. <code>2b &lt; N; N = 2b + 1</code> is enough.</li>
+<li>Can't wait for more than <code>N-b</code>, since <code>b</code> might be dead.</li>
+<li>So let's require waiting for <code>N-b</code> for each operation.</li>
+<li>The <code>b</code> we didn't wait for might be live and in another partition.</li>
+<li>We can prevent them from proceeding if <code>N-b &gt; b</code>.</li>
+<li>I.e. <code>N &gt; 2b =&gt; N = 2b + 1</code> is enough.</li>
 </ul>
 
 <h4>What are Harp's witnesses?</h4>
@@ -130,11 +130,12 @@ <h4>What are Harp's witnesses?</h4>
 <li>witness acts as a tie breaker: whoever can talk to it wins and gets to
 act as a primary</li>
 </ul></li>
-<li>a second use of the witness is to record operations, once say it's part of
-the partition <code>B, W</code> so that a majority of nodes have the latest operations</li>
+<li>a second use of the witness is to record operations</li>
+<li>once a witness is part of the partition <code>B, W</code>, it records operations so 
+that a majority of nodes have the latest operations</li>
 <li>a final function of the witness is that when the primary comes back to life,
 the witness has been logging every single operation issued since primary
-disappeared, so witness can reply every op to primary and will be up to
+disappeared, so witness can replay every op to primary and primary will be up to
 date w.r.t. all the operations executed
 <ul>
 <li>efficiently bring primary up to speed</li>
@@ -187,7 +188,7 @@ <h4>Does primary need to send operations to witnesses?</h4>
 </ul></li>
 <li>Thus each "promoted" witness keeps a log.</li>
 </ul></li>
-<li>So in a 2b+1 system, a view always has b+1 servers that the primary
+<li>So in a <code>2b+1</code> system, a view always has <code>b+1</code> servers that the primary
 must contact for each op, and that store each op.</li>
 </ul>
 
@@ -480,18 +481,43 @@ <h4>How does failure recovery work?</h4>
 
 <pre><code>  S1+S2+S3; then S1 crashes
   S2 is primary in new view (and S4 is promoted)
-  Will S2 have every committed operation?
-  Will S2 have every operation S1 received?
-  Will S2's log tail be the same as S3's log tail?
-  How far back can S2 and S3 log tail differ?
-  How to cause S2 and S3's log to be the same?
-    Must commit ops that appeared in both S2+S3 logs
-    What about ops that appear in only one log?
-      In this scenario, can discard since could not have committed
-      But in general committed op might be visible in just one log
-  From what point does promoted witness have to start keeping a log?
 </code></pre>
 
+<ul>
+<li>Will S2 have every committed operation?
+<ul>
+<li>Yes.</li>
+</ul></li>
+<li>Will S2 have every operation S1 received?
+<ul>
+<li>No. No, maybe op didn't reach S2 from S1 and then S1 crashed.</li>
+</ul></li>
+<li>Will S2's log tail be the same as S3's log tail?
+<ul>
+<li>Not necessarily. 
+<ul>
+<li>Maybe op reached S2 but not S3 and then S1 crashed.</li>
+<li>Maybe op reached S2, and S3 crashed, so S4 was promoted. Then S3
+came back up?</li>
+</ul></li>
+</ul></li>
+<li>How far back can S2 and S3 log tail differ?
+<ul>
+<li>Not up to the CP, because committed ops could be committed w/ help
+of promoted witness <code>=&gt;</code> backup logs differ</li>
+</ul></li>
+<li>How to cause S2 and S3's log to be the same?
+<ul>
+<li>Must commit ops that appeared in both S2+S3 logs</li>
+<li>What about ops that appear in only one log?
+<ul>
+<li>In this scenario, can discard since could not have committed</li>
+<li>But in general committed op might be visible in just one log</li>
+</ul></li>
+</ul></li>
+<li>From what point does promoted witness have to start keeping a log?</li>
+</ul>
+
 <h4>What if S1 crashed just before replying to a client?</h4>
 
 <ul>