Skip to content

Commit

Permalink
exams
Browse files Browse the repository at this point in the history
  • Loading branch information
alinush committed Mar 18, 2015
1 parent 93701d8 commit f4336e0
Show file tree
Hide file tree
Showing 7 changed files with 226 additions and 52 deletions.
84 changes: 81 additions & 3 deletions exams/exams.html
Original file line number Diff line number Diff line change
Expand Up @@ -152,9 +152,6 @@ <h2>Remus</h2>
<li>primary resumes and replies to client </li>
</ul>

<p><strong>TODO:</strong> Harp, understand how primary forwards ops to backups and/or witnesses.
what happens when some of them fail, etc.</p>

<h2>Flat data center storage</h2>

<p>Blob ID + tract # are mapped to a tract entry. In FDS, there are <code>O(n^2)</code> tract entries. 3 servers per entry. All possible combinations. Why?</p>
Expand Down Expand Up @@ -240,6 +237,30 @@ <h2>Paxos</h2>

<h2>Raft</h2>

<p>See <a href="raft.png">specification here</a></p>

<ul>
<li>Agree on leader (majority needs to vote for leader)</li>
<li>Leader tells everyone what the log entries are</li>
<li><code>=&gt;</code> no dueling proposers</li>
</ul>

<p>A log is an array that maps an index to a command + term. First index is 1.
Indices increase by one.</p>

<p>commitIndex versus lastApplied? lastApplied keeps track of up to what point
committed log entries have been applied to the FSM. When commitIndex exceeds
lastApplied, it's time to apply some entries.</p>

<p>AppendEntries is only called by leader? Only leader sends heartbeats? Yes. Yes.</p>

<p>How do these guys keep synchronized clocks?</p>

<p><strong>Most subtle Raft concept:</strong> If a log entry is replicated on a majority of
servers, that <strong>does NOT</strong> mean it's committed. Only if "S replicates an entry
from <em>its current term</em> on a majority of the servers, then that entry and all
entries before it are committed"</p>

<h2>Go's memory model</h2>

<p>The actual Go memory model is as follows:
Expand Down Expand Up @@ -269,6 +290,61 @@ <h2>Go's memory model</h2>

<h2>Harp</h2>

<ul>
<li><code>2n+1</code> servers, <code>1</code> primary, <code>n</code> backups, <code>n</code> witnesses
<ul>
<li>need a majority of <code>n+1</code> servers <code>=&gt;</code></li>
<li>tolerate up to <code>n</code> failures (see <a href="../l08-harp.html">notes</a> for why <code>2n+1</code>
is required)</li>
</ul></li>
</ul>

<p>Operation:</p>

<ul>
<li>primary gets NFS request</li>
<li>primary forwards each request to all <code>n</code> backups</li>
<li>after all backups reply, primary can execute op and apply to FS</li>
<li>in a later request, primary piggybacks an ACK to tell backups the op committed</li>
<li><strong>Note:</strong> witnesses do not ordinarily hear about ops or store state
<ul>
<li><code>b</code> failures out of <code>b+1</code> machines which do keep state <code>=&gt;</code> still have <code>1</code>
machine w/ state</li>
</ul></li>
</ul>

<p>Why have witnesses?</p>

<ul>
<li>Remember: need <code>2n+1</code> machines to break partitions
<ul>
<li>What machines do we use to break those partitions? The witnesses!</li>
</ul></li>
<li>If <code>m</code> backups are down, primary talks to <code>m</code> promoted witnesses to get a
majority for each op (as it would with <code>n</code> live backups)</li>
<li>the witnesses record the ops in a log when they are promoted</li>
</ul>

<p>What happens on crash of a backup?</p>

<ul>
<li>If a backup crashes after it writes an op to disk, but before replying to
primary <code>=&gt;</code> no way to (easily) tell if backup executed the op when it
comes back up
<ul>
<li>This implies we need the ops in the log entries to be <em>side-effect free</em>
so we can reapply them</li>
</ul></li>
</ul>

<p>On a primary crash, when primary comes back up, witnesses are used to replay ops
to it and bring it up to speed.</p>

<p>If all backups crash, then can promote all witnesses and continue</p>

<p><strong>TODO:</strong> Harp, understand how primary forwards ops to backups and/or witnesses.
what happens when some of them fail, etc.</p>

<h2>TreadMarks</h2>

<p><strong>TODO:</strong> Write amplification vs. false sharing</p>
Expand All @@ -288,4 +364,6 @@ <h2>TreadMarks</h2>

<h2>Ficus</h2>

<h2>AnalogicFS</h2>

<p><strong>TODO:</strong> The AnalogicFS paper, read it very carefully and understand it fully; it will definitely show up on the final.</p>
66 changes: 63 additions & 3 deletions exams/exams.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,9 +102,6 @@ Remus
- backup tells primary it's done copying
- primary resumes and replies to client

**TODO:** Harp, understand how primary forwards ops to backups and/or witnesses.
what happens when some of them fail, etc.

Flat data center storage
------------------------

Expand Down Expand Up @@ -166,6 +163,27 @@ no other different value can be re-chosen. Why?
Raft
----

See [specification here](raft.png)

- Agree on leader (majority needs to vote for leader)
- Leader tells everyone what the log entries are
- `=>` no dueling proposers

A log is an array that maps an index to a command + term. First index is 1.
Indices increase by one.

commitIndex versus lastApplied? lastApplied keeps track of up to what point
committed log entries have been applied to the FSM. When commitIndex exceeds
lastApplied, it's time to apply some entries.

AppendEntries is only called by leader? Only leader sends heartbeats? Yes. Yes.

How do these guys keep synchronized clocks?

**Most subtle Raft concept:** If a log entry is replicated on a majority of
servers, that **does NOT** mean it's committed. Only if "S replicates an entry
from _its current term_ on a majority of the servers, then that entry and all
entries before it are committed"

Go's memory model
-----------------
Expand Down Expand Up @@ -194,6 +212,45 @@ that ensure reads observe the desired writes.
Harp
-----

- `2n+1` servers, `1` primary, `n` backups, `n` witnesses
+ need a majority of `n+1` servers `=>`
+ tolerate up to `n` failures (see [notes](../l08-harp.html) for why `2n+1`
is required)

Operation:

- primary gets NFS request
- primary forwards each request to all `n` backups
- after all backups reply, primary can execute op and apply to FS
- in a later request, primary piggybacks an ACK to tell backups the op committed
- **Note:** witnesses do not ordinarily hear about ops or store state
+ `b` failures out of `b+1` machines which do keep state `=>` still have `1`
machine w/ state

Why have witnesses?

- Remember: need `2n+1` machines to break partitions
+ What machines do we use to break those partitions? The witnesses!
- If `m` backups are down, primary talks to `m` promoted witnesses to get a
majority for each op (as it would with `n` live backups)
- the witnesses record the ops in a log when they are promoted

What happens on crash of a backup?

- If a backup crashes after it writes an op to disk, but before replying to
primary `=>` no way to (easily) tell if backup executed the op when it
comes back up
+ This implies we need the ops in the log entries to be _side-effect free_
so we can reapply them

On a primary crash, when primary comes back up, witnesses are used to replay ops
to it and bring it up to speed.

If all backups crash, then can promote all witnesses and continue

**TODO:** Harp, understand how primary forwards ops to backups and/or witnesses.
what happens when some of them fail, etc.

TreadMarks
----------

Expand All @@ -213,4 +270,7 @@ contribute to current write are also made visible?
Ficus
-----

AnalogicFS
----------

**TODO:** The AnalogicFS paper, read it very carefully and understand it fully; it will definitely show up on the final.
Binary file added exams/raft.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 4 additions & 4 deletions l06-raft.html
Original file line number Diff line number Diff line change
Expand Up @@ -286,7 +286,7 @@ <h3>What if old leader isn't aware a new one is elected?</h3>
</ul></li>
<li>S2 replies false (<code>AppendEntries</code> step 2)</li>
<li>S3 decrements <code>nextIndex[S2]</code></li>
<li>S3 sends <code>AppendEntries</code> for the term=5 op, saying prev has term=3</li>
<li>S3 sends <code>AppendEntries</code> for the op w/ term=5, saying prev has term=3</li>
<li>S2 deletes op from term 4 (<code>AppendEntries</code> step 3) and replaces with op for term 5 from S3
(and S1 rejects b/c it doesn't have anything in that entry)
<ul>
Expand Down Expand Up @@ -393,8 +393,8 @@ <h3>What if old leader isn't aware a new one is elected?</h3>
<p>Figure 8:</p>

<pre><code>S1 1, L 2, , L 4,
S2 1, 2, , \A/
S3 1, , , 2 &lt;-| ,
S2 1, 2, , \A/,
S3 1, &lt;-------- 2 &lt;-| ,
S4 1, , , ,
S5 1, , L 3, , L will erase all 2's
</code></pre>
Expand All @@ -408,7 +408,7 @@ <h3>What if old leader isn't aware a new one is elected?</h3>
<ul>
<li>S1 was leader in term 2, sends out two copies of 2</li>
<li>S5 leader in term 3</li>
<li>S1 in term 4, sends one more copy of 2 (b/c S3 rejected op 4)</li>
<li>S1 leader in term 4, sends one more copy of 2 (b/c S3 rejected op 4)</li>
<li>what if S5 now becomes leader?
<ul>
<li>S5 can get a majority (w/o S1)</li>
Expand Down
8 changes: 4 additions & 4 deletions l06-raft.md
Original file line number Diff line number Diff line change
Expand Up @@ -204,7 +204,7 @@ Roll-back is a big hammer -- forces leader's log on everyone
+ `AppendEntries` says previous entry must have term 5
+ S2 replies false (`AppendEntries` step 2)
+ S3 decrements `nextIndex[S2]`
+ S3 sends `AppendEntries` for the term=5 op, saying prev has term=3
+ S3 sends `AppendEntries` for the op w/ term=5, saying prev has term=3
+ S2 deletes op from term 4 (`AppendEntries` step 3) and replaces with op for term 5 from S3
(and S1 rejects b/c it doesn't have anything in that entry)
+ S2 sets op for term 6 as well
Expand Down Expand Up @@ -278,8 +278,8 @@ The most subtle thing about Raft (figure 8)
Figure 8:

S1 1, L 2, , L 4,
S2 1, 2, , \A/
S3 1, , , 2 <-| ,
S2 1, 2, , \A/,
S3 1, <-------- 2 <-| ,
S4 1, , , ,
S5 1, , L 3, , L will erase all 2's

Expand All @@ -288,7 +288,7 @@ Figure 8:
- Figure 8:
+ S1 was leader in term 2, sends out two copies of 2
+ S5 leader in term 3
+ S1 in term 4, sends one more copy of 2 (b/c S3 rejected op 4)
+ S1 leader in term 4, sends one more copy of 2 (b/c S3 rejected op 4)
+ what if S5 now becomes leader?
- S5 can get a majority (w/o S1)
- S5 will roll back 2 and replace it with 3
Expand Down
64 changes: 45 additions & 19 deletions l08-harp.html
Original file line number Diff line number Diff line change
Expand Up @@ -110,11 +110,11 @@ <h4>Why are <code>2b+1</code> servers necessary to tolerate <code>b</code> failu
<ul>
<li>(This is review...)</li>
<li>Suppose we have <code>N</code> servers, and execute a write.</li>
<li>Can't wait for more than N-b, since b might be dead.</li>
<li>So let's require waiting for N-b for each operation.</li>
<li>The b we didn't wait for might be live and in another partition.</li>
<li>We can prevent them from proceeding if <code>b &lt; N-b</code>.</li>
<li>I.e. <code>2b &lt; N; N = 2b + 1</code> is enough.</li>
<li>Can't wait for more than <code>N-b</code>, since <code>b</code> might be dead.</li>
<li>So let's require waiting for <code>N-b</code> for each operation.</li>
<li>The <code>b</code> we didn't wait for might be live and in another partition.</li>
<li>We can prevent them from proceeding if <code>N-b &gt; b</code>.</li>
<li>I.e. <code>N &gt; 2b =&gt; N = 2b + 1</code> is enough.</li>
</ul>

<h4>What are Harp's witnesses?</h4>
Expand All @@ -130,11 +130,12 @@ <h4>What are Harp's witnesses?</h4>
<li>witness acts as a tie breaker: whoever can talk to it wins and gets to
act as a primary</li>
</ul></li>
<li>a second use of the witness is to record operations, once say it's part of
the partition <code>B, W</code> so that a majority of nodes have the latest operations</li>
<li>a second use of the witness is to record operations</li>
<li>once a witness is part of the partition <code>B, W</code>, it records operations so
that a majority of nodes have the latest operations</li>
<li>a final function of the witness is that when the primary comes back to life,
the witness has been logging every single operation issued since primary
disappeared, so witness can reply every op to primary and will be up to
disappeared, so witness can replay every op to primary and primary will be up to
date w.r.t. all the operations executed
<ul>
<li>efficiently bring primary up to speed</li>
Expand Down Expand Up @@ -187,7 +188,7 @@ <h4>Does primary need to send operations to witnesses?</h4>
</ul></li>
<li>Thus each "promoted" witness keeps a log.</li>
</ul></li>
<li>So in a 2b+1 system, a view always has b+1 servers that the primary
<li>So in a <code>2b+1</code> system, a view always has <code>b+1</code> servers that the primary
must contact for each op, and that store each op.</li>
</ul>

Expand Down Expand Up @@ -480,18 +481,43 @@ <h4>How does failure recovery work?</h4>

<pre><code> S1+S2+S3; then S1 crashes
S2 is primary in new view (and S4 is promoted)
Will S2 have every committed operation?
Will S2 have every operation S1 received?
Will S2's log tail be the same as S3's log tail?
How far back can S2 and S3 log tail differ?
How to cause S2 and S3's log to be the same?
Must commit ops that appeared in both S2+S3 logs
What about ops that appear in only one log?
In this scenario, can discard since could not have committed
But in general committed op might be visible in just one log
From what point does promoted witness have to start keeping a log?
</code></pre>

<ul>
<li>Will S2 have every committed operation?
<ul>
<li>Yes.</li>
</ul></li>
<li>Will S2 have every operation S1 received?
<ul>
<li>No. No, maybe op didn't reach S2 from S1 and then S1 crashed.</li>
</ul></li>
<li>Will S2's log tail be the same as S3's log tail?
<ul>
<li>Not necessarily.
<ul>
<li>Maybe op reached S2 but not S3 and then S1 crashed.</li>
<li>Maybe op reached S2, and S3 crashed, so S4 was promoted. Then S3
came back up?</li>
</ul></li>
</ul></li>
<li>How far back can S2 and S3 log tail differ?
<ul>
<li>Not up to the CP, because committed ops could be committed w/ help
of promoted witness <code>=&gt;</code> backup logs differ</li>
</ul></li>
<li>How to cause S2 and S3's log to be the same?
<ul>
<li>Must commit ops that appeared in both S2+S3 logs</li>
<li>What about ops that appear in only one log?
<ul>
<li>In this scenario, can discard since could not have committed</li>
<li>But in general committed op might be visible in just one log</li>
</ul></li>
</ul></li>
<li>From what point does promoted witness have to start keeping a log?</li>
</ul>

<h4>What if S1 crashed just before replying to a client?</h4>

<ul>
Expand Down
Loading

0 comments on commit f4336e0

Please sign in to comment.