-
Notifications
You must be signed in to change notification settings - Fork 14
Home
-
Why can deadline-ordered multicast (DOM) bring acceleration to consensus protocols?
Message reordering is the performance killer to the consensus protocols, which has been observed by prior works (e.g., SpecPaxos) and also our measurement study. As a result, DOM is designed to reduce the message reordering to create the favorable conditions for consensus.
-
Why can Nezha outperform the baselines so much? Where do the wins come from?
We have incorporated multiple pillars in Nezha's design to achive the high performance. Sec 8.4 of our paper conducts the ablation study to quantify the performance gains from the incorporated strategies.
-
In your evaluation (Figure 6), why does NOPaxos' performance drop so much compared with the previously published number in the NOPaxos paper?
Because our testbed environment is less favorable than the testbed used in NOPaxos paper. Specifically, the NOPaxos paper tests the protocols in on-prem clusters with more powerful CPUs, and their testbed incurs very few message reordering (as confirmed with the NOPaxos author), that is why NOPaxos achieves much higher performance (~240 K reqs/sec) in the original paper. But when it comes to the public cloud with virtual CPUs and no control over network transmission, NOPaxos suffers from serious degradation even after we have conducted some threading optimization. Besides, compared with the closed-loop tests used in NOPaxos paper, NOPaxos suffers more overheads in the open-loop tests, because open-loop clents creates more packet reordering which triggers the gap agreement in NOPaxos more frequently. We explained the details in Sec 8.2 of our paper.
-
Why do you conduct both closed-loop tests and open-loop tests. What is the difference between them?
Closed-loop tests are considered as the traditional method to evaluate consensus protocols and are applied by most prior works. But recently, open-loop tests gain more attention when some researchers begin to revisit the consensus protocols (e.g., EPaxos Revisited). We have explained the two methods in Sec 8.1 (Evaluation method paragraph). Open-loop tests are considered more realistic. Some consensus protocols, which have performed well in closed-loop tests, may show performance degradation in open-loop tests (e.g., NOPaxos and Fast Paxos).
Besides, while consensus protocols are evaluated in WAN setting, the latency can be very large (hundreds of milliseconds). Therefore, it costs a large number of closed-loop clients to saturate the throughput of the protocols. Compared with the closed-loop tests, open-loop tests, which only need to maintain a small number of clients, become more economic.
-
How does performance change/degrade as clock bounds loosen?
When clocks become bad, the throughput performance will not be affected thanks to our proxy design, but latency will become worse. In our technical report(Appendix D), we have analyzed and quantified the latency performance change as the clock accuracy degrades.
-
When does the replica modify the deadlines of the incoming requests?
On the leader replica, when the incoming request's deadline is smaller than the last released request, then the leader will modify its deadline to be a larger value in order to satisfy the consistent ordering requirement (refer to Step 3 in Figure 5).
On the follower replicas, they will also modify the deadlines of some requests if they find their log-lists are inconsistent with the leader's log-list after receiving leader's log-modification messages.
-
What happens when some replicas receive the request in their early buffer and the others receive it in their late buffer. Is such a scenario not possible? If possible, how will consensus be achieved?
It is possible but consensus can still be achieved. Specifically, there are two cases.
(1) If the leader puts it in early-buffer, the leader will make those followers, who have put the request into late-buffers, to pick it back from their late-buffers. These followers go through Step 9 in the slow path figure (Figure 5). Followers who have accepted the request in their early-buffers do not go through Step 9.
(2) If the leader puts the request into the late-buffer, the leader first modifies the deadline of the request to be larger than the last released request, and then puts it into the early-buffer. Afterwards, the log-modification messages are broadcast to the followers, which perform one of two actions: First, followers which put the request into early-buffer need to modify their deadline to keep consistent with the leader. Second, followers which put the request into late-buffer need to go through Step 9 to fetch the request first (or, rarely, fetch from the other replicas if the request is missing in their late-buffer due to packet drop), and then modify the deadline according to the log-modification message.
-
What if the leader does not receive the request, will the request still be committed later?
If the leader does not receive the request, then this request cannot be committed becasue our quorum check requires including the leader's reply. In that case, the client/proxy will timeout and retry the request with a new deadline. The retry is repeated until the leader can receive the request, so that the request can be committed finally. Nezha maintains at-most-once semantics at replicas to properly handle duplicate requests (see Sec 5.5).
-
When a replica crashes, a relaunched replica may have different IPs, how does the proxy know about the new replica's IPs when it is doing DOM?
This is essentially a reconfiguration problem, which can be handled by the traditional approach like Raft/Paxos, or the newly developped protocol (e.g., Matchmaker). Here we explain it with the traditinal reconfiguration approach: When the newly launched replica wants to join the system, it needs to contact a quorum of the healthy replicas to identify the current leader. Then it communicates with the current leader to fetch the correct states and also notify the leader of its information (e.g., the IP address and the port). The leader, after updating the maintained replicas' IP addresses and ports, broadcast to every healthy replica. As a result, when the proxy/client conducts DOM, any healthy replica can include the fresh IP information of replicas in the reply, so that the proxy/client can know which replicas to receive the requests in the follow-up multicast operations.
-
Are there other domains where DOM optimizations could be applied?
DOM can also be applied to concurrency control protocols and Byzantine-Fault-Tolerant consensus. These projects are already in progress.
-
Will there be some industrial systems benefit from Nezha?
Theorectically, any distributed systems which are now running on legacy consensus protocols (e.g., Raft and Paxos) can enjoy performance gains by switching to Nezha. Now we are trying to integrate Nezha into some industrial systems, including the financial exchange system and the Kubernettes. Expectedly, we can see some of the promising results in the near future.