full rewriting of Multipath TCP part

KaushikChavali · Aug 5, 2020 · dfd32c0 · dfd32c0
1 parent ab64f24
commit dfd32c0
Show file tree

Hide file tree

Showing 11 changed files with 103 additions and 82 deletions.
diff --git a/README.md b/README.md
@@ -11,20 +11,44 @@ To benefit from the hands-on, you need recent versions of the following software
 * [VirtualBox](https://www.virtualbox.org/wiki/Downloads)
 * [Wireshark](https://www.wireshark.org/download.html) (to be able to analyze Multipath TCP packet traces)
 
-## VM Setup
+> The remaining of this hands-on assumes that your host is running a Linux-based system.
+> However, the commands to run on your local machine are only limited to interactions with vagrant.
 
-Just run the following commands
+To setup the vagrant box, simply `cd` to this folder and run the following commands on your host
 ```bash
-$ vagrant up
+# The first `vagrant up` invocation fetches the vagrant box and runs the provision script.
+# It is likely that this takes some time, so launch this command ASAP!
+# The following `vagrant reload` command is required to restart the VM with the Multipath TCP kernel.
+$ vagrant up; vagrant reload
+# Now that your VM is ready, let's SSH it!
 $ vagrant ssh
 ```
-and you will be connected to the VM.
+Once done, you should be connected to the VM.
+To check that your VM's setup is correct, let's run the following commands inside the VM
+```bash
+$ cd ~; ls
+# iproute-mptcp  mininet  minitopo  oflops  oftest  openflow  picotls  pox  pquic
+$ uname -a
+# Linux ubuntu-bionic 4.14.146.mptcp #17 SMP Tue Sep 24 12:55:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
+```
+
+> Starting from now, we assume that otherwise stated, all commands are run inside the vagrant box. 
+
+The `tutorial_files` folder is shared with the vagrant box, such as the VM can access to this folder containing the experiment files through the `/tutorial` folder.
+The network experiments that we will perform in the remaining of this tutorial rely on [minitopo](https://github.com/qdeconinck/minitopo/tree/minitopo2) which itself is a wrapper of [Mininet](http://mininet.org/).
+For the sake of simplicity, we will rely on a bash alias called `mprun` (which is defined in `/etc/bash.bashrc`).
+Typically, you just need to go to the right folder and run `mprun -t topo_file -x xp_file` where `topo_file` is the file containing the description of a network scenario and `xp_file` the one with the description of the experiment to perform.
+If you are interested in reproducing the setup in another environment, or if you want to understand the provided "black-box", feel free to have a look at the `prepare_vm.sh` provision script.
+
 
-Inside the VM, you have access to the folder containing the experiment files through the `/tutorial` folder.
-To run the experiments, we rely on a bash alias called `mprun` (defined in `/etc/bash.bashrc`).
-Typically, you just need to go to the right folder and run `mprun -t topo_file -x xp_file` where `topo_file` contains the description of a network scenario and `xp_file` the description of the experiment to perform.
+## Organization
+
+The remaining of this document is split into 6 sections.
+The first five ones focus on Multipath TCP and the experimentation of various scenarios with different provided algorithms (packet scheduler, path manager, congestion control).
+The last one is dedicated to Multipath QUIC, with a small coding part.
+Although this document was written to perform experiments in order, feel free to directly jump to the section(s) of your interest.
+In case of troubles, do not hesitate to contact us on the [dedicated Slack channel](https://app.slack.com/client/T0107RGGMU6/C0186E2K69W) (during the SIGCOMM event) or open a GitHub issue.
 
-For the remaining of this tutorial, we recommand installing `wireshark` to analyze the PCAP packet traces that will be generated each time an experiment is performed.
 
 ## 1. Observing the Bandwidth Aggregation when Using Multiple Paths
 
@@ -40,32 +64,35 @@ Client                                Router --------- Server
 This scenario is described in the file `01_multipath/topo`.
 With this network, we will compare two `iperf` runs.
 The first consists in a regular TCP transfer between the client and the server.
-To perform this experiment, `ssh` into the vagrant VM using (if not done yet)
-```bash
-$ vagrant ssh
-```
-And then type the following
+To perform this experiment, `ssh` into the vagrant VM and then type the following commands
 ```bash
 $ cd /tutorial/01_multipath
 $ mprun -t topo -x xp_tcp
 ```
 The run will take about 25 seconds.
-When done, you can check (either on the VM or on your host machine) the content of `ìperf.log` using
+When done, you can check (either on the VM or on your host machine) the content of `server.log` using
 ```bash
-$ cat iperf.log
+$ cat server.log
 ```
-You should notice that the goodput achieved by `ìperf` should be about 19-20 Mbps, which is expected since only one of the 20 Mbps network path is used.
+You should notice that the overall goodput achieved by `ìperf` should be about 19-20 Mbps, which is expected since only one of the 20 Mbps network path is used.
 The run should also provide you two pcap files, one from the client's perspective (`client.pcap`) and the other from the server's one (`server.pcap`).
 
+> There is also an `iperf.log` file that shows the bandwidth estimation from the sender's side.
+
 Then, we will consider the same experiment, but running now Multipath TCP instead of plain TCP.
-For this, in the vagrant VM, just type the following command in the VM.
+For this, in the vagrant VM, just type the following command
 ```bash
 $ mprun -t topo -x xp_mptcp
 ```
-A quick inspection of the `iperf.log` file should indicate a goodput twice larger than with plain TCP.
-This confirms that Multipath TCP can take advantage of multiple network paths (in this case, two) while TCP cannot.
+A quick inspection of the `server.log` file should indicate a goodput twice larger than with plain TCP.
+This confirms that Multipath TCP can take advantage of multiple network paths (in this case, two) while plain TCP cannot.
 You can also have a look at the pcap files to observe the usage of "Multipath TCP" TCP options.
 
+> A careful look at the `xp_mptcp` file shows that in the Multipath TCP experiment, we force the receiving and the sending windows to 8 MB.
+> This is to limit the variability of the results introduced by the receive buffer autotuning of the Linux kernel.
+> However, and even with TCP, it is likely that you will observe some variability between your runs.
+> Unfortunately, this is a shortcoming of the emulation...
+
 
 ## 2. Impact of the Selection of the Path
 
@@ -79,6 +106,7 @@ The two most basic packets schedulers are the following.
 The packet scheduler is also responsible of the content of the data to be sent.
 Yet, due to implementation constraints, most of the proposed packet schedulers in the litterature focus on the first data to be sent (i.e., they only select the path where to send the next data).
 With such strategy, the scheduler has only impactful choices when several network paths are available for data transmission.
+Notice that cleverer packet schedulers, such as [BLEST](https://ieeexplore.ieee.org/abstract/document/7497206) or [ECF](https://dl.acm.org/doi/abs/10.1145/3143361.3143376) can delay the transmission of data on slow paths to achieve lower transfer times. 
 
 
 ### Case 1: request/response traffic from client perspective
@@ -89,13 +117,13 @@ Client                                Router --------- Server
    |-------- 100 Mbps, 80 ms RTT --------|
 ```
 
-Let's consider a simple traffic where the client sends a request (of size inferior to an initial congestion window) and the server replies to it.
+Let's consider a simple traffic where the client sends requests every 250 ms (of 10 KB, a size inferior to an initial congestion window) and the server replies to them.
 The client computes the delay between sending the request and receiving the corresponding response.
-To perform the experiment with the Lowest RTT scheduler, run the following command under folder `02_scheduler/msg`:
+To perform the experiment with the Lowest RTT scheduler, run the following command under folder `/tutorial/02_scheduler/reqres`:
 ```bash
 $ mprun -t topo -x reqres_rtt
 ```
-When inspecting the `msg_client.log` file, you can notice that all the delays are about 50 ms.
+When inspecting the `msg_client.log` file containing the measured delays in seconds, you can notice that all the delays are about 40-50 ms.
 Because the Lowest RTT scheduler always prefer the faster path, and because this fast path is never blocked by the congestion window due to the application traffic, the data only flows over the fast path.
 
 To perform the same experiment using the Round-Robin packet scheduler, runs:
@@ -114,39 +142,41 @@ Since the round-robin scheduler spreads the load over the slowest network path,
 
 ### Case 2: HTTP traffic
 
-TODO: discuss rmem/wmem
-
-While the choice of the packet scheduler is important for delay-sensitive traffic, this is less obvious for bulk transfers.
-Consider the following network.
+While the choice of the packet scheduler is important for delay-sensitive traffic, it also has some impact for bulk transfers, especially when hosts have constrained memory.
+Consider the following network scenario, where Multipath TCP creates a subflow between each Client's interface and the Server's one.
 
 ```
    |-------- 20 Mbps, 30 ms RTT ---------|
 Client                                Router --------- Server
    |-------- 20 Mbps, 100 ms RTT --------|
 ```
 
-On this network, the client will perform a HTTP GET request to the server for a file of varying size.
+On this network, the client will perform a HTTP GET request to the server for a file of 10 MB.
 The experiences files are located in the folder `/tutorial/02_scheduler/http`.
+In the remaining, we assume that each host uses a (fixed) sending (resp. receiving) window of 1 MB.
 
-Our runs returned the following results (in seconds).
-Yours might be different (try to run them several times), but the overal trend (and its explaination) should be similar.
-
-|**GET Size** | 256 KB | 1 MB  | 20 MB |
-| :---------: | :----: | :---: | :---: |
-|**Scheduler**|        |       |       |
-| Lowest RTT  | 0.246  | 0.533 | 4.912 |
-| Round Robin | 0.245  | 0.582 | 4.898 |
+First perform the run using regular TCP.
+Single-path TCP will only take advantage of the upper path (the one with 30 ms RTT).
+```bash
+$ mprun -t topo -x http_tcp
+```
+Have a look at the time indicated at the end of the `http_client.log` file, and keep it as a reference.
 
-Based on the network traces, could you explain:
-- Why there is very little difference between schedulers with the 256 KB GET?
-- The difference with larger files?
+Now run any of the following lines using Multipath TCP
+```bash
+# Using Lowest RTT scheduler
+$ mprun -t topo -x http_rtt
+# Using Round-Robin scheduler
+$ mprun -t topo -x http_rr
+```
+and have a look at the results in the `http_client.log` file.
 
-> The difference with larger files depends on when the last data on the slow path is sent.
-> In such bulk scenario when networks paths fully use their congestion window, the congestion control algorithm is the limiting factor.
+- Does the Multipath speedup correspond to your expectations? If not, why? HINT: Have a look at the server trace using Wireshark, select one packet going from the server to the client (like the first SYN/ACK) of the first subflow, then go to "Statistics -> TCP Stream Graphs -> Time Sequence (tcptrace)". Alternate between both subflows using either "Stream 0" or "Stream 1".
+- What happens if you increase the window sizes? (Replace all the 1000000 values by 8000000 in the experiment file)
+- On the other hand, if you focus on the Lowest RTT scheduler, what if the window sizes are very low (set 300000)? Could you explain this result?
 
-In the proposed HTTP experiment, a Multipath TCP connection is created for each data exchange.
-Let us think about the use of a persistent Multipath TCP connection (with already established subflows) to perform the HTTP requests.
-In your opinion, what will this change regarding to the results previously obtained? 
+> Other schedulers such as BLEST or ECF aims at tackling this Head-Of-Line blocking problem.
+> However, these are not included in the provided version of the vagrant box.
 
 
 ## 3. Impact of the Path Manager
@@ -172,9 +202,12 @@ Then, have a look at their corresponding PCAP files to spot how many subflows we
 $ mprun -t topo_single_path -x iperf_fullmesh
 $ mprun -t topo_single_path -x iperf_ndiffports
 ```
+
+HINT: Since the iperf traffic only generates one TCP connection, you can quickly spot the number of TCP subflows by going to "Statistics -> Conversations" and selecting the "TCP" tab.
+
 In the generated PCAP traces, you should notice only one subflow for the `fullmesh` path manager, while the `ndiffports` one should generate two.
 
-Then, let's consider the following network.
+Then, let us consider the following network.
 ```
    |-------- 25 Mbps, 20 ms RTT --------|
 Client                                Router --------- Server
@@ -188,7 +221,7 @@ $ mprun -t topo_two_client_paths -x iperf_ndiffports
 $ mprun -t topo_two_client_paths -x iperf_default
 ```
 
-- For each of them, can you explain the results you obtain in terms of goodput (`iperf.log`) and the number of subflows created (by inspecting the PCAP traces)?
+- For each of them, can you explain the results you obtain in terms of goodput (`server.log`) and the number of subflows created (by inspecting the PCAP traces)?
 
 Finally, consider this network.
 ```
@@ -202,7 +235,7 @@ $ mprun -t topo_two_client_paths_two_server_paths -x iperf_fullmesh
 ```
 
 - How many subflows are created, and between which IP address pairs?
-- How does the client learn the other IP address of the server?
+- How does the client learn the other IP address of the server? HINT: have a look at the first packets of the Multipath TCP connection.
 
 
 ## 4. The Notion of Backup Path
@@ -236,7 +269,7 @@ Now consider the same experiment but with the topology `topo_bk`.
 $ mprun -t topo_bk -x reqres_rtt
 ```
 
-- How do MPTCP hosts advertise the 30 ms RTT path as a backup one?
+- How do MPTCP hosts advertise the 30 ms RTT path as a backup one? HINT: Have a look at the SYN of the 30ms path.
 - Look at the application delays in `msg_client.log`. Based on the client trace, can you explain the results?
 - Focus on the server-side trace. Where does the server send the first response after the loss event? Can you explain why? Would it be possible for the server to decrease this application delay?
 

diff --git a/Vagrantfile b/Vagrantfile
@@ -57,8 +57,11 @@ Vagrant.configure("2") do |config|
   config.vm.provider "virtualbox" do |vb|
     # Customize the amount of memory on the VM:
     vb.memory = "2048"
-    # Having more than 1 vCPU is important for QUIC
-    vb.cpus = "3"
+    # Because VirtualBox seems to handle very badly the availability of
+    # several cores (and hence introduce lot of variability with mininet),
+    # just force a single vCPU. However, having more than 1 vCPU is
+    # important for QUIC...
+    vb.cpus = "1"
   end
   #
   # View the documentation for the provider you are using for more

diff --git a/prepare_vm.sh b/prepare_vm.sh
@@ -18,8 +18,8 @@ net.ipv6.conf.all.forwarding=1' | sudo tee -a /etc/sysctl.conf
 install_clang() {
     echo "Install CLANG"
     # Install clang 10
-    sudo echo "deb http://apt.llvm.org/bionic/ llvm-toolchain-bionic-10 main" >> /etc/apt/sources.list
-    sudo echo "deb-src http://apt.llvm.org/bionic/ llvm-toolchain-bionic-10 main" >> /etc/apt/sources.list
+    echo "deb http://apt.llvm.org/bionic/ llvm-toolchain-bionic-10 main" | sudo tee -a /etc/apt/sources.list
+    echo "deb-src http://apt.llvm.org/bionic/ llvm-toolchain-bionic-10 main" | sudo tee -a /etc/apt/sources.list
     wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key|sudo apt-key add -
     sudo apt-get update
     sudo apt-get install -y clang-10 lldb-10 lld-10

diff --git a/tutorial_files/02_scheduler/http/http_1mb_rtt b/tutorial_files/02_scheduler/http/http_1mb_rtt
diff --git a/tutorial_files/02_scheduler/http/http_20mb_rr b/tutorial_files/02_scheduler/http/http_20mb_rr
diff --git a/tutorial_files/02_scheduler/http/http_20mb_rtt b/tutorial_files/02_scheduler/http/http_20mb_rtt
diff --git a/tutorial_files/02_scheduler/http/http_256kb_rr b/tutorial_files/02_scheduler/http/http_256kb_rr
diff --git a/tutorial_files/02_scheduler/http/http_1mb_rr → tutorial_files/02_scheduler/http/http_rr b/tutorial_files/02_scheduler/http/http_1mb_rr → tutorial_files/02_scheduler/http/http_rr
@@ -4,4 +4,6 @@ serverPcap:yes
 snaplenPcap:100
 sched:roundrobin
 file:random
-file_size:1024
+file_size:10240
+rmem:1000000 1000000 1000000
+wmem:1000000 1000000 1000000
diff --git a/...al_files/02_scheduler/http/http_256kb_rtt → tutorial_files/02_scheduler/http/http_rtt b/...al_files/02_scheduler/http/http_256kb_rtt → tutorial_files/02_scheduler/http/http_rtt
@@ -4,4 +4,6 @@ serverPcap:yes
 snaplenPcap:100
 sched:default
 file:random
-file_size:256
+file_size:10240
+rmem:1000000 1000000 1000000
+wmem:1000000 1000000 1000000
diff --git a/tutorial_files/02_scheduler/http/http_tcp b/tutorial_files/02_scheduler/http/http_tcp
@@ -0,0 +1,9 @@
+xpType:http
+mptcpEnabled:0
+clientPcap:yes
+serverPcap:yes
+snaplenPcap:100
+file:random
+file_size:10240
+rmem:1000000 1000000 1000000
+wmem:1000000 1000000 1000000
diff --git a/tutorial_files/04_backup/topo_bk b/tutorial_files/04_backup/topo_bk
@@ -4,4 +4,4 @@ rightSubnet:10.1.
 path_c2r_0:20,400,100
 path_c2r_1:15,300,100,0,1
 changeNetem:yes
-netemAt_c2r_0:5,delay 20ms loss 100 limit 50000
+netemAt_c2r_0:4.9,delay 20ms loss 100 limit 50000