Skip to content

Latest commit

 

History

History

benches

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Benchmarks

This folder provides Rust benchmarks.

Running

$ cargo bench # Just run benchmarks
$ cargo bench -- --quick # Just run benchmarks, with less samples
$ cargo bench -- --profile-time 10 # run benchmarks with cpu profile; results will be in out/rust/criterion/<group>/<test>/profile/profile.pb
$ # Compare to a baseline
$ cargo bench -- --save-baseline <name> # save baseline
$ # ...change something...
$ cargo bench -- --baseline <name> # compare against it

Performance

Ztunnel performance largely falls into throughput and latency. While these are sometimes at odds with each other, as Ztunnel is a generic proxy, we aim to make it perform well on both metrics.

Request flows

The primary responsibility of the proxy is copying bits between peers. Currently, this is always either TCP<-->TCP or TCP<-->HBONE.

TCP to TCP

This is the simplest case, and common amongst many proxies. copy.rs does the bulk of the work, essentially just bi-directionally copying bytes between the two sockets.

Typical bi-di copies are using a fixed buffer. To adapt to various workloads, we use dynamically sized buffers, that can grow from 1kb -> 16kb -> 256kb when enough traffic is received. This allows high throughput workloads to perform well, without excessive memory costs for low-bandwidth services.

TCP to HBONE

This case ends up being much more complex, as we flow through HTTP2 and TLS. The full flow looks as such (pseudocode):

copy_bidi():
    loop {
        data = tcp_in.read(up to 256k) # based on dynamic buffer size
        h2.write(data)
    }
h2::write(data):
    Buffer data as a DATA frame, up to a max of `max_send_buffer_size`. We configure this to 256k.
    Asyncronously, the connection driver will pick up this data and call `rustls.write_vectored([256bytes, rest of data])`.
rustls::write(data):
    data=encrypt(data)
    # TLS records are at most 16k
    # In practice I have observed at most 4 chunks; unclear where this is configured.
    tcp_out.write_vectored([chunks of 16k])

From an iperf load, this ends up looking something like this in strace:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 55.21    0.841290           5    140711           writev
 44.78    0.682359          17     38481           recvfrom

This will be from writev([16kb * 4]) calls and recvfrom(256kb).

HBONE to TCP

This flow is substantially different from the inverse direction. The receive flow is driven by h2. Under the hood this uses a LengthDelimitedCodec. h2 will attempt to decode 1 frame at a time, using an internal buffer. This buffer starts at 8kb but will grow to meet the size of frames. We allow up to a max of 1mb frame sizes (config.frame_size).

Ultimately, this will call rustls.read(buf). This goes through a few indirections, but ultimately ends up in rustls.deframer_buffer. This is what calls read() on the underlying IO, in our case the TCP connection. This buffer is configured to do 4kb reads generally.

Upon reading the frame from the wire, these get buffered up by h2. We read these in recv_stream.poll_data, trigger by the copy_bidirectional. Ultimately, this will write out 1 DATA frame worth of data to the upstream TCP connection

From an iperf load, this ends up looking something like this in strace:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 61.08    1.253541          50     24703           sendto
 38.19    0.783733           2    360707         8 recvfrom

This will be from sendto(256kb) calls, with many recvfrom() calls ranging from 4k to 16k.

Comparison to Envoy

Under an iperf load, Envoy client:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 68.24    1.363149           3    440440         1 sendto
 31.72    0.633584          11     55114        31 readv

This is from many sendto(16k) calls, and readv([16k]*8).

Envoy Server:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 65.24    1.199264           1    757275         8 recvfrom
 34.73    0.638315          26     23670           writev

This is from many calls of recvfrom(5); recvfrom(16k), and writev([16k]*16).

(All strace commands are looking at -e trace=write,writev,read,recvfrom,sendto,readv).