Re: [tor-dev] The case for Tor-over-QUIC - tor-dev

12 Dec 2021


      Sections 3 and 6 of the Quux paper have some relevant discussion [1]
...
Unfortunately, it appears that the Quux QUIC paper studied QUIC at the
wrong position - between relays, and the QuTor implementation is
unclear. This means that it may still be an open question as to if
QUIC's congestion control algorithms will behave optimally in a
Tor-like network under heavy load.
As Reardon and Goldberg noted in concluding remarks, approaches other
  than hop-by-hop will incur an extra cost for retransmissions, since
  these must be rerouted through a larger part of the network [RG09].
As Tschorsch and Scheuermann discuss [TS12], due to the longer RTT of
  TCP connections, end-to-end approaches will also take longer to “ramp
  up” through slow start and up to a steady state.
Both of these factors (not to mention increased security risk of
  information leakage [DM09]) suggest that hop-by-hop designs are likely
  to yield beer results. In fact, the hop-by-hop approach may be viewed as
  an instance of the Split TCP Performance-Enhancing Proxy design, whereby
  arbitrary TCP connections are split in two to negate the issues noted
  above.
...
Unfortunately, the Quux implementation appears to use QUIC at a
suboptimal position -- they replace Tor's TLS connections with QUIC,
and use QUIC streams in place of Tor's circuit ID -- but only between
relays. This means that it does not benefit from QUIC's end-to-end
congestion control for the entire path of the circuit. Such a design
will not solve the queuing and associated OOM problems at Tor relays,
since relays would be unable to drop cells to signal backpressure to
endpoints. Drops will instead block every circuit on a connection
between relays, and even then, due to end-to-end reliability, relays
will still need to queue without bound, subject to Tor's current (and
inadequate) flow control.
A fully QUIC relay path (with slight modication to fix a limit on
  internal buffer sizes) would allow end-to-end backpressure to be used
  from the client application TCP stream up to the exit TCP stream.
  Leaving aside Tor’s inbound rate limit mechanism but retaining the
  global outbound limit, this design would allow max-min fairness to be
  achieved in the network, as outlined by Tschorsch and Scheuermann
  [TS11].
...
Once implemented however, backpressure would allow Tor to adopt a
  signicantly improved internal design. In such a design, a Tor relay
  could read a single cell from one QUIC stream’s read buffer, onion crypt
  it, and immediately place it onto the write buffer of the next stream in
  the circuit. This process would be able to operate at the granularity of
  a single cell because the read and write operations for QUIC are very
  cheap user-space function calls and not syscalls as for host TCP.
The schedule of this action would be governed by the existing EWMA
  scheduler for circuits that have both a readable stream and a writeable
  stream (and as allowed by a global outgoing token bucket), allowing
  optimal quality of service for circuits.
It’s expected that backpressure implemented in this way will yield
  signicant performance and fairness gains on top of the performance
  improvement found in this thesis.
One issue for Quux was that it used the Chromium demo QUIC server code as the
basis for its implementation, which was fine for performance research but not
such a good choice for Tor's networking stack.
Several Rust implementations have been released with server-side (not just
client-side) usage, so I expect that to be much less of an issue today.
io_uring is also a significant development since Quux was developed, as
it can reduce the performance hit for host-TCP syscalls, or for using
recvmsg instead of recvmmsg with QUIC if the implementation makes
it difficult to use recvmmsg on the listener side.
[1] https://www.benthamsgaze.org/wp-content/uploads/2016/09/393617_Alastair_Clar...
The following paper has in-depth discussion, but I don't have a copy to
hand unfortunately:
Ali Clark. Tor network performance — transport and flow control.  Technical report, University College London, April 2016