Sections 3 and 6 of the Quux paper have some relevant discussion [1]

> Unfortunately, it appears that the Quux QUIC paper studied QUIC at the

> wrong position - between relays, and the QuTor implementation is

> unclear. This means that it may still be an open question as to if

> QUIC's congestion control algorithms will behave optimally in a

> Tor-like network under heavy load.

As Reardon and Goldberg noted in concluding remarks, approaches other

than hop-by-hop will incur an extra cost for retransmissions, since

these must be rerouted through a larger part of the network [RG09].

As Tschorsch and Scheuermann discuss [TS12], due to the longer RTT of

TCP connections, end-to-end approaches will also take longer to “ramp

up” through slow start and up to a steady state.

Both of these factors (not to mention increased security risk of

information leakage [DM09]) suggest that hop-by-hop designs are likely

to yield beer results. In fact, the hop-by-hop approach may be viewed as

an instance of the Split TCP Performance-Enhancing Proxy design, whereby

arbitrary TCP connections are split in two to negate the issues noted

above.

> Unfortunately, the Quux implementation appears to use QUIC at a

> suboptimal position -- they replace Tor's TLS connections with QUIC,

> and use QUIC streams in place of Tor's circuit ID -- but only between

> relays. This means that it does not benefit from QUIC's end-to-end

> congestion control for the entire path of the circuit. Such a design

> will not solve the queuing and associated OOM problems at Tor relays,

> since relays would be unable to drop cells to signal backpressure to

> endpoints. Drops will instead block every circuit on a connection

> between relays, and even then, due to end-to-end reliability, relays

> will still need to queue without bound, subject to Tor's current (and

> inadequate) flow control.

A fully QUIC relay path (with slight modication to fix a limit on

internal buffer sizes) would allow end-to-end backpressure to be used

from the client application TCP stream up to the exit TCP stream.

Leaving aside Tor’s inbound rate limit mechanism but retaining the

global outbound limit, this design would allow max-min fairness to be

achieved in the network, as outlined by Tschorsch and Scheuermann

[TS11].

...

Once implemented however, backpressure would allow Tor to adopt a

signicantly improved internal design. In such a design, a Tor relay

could read a single cell from one QUIC stream’s read buffer, onion crypt

it, and immediately place it onto the write buffer of the next stream in

the circuit. This process would be able to operate at the granularity of

a single cell because the read and write operations for QUIC are very

cheap user-space function calls and not syscalls as for host TCP.

The schedule of this action would be governed by the existing EWMA

scheduler for circuits that have both a readable stream and a writeable

stream (and as allowed by a global outgoing token bucket), allowing

optimal quality of service for circuits.

It’s expected that backpressure implemented in this way will yield

signicant performance and fairness gains on top of the performance

improvement found in this thesis.

One issue for Quux was that it used the Chromium demo QUIC server code as the

basis for its implementation, which was fine for performance research but not

such a good choice for Tor's networking stack.

Several Rust implementations have been released with server-side (not just

client-side) usage, so I expect that to be much less of an issue today.

io_uring is also a significant development since Quux was developed, as

it can reduce the performance hit for host-TCP syscalls, or for using

recvmsg instead of recvmmsg with QUIC if the implementation makes

it difficult to use recvmmsg on the listener side.

[1] https://www.benthamsgaze.org/wp-content/uploads/2016/09/393617_Alastair_Clark_aclark_2360661_860598830.pdf

The following paper has in-depth discussion, but I don't have a copy to

hand unfortunately:

Ali Clark. Tor network performance — transport and flow control. Technical report, University College London, April 2016