The case for Tor-over-QUIC - tor-dev

23 Mar 2018


      In Rome, I held a session about network protocol upgrades. My intent was
to cover the switch to two guards, conflux, datagram transports, and
QUIC. We ended up touching only briefly on everything but QUIC, but we
went into enough depth on QUIC itself that it was a worthwhile and very
productive session.
Our notes are here:
https://trac.torproject.org/projects/tor/wiki/org/meetings/2018Rome/Notes/Fu...
I wanted to give a bit of background and some historical perspective
about datagram transports for Tor, as well as explain those notes in
long form, to get everybody on the same page about this idea. With
the forthcoming IETF standard ratification of QUIC along with several
solid reference implementations (canonical list:
https://github.com/quicwg/base-drafts/wiki/Implementations), I believe
we are close to the point where we can finally put together a plan (and
a funding proposal) to undertake this work.
Consider this mail a pre-proposal to temperature check and solicit early
feedback about a Tor-over-QUIC deployment, before we invest the effort
to deep dive into the framing, protocol, and implementation details that
will be necessary for a full proposal.
# Historical Background
So, first: Tor has been considering UDP transports for a long time.
There have been a series of research papers on the topic, dating back to
Proposal 100 -- the very first Tor proposal:
https://gitweb.torproject.org/torspec.git/tree/proposals/100-tor-spec-udp.tx...
That proposal only examines adding an unreliable DTLS link between
relays, to enable transmission of unreliable (UDP) traffic end-to-end.
Despite being marked as Dead, it is a good conceptual starting point,
because it examines the relay-to-relay link crypto properties we need
from a datagram layer and has some good comments at the bottom.
However, adding the ability to simply carry UDP is not sufficient. What
we have always really wanted is to transition the network to some form
of end-to-end drop-based reliable congestion and flow control. There are
numerous benefits from such a transition.
Drop-signaled end-to-end congestion control means that instead of having
to maintain arbitrary queues at relays (to make room for Tor's fixed
SENDME window sizes for every single Tor circuit -- which as we have
seen does not scale and is vulnerabe to DoS and OOM attacks), we can
instead use a global fixed-length queue at each relay, and have each
relay simply drop packets when this queue is full. This behavior mirrors
how the internet works, and will not only put an upper bound on the
end-to-end queuing latency of the Tor network, but it will also allow us
to better ensure fairness, eliminate OOM conditions, and mitigate DoS
and congestion attacks. QUIC's stream management and byte-acked teardown
mechanisms will also eliminate a class of sidechannel issues in Tor
where an adversary gets to send ignored data on invalid and/or recently
closed streams: https://trac.torproject.org/projects/tor/ticket/25573
Furthermore, the switch to a secure datagram link protocol will
eliminate RST spamming attacks, where an adversary spoofs RSTs to/from
relay IP addresses to tear down our TLS links to gain information from
the resulting circuit failures and retries across the network.
Prior to the advent of QUIC, we studied many options for achieving
end-to-end congestion control, but have found all of them to be lacking
in one way or another. In 2011, Steven Murdoch did a literature review
of these approaches:
https://research.torproject.org/techreports/datagram-comparison-2011-11-07.p...
Section 4 of that review is the most useful thing to skim -- it explains
why we found each of the end-to-end transport modes available at the
time to be either insufficient for Tor's use, or encumbered by licensing
or engineering incompatibilities.
# QUIC Overview
QUIC, however, is different. It basically serves as a complete drop-in
replacement for Tor's n:1 streams-on-circuit connection model. QUIC has
a connection layer, which corresponds to Tor's circuit concept, and a
framing layer, which supports multiplexing for multiple reliable streams
inside a single connection (corresponding to Tor's streams).
Each QUIC connection provides an outer layer of drop-based congestion
control. Unlike latency-based congestion control like bittorrent's utp,
which was designed to compensate for poor queuing management at edge
routers (as well as yield to TCP flows), QUIC's drop-based congestion
control is designed to find the common bottleneck anywhere on the
connection's path, and use packet drops from that queue to signal
congestion and ensure fairness among connections at that bottleneck
point. QUIC also provides optimizations to handle higher latency and
drop-heavy links (SACK, SACK ranges, early retransmit, reordering
tolerance, RTT estimates):
https://quicwg.github.io/base-drafts/draft-ietf-quic-recovery.html
Additionally, QUIC provides flow control at the connection level, as
well as at the individual stream level. This is analogous to Tor's flow
control model, but unlike Tor, QUIC implementations can adjust these
window sizes automatically based on RTT measurements and congestion
feedback:
https://datatracker.ietf.org/doc/html/draft-ietf-quic-transport#section-3.4
https://datatracker.ietf.org/doc/html/draft-ietf-quic-transport#section-11
https://docs.google.com/document/d/1F2YfdDXKpy20WVKJueEf4abn_LVZHhMUMS5gX6Pg...
It is perhaps no small coincidence that QUIC's architecture fits so
perfectly into Tor's circuit model. While at an IETF meeting years ago,
an attendee quietly whispered in my ear, "You know, we're designing IETF
QUIC with Tor-like networks in mind." I always love me a good prospiracy
(https://www.urbandictionary.com/define.php?term=prospiracy ;).
# Using QUIC in Tor
While QUIC does indeed solve a great many problems for Tor out of the
box, the path to Tor-over-QUIC requires that we make a few design and
engineering decisions specific to Tor.
There are two research implementations that study using Tor-over-QUIC:
Quux and QuTor. Quux was implemented by wrapping Chromium's C++ QUIC
implementation in C bindings:
https://www.benthamsgaze.org/2016/09/29/quux-a-quic-un-multiplexing-of-the-t...
https://www.benthamsgaze.org/wp-content/uploads/2016/09/393617_Alastair_Clar...
QuTor is unfortunately paywalled at researchgate, so it is hard for me
to tell what they did:
https://www.researchgate.net/profile/Raik_Aissaoui/publication/292392094_QUT...
Unfortunately, the Quux implementation appears to use QUIC at a
suboptimal position -- they replace Tor's TLS connections with QUIC, and
use QUIC streams in place of Tor's circuit ID -- but only between
relays. This means that it does not benefit from QUIC's end-to-end
congestion control for the entire path of the circuit. Such a design
will not solve the queuing and associated OOM problems at Tor relays,
since relays would be unable to drop cells to signal backpressure to
endpoints. Drops will instead block every circuit on a connection
between relays, and even then, due to end-to-end reliability, relays
will still need to queue without bound, subject to Tor's current (and
inadequate) flow control.
The optimal design is to use QUIC end-to-end. (Based on the researchgate
abstract, it looks like this might be what QuTor did, but it is hard to
tell.) Instead of replacing TLS with QUIC, we should use an unreliable
link encryption protocol between relays (such as DTLS), and replace
Tor's circuits with QUIC connections. In this design, Tor applications
connect to the local SOCKS port using TCP, and the tor client converts
it to a QUIC stream on a QUIC connection-circuit that runs all the way
to the exit or onion service, which then will convert the stream back to
TCP. The exit/service keeps buffering to a minimum by using QUIC
pushback on the Tor side, and TCP pushback on the server side.
While IETF QUIC provides its own crypto layer based on TLS 1.3, we won't
really need this encryption. Instead, we will layer QUIC inside of Tor's
layered onion encryption, with an ntor handshake to each hop in the
path.
# QUIC Deployment Considerations
Past the above high-level decision of how to use QUIC, a handful of
deployment considerations remain.
## Implementation
As mentioned previously, there are a growing number of QUIC
implementations out there, in several languages, with varying degrees of
IETF compatibility:
https://github.com/quicwg/base-drafts/wiki/Implementations
For our purposes, the most interesting ones are ngtcp2 (native C
implementation, BoringSSL, MIT licensed), picoquic (C, MIT license), and
mozquic (C++ with C bindings, NSS, MPL licensed). The QUIC in Chromium
is not IETF compliant, and is C++ with no C bindings. Sadly Rust
implementations of QUIC are not complete. Google does also maintain a Go
QUIC, which is complete. This suggests that experimental designs using
one of the golang Tor implementations might be worthwhile.
I'm told that the IETF hopes to finalize the QUIC standard within the
year.
## Framing
QUIC actually differentiates between connection-level packet sizes and
frame sizes. If we want to preserve Tor's wire behavior, we will want to
enforce a one-frame-per-packet policy from our QUIC library, and use
QUIC's padding frames to make packet sizes uniform.
## Upgrade Path
The upgrade path to QUIC will require some thorough thinking -- if all
relays on a path don't yet support datagram links, should we embed QUIC
inside an opaque RELAY cell type, or should wait for a flag day to
switch over once sufficient relays have upgraded? Embedding QUIC in
opaque RELAY cells seems like the natural choice, but will require
additional engineering considerations.
Additionally, a small and declining percentage of networks currently
block all UDP other than DNS. At last measurement, the rate of QUIC
failure was below 5%. However, like Google, we will likely want to
retain the ability to fall back to TLS over TCP for these situations.
## Signaling
Because we need to transition to unreliable relay links to get the most
out of QUIC, we will need to figure out how to handle circuit setup --
there is currently no mechanism to ensure reliability for onionskins in
Tor, since this is provided by TLS today. The simplest solution is to
retain TLS between relays to transmit circuit extend cells out-of-band.
Padding could be used to obscure timing information of this signaling.
The alternative would be to implement some onionskin-level acking
between relays, so that the datagram transport could be used for all
traffic. However, since our deployment path will likely involve
migration from TCP anyway, and since a small (and shrinking) percentage
of networks simply block all non-DNS traffic, we will likely want to
retain TCP connections for some time, making out-of-band signaling a
natural first choice.
## Native UDP Support
Unfortunately, during the standardization process, the IETF QUIC decided
to drop support for unreliable datagram streams inside the QUIC
connection, to simplify the layering of their implementation. However,
because we will still need to layer QUIC inside of an onion layer, we
can use that layer to carry either QUIC or UDP (at the expense of an
additional field to differentiate this).
# QUIC Research Questions
A couple areas of open research into how QUIC will behave in a Tor
deployment remain. It is my belief that while these areas are worth
further study, they do not represent blockers to QUIC, as a naive
deployment of QUIC will substantially outperform our current (lack of)
congestion and poor flow control.
## End-to-end Congestion Control Study
Unfortunately, it appears that the Quux QUIC paper studied QUIC at the
wrong position - between relays, and the QuTor implementation is
unclear. This means that it may still be an open question as to if
QUIC's congestion control algorithms will behave optimally in a Tor-like
network under heavy load.
However, Google has conducted extensive internet-scale research into the
congestion control algorithm, which very likely covers enough networks
with Tor-like latency and drop characteristics. Because of this, I do
not expect us to have to do a lot of tuning of their congestion control
here, but it is worth investigating.
## Queue Management
Fairness among flows in drop-based congestion control is heavily
influenced by how the queue is managed, and particularly by the decision
of which packet to drop when the queue is full. Modern favored
algorithms involve probabilistic early dropping from the middle of the
queue. The most popular approaches are the adaptive variants of RED and
Blue:
https://en.wikipedia.org/wiki/Random_early_detection
https://en.wikipedia.org/wiki/Blue_(queue_management_algorithm)
The simplest possible queue management design would be to set Tor's
queues to 0, and allow the OS kernel or upstream router to apply its
drop strategy. While this will be sufficient to ensure backpressure under
congestion, it is not optimal for fairness.
To enforce fairness, Tor would want to take information about QUIC
connection IDs into account when dropping packets, similar to what is
done with circuit ids by EWMA currently. This information will be
unavailable to the kernel and upstream routers, due to the DTLS link
encryption that will conceal the multiplexing of QUIC connections on
connections to relays. Moreover, ensuring that sufficient queuing
happens inside the Tor daemon (as opposed to the kernel) will require an
aggressive socket-reading strategy. Tor's single-threaded nature will
make this difficult -- to ensure that queuing happens inside the Tor
daemon, packets must be continuously read from the kernel with minimal
processing delay or delay from other events. Even in this case,
optimality will still depend upon the bottleneck being Tor, and not in
the kernel or in an upstream internet router. Thus, optimal queue
management in Tor for QUIC remains an open research and engineering
problem.
(However, this is not really a deployment blocker. The deployment of
QUIC on the internet has similar shortcomings -- the vast majority of
internet routers also do not take QUIC connection ID into account when
making UDP drop decisions, and yet the protocol still outperforms TCP in
link utilization and drop recovery).
## Anonymity
Since a Tor with bounded-length queues will have more predictable
latency characteristics, a faster Tor based on QUIC may be more
vulnerable to inferences about location that can be made by connection
latency. Particularly, middle nodes may be able to tell how far the
client is from the guard by measuring RTT. One solution to this may be
adding delay in cases where RTT could be inferred. Another solution is
to have two middle nodes for four hop paths, so that it is no longer
clear which middle node you are in a QUIC circuit.
There may be other attacks that come out of using end-to-end congestion
control, but I am personally having a hard time coming up with cases
where such side channels and resource starvation conditions could be any
worse than those that already exist, or are unique to the OOM and DoS
conditions currently possible due to Tor's lack of congestion control.
-- 
Mike Perry