Hackathon UDP Support - tor-dev

14 Apr 2021


      Hello!
During the Hackweek, I spent a little time hacking support of
UDP-over-Tor. The goal of the project was supporting UDP onion services,
and leaving Exit support for another time.
I didn't have a working implementation by the end of the week, but life
moves one.
This is an initial mail about the general idea, and I'll follow up with
a concrete proposal in the not-too-distant future.
I have a (currently broken) branch `feature_udp_data` in my gitlab repo
[0]. I started simplifying/refactoring the hack job, but it's still a
work in progress.
I'll briefly describe how the working version [1] is designed. Overall,
this mostly just-works. While UDP is not a reliable routed protocol,
datagrams are not dropped within the network. We delegate drops to the
edges' recv/send queues (and any intermediate network equipment).
Client Side:
The patch continued our dependence on SOCKS5 [2] and (maybe correctly)
implemented the UDP ASSOCIATE command. The client sends a normal SOCKS5
handshake, except the new command is 0x03. The tor client then:
  - Establishes a rendezvous circuit with the specified onion service
    hostname in the handshake
  - On completing that connection, creates and binds a datagram socket
  - Sends the application a Success response and includes the datagram
    socket's bound address and port in that message.
The client then sends all datagrams to the provided (bound)
address:port, and the TCP socket used in the SOCKS5 handshake is only
kept open as a way of tracking the UDP socket's lifetime. Tor knows that
any datagram payload should be transported as a datagram instead of a
stream.
Tor Internal:
The patch introduces two relay cell commands: DATAGRAM and
DATAGRAM_FRAG. These relay cell types are stateless. Simply, when a tor
client (or general edge, in the future) receives a DATAGRAM_FRAG relay
cell, it knows that the payload is only a fragment of an entire
datagram. As a result, tor queues the data and does not send it to the
application. When a DATAGRAM cell is received, then tor knows the
contained payload is either:
  1) a complete datagram that fit into a single relay cell
  2) the final payload of a fragmented datagram
In either case, tor appends the payload to the (possibly empty) queue
and then sends it to the application. Tor doesn't need to care which one
is the current case. Also, Tor doesn't take into account a MTU mismatch
between the client-side and server-side - future work.
As an aside, I considered introducing only a single DATAGRAM relay cell,
and reserving the first byte for metadata (and using a bit indicating if
the payload is a fragment). However, I couldn't convince myself that
losing one byte of payload capacity was worth the advantages. I'll leave
this as a valuable discussion during the proposal process.
Onion Service Side:
The onion service is configured normally, except now there is a
HiddenServiceType configuration option. When it isn't provided, the
default value is "stream", otherwise it should be "dgram". This
configuration tells tor that a "stream" opened to this onion service (on
the specified port) is actually connectionless, and any payload data
should be sent and received as datagrams. Following from the previous
section, when a DATAGRAM relay cell is received, the queued payload is
packaged as a datagram and sent to the configured onion service
address:port.
Future design considerations:
  - Better client-side interface than SOCKS5 UDP Associate?
  - Relay cell design improvements
  - Cleaner onion service configuration?
  - MTU difference on the client/server-sides
  - What's missing before we can support this on Exits?
[0] https://gitlab.torproject.org/sysrqb/tor/
[1] commit ebfb4a97713d49a5e7f61709658b9d55ac21ec95
[2] https://tools.ietf.org/html/rfc1928