[anti-censorship-team] USERADDR for Turbo Tunnel in Snowflake

6 Feb 2020


      On Fri, Jan 31, 2020 at 07:24:48PM -0700, David Fifield wrote:
...
https://gitweb.torproject.org/user/dcf/snowflake.git/tree/server/server.go?h...
The branch currently lacks client geoip lookup (ExtORPort USERADDR),
because of the difficulty I have talked about before of providing an IP
address for a virtual session that is not inherently tied to any single
network connection or address. I have a plan for solving it, though; it
requires a slight breaking of abstractions. In the server, after reading
the ClientID, we can peek at the first 4 bytes of the first packet.
These 4 bytes are the KCP conversation ID (https://github.com/xtaci/kcp-go/blob/v5.5.5/kcp.go#L120),
a random number chosen by the client, serving roughly the same purpose
in KCP as our ClientID. We store a temporary mapping from the
conversation ID to the IP address of client making the WebSocket
connection. kcp-go provides a GetConv function that we can call in
handleStream, just as we're about to connect to the ORPort, to look up
the client's IP address in the mapping. The possibility of doing this is
one reason I decided to go with KCP for this implementation rather than
QUIC as I did in the meek implementation: the quic-go package doesn't
expose an accessor for the QUIC connection ID.
https://gitweb.torproject.org/user/dcf/snowflake.git/commit/?h=turbotunnel&a...
This commit adds USERADDR support for turbotunnel sessions. I found a
nicer way to do it than what I proposed above, that doesn't require
peeking into the packet structure. Instead of using the KCP conversation
ID as the common element linking an IP address and a client session, we
can use the ClientID (the artificial 8-byte value that we tack on at the
beginning of every WebSocket connection). The ServeHTTP function has
access to the ClientID because it's what parses it out, and once you
have a session you can recover the ClientID by calling the RemoteAddr
method—this is an effect of kcp-go using the address returned from its
ReadFrom calls as the remote address of the session, and the fact that
we use the ClientID for the address in those ReadFrom calls.
To summarize:
 * ServeHTTP has an IP address and a ClientID but not a session.
 * acceptStreams has a session and a ClientID but not an IP address.
 * We bridge the gap using a data structure that maps a Client ID to an
   IP address. ServeHTTP stores an entry in the structure, and
   acceptStreams looks it up.
I designed a simple data structure, clientIDMap, to serve as the lookup
table. In spirit it is a map[ClientID]string: you Set(clientID, addr) to
store a mapping, and Get(clientID) to retrieve it. It differs from a
plain map in that it expires old entries when storing new ones: it's a
fixed-size circular buffer. I designed it to be proof against memory
leaks. With a plain map, it would be possible for a client to get as far
as sending a ClientID (storing an entry in the map), but not ultimately
establish a session (leaving the entry in the map forever). But it means
that the buffer has to be large enough not to expire entries before they
are needed (how big depends on the rate of new session creation and
delay between when a WebSocket connection starts and when a session is
established). But even if an entry is expired before it is used, the
worst thing that happens is that one session gets attributed to ??
rather than a certain country. I added a log message for when this
happens and if it turns out to be a problem, we can design a more
complicated, dynamically sized data structure.

2024

2023

2022

2021

2020

2019

[anti-censorship-team] USERADDR for Turbo Tunnel in Snowflake