On Tue, Nov 27, 2018 at 08:23:21AM -0500, Nick Mathewson wrote:
### Traffic Fingerprinting of TCP-like systems
Today, because Tor terminates TCP at the guard node, there is limited ability for the exit node to fingerprint client TCP behavior (aside from perhaps measuring some effects on traffic volume, but those are not likely preserved across the Tor network).
However, when using a TCP-like system for end-to-end congestion control, flow control, and reliability, the exit relay will be able to make inferences about client implementation and conditions based on its behavior.
Different implementations of TCP-like systems behave differently. Either party on a stream can observe the packets as they arrive to notice cells from an unusual implementation. They can probe the other side of the stream, nmap-style, to see how it responds to various inputs.
If two TCP-like implementations differ in their retransmit or timeout behavior, an attacker can use this to distinguish them by carefully chosen patterns of dropped traffic. Such an attacker does not even need to be a relay, if it can cause DTLS packets between relays to be dropped or reordered.
This class of attacks is solvable, especially if the exact same TCP-like implementation is used by all clients, but it also requires careful consideration and additional constraints to be placed on the TCP stack(s) in use that are not usually considered by TCP implementations -- particularly to ensure that they do not depend on OS-specific features or try to learn things about their environment over time, across different connections.
Thanks, this is nice and thoughtful analysis.
Does the word "clients" in the last paragraph meant to exclude servers? Or should I understand something like "peers" that includes clients and servers? I'm trying to think of how fingerprinting a server could be useful to an attacker. An onion service doesn't count as a server--at the layer of the TCP-like protocol, it's a client, with the RP as server.
Related to implementation differences is configuration. If there are knobs that let a user control, say, the reassembly buffer size, then some users will use them and make their protocol fingerprint differ.