Hi folks,
Here's an email I wrote to a researcher who is working on categorizing anonymity attacks. I figured I should share it with you in case it's useful in some way.
It's also related to my talk at https://www.cosic.esat.kuleuven.be/ecrypt/provpriv2012/program.html
and I expect to use it as background for my discussions at the upcoming Dagstuhl: http://www.dagstuhl.de/no_cache/en/program/calendar/semhp/?semnr=12381
--Roger
----- Forwarded message from Roger Dingledine arma@mit.edu -----
If you have any suggestions about which paper on each attack is most likely to provide such an explanation, please send them to me as soon as possible.
- "Traffic confirmation attack". If he can see/measure the traffic flow
between the user and the Tor network, and also the traffic flow between the Tor network and the destination, he can realize that the two flows correspond to the same circuit: http://freehaven.net/anonbib/#SS03 http://freehaven.net/anonbib/#timing-fc2004 http://freehaven.net/anonbib/#danezis:pet2004 http://freehaven.net/anonbib/#ShWa-Timing06 http://freehaven.net/anonbib/#murdoch-pet2007 http://freehaven.net/anonbib/#ccs2008:wang http://freehaven.net/anonbib/#active-pet2010
It depends in what way you want to become more precise.
I think the #SS03 paper might have the simplest version of the attack ("count up the number of packets you see on each end"). The #timing-fc2004 paper introduces the notion of a sliding window of counts on each side. The #murdoch-pet2007 one looks at how much statistical similarity you can notice between the flows when you are only sampling a small fraction of packets on each side.
- "Congestion attack". An adversary can send traffic through nodes or
links in the network, then try to detect whether the user's traffic flow slows down: http://freehaven.net/anonbib/#torta05 http://freehaven.net/anonbib/#torspinISC08 http://freehaven.net/anonbib/#congestion-longpaths
Section 2 and the first part of Section 3 in #congestion-longpaths is probably your best bet here. It actually provides a good pretty overview of related work including the passive correlation attacks above.
If by 'more precise' you mean you want to know exactly what the threat model is for this attack, I'm afraid it varies by paper. In #torta05 they assume the adversary runs the website, and when the target user starts to fetch a large file, they congest (DoS) relays one at a time until they see the download slow down.
In #congestion-longpaths they assume the adversary runs the exit relay as well, so they know the middle relay, and the only question is which relay is the guard (first) relay.
In #torspinISC08 on the other hand, they preemptively try to DoS the whole network except the malicious relays, so the target user will end up using malicious relays for her circuit.
- "Latency or throughput fingerprinting". While congestion attacks
by themselves typically just learn what relays the user picked (but don't break anonymity as defined above), they can be combined with other attacks: http://freehaven.net/anonbib/#tissec-latency-leak http://freehaven.net/anonbib/#ccs2011-stealthy http://freehaven.net/anonbib/#tcp-tor-pets12
These are three separate attacks.
In #tissec-latency-leak, they assume the above congestion attacks work great to identify Alice's path, and then the attacker builds a parallel circuit using the same path, finds out the latency from them to the (adversary-controlled) website that Alice went to, and then subtracts out to find the latency between Alice and the first hop.
#ccs2011-stealthy actually proposes a variety of variations on these attacks. They show that if Alice uses two streams on the same circuit, the two websites she visits can use throughput fingerprinting to realize they're the same circuit. They also show that by looking at the throughput Alice gets from her circuit, you can rule out a lot of relays that wouldn't have been able to provide that throughput at that time. And finally, they show that if you build test circuits through the network and then compare the throughput your test circuit gets with the throughput Alice gets, you can guess whether your circuit shares a bottleneck relay with Alice's circuit. Where "show" should probably be in quotes, since it probably works sometimes and not other times, and nobody has explored how robust the attack is.
#tcp-tor-pets12 has the adversary watching Alice's local network, and wanting to know whether she visited a certain website. The adversary exploits vulnerabilities in TCP's window design to spoof RST packets between every exit relay and the website in question. If they do it right, the connection between the exit relay and the website cuts its TCP congestion window in response, leading to a drop in throughput on the flow between the Tor network and Alice. In theory. It also works in the lab, sometimes.
I also left out http://freehaven.net/anonbib/date.html#esorics10-bandwidth which uses a novel remote bandwidth estimation algorithm to try to estimate whether various physical Internet links have less bandwidth when Alice is fetching her file. In theory this lets them walk back towards Alice, one traceroute-style hop at a time. In practice they need an Internet routing map (these are notoriously messy for the same reasons the Decoy Routing people are realizing), and also Alice's flows have to be quite high throughput for a long time.
- "Website fingerprinting". If the adversary can watch the user's
connection into the Tor network, and also has a database of traces of what the user looks like while visiting each of a variety of pages, and the user's destination page is in the database, then in some cases the attacker can guess the page she's going to: http://freehaven.net/anonbib/#hintz02 http://freehaven.net/anonbib/#TrafHTTP http://freehaven.net/anonbib/#pet05-bissias http://freehaven.net/anonbib/#Liberatore:2006 http://freehaven.net/anonbib/#ccsw09-fingerprinting http://freehaven.net/anonbib/#wpes11-panchenko http://freehaven.net/anonbib/#oakland2012-peekaboo
#oakland2012-peekaboo aims to be a survey paper for the topic, so it's probably the right one to look at first.
- "Correlating bridge availability with client activity."
If you run a relay and also use it as a client, the fact that the adversary can route traffic through you lets him learn about your client activity. Section 1.1 summarizes:
2. A bridge always accepts connections when its operator is using Tor. Because of this, an attacker can compile a list of times when a given operator was either possibly or certainly not using Tor, by repeatedly attempting to connect to the bridge. This list can be used to eliminate bridge operators as candidates for the originator of a series of connections exiting Tor. We demonstrate empirically that typically, a small set of linkable connections is sufficient to eliminate all but a few bridges as likely originators.
3. Traffic to and from clients connected to a bridge interferes with traffic to and from a bridge operator. We demonstrate empirically that this makes it possible to test via a circuit-clogging attack [17, 15] which of a small number of bridge operators is connecting to a malicious server over Tor. Combined with the previous two observations, this means that any bridge operator that connects several times, via Tor, to a web-site that can link users across visits could be identified by the site's operator.
I tried to keep this list of "excepts" as small as possible so it's not overwhelming, but I think the odds are very high that if the ratpac comes up with other issues, I'll be able to point to papers on anonbib that discuss these issues too. For example, these two papers are interesting: http://freehaven.net/anonbib/#ccs07-doa
Traditionally, we calculate the risk that Alice's circuit is controlled by the adversary as the chance that she chooses a bad first hop and a bad last hop. They're assumed to be independent. But if an adversary's relay is chosen anywhere in the circuit yet he *doesn't* have both the first and last hop, he should tear down the circuit, forcing Alice to make a new one and roll the dice again. Longer path lengths (once thought to make the circuit safer) *increase* vulnerability to this attack.
I think the guard node design helps here, but whether that's true is an area of active research.
If you lie about your bandwidth, you can get more traffic than you "should" get based on bandwidth investment. In theory we've solved this by doing active bandwidth measurement: https://blog.torproject.org/blog/torflow-node-capacity-integrity-and-reliabi... but in practice it's not fully solved: https://trac.torproject.org/projects/tor/ticket/2286
--Roger
----- End forwarded message -----