Hello,
Thank you for working on Tor.
I have a suggestion and would appreciate input. Please bear with me as I have a limited understanding of the design of Tor and all the different threats that it is meant to mitigate. Below, a (?) indicates a place where I need some confirmation that my understanding is correct, and N indicates either the number of Tor nodes, the number of end-users, or the amount of traffic (I assume these are all linearly related).
As far as I can tell, the main threat by a global passive adversary comes from traffic analysis (?). This attack should become easier as the number of Tor nodes increases (?): Tor uses a clique topology, so the number of edges potentially carrying traffic grows like N^2. A dual way to see this is that not enough mixing can happen around a node for incoming/outgoing edge pairs, bar injecting a huge amount of fake traffic.
To compensate, it seems natural to look for a sparse yet highly mixing network topology. Mathematically, those are called expanders [1]. A typical example of a family of expanders would be the Erdos-Renyi model [2], and indeed I have found in the literature suggestions for basing anonymizing protocols on such a model. The analysis in the presence of an active adversary becomes very difficult though.
Alternatively, one could use a different method for constructing that expander topology, working "all at once". This comes from recent mathematics research (<= 5 years, certainly not my own, see [3]). The graph is then a Cailey graph [4] in a matrix group (the group is fixed and determined by an approximation to the number of Tor nodes, such as nearest third power of a prime number). In some sense this construction interpolates between mixing chains and Tor, and can be seen as a lot of mixing chains interwoven.
In the setting of Tor, constructing the Cailey graph would require making two distributed randomize choices: - a matching of elements of the group to Tor nodes (possibly 2:1 for some Tor nodes) - a small subset of generators for the Cailey graph
From my understanding of security protocols, it should be easy to do these
two choices safely and fast, as it amounts to choosing a random element in S_N and filling lots of matrix entries with random elements between 1 and a prime p, with some rejection.
Once that is done, the network topology is fully determined, and with very high probability gives an expander. This means that traffic gets mixed up in very few hops. The number of hops needed grows as log N, with a constant that can be mitigated by chosing a large generating set above. This is the only downside I see (apart from difficulty to explain the math behind this): the latency would increase, from 3 in the current protocol to maybe 10 or so.
I don't know the details of the behaviour of the constants in the last paragraph, and would appreciate feedback from the list before looking too much into this.
Paul Dehaye
[1] http://en.wikipedia.org/wiki/Expander_graph [2] http://en.wikipedia.org/wiki/Erd%C5%91s%E2%80%93R%C3%A9nyi_model [3] http://terrytao.wordpress.com/2011/12/02/245b-notes-1-basic-theory-of-expand... 15 and remark below [4] http://en.wikipedia.org/wiki/Cayley_graph
So I don't work for Tor, nor am I a graph theorist, but I'll add a few preliminary thoughts.
On 22 August 2013 21:08, Paul-Olivier Dehaye paul-olivier.dehaye@math.uzh.ch wrote:
As far as I can tell, the main threat by a global passive adversary comes from traffic analysis (?).
A Global Passive Adversary is technically outside of Tor's threat model (see https://trac.torproject.org/projects/tor/wiki/doc/TorFAQ#Whatattacksremainag...) - but if there are easy ways to make it more difficult for such an adversary, at a low engineering cost - then Tor tends to be up for them.
This attack should become easier as the number of Tor nodes increases (?)
I'm not sure I agree with that. If the adversary is not global, but only partly global, then network diversity is crucial. If the adversary is truely global, I don't think more nodes would help as much as more _traffic_.
A dual way to see this is that not enough mixing can happen around a node for incoming/outgoing edge pairs, bar injecting a huge amount of fake traffic.
In what sense do you use the word 'mixing'? In the traffic analysis literature, I think it tends to refer to mix networks, and collecting several messages into a pool before releasing some or all of them (http://crypto.is/blog/mix_and_onion_networks).
-tom
I use the word mixing 4 times, with two meanings: "...enough mixing can happen..." : mathematical sense, on pairs of edges around a node "...yet highly mixing network..." : mathematical sense, on pairs of entry/exit Tor nodes "...between mixing chains and Tor, and can be seen as a lot of mixing chains..." : traffic analysis sense, the first 'mixing chains' here should have said 'mix networks' instead
Informally, in the mathematical sense, some topology is mixing if random behaviour on it leads exponentially fast to the uniform distribution (which amounts to no information in the security sense). See the part around [3].
The short summary of the weakness of Tor here: - We would like the whole protocol to be mixing (to an observer, the probability of exiting at any node C given entrance at node A is close to 1/N), - The clique topology leads to a constant leak of information, because at any time traffic analysis around a node X can be performed to recover a non-uniform distribution on entry-exit pairs around that node. Proposed solution: - By simply lowering the number of allowed edges around a node, nodes can adopt mixing chains-like behaviour combined with Tor-like behaviour, at little cost. They have more or less the same amount of traffic to play with, but they need to make it look constant (to resist against traffic analysis) on less edges, so they need to fake a lot less traffic. - Lowering the number of edges means that it increases danger of other attacks, relying on global information on the topology (if the topology is not a clique anymore, in 2 hops I might not be able to potentially be everywhere). The point of my email is that for specific network topologies (expanders) this is not really a problem (this was already observed in the literature), and that some type of expanders can be created easily, with the security guarantees needed (using Cayley graphs of matrix groups).
Paul
[3] http://terrytao.wordpress.com/2011/12/02/245b-notes-1-basic-theory-of-expand... Exercise 15 and remark below
On Fri, Aug 23, 2013 at 4:15 AM, Tom Ritter tom@ritter.vg wrote:
So I don't work for Tor, nor am I a graph theorist, but I'll add a few preliminary thoughts.
On 22 August 2013 21:08, Paul-Olivier Dehaye paul-olivier.dehaye@math.uzh.ch wrote:
As far as I can tell, the main threat by a global passive adversary comes from traffic analysis (?).
A Global Passive Adversary is technically outside of Tor's threat model (see https://trac.torproject.org/projects/tor/wiki/doc/TorFAQ#Whatattacksremainag... )
- but if there are easy ways to make it more difficult for such an
adversary, at a low engineering cost - then Tor tends to be up for them.
This attack should become easier as the number of Tor nodes increases (?)
I'm not sure I agree with that. If the adversary is not global, but only partly global, then network diversity is crucial. If the adversary is truely global, I don't think more nodes would help as much as more _traffic_.
A dual way to see this is that not enough mixing can happen around a node for incoming/outgoing edge
pairs,
bar injecting a huge amount of fake traffic.
In what sense do you use the word 'mixing'? In the traffic analysis literature, I think it tends to refer to mix networks, and collecting several messages into a pool before releasing some or all of them (http://crypto.is/blog/mix_and_onion_networks).
-tom _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
On Fri, Aug 23, 2013 at 09:19:32AM +0200, Paul-Olivier Dehaye wrote:
The short summary of the weakness of Tor here:
- We would like the whole protocol to be mixing (to an observer, the
probability of exiting at any node C given entrance at node A is close to 1/N),
Right, you're using terminology and threat models from the mixnet literature. Tor doesn't aim to (and doesn't) defend against that.
You might find the explanation in https://blog.torproject.org/blog/one-cell-enough to be useful. The first trouble with mixing in the Tor environment is that "messages" from each user aren't the same size, and it's really expensive to make them the same size ("round up to the largest expected web browsing session").
Another key point: it's not about the paths inside the network -- it's about the connections from the users to the network, and from the network to the destinations.
That said, for the beginning of your related work, see http://freehaven.net/anonbib/#danezis:pet2003
And for a much later follow-up, see http://freehaven.net/anonbib/#topology-pet2010
--Roger
On Fri, Aug 23, 2013 at 03:45:31AM -0400, Roger Dingledine wrote:
On Fri, Aug 23, 2013 at 09:19:32AM +0200, Paul-Olivier Dehaye wrote:
The short summary of the weakness of Tor here:
- We would like the whole protocol to be mixing (to an observer, the
probability of exiting at any node C given entrance at node A is close to 1/N),
Right, you're using terminology and threat models from the mixnet literature. Tor doesn't aim to (and doesn't) defend against that.
You might find the explanation in https://blog.torproject.org/blog/one-cell-enough to be useful. The first trouble with mixing in the Tor environment is that "messages" from each user aren't the same size, and it's really expensive to make them the same size ("round up to the largest expected web browsing session").
Another key point: it's not about the paths inside the network -- it's about the connections from the users to the network, and from the network to the destinations.
That said, for the beginning of your related work, see http://freehaven.net/anonbib/#danezis:pet2003
And for a much later follow-up, see http://freehaven.net/anonbib/#topology-pet2010
You might also want to look at the following for a design that tries to address your issues. http://freehaven.net/anonbib/#active-pet2010 See also citations therein for partial solutions.
High-order bit: I think this is about state-of-the-art for this area, and it's my paper, but we still need a lot of basic research progress in this space before we would have anything worth putting into Tor. And, except for adding small amounts of noise (besides uniform cell sizes, but that should be a gauge of tolerable overhead for anything we do) to complicate trawling, I'm not very sanguine about the prospects of this ever making practical sense. You might also consult my "Why I'm not an Entropist" http://www.syverson.org/entropist-final.pdf
aloha, Paul