Hello,
Thank you for working on Tor.
I have a suggestion and would appreciate input. Please bear with me as I have a limited understanding of the design of Tor and all the different threats that it is meant to mitigate. Below, a (?) indicates a place where I need some confirmation that my understanding is correct, and N indicates either the number of Tor nodes, the number of end-users, or the amount of traffic (I assume these are all linearly related).
As far as I can tell, the main threat by a global passive adversary comes from traffic analysis (?). This attack should become easier as the number of Tor nodes increases (?): Tor uses a clique topology, so the number of edges potentially carrying traffic grows like N^2. A dual way to see this is that not enough mixing can happen around a node for incoming/outgoing edge pairs, bar injecting a huge amount of fake traffic.
To compensate, it seems natural to look for a sparse yet highly mixing network topology. Mathematically, those are called expanders [1]. A typical example of a family of expanders would be the Erdos-Renyi model [2], and indeed I have found in the literature suggestions for basing anonymizing protocols on such a model. The analysis in the presence of an active adversary becomes very difficult though.
Alternatively, one could use a different method for constructing that expander topology, working "all at once". This comes from recent mathematics research (<= 5 years, certainly not my own, see [3]). The graph is then a Cailey graph [4] in a matrix group (the group is fixed and determined by an approximation to the number of Tor nodes, such as nearest third power of a prime number). In some sense this construction interpolates between mixing chains and Tor, and can be seen as a lot of mixing chains interwoven.
In the setting of Tor, constructing the Cailey graph would require making two distributed randomize choices: - a matching of elements of the group to Tor nodes (possibly 2:1 for some Tor nodes) - a small subset of generators for the Cailey graph
From my understanding of security protocols, it should be easy to do these
two choices safely and fast, as it amounts to choosing a random element in S_N and filling lots of matrix entries with random elements between 1 and a prime p, with some rejection.
Once that is done, the network topology is fully determined, and with very high probability gives an expander. This means that traffic gets mixed up in very few hops. The number of hops needed grows as log N, with a constant that can be mitigated by chosing a large generating set above. This is the only downside I see (apart from difficulty to explain the math behind this): the latency would increase, from 3 in the current protocol to maybe 10 or so.
I don't know the details of the behaviour of the constants in the last paragraph, and would appreciate feedback from the list before looking too much into this.
Paul Dehaye
[1] http://en.wikipedia.org/wiki/Expander_graph [2] http://en.wikipedia.org/wiki/Erd%C5%91s%E2%80%93R%C3%A9nyi_model [3] http://terrytao.wordpress.com/2011/12/02/245b-notes-1-basic-theory-of-expand... 15 and remark below [4] http://en.wikipedia.org/wiki/Cayley_graph