On 11 Oct 2014, at 23:00 , tor-dev-request@lists.torproject.org wrote:
Date: Fri, 10 Oct 2014 14:33:52 +0100 From: Steven Murdoch Steven.Murdoch@cl.cam.ac.uk To: tor-dev@lists.torproject.org Subject: [tor-dev] Optimising Tor node selection probabilities Message-ID: FDECA8F4-5F99-4738-8391-CD60D156D774@cl.cam.ac.uk Content-Type: text/plain; charset=windows-1252
I?ve just published a new paper on selecting the node selection probabilities (consensus weights) in Tor. It takes a queuing-theory approach and shows that what Tor used to do (distributing traffic to nodes in proportion to their contribution to network capacity) is not the best approach.
Counter-intuitively the paper shows that some of the slowest nodes should not be used at all, because if they are used they will slow down the average performance for all users. The proportion of nodes which shouldn?t be used depends on the relationship between network usage and network capacity, so will vary over time.
It?s not clear that there is a closed-form solution to the problem of calculating node selection probabilities (I couldn?t find one), but this paper shows that the optimisation surface is convex and so gradient-based optimisation methods will find the global optimum (rather than some local optimum which depends on the starting position of the optimisation process).
Although the process outlined in the paper requires knowing the relationship between network capacity and usage, it isn?t highly sensitive to minor inaccuracies in measuring this value. For example if it is assumed the network is loaded at 50% then the solution will outperform Tor?s old approach provided the true network load is between 0% and 60%.
After this work was done, Tor moved to actively measuring the network performance and manipulating the consensus weights in response to changes. This seems to have ended up with roughly the same outcome. The advantage of Tor?s new approach is that it doesn?t require knowing network usage and node capacity; however the disadvantage is that it can only react slowly to changes in network characteristics.
For more details, see the paper: http://www.cl.cam.ac.uk/~sjm217/papers/#pub-el14optimising
Note that this is published in IET Electronics Letters, which is a bit different to the usual Computer Science publication venues. It jumps straight into the maths and leaves it to the reader to understand the background and implications. The advantage is that it?s 2 pages long; the disadvantage is that to understand it you need to know a reasonable amount about Tor and queuing theory to make much sense of it.
Best wishes, Steven
This is fantastic, Steven - and although we've changed Tor's consensus weights algorithm, we still waste bandwidth telling clients about relays that wold slow the network down.
Your result further supports recent proposals to remove the slowest relays from the consensus entirely.
teor pgp 0xABFED1AC hkp://pgp.mit.edu/ https://gist.github.com/teor2345/d033b8ce0a99adbc89c5 http://0bin.net/paste/Mu92kPyphK0bqmbA#Zvt3gzMrSCAwDN6GKsUk7Q8G-eG+Y+BLpe7wt...
On Sun, Oct 12, 2014 at 06:43:10AM +1100, teor wrote:
On 11 Oct 2014, at 23:00 , tor-dev-request@lists.torproject.org wrote:
Date: Fri, 10 Oct 2014 14:33:52 +0100 From: Steven Murdoch Steven.Murdoch@cl.cam.ac.uk
I?ve just published a new paper on selecting the node selection probabilities (consensus weights) in Tor. It takes a queuing-theory approach and shows that what Tor used to do (distributing traffic to nodes in proportion to their contribution to network capacity) is not the best approach.
[snip]
For more details, see the paper: http://www.cl.cam.ac.uk/~sjm217/papers/#pub-el14optimising
[snip]
This is fantastic, Steven - and although we've changed Tor's consensus weights algorithm, we still waste bandwidth telling clients about relays that wold slow the network down.
Your result further supports recent proposals to remove the slowest relays from the consensus entirely.
I find this theoretically very interesting and an important contribution, but I'm less sure what conclusions it supports for Tor as implemented and deployed. A first major question is that the results assume FIFO processing of cells at each relay, but Tor currently uses EWMA scheduling and is now moving even further from FIFO as KIST is being adopted. There are other questions, e.g., that the paper assumes it is safe to ignore circuits and streams (not just for FIFO vs. prioritized processing but for routing and distribution of cells across relays as well---or said differently, Tor's onion routing, but this isn't). But I'm thinking if I'm correct even about this one point, then it would be extremely premature to directly apply the conclusions of this work to practical proposals for improving Tor performance. Then of course there are those pesky security implications to worry about ;>) My comments are not meant at all to question the value of the paper, which I think contributes to our understanding of such networks. Rather I am cautioning against applying its results outside the scope of its assumptions.
Cf. the KIST paper, which itself cites the EWMA introduction paper and subsequent related work. http://www.nrl.navy.mil/itd/chacs/sites/edit-www.nrl.navy.mil.itd.chacs/file... or http://www.robgjansen.com/publications/kist-sec2014.pdf
aloha, Paul