Andrew Lewman:
I had a conversation with a vendor yesterday. They are interested in including Tor as their "private browsing mode" and basically shipping a re-branded tor browser which lets people toggle the connectivity to the Tor network on and off.
They very much like Tor Browser and would like to ship it to their customer base. Their product is 10-20% of the global market, this is of roughly 2.8 billion global Internet users.
As Tor Browser is open source, they are already working on it. However ,their concern is scaling up to handling some percent of global users with "tor mode" enabled. They're willing to entertain offering their resources to help us solve the scalability challenges of handling hundreds of millions of users and relays on Tor.
As this question keeps popping up by the business world looking at privacy as the next "must have" feature in their products, I'm trying to compile a list of tasks to solve to help us scale. The old 2008 three-year roadmap looks at performance, https://www.torproject.org/press/2008-12-19-roadmap-press-release.html.en
I've been through the specs, https://gitweb.torproject.org/torspec.git/tree/HEAD:/proposals to see if there are proposals for scaling the network or directory authorities. I didn't see anything directly related.
The last research paper I see directly addressing scalability is Torsk (http://www.freehaven.net/anonbib/bibtex.html#ccs09-torsk) or PIR-Tor (http://www.freehaven.net/anonbib/bibtex.html#usenix11-pirtor)
These research papers basically propose a total network overhaul to deal with the problem of Tor relay directory traffic overwhelming the Tor network and/or Tor clients.
However, I believe that with only minor modifications, the current Tor network architecture could support 100M daily directly connecting users, assuming we focus our efforts on higher capacity relays and not simply adding tons of slower relays.
The core problem is that the fraction of network capacity that you spend telling users about the current relays in the network can be written as:
f = D*U/B
D is current Tor relay directory size in bytes per day, U is number of users, and B is the bandwidth per day in bytes provided by this Tor network. Of course, this is a simplification, because of multiple directory fetches per day and partially-connecting/idle clients, but for purposes of discussion it is good enough.
To put some real numbers on this, if you compare https://metrics.torproject.org/bandwidth.html#dirbytes with https://metrics.torproject.org/bandwidth.html#bandwidth, you can see that we're currently devoting about 2% of our network throughput to directory activity (~120MiB/sec out of ~5000MiB/sec). So we're not exactly hurting at this point in terms of our directory bytes per user yet.
But, because this is fraction rises with both D and U, these research papers rightly point out that you can't keep adding relays *and* users and expect Tor to scale.
However, when you look at this f=D*U/B formula, what it also says is that if you can reduce the relay directory size by a factor c, and also grow the network capacity by this same factor c, then you can multiply the userbase by c, and have the same fraction of directory bytes.
This means that rather than trying to undertake a major network overhaul like TorSK or PIR-Tor to try to support hundreds of thousands of slow junky relays, we can scale the network by focusing on improving the situation for high capacity relay operators, so we can provide more network bandwidth for the same number of directory bytes per user.
So, let's look at ways to reduce the size of the Tor relay directory, and each way we can find to do so means a corresponding increase in the number of users we can support:
1. Proper multicore support.
Right now, any relay with more than ~100Mbit of capacity really needs to run an additional tor relay instance on that link to make use of it. If they have AES-NI, this might go up to 300Mbit.
Each of these additional instances is basically wasted directory bytes for those relay descriptors.
But with proper multicore support, such high capacity relays could run only one relay instance on links as fast as 2.5Gbit (assuming an 8 core AES-NI machine).
Result: 2-8X reduction in consensus and directory size, depending on the the number of high capacity relays on multicore systems we have.
2. Cut off relays below the median capacity, and turn them into bridges.
Relays in the top 10% of the network are 164 times faster than relays in the 50-60% range, 1400 times faster than relays in the 70-80% range, and 35000 times faster than relays in the 90-100% range.
In fact, many relays are so slow that they provide less bytes to the network than it costs to tell all of our users about them. There should be a sweet spot where we can set this cutoff such that the overhead from directory activity balances the loss of capacity from these relays, as a function of userbase size.
Result: ~2X reduction in consensus and directory size.
3. Switching to ECC keys only.
We're wasting a lot of directory traffic on uncompressible RSA1024 keys, which are 4X larger than ECC keys, and less secure. Right now, were also listing both. When we finally remove RSA1024 entirely, the directory should get quite a bit smaller.
Result: ~2-4X reduction in consensus and directory size.
4. Consensus diffs.
With proposal 140, we can save 60% of the directory activity if we send diffs of the consensus for regularly connecting clients. Calculating the benefit from this is complicated, since if clients leave the network for just 16 hours, there is very little benefit to this optimization. These numbers are highly dependent on churn though, and it may be that by removing most of the slow junk relays, there is actually less churn in the network, and smaller diffs: https://gitweb.torproject.org/torspec.git/blob/HEAD:/proposals/140-consensus...
Let's just ballpark it at 50% for the typical case.
Result: 2X reduction in directory size.
5. Invest in the Tor network.
Based purely on extrapolating from the Noisebridge relays, we could add ~300 relays, and double the network capacity for $3M/yr, or about $1 per user per year (based on the user counts from: https://metrics.torproject.org/users.html).
Note that this value should be treated as a minimum estimate. We actually want to ensure diversity as we grow the network, which may make this number higher. I am working on better estimates using replies from: https://lists.torproject.org/pipermail/tor-relays/2014-September/005335.html
Automated donation/funding distribution mechanisms such as https://www.oniontip.com/ are especially interesting ways to do this (and can even automatically enforce our diversity goals) but more traditional partnerships are also possible.
Result: 100% capacity increase for each O($3M/yr), or ~$1 per new user per year.
So if we chain #1-4 all together, using the low estimates, we should be able to reduce directory size by at least 2*2*2*2 or 8X.
Going back to the f=D*U/B formula, this means that we should be able to add capacity to support 8X more users, while *still* maintaining our 2% directory overhead percentage. This would be 2.5M users * 8, or 20M directly connecting users.
If we were willing to tolerate 10% directory overhead this would allow for 5 times as many users. In other words, 100M daily connecting users.
We would still need to find some way to fund the growth of the network to support this 40X increase, but there are no actual *technical* reasons why it cannot be done.