Mike Perry:
Andrew Lewman:
I had a conversation with a vendor yesterday. They are interested in including Tor as their "private browsing mode" and basically shipping a re-branded tor browser which lets people toggle the connectivity to the Tor network on and off.
They very much like Tor Browser and would like to ship it to their customer base. Their product is 10-20% of the global market, this is of roughly 2.8 billion global Internet users.
The core problem is that the fraction of network capacity that you spend telling users about the current relays in the network can be written as:
f = D*U/B
D is current Tor relay directory size in bytes per day, U is number of users, and B is the bandwidth per day in bytes provided by this Tor network. Of course, this is a simplification, because of multiple directory fetches per day and partially-connecting/idle clients, but for purposes of discussion it is good enough.
To put some real numbers on this, if you compare https://metrics.torproject.org/bandwidth.html#dirbytes with https://metrics.torproject.org/bandwidth.html#bandwidth, you can see that we're currently devoting about 2% of our network throughput to directory activity (~120MiB/sec out of ~5000MiB/sec). So we're not exactly hurting at this point in terms of our directory bytes per user yet.
But, because this is fraction rises with both D and U, these research papers rightly point out that you can't keep adding relays *and* users and expect Tor to scale.
However, when you look at this f=D*U/B formula, what it also says is that if you can reduce the relay directory size by a factor c, and also grow the network capacity by this same factor c, then you can multiply the userbase by c, and have the same fraction of directory bytes.
Actually, there's an obvious and basic dimensional analysis failure here in this paragraph. Rather embarrassing.
What I meant to say is that if you increase U and B by the same factor c but keeping the overall directory size fixed, you get the same f ratio.
That's basically what I'm arguing: We can increase the capacity of the network by reducing directory waste but adding more high capacity relays to replace this waste, causing the overall directory to be the same size, but with more capacity.
It's possible to formalize all of this more precisely, too. I will think about it when I start working on more detailed cost estimations using data from the tor-relays thread. I think that data is the real key to figuring out what we actually need to make this work (in terms of funding and/or new relays).