On Mon, Feb 17, 2020 at 11:53:28AM +1000, teor wrote:
On 13 Feb 2020, at 22:05, zwiebeln zwiebeln@online.de wrote:
depended on a network that is 21 percent controlled by a single person that you don't know?
I agree that it's not best.
But I'll turn it around, and point out that many systems (e.g. most VPNs) are centralized, that is, the number is 100 percent.
(You might turn it back around and say that VPNs are companies and you have an agreement with them so nothing will go wrong. That's a good point too, though that trust should only go so far. It's not clear to me which one is the shakier argument. :)
"Let's encourage people to run more relays." [...] ultimately, if we doubled tor's exit bandwidth, this issue would go away. That's the best solution to this problem.
Agreed.
Though, hm. In the sense that Tor's security comes down to probabilities, it's not obvious that 20% of the network is much worse than 10% of the network. Let's say there is some activity which you do periodically, and you're worried that some relay-running adversary will be in a position to observe your traffic and learn about your activity ("watch an exit stream as it leaves the Tor network", "be chosen as your guard" etc). The actual probabilities depend on the specific attack we're talking about, but I can imagine some situations where the 20% relay operator is twice as likely to be in a position to do the attack compared to the 10% relay operator.
So for recurring behavior, that might be equivalent to saying that the 10% relay operator takes twice as long to succeed at the attack compared to the 20% relay operator. That multiplier doesn't seem like a substantial (qualitative) difference. Or said another way, if I'm not comfortable with the 20% attacker, I shouldn't be much more comfortable with the 10% attacker.
This analysis reminds me of the discussion from the "Users Get Routed" paper: https://www.freehaven.net/anonbib/#ccs2013-usersrouted https://blog.torproject.org/improving-tors-anonymity-changing-guard-paramete... where the aim is to choose the best parameters to slow down probabilistic attacks for as long as possible.
Centralization definitely makes me uncomfortable, but as you say, we also have to worry about centralization of where traffic goes between relays, centralization of undersea cables, centralization inside countries, etc.
It's times like this where I wish the world knew how to do mixing with streams. That is, there is a whole field out there on how to build stronger anonymity designs, based on mix-nets, but nobody knows how to do that safely when users generate flows of messages rather than just a single message.
I'll also ask our new Network Health team to consider the risk of large operators and large ASes. Hopefully they can recommend some changes to the bandwidth authorities (or sbws maintainers).
I definitely think there is a role to be played here by improved bandwidth scanners. I'm thinking of the tor-relay thread about the quintex relays: https://lists.torproject.org/pipermail/tor-relays/2020-January/018044.html where they're slowly losing their consensus weights, despite having plenty of capacity, and nobody understands why. Making sure that people who want to contribute a lot of bandwidth can actually do it is really important. So I do continue to think that accurate consensus weights are a huge piece of usefully moving forward here, which is why I told GeKo that in my opinion that's the #1 priority item for the network health team to get a handle on.
And then, once we have some confidence in our bandwidth weights, that's a great point to start exploring reducing centralization -- along several axes, not just relay operator concentration.
But even then, we would want to do it carefully. For example, let's say we declare that no relay family should get more than 10% of the total consensus weight for any relay role (guard, exit, etc). By adopting a policy like that, we could accidentally *increase* the total weight that actual bad relays receive, thus providing yet another incentive for attackers to assign their families incorrectly.
See also the tickets on whether MyFamily is a harmful idea, because it pulls traffic away from honest relay operators and sends it to dishonest ones: https://bugs.torproject.org/6676 https://bugs.torproject.org/15060
So in summary: (a) yes we should get more relays and more capacity, and (b) yes it is super important for us to get better at making the consensus weights accurate and predictable and well-understood, but also (c) there are a bunch of interconnected reasons why these two steps are important to do, and I think "help relay operators contribute and feel good about it" is much more urgent than, and ultimately a more productive fix, for "omg one family is currently 17% of the network."
Thanks! --Roger