Mike Perry mikeperry@torproject.org writes:
In-line below for ease of comment. Also available at: https://gitweb.torproject.org/user/mikeperry/torspec.git/tree/proposals/xxx-...
===========================
Filename: xxx-two-guard-nodes.txt Title: The move to two guard nodes Author: Mike Perry Created: 2018-03-22 Supersedes: Proposal 236
<snip>
3.1. Eliminate path restrictions entirely
If Tor decided to stop enforcing /16, node family, and also allowed the guard node to be chosen twice in the path, then under normal conditions, it should retain the use of its primary guard.
This approach is not as extreme as it seems on face. In fact, it is hard to come up with arguments against removing these restrictions. Tor's /16 restriction is of questionable utility against monitoring, and it can be argued that since only good actors use node family, it gives influence over path selection to bad actors in ways that are worse than the benefit it provides to paths through good actors[10,11].
However, while removing path restrictions will solve the immediate problem, it will not address other instances where Tor temporarily opts use a second guard due to congestion, OOM, or failure of its primary guard, and we're still running into bugs where this can be adversarially controlled or just happen randomly[5].
Hello Mike,
IMO we should not portray removing the above path restrictions as something extreme, until we have good evidence that those path restrictions offer something positive in the cases we are examining. Personally, I see the result of this proposal of making Sybil attacks two times more quick (section 2.3), as an equally radical result.
That said, I feel that this proposal is valuable and I'm not trying to say that I don't like this proposal, or that I don't buy the arguments. I'm trying to say that I don't know how to weight the tradeoffs here so that I gain confidence, because I'm not sure how people are trying to attack Tor clients right now.
The way I see it is that if we adopt this proposal: + We are better defended against active attacks like congestion attacks and OOM/DoS attacks. + We improve network health by reducing congestion to certain guards. - Sybil attacks can be performed two times more quickly.
IMO, we should not rush this decision for 034, given that it's a concensus parameter change that can happen instantaneously. However, we should do the following soon:
1) Accept that there is no single best guard topology, and fix our codebase to work well with either one guard or two guards, so that we are ready for when we flip the switch. Perhaps we can fix #25753/#25705/etc. in a way that works well both now and in the 2-guard future?
2) Investigate our current prop#271 codebase and make sure that the paragraph below will work as intended if we do this proposal.
3) Involve more peple into this (Roger, NRL, etc.) and have them think about this, to gain more confidence.
Do you think this approach is too slow or backwards?
Just to speed it up, I just did (2) below:
Note that for this analysis to hold, we have to ensure that nodes that are at RESOURCELIMIT or otherwise temporarily unresponsive do not cause us to consider other primary guards beyond than the two we have chosen. This is accomplished by setting guard-n-primary-guards to 2 (in addition to setting guard-n-primary-guards-to-use to 2). With this parameter set, the proposal 271 algorithm will avoid considering more than our two guards, unless *both* are down at once.
OK, the above paragraph is basically the juice of this proposal! I spent all day today to investigate how this would work! The results are very positive, but also not 100% straightforward because of the various intricancies of prop#271.
[First of all, there is no way to simulate the above topology using the config file because if you set NumEntryGuards=2 in your torrc, Tor will setup 4 primary guards because of the way get_n_primary_guards() works. So I hacked my Tor client to *have* 2 primary guards (guard-n-primary-guards), and *use* 2 primary guards (guard-n-primary-guards-to-use).]
The good part: This topology works exactly how the proposal wants it to work. Because of the way primary guards work, you will have 2 primary guards, and if one of them goes down you will always use the other primary, instead of falling back to a third guard. That's excellent, but it's also abusing the primary guard feature in a good way but not in the way we were intending it to be used.
Here are the side-effects from this abuse:
- By reducing the amount of primaries from three to two, it's more likely that all primaries can be down at a given time. Prop#271 was written with an inherent assumption that one of the primaries will always be reachable, because when all of them are down the code goes into an "oh shit! bad reachability!" mode which was mainly designed for network-down scenarios (like no-internet-land, or tunnels).
I'm refering to the UPDATE_WAITING section of prop#271 and entry_guards_upgrade_waiting_circuits() in our codebase which takes care of this situation. This behavior will basically delay circuits on non-primary guards until a primary guard goes online. You can test this behavior by blocking connections to all your primaries using iptables. I did this today, and while Tor worked fine after some time, there were delays and broken circuits. It's very likely we can optimize this behavior if we want, so this is not really a blocker for this proposal, but something we should think about and experiment with...
We might also want to consider writing code to block clients from skipping to lower-priority primary guards if higher-priority primary guards are still reachable and guard-n-primary-guards-to-use > 1, so that we can have more primary guards than we need without skipping them when one of them goes down. That would allow us to get both the effect of prop#291 while maintaining the original use of primary guards.
- If we set the number of primary guards to 2 and we leave NumDirectoryGuards to 3, then NumDirectoryGuards will not work as intended, and we will actually always use our two primary guards for dirinfo as long as one of them is reachable. This is not a huge problem, and might be a feature, but not the way we were intending to use NumDirectoryGuards (see #13908 and https://lists.torproject.org/pipermail/tor-dev/2014-May/006820.html).
Other than the above side-effects, Tor worked fine all day and only connected to the primary guards, even when I blocked connections to one of them. It was actually quite nice to see!
---
Hope this was useful and let me know if you have questions!