Roger Dingledine:
On Sat, Mar 31, 2018 at 06:52:51AM +0000, Mike Perry wrote:
3.1. Eliminate path restrictions entirely
I'm increasingly a fan of this option, the more I read these threads.
Let's examine the two attacker assumptions behind two of the attacks we're worried about.
Attack one: the client's local ISP collects coarse netflow logs, and these logs aren't detailed enough to allow a traffic volume detection attack on an existing long-lived TLS flow, so the connection to that first guard is safe; but a connection to that second guard will be unusual and not multiplexed and at exactly the time of the adversary-controlled circuit that triggered it, so that second guard, because it is used so rarely, is dangerous to use.
Attack two: if the client uses its guard as the first hop of its circuit and also the adversary-requested fourth hop, then the guard can do pairwise traffic correlation attacks on all of its circuits and realize that these two circuits it has are really two pieces of the same circuit.
This second attack seems weird to me. One reason is because in attack one we're brushing aside the traffic analysis as hard, whereas in attack two we're assuming it's trivial and perfect. But the simpler reason is: if your guard is going to participate in a traffic correlation attack against you, then it could just as easily team up with some other relay that the adversary picked. That is, avoiding reusing your guard on the other end of the circuit isn't going to save you if your guard is out to get you.
I agree. I am not concerned about attack two. But we're not choosing between just these two attacks.
To be clear, the design I've been considering here is simply allowing reuse between the guard hop and the final hop, when it can't be avoided. I don't mean to allow the guard (or its family) to show up as all four hops in the path. Is that the same as what you meant, or did you mean something more thorough?
By all path restrictions I mean for the last hop of the circuit and the first (though vanguards would be simpler if we got rid of them for other hops, too). But I do mean all restrictions, not just guard node choice. The adversary also gets to force you to use a second network path whenever they want via the /16 and node family restrictions. And it happens naturally all the time.
We're not using one guard in the current Tor. We're using two, and the second one is only used for unmultiplexed activity. That is one property I don't like about our "let's pretend to use one guard" status quo.
The second thing I don't like is that one guard is fragile, which enables confirmation attacks when it can be made to go down.
I think "can't be avoided" means HSDir, IP, RP -- which I note are all onion service related circuits.
I'd like to hear more about the "cleverly crafted exit policy" attack, and I wonder if we can't solve that differently. For example, if it's about making you do a request to a port that only one exit relay allows, and ha ha whoops your guard was on the same /16 as that exit relay... maybe it's time for the dir auths to not advertise super rare ports? This was one of the topics in the users-get-routed paper too.
Yes that is the one I was talking about.
However, another way to do this type of exit rotation attack is to cause a client to look up a DNS name where you control the resolver, and keep timing out on the DNS response. The client will then retry the stream request with a new exit. The same thing can also be done by timing out the TCP handshake to a server you control. Both of these attacks can be done with only the ability to inject an img tag into a page.
You repeat this until an exit is chosen that is in the same /16 or family as the guard, and then the client uses a second network path for an unmultiplexed request at a time you control.
One non-starter idea would be to move onion-service-related Tors to two guards, and leave other Tors at one guard. It's a non-starter because of course advertising which you are to your local network is no good. But that idea gave me a different perspective on this discussion: I wonder how much this design decision comes down to making all Tors use two guards in order to protect the onion-service-related Tors, which are the only ones who actually need it?
Our path restrictions also cause normal exiting clients to use a second guard for unmultiplexed activity, at adversary controlled times, or just at periodically at random.
However, while removing path restrictions will solve the immediate problem, it will not address other instances where Tor temporarily opts use a second guard due to congestion, OOM, or failure of its primary guard, and we're still running into bugs where this can be adversarially controlled or just happen randomly[5].
I continue to think we need to fix these. I'm glad to see that George has been putting some energy into looking more at them. The bugs that we don't understand are especially worrying, since it's hard to know how bad they are. Moving to two guards might put a bit of a bandaid on the issues, but it can't be our long-term plan for fixing them.
We're choosing fixes for these bugs that enable an adversary to deny service to clients at a particular guard, *without* letting those clients move to a second guard. This enables confirmation attacks, and these confirmation attacks can be extended to guard discovery attacks by DoSing guards one at a time until an onion service fails.
Bringing back CREATE_FAST could help with this piece, I suppose, but it doesn't solve OOM attacks...
Note that for this analysis to hold, we have to ensure that nodes that are at RESOURCELIMIT or otherwise temporarily unresponsive do not cause us to consider other primary guards beyond than the two we have chosen. This is accomplished by setting guard-n-primary-guards to 2 (in addition to setting guard-n-primary-guards-to-use to 2). With this parameter set, the proposal 271 algorithm will avoid considering more than our two guards, unless *both* are down at once.
I like this general idea of not immediately replacing guards so long as you have a working one. In fact, we used to do something similar back in the day: https://blog.torproject.org/improving-tors-anonymity-changing-guard-paramete... says (emphasis mine) """ Tor 0.2.3's entry guard behavior is "choose three guards, ***adding another one if two of those three go down*** but going back to the original ones if they come back up, and also throw out (aka rotate) a guard 4-8 weeks after you chose it." """
There are still some fiddly decisions to make here. For example, as you say we probably shouldn't replacement a guard just because we failed to connect to one of our guards once. We might decide that it's time to add a new second guard if the consensus tells us that one of them is down (so we have confirmation that it isn't down for just us, it's down for everybody). Or we might decide to wait on adding a new one even if it really is down, because maybe it'll come back soon. But how long do we wait? And if, while we're down to one, we encounter one of these situations where the requested fourth hop overlaps with our remaining guard, what do we do?
If I were to drop everything to build the Tor I think should exist, I would do the following:
1. Use two guards, replacing them only when both are unreachable, or when one leaves the consensus. 2. Make path restrictions not as strict (for cases like the one above). 3. Use conflux (which also needs less strict/no path restrictions) 4. Build it on QUIC.
I would do them in that order because I think we get the most benefit from #1, and we get some benefit from #2 still (as you point out above).
You keep focusing on the performance aspects of conflux, but that is not the argument I am making. My arguments for conflux in Section 4 are about resilience to congestion, downtime, circuit killing, and DoS, as well as traffic analysis resistance. I see the performance benefits as secondary.
(I also think the best arguments for QUIC are also in the reliability direction, because fixed queues means no adversary provoked OOMing.)
In fact, here's a hopefully useful insight that I've just realized: you're not concerned about one guard vs two guards, you're concerned about *transitioning* between guards. It's that moment when you're starting to use a new guard, if the attacker can observe that you're doing it, and especially if the attacker can make you do it, that is vulnerable. And starting with two guards can help, in that it postpones the time until you're forced to transition, and maybe also because if we do it right it can make the transition less visible.
The transition aspect is a big piece of it, but I think we're also running into a fragility problem, which makes the transition signal very loud in many cases.
But I wonder if we're looking at this backwards, and the primary question we should be asking is "How can we protect the transition between guards?" Then one of the potential answers to consider is "Maybe we should start out with two guards rather than just one." Framing it that way, are there more options that we should consider too? For example, removing the ability of the non-local attacker to trigger a transition? Then there would still be visibility of a transition, but the (non-local) attacker can't impact the timing of the transition. How much does that solve? Need to think more.
One guard is inherently more fragile than two, and no matter what we do, it means that there will be a risk of attacks that can confirm guard choice, because the downtime during this transition can never be hidden without at least some redundancy.
In summary:
(1) I think we should fix the bug from #14917 where the attacker can push us off our guard just by naming our guard as the HSDir/IP/RP, and I think we should fix it by being willing to reuse our guard when it can't be avoided. That step will resolve some, but not all, of the pressure about moving to two guards. Then
Without removing all path restrictions that apply to first and last hop, we're still actually using two guards, and using them at times that the adversary gets to control if they want, or just randomly otherwise.
(2) Hopefully the above discussion has helped us move forward on the remaining reasons for switching to two guards. To me the two biggest questions left to resolve are (a) how best to protect the vulnerable transition to a new guard, and if two guards is the best idea we've got for that, and (b) how big an issue is it really that having only one guard can sometimes give you a low-performance guard, and if two guards is the best idea we've got for that one too.
Transitions will always be noisy with one guard, because it is fragile to DoS, congestion, OOM, circuit failure, onionskin overload, etc etc etc. How can you provide resiliency under arbitrary and partial failure without any redundancy?