Re: [tor-dev] Proposal: The move to two guard nodes

11 Apr 2018


      Roger Dingledine:
...
On Sat, Mar 31, 2018 at 06:52:51AM +0000, Mike Perry wrote:
...
3.1. Eliminate path restrictions entirely
I'm increasingly a fan of this option, the more I read these threads.
Let's examine the two attacker assumptions behind two of the attacks
we're worried about.
Attack one: the client's local ISP collects coarse netflow logs, and these
logs aren't detailed enough to allow a traffic volume detection attack on
an existing long-lived TLS flow, so the connection to that first guard
is safe; but a connection to that second guard will be unusual and not
multiplexed and at exactly the time of the adversary-controlled circuit
that triggered it, so that second guard, because it is used so rarely,
is dangerous to use.
Attack two: if the client uses its guard as the first hop of its circuit
and also the adversary-requested fourth hop, then the guard can do
pairwise traffic correlation attacks on all of its circuits and realize
that these two circuits it has are really two pieces of the same circuit.
This second attack seems weird to me. One reason is because in attack
one we're brushing aside the traffic analysis as hard, whereas in attack
two we're assuming it's trivial and perfect. But the simpler reason is:
if your guard is going to participate in a traffic correlation attack
against you, then it could just as easily team up with some other relay
that the adversary picked. That is, avoiding reusing your guard on the
other end of the circuit isn't going to save you if your guard is out
to get you.
I agree. I am not concerned about attack two. But we're not choosing
between just these two attacks.
...
To be clear, the design I've been considering here is simply allowing
reuse between the guard hop and the final hop, when it can't be avoided. I
don't mean to allow the guard (or its family) to show up as all four
hops in the path. Is that the same as what you meant, or did you mean
something more thorough?
By all path restrictions I mean for the last hop of the circuit and the
first (though vanguards would be simpler if we got rid of them for other
hops, too). But I do mean all restrictions, not just guard node choice.
The adversary also gets to force you to use a second network path
whenever they want via the /16 and node family restrictions. And it
happens naturally all the time.
We're not using one guard in the current Tor. We're using two, and the
second one is only used for unmultiplexed activity. That is one property
I don't like about our "let's pretend to use one guard" status quo.
The second thing I don't like is that one guard is fragile, which
enables confirmation attacks when it can be made to go down.
...
I think "can't be avoided" means HSDir, IP, RP -- which I note are all
onion service related circuits.
I'd like to hear more about the "cleverly crafted exit policy" attack, and
I wonder if we can't solve that differently. For example, if it's about
making you do a request to a port that only one exit relay allows, and
ha ha whoops your guard was on the same /16 as that exit relay... maybe
it's time for the dir auths to not advertise super rare ports? This was
one of the topics in the users-get-routed paper too.
Yes that is the one I was talking about.
However, another way to do this type of exit rotation attack is to cause
a client to look up a DNS name where you control the resolver, and keep
timing out on the DNS response. The client will then retry the stream
request with a new exit. The same thing can also be done by timing out
the TCP handshake to a server you control. Both of these attacks can be
done with only the ability to inject an img tag into a page.
You repeat this until an exit is chosen that is in the same /16 or
family as the guard, and then the client uses a second network path for
an unmultiplexed request at a time you control.
...
One non-starter idea would be to move onion-service-related Tors to two
guards, and leave other Tors at one guard. It's a non-starter because of
course advertising which you are to your local network is no good. But
that idea gave me a different perspective on this discussion: I wonder
how much this design decision comes down to making all Tors use two
guards in order to protect the onion-service-related Tors, which are
the only ones who actually need it?
Our path restrictions also cause normal exiting clients to use a second
guard for unmultiplexed activity, at adversary controlled times, or just
at periodically at random.
...
...
However, while removing path restrictions will solve the immediate
  problem, it will not address other instances where Tor temporarily opts
  use a second guard due to congestion, OOM, or failure of its primary
  guard, and we're still running into bugs where this can be adversarially
  controlled or just happen randomly[5].
I continue to think we need to fix these. I'm glad to see that George
has been putting some energy into looking more at them. The bugs that
we don't understand are especially worrying, since it's hard to know
how bad they are. Moving to two guards might put a bit of a bandaid on
the issues, but it can't be our long-term plan for fixing them.
We're choosing fixes for these bugs that enable an adversary to deny
service to clients at a particular guard, *without* letting those
clients move to a second guard. This enables confirmation attacks, and
these confirmation attacks can be extended to guard discovery attacks by
DoSing guards one at a time until an onion service fails.
Bringing back CREATE_FAST could help with this piece, I suppose, but it
doesn't solve OOM attacks...
...
...
Note that for this analysis to hold, we have to ensure that nodes that
  are at RESOURCELIMIT or otherwise temporarily unresponsive do not cause
  us to consider other primary guards beyond than the two we have chosen.
  This is accomplished by setting guard-n-primary-guards to 2 (in addition
  to setting guard-n-primary-guards-to-use to 2). With this parameter
  set, the proposal 271 algorithm will avoid considering more than our two
  guards, unless *both* are down at once.
I like this general idea of not immediately replacing guards so long as
you have a working one. In fact, we used to do something similar back
in the day:
https://blog.torproject.org/improving-tors-anonymity-changing-guard-paramete...
says (emphasis mine)
"""
Tor 0.2.3's entry guard behavior is "choose three guards, ***adding
another one if two of those three go down*** but going back to the
original ones if they come back up, and also throw out (aka rotate)
a guard 4-8 weeks after you chose it."
"""
There are still some fiddly decisions to make here. For example, as you
say we probably shouldn't replacement a guard just because we failed to
connect to one of our guards once. We might decide that it's time to add
a new second guard if the consensus tells us that one of them is down
(so we have confirmation that it isn't down for just us, it's down for
everybody). Or we might decide to wait on adding a new one even if it
really is down, because maybe it'll come back soon. But how long do
we wait? And if, while we're down to one, we encounter one of these
situations where the requested fourth hop overlaps with our remaining
guard, what do we do?
If I were to drop everything to build the Tor I think should exist, I
would do the following:
1. Use two guards, replacing them only when both are unreachable, or
   when one leaves the consensus.
2. Make path restrictions not as strict (for cases like the one above).
3. Use conflux (which also needs less strict/no path restrictions)
4. Build it on QUIC.
I would do them in that order because I think we get the most benefit
from #1, and we get some benefit from #2 still (as you point out above).
You keep focusing on the performance aspects of conflux, but that is not
the argument I am making. My arguments for conflux in Section 4 are
about resilience to congestion, downtime, circuit killing, and DoS, as
well as traffic analysis resistance. I see the performance benefits as
secondary.
(I also think the best arguments for QUIC are also in the reliability
direction, because fixed queues means no adversary provoked OOMing.)
...
In fact, here's a hopefully useful insight that I've just realized:
you're not concerned about one guard vs two guards, you're concerned
about *transitioning* between guards. It's that moment when you're
starting to use a new guard, if the attacker can observe that you're
doing it, and especially if the attacker can make you do it, that is
vulnerable. And starting with two guards can help, in that it postpones
the time until you're forced to transition, and maybe also because if
we do it right it can make the transition less visible.
The transition aspect is a big piece of it, but I think we're also
running into a fragility problem, which makes the transition signal very
loud in many cases.
...
But I wonder if we're looking at this backwards, and the primary
question we should be asking is "How can we protect the transition between
guards?" Then one of the potential answers to consider is "Maybe we should
start out with two guards rather than just one." Framing it that way,
are there more options that we should consider too? For example, removing
the ability of the non-local attacker to trigger a transition? Then
there would still be visibility of a transition, but the (non-local)
attacker can't impact the timing of the transition. How much does that
solve? Need to think more.
One guard is inherently more fragile than two, and no matter what we do,
it means that there will be a risk of attacks that can confirm guard
choice, because the downtime during this transition can never be hidden
without at least some redundancy.
...
In summary:
(1) I think we should fix the bug from #14917 where the attacker can
push us off our guard just by naming our guard as the HSDir/IP/RP,
and I think we should fix it by being willing to reuse our guard when
it can't be avoided. That step will resolve some, but not all, of the
pressure about moving to two guards. Then
Without removing all path restrictions that apply to first and last hop,
we're still actually using two guards, and using them at times that the
adversary gets to control if they want, or just randomly otherwise.
...
(2) Hopefully the above discussion has helped us move forward on the
remaining reasons for switching to two guards. To me the two biggest
questions left to resolve are (a) how best to protect the vulnerable
transition to a new guard, and if two guards is the best idea we've got
for that, and (b) how big an issue is it really that having only one
guard can sometimes give you a low-performance guard, and if two guards
is the best idea we've got for that one too.
Transitions will always be noisy with one guard, because it is fragile
to DoS, congestion, OOM, circuit failure, onionskin overload, etc etc
etc. How can you provide resiliency under arbitrary and partial failure
without any redundancy?
-- 
Mike Perry

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [tor-dev] Proposal: The move to two guard nodes