Mike Perry mikeperry@torproject.org writes:
George Kadianakis:
Mike Perry <mikeperry at torproject.org> writes:
<snip>
I have mixed feelings about this.
- If client guard discovery is the main reason we are doing this, I think
we should first look into these guard discovery vectors individually and figure out how concerning they are and if there is anything else we can do to block them,
I agree this is worthwhile, if only to better understand the design space. However, I think we're going to find that most applications we envision can be induced into violating many of the ad-hoc mitigations we try to bake in.
OK. Let's see. I feel that these guard discovery attacks can be blocked with:
a) If an IP listed on an HS descriptor tells you that it doesn't know the HS, then ignore it for this hidden service today.
b) If an HSDir that should have an HS descriptor tells you that it doesn't have it, then don't ask it again this hour.
I think we do both checks right now in the Tor codebase and we also have caches so that we don't retry the same nodes. If we are serious, we could even write those caches on disk.
I feel that if an application restarts Tor or flushes those caches because a hidden service does not work, then the application is doing it wrong.
Also even with client vanguards I think the checks above will still have to be implemented. I could imagine an application that flushes all the DataDirectory if the hidden service stops working, and then even vanguards won't save them.
In general, I'm not sure how much sanity we can assume from third-party applications.
before complicating path selection even more.
I feel like you're actually going to end up complicating the implementation more with this position. If we have to have separate path selection modes for service side and client side, we then have to maintain three different path selection mechanisms in Tor: normal exit, onion services, and onion clients.
If we gave the same options for both hidden services and clients, we are at least down to two systems (exit vs non-exit), with some minor options for each.
Hmmm maybe. But onion clients would look very much like normal exit, but they would connect to RPs/IPs, instead of exits. Just like the code is now.
Also, with vanguards if we end up doing something like:
HSDir: C - L - S - E - HSDir IP: C - L - S - E - IP Rend: C - L - M - RP -- S - M - L - HS
we have three different path types here. We would need to write very beautiful interfaces if we want this to be done by the same code.
- Also, I like symmetry myself, but I wouldn't change path selection and security just for that _if I can help it_.
<snip>
Hsdir post/fetch:
- C - L - M - S - E - HC - L - M - S - E - H
- C - L - S - E - H
- C - L - S - H
Intro:
- C - L - M - S - E -- I - S - M - L - H
- C - L - S - E -- I - S - L - H
*3. C - L - S -- I&S - L - H (* IP Intersection attack!)
Rend:
- C - L - M - S - R -- E - S - M - L - H
- C - L - S - R -- E - S - L - H
- C - L - R&S -- S - L - H
What is R&S is here? Clients use static short-lifespan rendezvous points?
Yes. Similarly for I&S (which we should not do - it's bad in every variation of Vanguards).
I don't see any such problems with R&S though, since R is not associated with any publicly viewable information, I don't think it is as big of a problem. At best its a linkability risk for the client. But maybe I missed something.
Hmm, the only problem I can see here is that the R&S can link clients based on the L node. So for example, in the crazy edge case where only one client conncets to hidden services through R&S over L, then R&S could count "Ah this client has done 42 rendezvous through me in the past 5 hours". And if that's a ricochet client with 42 contacts maybe it's a selector. But I think this is a pretty far fetched example...
Another _big_ gotcha here is that let's say we end up doing:
HSDir: C - L - M - S - E - HSDir IP: C - L - M - S - E - IP Rend: C - L - S - RP -- S - M - L - HS
and all the 'S' nodes are taken from the same pool, then the 'L' node will be able to learn 'M' by looking at the IP circuits, and learn 'S' by looking at the rend circuit. So it will basically be able to derive the full circuit.
We need to be very careful about which paths we pick, and which "guardsets" we get the nodes from.
Looking at these, we can see that we sacrifice the middle guards in the second option, which will come at the cost of one less compromise attack (but still the need to compromise the long-lived guard). We also lose the unlinkability in the third option, and this actually bites us in Intro 3: the hidden service L guard can perform a long-term intersection attack, watching for published intro points and matching that to the circuits that H makes to them. So that path length probably should not be used.
<snip>
However, I still have mixed feelings about changing client path selection as part of proposal 247:
- My main issue is that I think figuring out the right client path
selection will require a _heavy_ amount of security analysis that will delay prop247 even more. I was hoping that we could treat the client-side as an orthogonal problem and tackle it in the future separately. But maybe I'm totally wrong and should be more patient and these two problems should be handled together.
I think patience is best, because if we don't understand this problem really well, we're liable to miss something. Or cement ourselves off from a potential future of interactive HS voice+video. Neither one is a great failure mode.
Agreed.
I think for many applications (esp the browser and ricochet), we're going to find that we need to protect the client just as much as the server.
- If the above changes only happen to HS circuits, we make it harder to
make HS circuits indistinguishable from normal circuits on the face of traffic analysis. But maybe we have already lost this game.
We already lost that game until we have multihop padding. Proposal 247 already outlines how to use it in section 4.1 to help conceal vanguard usage.
It is also worth pointing out that if we fail to conceal the HS vanguard fingerprint entirely with padding, it will be especially valuable to have more than just 30k service-side instances with the vanguard fingerprint. Far better to have all the clients in that anonymity set, too, I think.
Yes that's true. This seems to be the main argument for doing client vanguards right now for me.
However, to actually achieve any sort of confusion here, we need to ensure that the paths between clients and HSes are symmetric. So for example if we end up doing:
C - L - S - E -- IP - S - M - L - H
then the L guard could distinguish clients from HSes by looking at whether the second hop is short lived ('S') or medium lived ('M').
Woohoo! Anonymity!