Heyo.
We're going to have a meeting to discuss Proposal 291. See this thread: https://lists.torproject.org/pipermail/tor-dev/2018-April/013053.html
The meeting will be at 17:00 UTC, on Wednesday, April 18th, in #tor-meeting on irc.oftc.net. (That's 10:00 left coast, 12:00 middle coast, 13:00 right coast, and 19:00 in several socialist paradises that strangely do not have public water fountains.) https://www.timeanddate.com/worldclock/fixedtime.html?iso=20180415T1700
Things we need to decide: 1. Do we abandon Tor's path restrictions? 2. Do we use two guards?
At the end of this meeting, we should commit to one or both of these things long-term. (Surprise twist: we're already doing #2!)
Each of these choices is a nuanced thing. And just picking one or the other doesn't solve everything. I think it's best to think of them as a commitment to a plan over some timescale, based on the information we have available today.
People who mos def should attend: George Kadianakis, Roger, Nick, Me
People who probably maybe should attend: Aaron Johnson, Isis (and others concerned about guard fingerprinting), You?
Mike Perry:
Heyo.
We're going to have a meeting to discuss Proposal 291. See this thread: https://lists.torproject.org/pipermail/tor-dev/2018-April/013053.html
Ok, we had this meeting. High level (ammended) action items are:
1. Use patches in https://trac.torproject.org/projects/tor/ticket/25843 to set NumEntryGuards=2 in torrc, and observe results. Please join us! Stuff we are looking for during testing is on that ticket! 2. Merge that patch to make the torrc guard options do what we meant for them to do. Probably backport it. 3. Descibe adversary models for our variant proposals from the notes. (Why do we disagree? In Mike's case, my disagreements are because I think each step is an improvement over previous/status quo -- we can decide harder things later and still do better both now and later.) 4. Agree on an order of operations for fixes+changes, ideally such that we don't block forever trying to come up with a perfect solution. Things are pretty bad now. All we really need to do is agree on steps to make it better.
The full meeting logs are here: http://meetbot.debian.net/tor-meeting/2018/tor-meeting.2018-04-18-17.01.log....
Our notes from the pad (https://pad.riseup.net/p/TwoGuardMeeting) are also below, for archival. Please comment further here on list or in the testing ticket, not on the pad. It will disappear eventually (and/or get edited by randos). Please pay particular attention to the proposal variants we have below, and weigh in if you like (especially with adversary differentiation).
===============================
Things to decide: 1. Remove some or all of Tor's path restrictions? 1a. Remove some, for some hops? (Allow just same node, or same /16 + family two? and for which hops?) 1b. Remove all? 1c. Allow "same node, same /16, same family" between guard and last hop. If it's a 3-hop circ (A - B - A), extend it to a 4-hop circ (A - B - C - A). 2. Use two guards? 2a. Set prop#271 values? 2b. Modify prop#271 behavior? 2c. Two directory guards? 3. Alternatives? 3a. Allow some leakage about the guard, such as dividing guards into sets sharing similar /16 and family restrictions and then choosing exits and middles in a way that violates no path restrictions for any guard in your set. Taken to the extreme, we get the radical solution of two Tors: A-Tor and B-Tor. A-Tor exits, middles, and guards don't conflict each other, and similary for B-Tor. Alternately, we can just enforce that no exit is in the same /16 or family as any guard.
Reasons for 1: 1. Eliminates cases where adversary gets to influence your guard choice 2. Doing 1b also makes vanguard implementation simpler (no risk of choosing an impossible set of vanguards)
Blockers to 1: 1. Relay operators may like node family as protection? 2. 1b would make nearly _all_ kinds of path restriction impossible, indefinitely. 3. Circular paths make traffic analysis easier. 4. Circular paths are scary. :/
Reasons for 2: 1. Two guards inherently more resilient to downtime/DoS than one. 2. Helps conceal transition information when adding/removing single guards 3. Conflux will help us in more ways than just performance (reliability, congestion/DoS resistence)
Blockers for 2: 1. Current Prop#271 options may not be what we want (what do we do when two guards go down?) 2. May still need to remove/relax some restrictions, to avoid using 3rd guard if one is down. 3. Sybil time is halved (but still large) 4. Prop#271 mishandles directory guards (but maybe in a way we want it to) 5. Two-equal-guards means 2X external observers on the path for 1/2 of client traffic (but more multiplexed activity)
Relevant tickets related to guard-selection/path-restriction designs: https://trac.torproject.org/projects/tor/ticket/14917 (Original bug that cuased us to use a second guard) https://trac.torproject.org/projects/tor/ticket/25347 (Clients thrash at one busy guard) https://trac.torproject.org/projects/tor/ticket/13908 (one directory guard?) https://trac.torproject.org/projects/tor/ticket/25546 (vanguard patches -- open children are all about restriction issues) https://trac.torproject.org/projects/tor/ticket/25783 (prop#271 bug we might encounter if we switch to prop#291 (2 primaries) right now. there's probably more where this came from) https://bugs.torproject.org/17773 (How to transition if guard lose guard flag?) https://bugs.torproject.org/2998 (Bridge path restriction circuit failure bug) Other relevant tickets: https://trac.torproject.org/projects/tor/ticket/24309 (UX for communicating guard purpose / protection to user)
Roger's proposal: * Remove /16 and family path restrictions between guard and last hop * Optionally, dir auths don't give you Guard if you're an Exit * Use first guard but pad to backup guard so the switch isn't as obvious * First and backup guard are chosen in different /16's and different families
asn proposal: * Allow "same node, same /16, same family" between guard and last hop. If it's a 3-hop circ (A - B - A), extend it to a 4-hop circ (A - B - C - A). * Switch to two primary guards; and revisit prop#271 as needed to make this possible and good.
Nick's proposal: * allow two primary guards * tweak guard design so that primary guards are not chosen in same /16 or family * separately, consider relaxing path restriction rules. Not removing. * separately, consider other proposals for new behavior on guard failure (as modification to guard-spec). * separately, consider requiring introduce cells to contain >=two possible rendezvous points in separate families. * separately, require that introduction points be chosen from different families.
Aaron's proposal: * Use first guard but pad to backup guard so the switch isn't as obvious * First and backup guard are chosen in different /16's and different families
Mike's proposal: * Set "num primary guards"=2 and "num primary guards to use"=2 * Make no other changes right now * File a path selection parent ticket to decide/fix path selection issues * Tweak prop#271 behavior when both guards are down * Investigate either favor-one-guard preference, conflux, and/or padding, but do this carefully.
Concrete things we can do now: #1: ourselves set those guard params to 2 and find bugs. once #3 below is done, encourage others, like on tor-talk, to do it too. #2: enumerate the current situations where we use a guard other than our first guard, especially noting the ones where the attacker can make us use a guard other than our first guard. fix as many as we want to fix. maybe categorize by whether they cause us to mark our first guard as down or not. #3: merge a patch to make the torrc guard options do what we meant for them to do #4 Descibe adversary models for above proposals? (Why do we disagree? In Mike's case, my disagreements are primarily because I think ech step is an improvement over previous/status quo -- we can decide harder things later and still do better).
===================
Mike Perry mikeperry@torproject.org writes:
Mike Perry:
Heyo.
We're going to have a meeting to discuss Proposal 291. See this thread: https://lists.torproject.org/pipermail/tor-dev/2018-April/013053.html
Ok, we had this meeting. High level (ammended) action items are:
- Use patches in https://trac.torproject.org/projects/tor/ticket/25843 to set NumEntryGuards=2 in torrc, and observe results. Please join us! Stuff we are looking for during testing is on that ticket!
- Merge that patch to make the torrc guard options do what we meant for them to do. Probably backport it.
Hello,
I wrote the patch on #25843 and I'm now testing 2-guards on my Tor. So far so good, but I think we need people on more unstable connections to test this.
- Descibe adversary models for our variant proposals from the notes. (Why do we disagree? In Mike's case, my disagreements are because I think each step is an improvement over previous/status quo -- we can decide harder things later and still do better both now and later.)
Here is my proposal, but please don't consider it set on stone. I actually think these are really complicated issues that take a while to understand, and we should probably not rush it. Even on a short first IRC meeting we came up with new issues and ideas while discussing this topic.
asn proposal: 1) Allow "same node, same /16, same family" between guard and last hop. If it's a 3-hop circ (A - B - A), extend it to a 4-hop circ (A - B - C - A). 2) Switch to two primary guards; and revisit prop#271 as needed to make this possible and good.
Rationale:
I care about an attacker who is trying to deanon Tor client by setting up Tor nodes and comboing various active attacks. In particular, I worry about adversary who uses guard discovery to learn client's guard nodes and then uses #14917 or tries to DoS them.
I like two guards because it makes us stronger and more redundant against such attacks, and also because it improves congestion. The "pad-to-backup" idea seems too experimental to me, and not sufficiently specified right now hence I'm unable to analyze it (e.g. how much do we pad, how often, can this actually mask us against adversary who launches #14917 repeatedly?).
I propose altering the above path restrictions because that seems to be the only way to concretely defend against #14917 (e.g. see attacks against idle clients on meeting log, etc.). Attackers who have already owned our guard node are not in my threat model wrt these attacks. IMO simple A - B - A path restrictions don't help us against such persistent adversaries; e.g. attacker can simply spawn up another tiny relay C on another data center and do an A - B - C correlation attack.
- Agree on an order of operations for fixes+changes, ideally such that we don't block forever trying to come up with a perfect solution. Things are pretty bad now. All we really need to do is agree on steps to make it better.
I think (1) and (2) above can be considered as orthogonal issues and get done in any order. IMO, here are the prerequisites for doing these tasks:
For path restrictions: Specify current path restrictions through the whole Tor circuit and write a concrete proposal with proposed changes. I think we are looking for 0.3.5 if we want to do this.
For 2-guards: Get the 2-guard design sufficiently tested to ensure that we are not gonna bug out the whole network by switching to 2-guards. I'm particularly worried about clients on bad networks, and clients continuously flapping on-and-off the net. If we toggle the consensus param switch soon, we should be prepared for another round of guard bugs in 034, and that's fine.
Cheers! :)
Mike Perry:
Mike Perry:
Heyo.
We're going to have a meeting to discuss Proposal 291. See this thread: https://lists.torproject.org/pipermail/tor-dev/2018-April/013053.html
- Describe adversary models for our variant proposals from the notes. (Why do we disagree? In Mike's case, my disagreements are because I think each step is an improvement over previous/status quo -- we can decide harder things later and still do better both now and later.)
Ok, in the interest of getting closer to an adversary model, let's first start with enumerating the properties the proposals below provide. Properties #1-5 have parenthesis at the end of them. When the condition in parenthesis is met for property #N, we'll call that "strong #N".
1. Hidden service use can't push you over to an unused guard (at all). 2. Hidden service use can't influence your choice of guard (at all). 3. Exits and websites can't push you over to an unused guard (at all) 4. DoS/Guard node downtime signals are rare (absent) 5. Nodes are not reused for Guard and Exit positions ("any" positions) 6. Information about the guard(s) does not leak to the website/RP (at all). 7. Relays in the same family can't be forced to correlate Exit traffic.
Roger's proposal:
- Remove /16 and family path restrictions between guard and last hop
- Optionally, dir auths don't give you Guard if you're an Exit
- Use first guard but pad to backup guard so the switch isn't as obvious
- First and backup guard are chosen in different /16's and different families
Depending on how good the padding is, this proposal maybe-provides: 1. Hidden service use can't push you over to an unused guard (at all). 3. Exits and websites can't push you over to an unused guard (at all)
Depending on how good the detection mechanism is: 4. DoS/Guard node downtime signals are much more rare (absent)
It provides strong: 5. Nodes are not reused for Guard and Exit positions ("any" positions)
It provides: 7. Relays in the same family can't be forced to correlate Exit traffic.
It does not provide: 2. Hidden service use can't influence your choice of guard (at all). 6. Information about the guard(s) does not leak to the website/RP (at all).
asn proposal: * Allow "same node, same /16, same family" between guard and last hop. If it's a 3-hop circ (A - B - A), extend it to a 4-hop circ (A - B - C - A). * Switch to two primary guards; and revisit prop#271 as needed to make this possible and good.
This proposal provides strong: 1. Hidden service use can't push you over to an unused guard (at all). 2. Hidden service use can't influence your choice of guard (at all). 3. Exits and websites can't push you over to an unused guard (at all)
If we fix prop#271's downtime detection for the two primaries, it provides: 4. DoS/Guard node downtime signals are rare (absent)
If the client chooses its primary guards from the same /16 or family, it does not provide #6 (since the hop before the RP won't ever be in that family): 6. Information about the guard(s) does not leak to the website/RP (at all).
It does not provide: 5. Nodes are not reused for Guard and Exit positions ("any" positions) 7. Relays in the same family can't be forced to correlate Exit traffic.
Nick's proposal: * allow two primary guards * tweak guard design so that primary guards are not chosen in same /16 or family * separately, consider relaxing path restriction rules. Not removing. * separately, consider other proposals for new behavior on guard failure (as modification to guard-spec). * separately, consider requiring introduce cells to contain >=two possible rendezvous points in separate families. * separately, require that introduction points be chosen from different families.
In the short term, this proposal provides #1,3-4,6 (not strong, because if one of the primary guards is down, you can be forced into using a third): 1. Hidden service use can't push you over to an unused guard (at all). 3. Exits and websites can't push you over to an unused guard (at all) 4. DoS/Guard node downtime signals are rare (absent) 6. Information about the guard(s) does not leak to the website/RP (at all).
In the short term, it gets strong #5 and #7, though this may change if we relax restrictions. 5. Nodes are not reused for Guard and Exit positions ("any" positions) 7. Relays in the same family can't be forced to correlate Exit traffic.
Changing the introduce cell will provide strong #1-2. Improving guard failure conditions gets it strong #4: 1. Hidden service use can't push you over to an unused guard (at all). 2. Hidden service use can't influence your choice of guard (at all). 4. DoS/Guard node downtime signals are rare (absent)
Aaron's proposal:
- Use first guard but pad to backup guard so the switch isn't as obvious
- First and backup guard are chosen in different /16's and different families
Depending on how good the padding is, this proposal maybe-provides: 1. Hidden service use can't push you over to an unused guard (at all). 3. Exits and websites can't push you over to an unused guard (at all)
Depending on how good the detection mechanism is: 4. DoS/Guard node downtime signals are much more rare (absent)
It provides strong #5: 5. Nodes are not reused for Guard and Exit positions ("any" positions)
It provides #7: 7. Relays in the same family can't be forced to correlate Exit traffic.
It does not provide #2 or #6: 2. Hidden service use can't influence your choice of guard (at all). 6. Information about the guard(s) does not leak to the website/RP (at all).
Mike's proposal from the meeting:
- Set "num primary guards"=2 and "num primary guards to use"=2
- Make no other changes right now
- File a path selection parent ticket to decide/fix path selection issues
- Tweak prop#271 behavior when both guards are down
- Investigate either favor-one-guard preference, conflux, and/or padding, but do this carefully.
In the short term, this proposal provides #1,3-4,6 (not strong, because if one of the primary guards is down, you can be forced into using a third): 1. Hidden service use can't push you over to an unused guard (at all). 3. Exits and websites can't push you over to an unused guard (at all) 4. DoS/Guard node downtime signals are rare (absent) 6. Information about the guard(s) does not leak to the website/RP (at all).
If you get unlucky and choose both primaries from the same /16 or family, you also lose #1,3,6.
In the short term, it gets strong #5 and #7, though this may change if we relax restrictions: 5. Nodes are not reused for Guard and Exit positions ("any" positions) 7. Relays in the same family can't be forced to correlate Exit traffic.
Improving guard failure conditions gets it strong #4: 4. DoS/Guard node downtime signals are rare (absent)
It does not provide: 2. Hidden service use can't influence your choice of guard (at all).
[Mike's rules proposal to Roger in the other thread] In the world where we keep path restrictions, these would be my rules:
- Two equal guards, chosen from not the same /16 or family
- Choose each vanguard layer members such that each layer has at least one node from a unique /16 and family.
- Build paths in a strict order, from last hop towards guard. If you can't build a path with this ordering, start over with a sampled guard. (With rule #1 and #2, this should be very rare and should mean that a guard is marked down locally but still marked up in the consensus.)
- No guards as exits (Not needed but do it anyway for other reasons).
Then under these rules, you decide to use a new primary guard, if: 0. When a guard leaves the consensus, replace it with a new primary guard.
- Temporarily pick a new guard when your two primaries are locally down or unusable (ie step #3 above fails).
This gets #1 and #6, but not strong (if one guard is temporarily down): 1. Hidden service use can't push you over to an unused guard (at all). 6. Information about the guard(s) does not leak to the website/RP (at all).
It gets strong #3-5: 3. Exits and websites can't push you over to an unused guard (at all) 4. DoS/Guard node downtime signals are rare (absent) 5. Nodes are not reused for Guard and Exit positions ("any" positions)
It does not provide: 2. Hidden service use can't influence your choice of guard (at all).
======================================================================
Ok, so here's a proposal that gets strong #1-4, and regular #5-7. It is my current favorite:
- Set "num primary guards"=2 and "num primary guards to use"=2
- Don't give Exit nodes the Guard flag.
- Allow "same node, same /16, same family" between guard and last hop, but only for HS circuits (which are at least 4 hops long for these cases).
- Allow same /16 and same family for HS circuits.
- When a primary guard leaves the consensus, pick a new one.
- If both primary guards are down/not completing circuits, pick a new one.
Strong: 1. Hidden service use can't push you over to an unused guard (at all). 2. Hidden service use can't influence your choice of guard (at all). 3. Exits and websites can't push you over to an unused guard (at all) 4. DoS/Guard node downtime signals are rare (absent)
Regular: 5. Nodes are not reused for Guard and Exit positions ("any" positions) 6. Information about the guard(s) does not leak to the website/RP (at all). 7. Relays in the same family can't be forced to correlate Exit traffic.
It gives up on strong #5 to get strong #1 and strong #2, because I don't see a lot of difference between an HS circuit that uses the same Guard as the RP vs one that uses the same Guard node for one of the other side's middle or Guard hops (which we can't prevent).
We don't get strong #6, because if one guard is temporarily down but still in the consensus and the adversarial RP makes enough circuits fast enough, it could theoretically notice that the next node is never the remaining not-down Guard. This window of time can be minimized by more eagerly switching guards when one of them is unresponsive. It could be eliminated by using S - G - L2 - L3 - R paths with vanguards (at the expense of directly exposing service L3 vanguards to the RP, and creating service linkability).
Mike Perry mikeperry@torproject.org writes:
Mike Perry:
Mike Perry:
Heyo.
We're going to have a meeting to discuss Proposal 291. See this thread: https://lists.torproject.org/pipermail/tor-dev/2018-April/013053.html
- Describe adversary models for our variant proposals from the notes. (Why do we disagree? In Mike's case, my disagreements are because I think each step is an improvement over previous/status quo -- we can decide harder things later and still do better both now and later.)
Ok, in the interest of getting closer to an adversary model, let's first start with enumerating the properties the proposals below provide. Properties #1-5 have parenthesis at the end of them. When the condition in parenthesis is met for property #N, we'll call that "strong #N".
Thanks Mike for this email. I think this moves us forward quite a bit with an adversary model here! Here is some feedback:
- Hidden service use can't push you over to an unused guard (at all).
- Hidden service use can't influence your choice of guard (at all).
Can we have a bit of more detailed description about the two properties above? (2) seems like a superset of (1), so making these properties clear would be useful.
- Exits and websites can't push you over to an unused guard (at all)
- DoS/Guard node downtime signals are rare (absent)
Also, what does property (4) mean exactly?
- Nodes are not reused for Guard and Exit positions ("any" positions)
- Information about the guard(s) does not leak to the website/RP (at all).
- Relays in the same family can't be forced to correlate Exit traffic.
Also it might be useful to rate the current guard design with these properties and see how well we are currently doing.
IIUC, since we use all the primaries for dirguards it provides: 1. Hidden service use can't push you over to an unused guard (at all). 3. Exits and websites can't push you over to an unused guard (at all)
Because of the path restrictions it also provides: 5. Nodes are not reused for Guard and Exit positions ("any" positions) 7. Relays in the same family can't be forced to correlate Exit traffic.
It does *not* provide 2. Hidden service use can't influence your choice of guard (at all). 4. DoS/Guard node downtime signals are rare (absent) 6. Information about the guard(s) does not leak to the website/RP (at all).
Let me know if I messed it up.
Clearly since everyone in this thread wants to improve the current situation, the properties the current system lacks are important. In particular it seems like (2) and (6) are particularly important properties.
Roger's proposal:
- Remove /16 and family path restrictions between guard and last hop
- Optionally, dir auths don't give you Guard if you're an Exit
- Use first guard but pad to backup guard so the switch isn't as obvious
- First and backup guard are chosen in different /16's and different families
Depending on how good the padding is, this proposal maybe-provides:
- Hidden service use can't push you over to an unused guard (at all).
- Exits and websites can't push you over to an unused guard (at all)
Depending on how good the detection mechanism is: 4. DoS/Guard node downtime signals are much more rare (absent)
It provides strong: 5. Nodes are not reused for Guard and Exit positions ("any" positions)
It provides: 7. Relays in the same family can't be forced to correlate Exit traffic.
How does it provide 7?
<snip>
Aaron's proposal:
- Use first guard but pad to backup guard so the switch isn't as obvious
- First and backup guard are chosen in different /16's and different families
Depending on how good the padding is, this proposal maybe-provides:
- Hidden service use can't push you over to an unused guard (at all).
- Exits and websites can't push you over to an unused guard (at all)
Depending on how good the detection mechanism is: 4. DoS/Guard node downtime signals are much more rare (absent)
It provides strong #5: 5. Nodes are not reused for Guard and Exit positions ("any" positions)
It provides #7: 7. Relays in the same family can't be forced to correlate Exit traffic.
It does not provide #2 or #6: 2. Hidden service use can't influence your choice of guard (at all). 6. Information about the guard(s) does not leak to the website/RP (at all).
How come Aaron's proposal provides the same benefits as Roger's even tho they different? Am I missing something?
<snip>
Ok, so here's a proposal that gets strong #1-4, and regular #5-7. It is my current favorite:
- Set "num primary guards"=2 and "num primary guards to use"=2
- Don't give Exit nodes the Guard flag.
- Allow "same node, same /16, same family" between guard and last hop, but only for HS circuits (which are at least 4 hops long for these cases).
- Allow same /16 and same family for HS circuits.
- When a primary guard leaves the consensus, pick a new one.
We already do this one. Primary guards come from the filtered set, and filtered set guards need to be listed in the consensus. See entry_guard_passes_filter(). If this is not the case in reality, it's a bug.
- If both primary guards are down/not completing circuits, pick a new one.
Hmm, this is almost impossible to do. People with laptops and unstable networks frequently have both of their primary guards marked as unreachable while Tor is trying to reach network. Picking new primaries at that point would not be a good move.
Strong:
- Hidden service use can't push you over to an unused guard (at all).
- Hidden service use can't influence your choice of guard (at all).
- Exits and websites can't push you over to an unused guard (at all)
- DoS/Guard node downtime signals are rare (absent)
Regular: 5. Nodes are not reused for Guard and Exit positions ("any" positions) 6. Information about the guard(s) does not leak to the website/RP (at all). 7. Relays in the same family can't be forced to correlate Exit traffic.
All in all I like the above proposal (modulo the issues above) and I think it's quite sane, and gets the best of most worlds ;) We should perhaps think more about it and try to spec it out! :)
Let's see what other people think.
George Kadianakis:
Mike Perry mikeperry@torproject.org writes:
Mike Perry:
Mike Perry:
Heyo.
We're going to have a meeting to discuss Proposal 291. See this thread: https://lists.torproject.org/pipermail/tor-dev/2018-April/013053.html
- Describe adversary models for our variant proposals from the notes. (Why do we disagree? In Mike's case, my disagreements are because I think each step is an improvement over previous/status quo -- we can decide harder things later and still do better both now and later.)
Ok, in the interest of getting closer to an adversary model, let's first start with enumerating the properties the proposals below provide. Properties #1-5 have parenthesis at the end of them. When the condition in parenthesis is met for property #N, we'll call that "strong #N".
Thanks Mike for this email. I think this moves us forward quite a bit with an adversary model here! Here is some feedback:
- Hidden service use can't push you over to an unused guard (at all).
- Hidden service use can't influence your choice of guard (at all).
Can we have a bit of more detailed description about the two properties above? (2) seems like a superset of (1), so making these properties clear would be useful.
Yes, if a defense provides #2, then it always provides #1. Also, a defense provides #1 without providing #2 (by using two guards equally, for example).
Or said a different way, an attacker who can break #2 can sometimes use that to break #1.
To avoid confusion, I don't think we should change the property wording or numbering until we do another round of proposal comparison, and/or until people propose new properties that some designs satisfy (or failed to satisfy).
- Exits and websites can't push you over to an unused guard (at all)
- DoS/Guard node downtime signals are rare (absent)
Also, what does property (4) mean exactly?
Property 4 is the best argument for using two guards as opposed to only fiddling with restrictions. With the current way we handle onionskin failure (#25347), clients will simply lose connectivity by way of endless DESTROY responses before making a valid circuit. This means that the adversary can onionskin-DoS guard nodes one at a time, and wait for a hidden service to become unresponsive. That is what it means to have a DoS (or downtime) signal.
Using two guards dumbly makes this rare. Both are down at the same time by chance much less frequently than one is down, and a two-node DoS search is harder to pull off when the adversary has to keep pairs (or more) nodes offline at the same time, without taking other services offline and causing false positives.
Using additional guards as soon as things fail makes these signals absent, in theory. If a client is always trying to connect to new guards, as long as the client can connect to the network, it will find a guard that works pretty soon. This is also be another way of using two guards dumbly, though.
- Nodes are not reused for Guard and Exit positions ("any" positions)
- Information about the guard(s) does not leak to the website/RP (at all).
- Relays in the same family can't be forced to correlate Exit traffic.
Also it might be useful to rate the current guard design with these properties and see how well we are currently doing.
IIUC, since we use all the primaries for dirguards it provides: 1. Hidden service use can't push you over to an unused guard (at all). 3. Exits and websites can't push you over to an unused guard (at all)
If by current design, you mean the current network as-is without changing any consensus parameters, then these two aren't provided.
Since the current design is "num primary guard to use"=1, the current design tries really hard to use only this guard. This means that as soon as a hidden service chooses that guard as it's RP, it will use a second guard. This second guard is normally unused. Hence: Hidden service use pushed the service over to an unused guard.
Similarly, if website can cause a client to keep connecting through different circuits over and over (via at least 3 different attacks, mentioned in the other thread), then it can eventually cause that client to use a second guard. We want to fix this for other reasons (guard discovery), but that doesn't change it as a property here. And there may be more like them if we fix just these three.
Because of the path restrictions it also provides: 5. Nodes are not reused for Guard and Exit positions ("any" positions) 7. Relays in the same family can't be forced to correlate Exit traffic.
Correct. It does provide these.
It does *not* provide 2. Hidden service use can't influence your choice of guard (at all). 4. DoS/Guard node downtime signals are rare (absent) 6. Information about the guard(s) does not leak to the website/RP (at all).
Correct. It does not provide these.
Let me know if I messed it up.
Clearly since everyone in this thread wants to improve the current situation, the properties the current system lacks are important. In particular it seems like (2) and (6) are particularly important properties.
Roger's proposal:
- Remove /16 and family path restrictions between guard and last hop
- Optionally, dir auths don't give you Guard if you're an Exit
- Use first guard but pad to backup guard so the switch isn't as obvious
- First and backup guard are chosen in different /16's and different families
Depending on how good the padding is, this proposal maybe-provides:
- Hidden service use can't push you over to an unused guard (at all).
- Exits and websites can't push you over to an unused guard (at all)
Depending on how good the detection mechanism is: 4. DoS/Guard node downtime signals are much more rare (absent)
It provides strong: 5. Nodes are not reused for Guard and Exit positions ("any" positions)
It provides: 7. Relays in the same family can't be forced to correlate Exit traffic.
How does it provide 7?
Woops, it does not. All it does is prevent the same *node* from being used in the Guard and Exit position. I mixed that up with an earlier revision of these properties...
Aaron's proposal:
- Use first guard but pad to backup guard so the switch isn't as obvious
- First and backup guard are chosen in different /16's and different families
Depending on how good the padding is, this proposal maybe-provides:
- Hidden service use can't push you over to an unused guard (at all).
- Exits and websites can't push you over to an unused guard (at all)
Depending on how good the detection mechanism is: 4. DoS/Guard node downtime signals are much more rare (absent)
It provides strong #5: 5. Nodes are not reused for Guard and Exit positions ("any" positions)
It provides #7: 7. Relays in the same family can't be forced to correlate Exit traffic.
It does not provide #2 or #6: 2. Hidden service use can't influence your choice of guard (at all). 6. Information about the guard(s) does not leak to the website/RP (at all).
How come Aaron's proposal provides the same benefits as Roger's even tho they different? Am I missing something?
Aaron's proposal actually does provide #7.
The key difference between the two is Roger's "Remove /16 and family path restrictions between guard and last hop". That causes Roger to lose #7. They also differ in the Guard+Exit flag assignment, but in this case that does not change the properties provided, because no node restrictions are removed.
<snip>
Ok, so here's a proposal that gets strong #1-4, and regular #5-7. It is my current favorite:
- Set "num primary guards"=2 and "num primary guards to use"=2
- Don't give Exit nodes the Guard flag.
- Allow "same node, same /16, same family" between guard and last hop, but only for HS circuits (which are at least 4 hops long for these cases).
- Allow same /16 and same family for HS circuits.
- When a primary guard leaves the consensus, pick a new one.
We already do this one. Primary guards come from the filtered set, and filtered set guards need to be listed in the consensus. See entry_guard_passes_filter(). If this is not the case in reality, it's a bug.
Good.
- If both primary guards are down/not completing circuits, pick a new one.
Hmm, this is almost impossible to do. People with laptops and unstable networks frequently have both of their primary guards marked as unreachable while Tor is trying to reach network. Picking new primaries at that point would not be a good move.
Yuck. Well, minimizing this time/chance perhaps. Like if the client has a TLS connection but both are failing all onionskins, then choose a third?
Strong:
- Hidden service use can't push you over to an unused guard (at all).
- Hidden service use can't influence your choice of guard (at all).
- Exits and websites can't push you over to an unused guard (at all)
- DoS/Guard node downtime signals are rare (absent)
Regular: 5. Nodes are not reused for Guard and Exit positions ("any" positions) 6. Information about the guard(s) does not leak to the website/RP (at all). 7. Relays in the same family can't be forced to correlate Exit traffic.
All in all I like the above proposal (modulo the issues above) and I think it's quite sane, and gets the best of most worlds ;) We should perhaps think more about it and try to spec it out! :)
I would prefer a proposal that has strong #6, but I think we are close to that. All we need to do is prevent the case where "one guard down && guards can be chosen next to the RP."
Right now, I am leaning towards a hack that says "Vanguards can choose a guard before the RP." We'd still be S - G - L2 - L3 - G - RP in that case, though. As I said, an alternative is S - G - L2 - L3 - RP, but I think I would rather preserve unlinkability for services run on the same Tor client. A third alternative is trying to minimize the "only one guard down" time. Such downtime minimization does seem tricky, though.
I would also like to try to beef up #4 as much as we can. If we can't make all node downtime signals absent, we should aim to minimize them.
On 25 Apr 2018, at 18:30, Mike Perry mikeperry@torproject.org wrote:
- Hidden service use can't push you over to an unused guard (at all).
- Hidden service use can't influence your choice of guard (at all).
- Exits and websites can't push you over to an unused guard (at all)
- DoS/Guard node downtime signals are rare (absent)
- Nodes are not reused for Guard and Exit positions ("any" positions)
- Information about the guard(s) does not leak to the website/RP (at all).
- Relays in the same family can't be forced to correlate Exit traffic.
I think this list is missing some important user-visible properties, or it's not clear which property above corresponds to these properties:
* Is Tor reliable and responsive when guards go down, or when I move networks, or when I have lost and regained service?
I also think it's missing an implicit property, which we should make explicit:
* Can Tor users be fingerprinted by their set of guards or directory guards?
Perhaps this property is out of scope.
T
teor:
On 25 Apr 2018, at 18:30, Mike Perry mikeperry@torproject.org wrote:
- Hidden service use can't push you over to an unused guard (at all).
- Hidden service use can't influence your choice of guard (at all).
- Exits and websites can't push you over to an unused guard (at all)
- DoS/Guard node downtime signals are rare (absent)
- Nodes are not reused for Guard and Exit positions ("any" positions)
- Information about the guard(s) does not leak to the website/RP (at all).
- Relays in the same family can't be forced to correlate Exit traffic.
I think this list is missing some important user-visible properties, or it's not clear which property above corresponds to these properties:
- Is Tor reliable and responsive when guards go down, or when I move networks, or when I have lost and regained service?
I think this is implicitly provided by #4. Downtime is a security issue. If (any of) a client Guard(s) are down, and the adversary can detect this based on client behavior, well, that is a side channel signal that provides information about the Guard. So by satisfying #4, we also satisfy the weaker conditions of general reliability and responsiveness.
I also think it's missing an implicit property, which we should make explicit:
- Can Tor users be fingerprinted by their set of guards or directory guards?
Perhaps this property is out of scope.
I think it is worth considering. We should add it if we need to do another round of evaluation.
But remmeber that we are already in the situation where Tor is using two guards for a lot (or all) users right now: it uses a second guard right now whenever an RP or Exit is the same as the Guard node, or is chosen from the same /16 or family as the Guard node. Depending on how unlucky you are, you could be using 2 guards pretty often right now. Just not often enough to benefit from any multiplexing and netflow padding.
Tor also currently uses 3 directory guards, and unless we set "num entry guards to use" and "num entry guards" to the same number, these are different nodes than the primary guard. Miraculously, if we set this to two, then Tor uses those two primary guards *as* its directory guards. This means that any proposal that said "Set these to 2" has *less* fingerprinting than those that did not. My proposal was the only one that explicitly said this, but I think asn wants this too.
That means if we accept the proposal at the end of my mail, which gets us strong #1-4, non-strong #5, strong #6 (with mods), and #7, then we'll have less guard fingerprintability than today.
Mike Perry:
teor:
On 25 Apr 2018, at 18:30, Mike Perry mikeperry@torproject.org wrote:
- Hidden service use can't push you over to an unused guard (at all).
- Hidden service use can't influence your choice of guard (at all).
- Exits and websites can't push you over to an unused guard (at all)
- DoS/Guard node downtime signals are rare (absent)
- Nodes are not reused for Guard and Exit positions ("any" positions)
- Information about the guard(s) does not leak to the website/RP (at all).
- Relays in the same family can't be forced to correlate Exit traffic.
I think this list is missing some important user-visible properties, or it's not clear which property above corresponds to these properties:
- Is Tor reliable and responsive when guards go down, or when I move networks, or when I have lost and regained service?
I think this is implicitly provided by #4. Downtime is a security issue. If (any of) a client Guard(s) are down, and the adversary can detect this based on client behavior, well, that is a side channel signal that provides information about the Guard. So by satisfying #4, we also satisfy the weaker conditions of general reliability and responsiveness.
I also think it's missing an implicit property, which we should make explicit:
- Can Tor users be fingerprinted by their set of guards or directory guards?
Perhaps this property is out of scope.
I think it is worth considering. We should add it if we need to do another round of evaluation.
Alright, for the sake of argument, let's call this Property #8: 8. Less information from guard fingerprinting (the least information)
I argue that this #8 is also equivalent to a #9 that Roger would ask for: 9. Fewer points of observation into the network (the fewest points).
To avoid TL;DR, that argument is an exercise to the reader ;).
Here is a proposal that beats my previous proposal on Property #8 and #9, while trying to preserve as many of the other properties as possible:
* Set "num primary guards"=1 and "num primary guards to use"=1 * Set "num directory guards"=1 and "num directory guards to use"=1 * Don't give Exit nodes the Guard flag. * Allow "same node, same /16, same family" between guard and last hop, but only for HS circuits (which are at least 4 hops). * Allow same /16 and same family for HS circuits. * When a primary guard leaves the consensus, pick a new one. * When a primary guard fails circuits, do $MAGIC_FAILURE_HEURISTIC.
This proposal gets strong: 1. Hidden service use can't push you over to an unused guard (at all). 2. Hidden service use can't influence your choice of guard (at all). 3. Exits and websites can't push you over to an unused guard (at all) 8. Less information from guard fingerprinting (the least information)
It loses #4 (and your reliability point above), because if we transition to a second guard too quickly when the first one starts failing, then we lose the winning fingerprinting property we want to keep. So then therefore, we must tolerate failure and RESOURCELIMIT issues and suffer through connectivity issues during DoS: 4. DoS/Guard node downtime signals are rare (absent)
It then gets us regular: 5. Nodes are not reused for Guard and Exit positions ("any" positions) 6. Information about the guard(s) does not leak to the website/RP (at all). 7. Relays in the same family can't be forced to correlate Exit traffic.
And again, we could get strong #6 if we allow the guard node for both RP and the node before the RP: 6. Information about the guard(s) does not leak to the website/RP (at all).
So the key thing (in this property list) that forcing one guard causes us to lose is reliability under DoS, which is a guard discovery vector (and probably a source of other side channels, too).
Mike Perry mikeperry@torproject.org writes:
Mike Perry:
teor:
On 25 Apr 2018, at 18:30, Mike Perry mikeperry@torproject.org wrote:
- Hidden service use can't push you over to an unused guard (at all).
- Hidden service use can't influence your choice of guard (at all).
- Exits and websites can't push you over to an unused guard (at all)
- DoS/Guard node downtime signals are rare (absent)
- Nodes are not reused for Guard and Exit positions ("any" positions)
- Information about the guard(s) does not leak to the website/RP (at all).
- Relays in the same family can't be forced to correlate Exit traffic.
I think this list is missing some important user-visible properties, or it's not clear which property above corresponds to these properties:
- Is Tor reliable and responsive when guards go down, or when I move networks, or when I have lost and regained service?
I think this is implicitly provided by #4. Downtime is a security issue. If (any of) a client Guard(s) are down, and the adversary can detect this based on client behavior, well, that is a side channel signal that provides information about the Guard. So by satisfying #4, we also satisfy the weaker conditions of general reliability and responsiveness.
I also think it's missing an implicit property, which we should make explicit:
- Can Tor users be fingerprinted by their set of guards or directory guards?
Perhaps this property is out of scope.
I think it is worth considering. We should add it if we need to do another round of evaluation.
Alright, for the sake of argument, let's call this Property #8: 8. Less information from guard fingerprinting (the least information)
I argue that this #8 is also equivalent to a #9 that Roger would ask for: 9. Fewer points of observation into the network (the fewest points).
If we are actually aiming for 8 and 9 we need to do something about the numdirguard=3 situation, otherwise we still have a huge guard fpr and we still expose ourselves to more of the network even if we keep one guard.
To avoid TL;DR, that argument is an exercise to the reader ;).
Here is a proposal that beats my previous proposal on Property #8 and #9, while trying to preserve as many of the other properties as possible:
- Set "num primary guards"=1 and "num primary guards to use"=1
- Set "num directory guards"=1 and "num directory guards to use"=1
- Don't give Exit nodes the Guard flag.
- Allow "same node, same /16, same family" between guard and last hop, but only for HS circuits (which are at least 4 hops).
- Allow same /16 and same family for HS circuits.
This's for all hops? So all service-side HS circ hops can share the same family? I gues that's OK since we don't know what's happening on the other side of the HS circuit anyhow? Or what?
- When a primary guard leaves the consensus, pick a new one.
- When a primary guard fails circuits, do $MAGIC_FAILURE_HEURISTIC.
What is the $MAGIC_FAILURE_HEURISTIC supposed to do? Also I doubt we can do anything magic here, we even have trouble doing very naive stuff when it comes to network-uptime response.
This proposal gets strong:
- Hidden service use can't push you over to an unused guard (at all).
- Hidden service use can't influence your choice of guard (at all).
- Exits and websites can't push you over to an unused guard (at all)
- Less information from guard fingerprinting (the least information)
It loses #4 (and your reliability point above), because if we transition to a second guard too quickly when the first one starts failing, then we lose the winning fingerprinting property we want to keep. So then therefore, we must tolerate failure and RESOURCELIMIT issues and suffer through connectivity issues during DoS: 4. DoS/Guard node downtime signals are rare (absent)
It then gets us regular: 5. Nodes are not reused for Guard and Exit positions ("any" positions) 6. Information about the guard(s) does not leak to the website/RP (at all). 7. Relays in the same family can't be forced to correlate Exit traffic.
And again, we could get strong #6 if we allow the guard node for both RP and the node before the RP: 6. Information about the guard(s) does not leak to the website/RP (at all).
So the key thing (in this property list) that forcing one guard causes us to lose is reliability under DoS, which is a guard discovery vector (and probably a source of other side channels, too).
George Kadianakis:
Mike Perry mikeperry@torproject.org writes:
Mike Perry:
teor:
On 25 Apr 2018, at 18:30, Mike Perry mikeperry@torproject.org wrote:
- Hidden service use can't push you over to an unused guard (at all).
- Hidden service use can't influence your choice of guard (at all).
- Exits and websites can't push you over to an unused guard (at all)
- DoS/Guard node downtime signals are rare (absent)
- Nodes are not reused for Guard and Exit positions ("any" positions)
- Information about the guard(s) does not leak to the website/RP (at all).
- Relays in the same family can't be forced to correlate Exit traffic.
I think this list is missing some important user-visible properties, or it's not clear which property above corresponds to these properties:
- Is Tor reliable and responsive when guards go down, or when I move networks, or when I have lost and regained service?
I think this is implicitly provided by #4. Downtime is a security issue. If (any of) a client Guard(s) are down, and the adversary can detect this based on client behavior, well, that is a side channel signal that provides information about the Guard. So by satisfying #4, we also satisfy the weaker conditions of general reliability and responsiveness.
I also think it's missing an implicit property, which we should make explicit:
- Can Tor users be fingerprinted by their set of guards or directory guards?
Perhaps this property is out of scope.
I think it is worth considering. We should add it if we need to do another round of evaluation.
Alright, for the sake of argument, let's call this Property #8: 8. Less information from guard fingerprinting (the least information)
I argue that this #8 is also equivalent to a #9 that Roger would ask for: 9. Fewer points of observation into the network (the fewest points).
If we are actually aiming for 8 and 9 we need to do something about the numdirguard=3 situation, otherwise we still have a huge guard fpr and we still expose ourselves to more of the network even if we keep one guard.
Yeah. Hrmm. I suppose this is a way that property #8 differs from property #9... The dirguard usage increases fingerprinting, but if observation for #9 means "observation of relayed application traffic", then not setting the dirguards to 1 costs us #8, but not #9.
To avoid TL;DR, that argument is an exercise to the reader ;).
Here is a proposal that beats my previous proposal on Property #8 and #9, while trying to preserve as many of the other properties as possible:
- Set "num primary guards"=1 and "num primary guards to use"=1
- Set "num directory guards"=1 and "num directory guards to use"=1
- Don't give Exit nodes the Guard flag.
- Allow "same node, same /16, same family" between guard and last hop, but only for HS circuits (which are at least 4 hops).
- Allow same /16 and same family for HS circuits.
This's for all hops? So all service-side HS circ hops can share the same family? I gues that's OK since we don't know what's happening on the other side of the HS circuit anyhow? Or what?
Yeah, that was my reasoning for defining property #7 in terms of Exit traffic only. There may be alterations of this that prevent the same family from being in every position of one end of the circuit, but since we can't prevent the case where the same family is on both entry points across the entire HS connection to correlate the entire circuit, I am not sure how to define this property.
Maybe there is a difference if the same family is allowed to be the IP and HSDIR, though, since that could allow forced correlation to deanonymize the HS itself... We could consider preventing that. With one guard, it definitely will leak information about the choice of IPs over time, though, which is worse (and is the case today :/). With two guards chosen from different families and /16, it should be fine with respect to chosen IPs and used HSDIRs, except in the event that one of the guard's downtime happens at the same time as an IP or HSDIR is chosen from the same family as the still-up guard. This is a much more rare and less risky event than the similar situation with an RP, though (since the RP cycles frequently and can be adversary controlled).
- When a primary guard leaves the consensus, pick a new one.
- When a primary guard fails circuits, do $MAGIC_FAILURE_HEURISTIC.
What is the $MAGIC_FAILURE_HEURISTIC supposed to do? Also I doubt we can do anything magic here, we even have trouble doing very naive stuff when it comes to network-uptime response.
In order to preserve property #8 (and #9), this failure heuristic has to try really hard not to quickly switch over to the second guard as soon as there is a RESOURCELIMIT or other failure. It needs to be "sure" that the guard is really down. This means waiting for some number of RESOURCELIMITs or other failures to happen before the switch to the second guard, which necessarily introduces some level of downtime signal, which costs us property #4. (We already have decided in https://trac.torproject.org/projects/tor/ticket/25347 that it is preferable to accept large amounts of RESOURCELIMITs before switching guards.)
That was the point of this proposal -- I wanted to demonstrate that with only one guard, we basically have to accept either a louder downtime signal, or we have to accept cases where we use two guards more often.
I still believe that two always-on guards is the better choice (and gives us more flexibility with alternate ways to handle things like family restrictions above), but I also wanted to compare apples to apples in terms of one guard vs two guard proposals.
This proposal gets strong:
- Hidden service use can't push you over to an unused guard (at all).
- Hidden service use can't influence your choice of guard (at all).
- Exits and websites can't push you over to an unused guard (at all)
- Less information from guard fingerprinting (the least information)
It loses #4 (and your reliability point above), because if we transition to a second guard too quickly when the first one starts failing, then we lose the winning fingerprinting property we want to keep. So then therefore, we must tolerate failure and RESOURCELIMIT issues and suffer through connectivity issues during DoS: 4. DoS/Guard node downtime signals are rare (absent)
It then gets us regular: 5. Nodes are not reused for Guard and Exit positions ("any" positions) 6. Information about the guard(s) does not leak to the website/RP (at all). 7. Relays in the same family can't be forced to correlate Exit traffic.
And again, we could get strong #6 if we allow the guard node for both RP and the node before the RP: 6. Information about the guard(s) does not leak to the website/RP (at all).
So the key thing (in this property list) that forcing one guard causes us to lose is reliability under DoS, which is a guard discovery vector (and probably a source of other side channels, too).
Mike Perry mikeperry@torproject.org writes:
Mike Perry:
Heyo.
We're going to have a meeting to discuss Proposal 291. See this thread: https://lists.torproject.org/pipermail/tor-dev/2018-April/013053.html
Ok, we had this meeting. High level (ammended) action items are:
- Use patches in https://trac.torproject.org/projects/tor/ticket/25843 to set NumEntryGuards=2 in torrc, and observe results. Please join us! Stuff we are looking for during testing is on that ticket!
- Merge that patch to make the torrc guard options do what we meant for them to do. Probably backport it.
- Descibe adversary models for our variant proposals from the notes. (Why do we disagree? In Mike's case, my disagreements are because I think each step is an improvement over previous/status quo -- we can decide harder things later and still do better both now and later.)
- Agree on an order of operations for fixes+changes, ideally such that we don't block forever trying to come up with a perfect solution. Things are pretty bad now. All we really need to do is agree on steps to make it better.
<snip>
Concrete things we can do now: #1: ourselves set those guard params to 2 and find bugs. once #3 below is done, encourage others, like on tor-talk, to do it too. #2: enumerate the current situations where we use a guard other than our first guard, especially noting the ones where the attacker can make us use a guard other than our first guard. fix as many as we want to fix. maybe categorize by whether they cause us to mark our first guard as down or not.
OK, I did a bit of #2 yesterday as part of an IRC discussion with Mike and Roger. In particular, I attempted to enumerate the places in our codebase where we mark a guard as unreachable and hence skip it for future circuits.
The key functions here are entry_guard_failed() and entry_guard_chan_failed(). These are called in the following places:
1) circuit_build_failed(): We blame the guard if there was an error during path building when we don't have the first hop open on the circuit yet. We don't blame the guard for errors during path selection.
2) connection_dir_request_failed(): We blame the guard if we fail to connect to a dirserver because of network error.
3) connection_or_about_to_close(): We blame the guard when we are closing an OR connection that started at us but never made it to state open. We do this because otherwise we would keep beating our heads against a broken guard.
4) connection_or_client_learned_peer_id(): We blame the guard when we receive the wrong RSA identity key from the guard during the TLS handshake.
The first 3 cases here seem to handle the cases of network errors and unreachable guards. It's interesting how we have to handle this case in three different places. I wonder if we are missing any other places here.
The last case seems to handle the case of network MITM attacks. I don't see anything wrong with that, since encountering an MITM certainly means that something bad is going on, and also an MITM adversary could also cause one of the first 3 cases.