I spent some time trying to clean up proposal 247 based on everyone's comments, as well as based on my own thoughts. Please have a look if you commented on the original proposal, and complain if I've not taken your thoughts into account.
(Aaron: In particular, I made several tradeoffs in favor of performance and DoS resistance that may be at odds with some of your suggestions, but I think the end result is still OK after looking into the Sybil rates and thinking about the adversary model in more detail. You may disagree).
I've attached my updated version of the proposal inline in this mail, but the canonical updated proposal is in my remote at: https://gitweb.torproject.org/user/mikeperry/torspec.git/tree/proposals/247-...
Here's a summary of the changes (which are also listed in Git):
* Try to make a coherent threat model and specify its assumptions
* Fold in my comments about using disjoint sets ("buckets") for the third level guard.
* Make the parameter discussion subsection its own section, and include tables with far more detail for the Sybil success rates.
* Put the rotation period in a separate subsection from the number of guards
* Switch to using min(X,X) and max(X,X) for the distribution for the second and third layer guard lifespans, respectively. Add a subsection describing this distribution (3.2.3)
* Changed the default parameters based on these tables, and based on my own intuition about Tor's performance properties.
* Move the load balancing, torrc, and other performance considerations to their own section (Section 5).
* Move "3.2. Distinguishing new HS circuits from normal HS circuits" to section 4.1.
* Fold in some of "3.3. Circuit nodes can now be linked to specific hidden services" into 4.1. Some of it I just removed, though, because I did not find it credible.
* Added Roger's concerns about guard linkability to Section 4.2.
* Added a denial of service subsection to Section 4.3.
================================
Filename: 247-hs-guard-discovery.txt Title: Defending Against Guard Discovery Attacks using Vanguards Author: George Kadianakis Created: 2015-07-10 Status: Draft
0. Motivation
A guard discovery attack allow attackers to determine the guard node of a Tor client. The hidden service rendezvous protocol provides an attack vector for a guard discovery attack since anyone can force an HS to construct a 3-hop circuit to a relay (#9001).
Following the guard discovery attack with a compromise and/or coercion of the guard node can lead to the deanonymization of a hidden service.
1. Overview
This document tries to make the above guard discovery + compromise attack harder to launch. It introduces an optional configuration option which makes the hidden service also pin the second and third hops of its circuits for a longer duration.
With this new path selection, we force the adversary to perform a Sybil attack and two compromise attacks before succeeding. This is an improvement over the current state where the Sybil attack is trivial to pull off, and only a single compromise attack is required.
With this new path selection, an attacker is forced to do a one or more node compromise attacks before learning the guard node of a hidden service. This increases the uncertainty of the attacker, since compromise attacks are costly and potentially detectable, so an attacker will have to think twice before beginning a chain of node compromise attacks that he might not be able to complete.
1.1. Visuals
Here is how a hidden service rendezvous circuit currently looks like:
-> middle_1 -> middle_A -> middle_2 -> middle_B -> middle_3 -> middle_C -> middle_4 -> middle_D HS -> guard -> middle_5 -> middle_E -> Rendezvous Point -> middle_6 -> middle_F -> middle_7 -> middle_G -> middle_8 -> middle_H -> ... -> ... -> middle_n -> middle_n
this proposal pins the two middles nodes to a much more restricted set, as follows:
-> guard_3A_A -> guard_2_A -> guard_3A_B -> guard_3A_C -> Rendezvous Point HS -> guard_1 -> guard_3B_D -> guard_2_B -> guard_3B_E -> guard_3B_F -> Rendezvous Point
Note that the third level guards are partitioned into buckets such that they are only used with one specific second-level guard. In this way, we ensure that even if an adversary is able to execute a Sybil attack against the third layer, they only get to learn one of the second-layer Guards, and not all of them. This prevents the adversary from gaining the ability to take their pick of the weakest of the second-level guards for further attack.
2. Design
This feature requires the HiddenServiceGuardDiscovery torrc option to be enabled.
When a hidden service picks its guard nodes, it also picks two additional sets of middle nodes `second_guard_set` and `third_guard_set` of size NUM_SECOND_GUARDS and NUM_THIRD_GUARDS respectively for each hidden service. These sets are unique to each hidden service created by a single Tor client, and must be kept separate and distinct.
When a hidden service needs to establish a circuit to an HSDir, introduction point or a rendezvous point, it uses nodes from `second_guard_set` as the second hop of the circuit and nodes from that second hop's corresponding `third_guard_set` as third hops of the circuit.
A hidden service rotates nodes from the 'second_guard_set' at a random time between MIN_SECOND_GUARD_LIFETIME hours and MAX_SECOND_GUARD_LIFETIME hours.
A hidden service rotates nodes from the 'third_guard_set' at a random time between MIN_THIRD_GUARD_LIFETIME and MAX_THIRD_GUARD_LIFETIME hours.
These extra guard nodes should be picked with the same path selection procedure that is used for regular middle nodes (though see Section 5.1 for performance reasons to restrict this slightly).
Each node's rotation time is tracked independently, to avoid disclosing the rotation times of the primary and second-level guards.
XXX how should proposal 241 ("Resisting guard-turnover attacks") be applied here?
2.1. Security parameters
We set NUM_SECOND_GUARDS to 4 nodes and NUM_THIRD_GUARDS to 16 nodes (ie four sets of four). XXX: 3 and 12 might be another option here, in which case our rotation period for the second guard position can be reduced to 15 days.
We set MIN_SECOND_GUARD_LIFETIME to 1 day, and MAX_SECOND_GUARD_LIFETIME to 33 days, for an average rotation rate of ~11 days, using the min(X,X) distribution specified in Section 3.2.2.
We set MIN_THIRD_GUARD_LIFETIME to 1 hour, and MAX_THIRD_GUARD_LIFETIME to 18 hours, for an average rotation rate of ~12 hours, using the max(X,X) distribution specified in Section 3.2.2.
XXX make all the above consensus parameters? Yes. Very yes, especially if we decide to change the primary guard lifespan.
See Section 3 for more analysis on these constants.
3. Rationale and Security Parameter Selection
3.1. Threat model, Assumptions, and Goals
Consider an adversary with the following powers:
- Can launch a Sybil guard discovery attack against any node of a rendezvous circuit. The slower the rotation period of the node, the longer the attack takes. Similarly, the higher the percentage of the network is compromised, the faster the attack runs.
- Can compromise any node on the network, but this compromise takes time and potentially even coercive action, and also carries risk of discovery.
We also make the following assumptions about the types of attacks:
1. A Sybil attack is noisy. It will require either large amounts of traffic, multiple test circuits, or both.
2. A Sybil attack against the second or first layer Guards will be more noisy than a Sybil attack against the third layer guard, since the second and first layer Sybil attack requires a timing side channel in order to determine success, where as the Sybil success is almost immediately obvious to third layer guard, since it will now be returned as a rend point for circuits for the hidden service in question.
3. As soon as the adversary is confident they have won the Sybil attack, an even more aggressive circuit building attack will allow them to determine the next node very fast (an hour or less).
4. The adversary is strongly disincentivized from compromising nodes that may prove useless, as node compromise is even more risky for the adversary than a Sybil attack in terms of being noticed.
Given this threat model, our security parameters were selected so that the first two layers of guards should be hard to attack using a Sybil guard discovery attack and hence require a node compromise attack. Ideally, we want the node compromise attacks to carry a non-negligible probability of being useless to the adversary by the time they complete.
On the other hand, the outermost layer of guards should rotate fast enough to _require_ a Sybil attack.
3.2. Parameter Tuning
3.2.1. Sybil rotation counts for a given number of Guards
The probability of Sybil success for Guard discovery can be modeled as the probability of choosing 1 or more malicious middle nodes for a sensitive circuit over some period of time.
P(At least 1 bad middle) = 1 - P(All Good Middles) = 1 - P(One Good middle)^(num_middles) = 1 - (1 - c/n)^(num_middles)
c/n is the adversary compromise percentage
In the case of Vanguards, num_middles is the number of Guards you rotate through in a given time period. This is a function of the number of vanguards in that position (v), as well as the number of rotations (r).
P(At least one bad middle) = 1 - (1 - c/n)^(v*r)
Here's detailed tables in terms of the number of rotations required for a given Sybil success rate for certain number of guards.
1.0% Network Compromise: Sybil Success One Two Three Four Five Six Eight Nine Ten Twelve Sixteen 10% 11 6 4 3 3 2 2 2 2 1 1 15% 17 9 6 5 4 3 3 2 2 2 2 25% 29 15 10 8 6 5 4 4 3 3 2 50% 69 35 23 18 14 12 9 8 7 6 5 60% 92 46 31 23 19 16 12 11 10 8 6 75% 138 69 46 35 28 23 18 16 14 12 9 85% 189 95 63 48 38 32 24 21 19 16 12 90% 230 115 77 58 46 39 29 26 23 20 15 95% 299 150 100 75 60 50 38 34 30 25 19 99% 459 230 153 115 92 77 58 51 46 39 29
5.0% Network Compromise: Sybil Success One Two Three Four Five Six Eight Nine Ten Twelve Sixteen 10% 3 2 1 1 1 1 1 1 1 1 1 15% 4 2 2 1 1 1 1 1 1 1 1 25% 6 3 2 2 2 1 1 1 1 1 1 50% 14 7 5 4 3 3 2 2 2 2 1 60% 18 9 6 5 4 3 3 2 2 2 2 75% 28 14 10 7 6 5 4 4 3 3 2 85% 37 19 13 10 8 7 5 5 4 4 3 90% 45 23 15 12 9 8 6 5 5 4 3 95% 59 30 20 15 12 10 8 7 6 5 4 99% 90 45 30 23 18 15 12 10 9 8 6
10.0% Network Compromise: Sybil Success One Two Three Four Five Six Eight Nine Ten Twelve Sixteen 10% 2 1 1 1 1 1 1 1 1 1 1 15% 2 1 1 1 1 1 1 1 1 1 1 25% 3 2 1 1 1 1 1 1 1 1 1 50% 7 4 3 2 2 2 1 1 1 1 1 60% 9 5 3 3 2 2 2 1 1 1 1 75% 14 7 5 4 3 3 2 2 2 2 1 85% 19 10 7 5 4 4 3 3 2 2 2 90% 22 11 8 6 5 4 3 3 3 2 2 95% 29 15 10 8 6 5 4 4 3 3 2 99% 44 22 15 11 9 8 6 5 5 4 3
The rotation counts in these tables were generated with: def count_rotations(c, v, success): r = 0 while 1-math.pow((1-c), v*r) < success: r += 1 return r
3.2.2. Rotation Period
As specified in Section 3.1, the primary driving force for the third layer selection was to ensure that these nodes rotate fast enough that it is not worth trying to compromise them, because it is unlikely for compromise to succeed and yield useful information before the nodes stop being used. For this reason we chose 1 to 18 hours, with a weighted distribution (Section 3.2.3) causing the expected average to be 12 hours.
From the table in Section 3.2.1, it can be seen that this means that the Sybil attack will complete with near-certainty (99%) in 29*12 hours (14.5 days) for the 1% adversary, 3 days for the 5% adversary, and 1.5 days for the 10% adversary.
Since rotation of each node happens independently, the distribution of when the adversary expects to win this Sybil attack in order to discover the next node up is uniform. This means that on average, the adversary should expect that half of the rotation period of the next node is already over by the time that they win the Sybil.
With this fact, we choose our range and distribution for the second layer rotation to be short enough to cause the adversary to risk compromising nodes that are useless, yet long enough to require a Sybil attack to be noticeable in terms of client activity. For this reason, we choose a minimum second-layer guard lifetime of 1 day, since this gives the adversary a minimum expected value of 12 hours for during which they can compromise a guard before it might be rotated. If the total expected rotation rate is 11 days, then the adversary can expect overall to have 5.5 days remaining after completing their Sybil attack before a second-layer guard rotates away.
3.2.3. Rotation distribution
In order to skew the distribution of the third layer guard towards higher values, we use max(X,X) for the distribution, where X is a random variable that takes on values from the uniform distribution.
In order to skew the distribution of the second layer guard towards low values (to increase the risk of compromising useless nodes) we skew the distribution towards lower values, using min(X,X).
Here's a table of expectation (arithmetic means) for relevant ranges of X (sampled from 0..N).
The current choice for second-layer guards is noted with **, and the current choice for third-layer guards is noted with ***.
Range Min(X,X) Max(X,X) 10 2.85 6.15 11 3.18 6.82 12 3.51 7.49 13 3.85 8.15 14 4.18 8.82 15 4.51 9.49 16 4.84 10.16 17 5.18 10.82*** 18 5.51 11.49 19 5.84 12.16 20 6.18 12.82 21 6.51 13.49 22 6.84 14.16 23 7.17 14.83 24 7.51 15.49 25 7.84 16.16 26 8.17 16.83 27 8.51 17.49 28 8.84 18.16 29 9.17 18.83 30 9.51 19.49 31 9.84 20.16 32 10.17** 20.83 33 10.51 21.49 34 10.84 22.16 35 11.17 22.83 36 11.50 23.50 37 11.84 24.16 38 12.17 24.83 39 12.50 25.50
4. Security concerns and mitigations
4.1. Mitigating fingerprinting of new HS circuits
By pinning the middle nodes of rendezvous circuits, we make it easier for all hops of the circuit to detect that they are part of a special hidden service circuit with varying degrees of certainty.
The Guard node is able to recognize a Vanguard client with a high degree of certainty because it will observe a client IP creating the overwhelming majority of its circuits to just a few middle nodes in any given 10-18 day time period.
The middle nodes will be able to tell with a variable certainty that depends on both its traffic volume and upon the popularity of the service, because they will see a large number of circuits that tend to pick the same Guard and Exit.
The final nodes will be able to tell with a similar level certainty that depends on their capacity and the service popularity, because they will see a lot of rend handshakes that all tend to have the same second hop.
The most serious of these is the Guard fingerprinting issue. When proposal xxx-padding-negotiation is implemented, services that enable this feature should use those padding primitives to create fake circuits to random middle nodes that are not their guards, in an attempt to look more like a client.
Additionally, if Tor Browser implements "virtual circuits" based on SOCKS username+password isolation in order to enforce the re-use of paths when SOCKS username+passwords are re-used, then the number of middle nodes in use during a typical user's browsing session will be proportional to the number of sites they are viewing at any one time. This is likely to be much lower than one new middle node every ten minutes, and for some users, may be close to the number of Vanguards we're considering.
This same reasoning is also an argument for increasing the number of second-level guards beyond just two, as it will spread the hidden service's traffic over a wider set of middle nodes, making it both easier to cover, and behave closer to a client using SOCKS virtual circuit isolation.
4.2. Hidden service linkability
Multiple hidden services on the same Tor instance should use separate second and third level guard sets, otherwise an adversary is trivially able to determine that the two hidden services are co-located by inspecting their current chosen rend point nodes.
Unfortunately, if the adversary is still able to determine that two or more hidden services are run on the same Tor instance through some other means, then they are able to take advantage of this fact to execute a Sybil attack more effectively, since there will now be an extra set of guard nodes for each hidden service in use.
For this reason, if Vanguards are enabled, and more than one hidden service is configured, the user should be advised to ensure that they do not accidentally leak that the two hidden services are from the same Tor instance.
4.3. Denial of service
Since it will be fairly trivial for the adversary to enumerate the current set of rend nodes for a hidden service, denial of service becomes a serious risk for Vanguard users.
For this reason, it is important to support a large number of third-level guards, to increase the amount of resources required to bring a hidden service offline by DoSing just a few Tor nodes.
5. Performance considerations
The switch to a restricted set of nodes will very likely cause significant performance issues, especially for high-traffic hidden services. If any of the nodes they select happen to be temporarily overloaded, performance will suffer dramatically until the next rotation period.
5.1. Load Balancing
Since the second and third level "guards" are chosen from the set of all nodes eligible for use in the "middle" hop (as per hidden services today), this proposal should not significantly affect the long-term load on various classes of the Tor network, and should not require any changes to either the node weight equations, or the bandwidth authorities.
Unfortunately, transient load is another matter, as mentioned previously. It is very likely that this scheme will increase instances of transient overload at nodes selected by high-traffic hidden services.
One option to reduce the impact of this transient overload is to restrict the set of middle nodes that we chose from to some percentage of the fastest middle-capable relays in the network. This may have some impact on load balancing, but since the total volume of hidden service traffic is low, it may be unlikely to matter.
5.2. Circuit build timeout
The adaptive circuit build timeout mechanism in Tor is what corrects for instances of transient node overload right now.
The timeout will naturally tend to select the current fastest and least-loaded paths even through this set of restricted routes, but it may fail to behave correctly if there are a very small set of nodes in each guard set, as it is based upon assumptions about the current path selection algorithm, and it may need to be tuned specifically for Vanguards, especially if the set of possible routes is small.
5.3. OnionBalance
At first glance, it seems that this scheme makes multi-homed hidden services such as OnionBalance[1] even more important for high-traffic hidden services.
Unfortunately, if it is equally damaging to the user for any of their multi-homed hidden service locations to be discovered, then OnionBalance is strictly equivalent to simply increasing the number of second-level guard nodes in use, because an active adversary can perform simultaneous Sybil attacks against all of the rend points offered by the multi-homed OnionBalance introduction points.
5.4. Default vs optional behavior
We suggest this torrc option to be optional because it changes path selection in a way that may seriously impact hidden service performance, especially for high traffic services that happen to pick slow guard nodes.
However, by having this setting be disabled by default, we make hidden services who use it stand out a lot. For this reason, we should in fact enable this feature globally, but only after we verify its viability for high-traffic hidden services, and ensure that it is free of second-order load balancing effects.
Even after that point, until Single Onion Services are implemented, there will likely still be classes of very high traffic hidden services for whom some degree of location anonymity is desired, but for which performance is much more important than the benefit of Vanguards, so there should always remain a way to turn this option off.
6. Future directions
Here are some more ideas for improvements that should be done sooner or later:
- Maybe we should make the size and rotation period of secondary/third guard sets to be configurable by the user.
- To make it harder for an adversary, a hidden service MAY extend the path length of its circuits by an additional static hop. This forces the adversary to use another coercion attack to walk the chain up to the hidden service.
7. Acknowledgments
Thanks to Aaron Johnson, John Brooks, Mike Perry and everyone else who helped with this idea.
1. https://onionbalance.readthedocs.org/en/latest/design.html#overview
On 14 Sep 2015, at 09:07, Mike Perry mikeperry@torproject.org wrote:
...
- Security concerns and mitigations
4.1. Mitigating fingerprinting of new HS circuits
By pinning the middle nodes of rendezvous circuits, we make it easier for all hops of the circuit to detect that they are part of a special hidden service circuit with varying degrees of certainty.
...
The most serious of these is the Guard fingerprinting issue. When proposal xxx-padding-negotiation is implemented, services that enable this feature should use those padding primitives to create fake circuits to random middle nodes that are not their guards, in an attempt to look more like a client.
How will this interact with the rate limiting in xxx-padding-negotiation section 4.1?
On 12 Sep 2015, at 10:34, Mike Perry mikeperry@torproject.org wrote:
4.1. Rate limiting
...
We recommend that three consensus parameters be used in the event that the network is being overloaded from padding to such a degree that padding requests should be ignored:
- CircuitPaddingMaxRatio
- The maximum ratio of padding traffic to non-padding traffic (expressed as a percent) to allow on a circuit before ceasing to pad. Ex: 75 means 75 padding packets for every 100 non-padding packets.
- Default: 100
- CircuitPaddingLimitCount
- The number of padding cells that must be transmitted before the ratio limit is applied.
- Default: 500
- CircuitPaddingLimitTime
- The time period in seconds over which to count padding cells for application of the ratio limit.
- Default: 60
XXX: Should we cap padding at these rates, or fully disable it once they're crossed?
If the rate limiting is being applied, that will limit fake middle circuits (with few non-padding packets) to ~500 cells per minute (~4KBytes per second).
Does CircuitPaddingLimitCount reset after CircuitPaddingLimitTime? (I can’t tell from the proposal, but I assume it has to reset, otherwise the limit is quite low, at 500 cells per fake circuit for its entire lifetime [plus whatever dribble it gets from non-padding cells].)
Are those consensus parameters intended to be always set, or just set when there is an issue with padding? (I can see arguments both ways, but having them always set could be useful as a precaution against a quick attack.)
Tim
Tim Wilson-Brown (teor)
teor2345 at gmail dot com PGP: 968F094B (ABFED1AC & A39A9058 expire 15 Sep 2015)
teor at blah dot im OTR CAD08081 9755866D 89E2A06F E3558B7F B5A9D14F (From 1 Sep 2015)
Tim Wilson-Brown - teor:
On 14 Sep 2015, at 09:07, Mike Perry mikeperry@torproject.org wrote:
...
- Security concerns and mitigations
4.1. Mitigating fingerprinting of new HS circuits
By pinning the middle nodes of rendezvous circuits, we make it easier for all hops of the circuit to detect that they are part of a special hidden service circuit with varying degrees of certainty.
...
The most serious of these is the Guard fingerprinting issue. When proposal xxx-padding-negotiation is implemented, services that enable this feature should use those padding primitives to create fake circuits to random middle nodes that are not their guards, in an attempt to look more like a client.
How will this interact with the rate limiting in xxx-padding-negotiation section 4.1?
On 12 Sep 2015, at 10:34, Mike Perry mikeperry@torproject.org wrote:
4.1. Rate limiting
...
We recommend that three consensus parameters be used in the event that the network is being overloaded from padding to such a degree that padding requests should be ignored:
- CircuitPaddingMaxRatio - The maximum ratio of padding traffic to
non-padding traffic (expressed as a percent) to allow on a circuit before ceasing to pad. Ex: 75 means 75 padding packets for every 100 non-padding packets. - Default: 100 * CircuitPaddingLimitCount
- The number of padding cells that must be transmitted before the
ratio limit is applied. - Default: 500 * CircuitPaddingLimitTime - The time period in seconds over which to count padding cells for application of the ratio limit. - Default: 60
XXX: Should we cap padding at these rates, or fully disable it once they're crossed?
If the rate limiting is being applied, that will limit fake middle circuits (with few non-padding packets) to ~500 cells per minute (~4KBytes per second).
Hrmm.. More importantly, 500 cells is only roughly 250kb, and apparently the average webpage size has ballooned to up to 1.6M: http://www.websiteoptimization.com/speed/tweak/average-web-page/
I've boosted the default here to 5000, so we can give this case a chance at pretending to fetch web pages.
Does CircuitPaddingLimitCount reset after CircuitPaddingLimitTime? (I can’t tell from the proposal, but I assume it has to reset, otherwise the limit is quite low, at 500 cells per fake circuit for its entire lifetime [plus whatever dribble it gets from non-padding cells].)
The count is meant to reset after CircuitPaddingLimitTime. I've clarified this and pushed again to my padding_negotiation branch.
Are those consensus parameters intended to be always set, or just set when there is an issue with padding? (I can see arguments both ways, but having them always set could be useful as a precaution against a quick attack.)
The way consensus parameters work in the Tor client code is that they have a hardcoded default value that is used if they are not present in the consensus. If a consensus value is present, it overrides this hardcoded value.
Hence it makes sense to set the defaults fairly liberally, and we'd only reduce them in the consensus if we noticed overload/abuse at relays.
Mike Perry:
I spent some time trying to clean up proposal 247 based on everyone's comments, as well as based on my own thoughts. Please have a look if you commented on the original proposal, and complain if I've not taken your thoughts into account.
I spent yet more time thinking about the new threat model and what the adversary's expectation for how long they will have after the Sybil compromise for the third guard completes before the second Guard rotates away, and I have some awesome results. It turns out that if we make all node rotation times fully independent, then the point at which the adversary wins the Sybil attack on layer 3 will be uniformly distributed with respect to the rotation period of the layer two guards.
With the parameters I specified, this means that when you sum the total remaining rotation expectations, the adversary will have at least a 19% probability that they will have less than a day remaining before the layer two guard changes on them after they win the Sybil, and at least a 32% probability that they will have less than two days before this happens.
IMO, this is great news, as it shows that we can indeed force the adversary to risk compromising/coercing nodes that end up having no utility to them with high probability.
I've added these results to Section 3.2.3 of the proposal in my remote, and also added the full python script used to generate all tables as Appendix A.
Here's the new piece of Section 3.2.3: https://gitweb.torproject.org/user/mikeperry/torspec.git/tree/proposals/247-...
And Appendix A with the table generation script: https://gitweb.torproject.org/user/mikeperry/torspec.git/tree/proposals/247-...
I also noted in Section 6 that we may want to consider enforcing different GeoIP countries for the first and second layer guards. Against some adversaries, this may increase the expenditure of the attack, but in others it may simply leak info to the adversary that they can use for other attacks. Not sure what the right tradeoff is here.
The git commit of the proposal overhaul I posted to the list was 0d676335088bce2954a68b77e2fca4347ec7ae3d. The new head is currently 158d3b22e27f032ed64091d25205fb941b352e3d. The full history is still preserved in my guard_discovery_dev branch in my remote.
Mike Perry mikeperry@torproject.org writes:
Mike Perry:
I spent some time trying to clean up proposal 247 based on everyone's comments, as well as based on my own thoughts. Please have a look if you commented on the original proposal, and complain if I've not taken your thoughts into account.
I spent yet more time thinking about the new threat model and what the adversary's expectation for how long they will have after the Sybil compromise for the third guard completes before the second Guard rotates away, and I have some awesome results. It turns out that if we make all node rotation times fully independent, then the point at which the adversary wins the Sybil attack on layer 3 will be uniformly distributed with respect to the rotation period of the layer two guards.
With the parameters I specified, this means that when you sum the total remaining rotation expectations, the adversary will have at least a 19% probability that they will have less than a day remaining before the layer two guard changes on them after they win the Sybil, and at least a 32% probability that they will have less than two days before this happens.
IMO, this is great news, as it shows that we can indeed force the adversary to risk compromising/coercing nodes that end up having no utility to them with high probability.
I've added these results to Section 3.2.3 of the proposal in my remote, and also added the full python script used to generate all tables as Appendix A.
Here's the new piece of Section 3.2.3: https://gitweb.torproject.org/user/mikeperry/torspec.git/tree/proposals/247-...
Hm. how come the CDF is defined this way?
Specifically, why do we do the following?
For durations d greater than t days, we take the fraction of that rotation period's selection probability and multiply it by t/d and add it to the density. In other words:
I'm more familiar with the CDF definition used in commit acf86d7 but changed afterwards.
George Kadianakis:
Mike Perry mikeperry@torproject.org writes:
Mike Perry:
I spent some time trying to clean up proposal 247 based on everyone's comments, as well as based on my own thoughts. Please have a look if you commented on the original proposal, and complain if I've not taken your thoughts into account.
I spent yet more time thinking about the new threat model and what the adversary's expectation for how long they will have after the Sybil compromise for the third guard completes before the second Guard rotates away, and I have some awesome results. It turns out that if we make all node rotation times fully independent, then the point at which the adversary wins the Sybil attack on layer 3 will be uniformly distributed with respect to the rotation period of the layer two guards.
With the parameters I specified, this means that when you sum the total remaining rotation expectations, the adversary will have at least a 19% probability that they will have less than a day remaining before the layer two guard changes on them after they win the Sybil, and at least a 32% probability that they will have less than two days before this happens.
IMO, this is great news, as it shows that we can indeed force the adversary to risk compromising/coercing nodes that end up having no utility to them with high probability.
I've added these results to Section 3.2.3 of the proposal in my remote, and also added the full python script used to generate all tables as Appendix A.
Here's the new piece of Section 3.2.3: https://gitweb.torproject.org/user/mikeperry/torspec.git/tree/proposals/247-...
Hm. how come the CDF is defined this way?
Specifically, why do we do the following?
For durations d greater than t days, we take the fraction of that rotation period's selection probability and multiply it by t/d and add it to the density. In other words:
I'm more familiar with the CDF definition used in commit acf86d7 but changed afterwards.
In an attempt to clarify this, we're not "doing" anything new. As the XXX comment in acf86d7 hints at, the new CDF is a more accurate representation of the probability that a certain time remains after the Sybil is successful, but before the second-level guard rotates.
Consider the single question: "What happens if the second-level guard was chosen with a 30 day lifespan? What is the probability in that single case of only 1 day remaining after a Sybil wins when the second-level guard has a 30 day lifespan?"
The answer to this single case is 1/30, because the Sybil has a uniform probability of winning at any given time point in that 30 day lifespan, that means that the probability of there being less than or equal to 1 day remaining at the point of Sybil success is 1/30 of that 30 day lifespan.
The CDF comes from summing up all of this fractional probability for rotation periods greater than 1 day, and adding the entire selection probability for 1 day.
Does this make any more sense, or less? Not really sure how to clarify the proposal text to explain this. Here is the original text again:
Because the Sybil attack on the third node is expected to complete at any point in the second node's rotation period with uniform probability, if we want to know the probability that a second-level Guard node will still be in use after t days, we simply sum up the fractional probability density for all rotation durations. For rotation durations less than t days, we add the entire probability mass for that period to the density function. For durations d greater than t days, we take the fraction of that rotation period's selection probability and multiply it by t/d and add it to the density. In other words:
def FullCDF(N, ProbFunc, t): density = 0.0 for d in xrange(N): if t >= d: density += ProbFunc(N, d) # The +1's below compensate for 0-indexed arrays: else: density += ProbFunc(N, d)*(float(t+1))/(d+1) return density
Mike Perry mikeperry@torproject.org writes:
Mike Perry:
I spent some time trying to clean up proposal 247 based on everyone's comments, as well as based on my own thoughts. Please have a look if you commented on the original proposal, and complain if I've not taken your thoughts into account.
I spent yet more time thinking about the new threat model and what the adversary's expectation for how long they will have after the Sybil compromise for the third guard completes before the second Guard rotates away, and I have some awesome results. It turns out that if we make all node rotation times fully independent, then the point at which the adversary wins the Sybil attack on layer 3 will be uniformly distributed with respect to the rotation period of the layer two guards.
With the parameters I specified, this means that when you sum the total remaining rotation expectations, the adversary will have at least a 19% probability that they will have less than a day remaining before the layer two guard changes on them after they win the Sybil, and at least a 32% probability that they will have less than two days before this happens.
IMO, this is great news, as it shows that we can indeed force the adversary to risk compromising/coercing nodes that end up having no utility to them with high probability.
While this is a very interesting idea, I wonder if it can actually hold true with the parameters in the proposal.
Let's say I'm an attacker that just Sybil'ed the third hop. I can trivially verify my success by doing a traffic confirmation attack. Since I own the third hop, I now learn the second hop.
I don't know how much time is left before the second hop rotates but I know that I have to compromise it somehow. So I start preparing my compromise attack which might take a few hours or days (by writing my exploit, getting a warrant, or ordering my goons to visit the data center).
As an attacker, before launching my compromise attack (actually launching the exploit, or entering the data center), I would again run my confirmation attack to ensure that the second hop hasn't rotated.
If the second hop has rotated in the meanwhile, tough luck. As the attacker, I will need to rerun my Sybil attack and start from scratch.
However, if the second hop has not rotated, then I can launch my compromise attack, which usually takes a few minutes to hours to complete. Those minutes or hours *are* the moments of uncertainty for the attacker, since if the hop rotates during that time the attacker is screwed (they just compromised something useless).
Unfortunately, I think that with the values in our proposal, I don't think we can rely that the second hop will rotate during those crucial minutes/hours.
What I'm trying to say is that only _very_ unlucky or stupid attackers will end up compromising something useless. What I consider more likely is that if it takes a while to prepare the attack (like write a whole exploit), then maybe the second hop rotates during the preparation time, but that won't expose the attacker.
I'm trying to not be negative here, because I actually like the skewed rotation distribution and everything. I'm just not sure how much we can rely on this.
=================================================================================
Of course the ideal thing here would be to rotate the second-hop a few hours after we rotate the third-hop. That would give the attacker a very small window of compromise.
Unfortunately, since we want the third-hop to rotate fast, this means that the second-hop would also have to rotate fast. This is unacceptable since that would make the second hop vulnerable to Sybil attacks which is very bad (since the attacker would jump directly to the second hop with a Sybil attack).
George Kadianakis:
Mike Perry mikeperry@torproject.org writes:
Mike Perry:
I spent some time trying to clean up proposal 247 based on everyone's comments, as well as based on my own thoughts. Please have a look if you commented on the original proposal, and complain if I've not taken your thoughts into account.
I spent yet more time thinking about the new threat model and what the adversary's expectation for how long they will have after the Sybil compromise for the third guard completes before the second Guard rotates away, and I have some awesome results. It turns out that if we make all node rotation times fully independent, then the point at which the adversary wins the Sybil attack on layer 3 will be uniformly distributed with respect to the rotation period of the layer two guards.
With the parameters I specified, this means that when you sum the total remaining rotation expectations, the adversary will have at least a 19% probability that they will have less than a day remaining before the layer two guard changes on them after they win the Sybil, and at least a 32% probability that they will have less than two days before this happens.
IMO, this is great news, as it shows that we can indeed force the adversary to risk compromising/coercing nodes that end up having no utility to them with high probability.
While this is a very interesting idea, I wonder if it can actually hold true with the parameters in the proposal.
Let's say I'm an attacker that just Sybil'ed the third hop. I can trivially verify my success by doing a traffic confirmation attack. Since I own the third hop, I now learn the second hop.
I don't know how much time is left before the second hop rotates but I know that I have to compromise it somehow. So I start preparing my compromise attack which might take a few hours or days (by writing my exploit, getting a warrant, or ordering my goons to visit the data center).
As an attacker, before launching my compromise attack (actually launching the exploit, or entering the data center), I would again run my confirmation attack to ensure that the second hop hasn't rotated.
If the second hop has rotated in the meanwhile, tough luck. As the attacker, I will need to rerun my Sybil attack and start from scratch.
However, if the second hop has not rotated, then I can launch my compromise attack, which usually takes a few minutes to hours to complete. Those minutes or hours *are* the moments of uncertainty for the attacker, since if the hop rotates during that time the attacker is screwed (they just compromised something useless).
Unfortunately, I think that with the values in our proposal, I don't think we can rely that the second hop will rotate during those crucial minutes/hours.
What I'm trying to say is that only _very_ unlucky or stupid attackers will end up compromising something useless. What I consider more likely is that if it takes a while to prepare the attack (like write a whole exploit), then maybe the second hop rotates during the preparation time, but that won't expose the attacker.
I think you're right here. The attacker doesn't so much risk compromising something useless as they risk doing whatever prep work against a specific node for no reason. They only risk compromising something useless if the side channel attack *after* compromise takes a significant amount of time, but I think you're right in suspecting that it will not (my own guess is that such a side channel will probably take an hour or less). I'll update the proposal prose to clarify this.
Also note that the CDF I calculated is also an approximation based on discrete 1 day time values. In reality though, it turns out that basically every unit of time that goes while the adversary prepares their attack means an increasing probability that the node they are targeting will have rotated away during this prep time.
Again, as an approximation, the 1-day-resolution CDF tells us that the probability of the node having less than or equal to 1 day remaining in its rotation period is 19%. The rate of additional probability accumulation is not linear, but at least for that first day, basically every hour that goes by, a little less than 1% probability that the node is now useless has been added (give or take).
I debated trying to calculate the actual accumulation of probability per hour, but decided against it because it was too cumbersome and confusing to switch time units. I can still try to do this if we think it might help us choose parameters? Who knows, maybe different rotation parameters (such as a minimum of 0.5 days?) will accumulate more probability for the initial hours than others, and knowing that might help guide us (because as you state, those initial few hours are the most important ones).
Let me know what you think about this, and if it is worthwhile or would just confuse things further.
Hi Mike,
Here are some specific comments on the proposal, most of which I didn’t mention at the session yesterday.
Sec. 2: “When a hidden service picks its guard nodes, it also picks two additional sets of middle nodes `second_guard_set` and `third_guard_set` of size NUM_SECOND_GUARDS and NUM_THIRD_GUARDS respectively for each hidden service.” appears to conflict with “When a hidden service needs to establish a circuit to an HSDir, introduction point or a rendezvous point, it uses nodes from `second_guard_set` as the second hop of the circuit and nodes from that second hop's corresponding `third_guard_set` as third hops of the circuit”. I think the former statement should be amended to describe how the third sets are chosen for each guard in the second set.
Sec. 3.1: “A Sybil attack is noisy”. “Noisy” in the sense that it isn’t perfectly accurate or “noisy” in the sense that it is observable?
Sec. 3.1: “the Sybil success is almost immediately obvious to third layer guard, since it will now be returned as a rend point for circuits for the hidden service in question”. The third-layer guard isn’t the a rendezvous point. So how it is immediately obvious to the adversary when their relay is selected as a third-layer guard?
Sec. 3.2.3: I’m not sure that FullCDF() correctly computes the rotation CDF (even approximately). It should take into account the probability of observing a previous node that has selected the given rotation time (longer rotation times are more likely to be observed). So the probability that there are exactly t days until rotation of the observed node should be proportional to \sum_{i=t}^N Pr[X=i]. After normalization to make this a probability, the expression is (\sum_{i=t}^N Pr[X=i]) / (\sum_{i=1}^N Pr[X=i]*i).
Best, Aaron
On Sep 14, 2015, at 10:47 PM, Mike Perry mikeperry@torproject.org wrote:
George Kadianakis:
Mike Perry mikeperry@torproject.org writes:
Mike Perry:
I spent some time trying to clean up proposal 247 based on everyone's comments, as well as based on my own thoughts. Please have a look if you commented on the original proposal, and complain if I've not taken your thoughts into account.
I spent yet more time thinking about the new threat model and what the adversary's expectation for how long they will have after the Sybil compromise for the third guard completes before the second Guard rotates away, and I have some awesome results. It turns out that if we make all node rotation times fully independent, then the point at which the adversary wins the Sybil attack on layer 3 will be uniformly distributed with respect to the rotation period of the layer two guards.
With the parameters I specified, this means that when you sum the total remaining rotation expectations, the adversary will have at least a 19% probability that they will have less than a day remaining before the layer two guard changes on them after they win the Sybil, and at least a 32% probability that they will have less than two days before this happens.
IMO, this is great news, as it shows that we can indeed force the adversary to risk compromising/coercing nodes that end up having no utility to them with high probability.
While this is a very interesting idea, I wonder if it can actually hold true with the parameters in the proposal.
Let's say I'm an attacker that just Sybil'ed the third hop. I can trivially verify my success by doing a traffic confirmation attack. Since I own the third hop, I now learn the second hop.
I don't know how much time is left before the second hop rotates but I know that I have to compromise it somehow. So I start preparing my compromise attack which might take a few hours or days (by writing my exploit, getting a warrant, or ordering my goons to visit the data center).
As an attacker, before launching my compromise attack (actually launching the exploit, or entering the data center), I would again run my confirmation attack to ensure that the second hop hasn't rotated.
If the second hop has rotated in the meanwhile, tough luck. As the attacker, I will need to rerun my Sybil attack and start from scratch.
However, if the second hop has not rotated, then I can launch my compromise attack, which usually takes a few minutes to hours to complete. Those minutes or hours *are* the moments of uncertainty for the attacker, since if the hop rotates during that time the attacker is screwed (they just compromised something useless).
Unfortunately, I think that with the values in our proposal, I don't think we can rely that the second hop will rotate during those crucial minutes/hours.
What I'm trying to say is that only _very_ unlucky or stupid attackers will end up compromising something useless. What I consider more likely is that if it takes a while to prepare the attack (like write a whole exploit), then maybe the second hop rotates during the preparation time, but that won't expose the attacker.
I think you're right here. The attacker doesn't so much risk compromising something useless as they risk doing whatever prep work against a specific node for no reason. They only risk compromising something useless if the side channel attack *after* compromise takes a significant amount of time, but I think you're right in suspecting that it will not (my own guess is that such a side channel will probably take an hour or less). I'll update the proposal prose to clarify this.
Also note that the CDF I calculated is also an approximation based on discrete 1 day time values. In reality though, it turns out that basically every unit of time that goes while the adversary prepares their attack means an increasing probability that the node they are targeting will have rotated away during this prep time.
Again, as an approximation, the 1-day-resolution CDF tells us that the probability of the node having less than or equal to 1 day remaining in its rotation period is 19%. The rate of additional probability accumulation is not linear, but at least for that first day, basically every hour that goes by, a little less than 1% probability that the node is now useless has been added (give or take).
I debated trying to calculate the actual accumulation of probability per hour, but decided against it because it was too cumbersome and confusing to switch time units. I can still try to do this if we think it might help us choose parameters? Who knows, maybe different rotation parameters (such as a minimum of 0.5 days?) will accumulate more probability for the initial hours than others, and knowing that might help guide us (because as you state, those initial few hours are the most important ones).
Let me know what you think about this, and if it is worthwhile or would just confuse things further.
-- Mike Perry _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Aaron Johnson:
Hi Mike,
Here are some specific comments on the proposal, most of which I didn’t mention at the session yesterday.
Sec. 2: “When a hidden service picks its guard nodes, it also picks two additional sets of middle nodes `second_guard_set` and `third_guard_set` of size NUM_SECOND_GUARDS and NUM_THIRD_GUARDS respectively for each hidden service.” appears to conflict with “When a hidden service needs to establish a circuit to an HSDir, introduction point or a rendezvous point, it uses nodes from `second_guard_set` as the second hop of the circuit and nodes from that second hop's corresponding `third_guard_set` as third hops of the circuit”. I think the former statement should be amended to describe how the third sets are chosen for each guard in the second set.
Sec. 3.1: “A Sybil attack is noisy”. “Noisy” in the sense that it isn’t perfectly accurate or “noisy” in the sense that it is observable?
Sec. 3.1: “the Sybil success is almost immediately obvious to third layer guard, since it will now be returned as a rend point for circuits for the hidden service in question”. The third-layer guard isn’t the a rendezvous point. So how it is immediately obvious to the adversary when their relay is selected as a third-layer guard?
Ok, I have corrected these in https://gitweb.torproject.org/user/mikeperry/torspec.git/commit/proposals/24...
I have not yet cleaned up the proposal with the discussion from the dev meeting. It still is in there in the form of XXX's.
Sec. 3.2.3: I’m not sure that FullCDF() correctly computes the rotation CDF (even approximately). It should take into account the probability of observing a previous node that has selected the given rotation time (longer rotation times are more likely to be observed). So the probability that there are exactly t days until rotation of the observed node should be proportional to \sum_{i=t}^N Pr[X=i]. After normalization to make this a probability, the expression is (\sum_{i=t}^N Pr[X=i]) / (\sum_{i=1}^N Pr[X=i]*i).
I understand what you're talking about here, and there is indeed a mistake in my original CDF calculations. I forgot to account for the non-uniform time durations of the rotations (ie: weighting their selection probability proportionally by how long the rotation duration is).
However, I came up with a different expression than you did for doing that. I believe that if you pick a time point uniformly at random, then the time-weighted probability of the rotation duration being r is:
P(R==r) = ProbMinXX(N,r)*r / \sum_{i=1}^N ProbMinXX(X=i)*i
Here's the commit that updates the proposal with this explaination in more detail, and uses this new time-weighted P(R==r) to compute the CDF: https://gitweb.torproject.org/user/mikeperry/torspec.git/commit/proposals/24...
As you can see, this does reduce the CDF rate of increase, as you might expect from taking this into account. I'm not sure what this change means with respect to how we pick the constants. I still need to think about that (but that's probably best withheld until I review all of the notes from the dev meeting and take them into consideration first).
FWIW: Your P(R==r) expression actually causes the CDF to increase *faster* (so that the probabilities are higher than before at lower t values), so that makes me more certain that it's not doing the right thing.
Here's a quick table of the first few values of P(SECOND_ROTATION <= t):
t OldP(R <= t) YourP(R <= t) MyP(R <= t) 1 0.19095 0.24788 0.07701 2 0.32222 0.40624 0.15403 3 0.42456 0.52261 0.22829
Here's the python I used to implement your ProbR. You can replace my ProbR in the script in Appendix A of the proposal to reproduce those numbers (and also verify I understood your expression correctly):
def ProbR(N, r, ProbFunc=ProbMinXX): sum1 = 0.0 i=r+1 while i < N: sum1 += ProbFunc(N, i) i += 1 return sum1/ExpFn(N, ProbFunc)
Here's the current head of the full proposal, for convenience: https://gitweb.torproject.org/user/mikeperry/torspec.git/tree/proposals/247-...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
Hi,
I like this very much. See comments inline.
On 9/14/2015 2:07 AM, Mike Perry wrote:
I spent some time trying to clean up proposal 247 based on everyone's comments, as well as based on my own thoughts. Please have a look if you commented on the original proposal, and complain if I've not taken your thoughts into account.
(Aaron: In particular, I made several tradeoffs in favor of performance and DoS resistance that may be at odds with some of your suggestions, but I think the end result is still OK after looking into the Sybil rates and thinking about the adversary model in more detail. You may disagree).
I've attached my updated version of the proposal inline in this mail, but the canonical updated proposal is in my remote at: https://gitweb.torproject.org/user/mikeperry/torspec.git/tree/proposals/247-...
Here's a summary of the changes (which are also listed in Git):
Try to make a coherent threat model and specify its assumptions
Fold in my comments about using disjoint sets ("buckets") for
the third level guard.
- Make the parameter discussion subsection its own section, and
include tables with far more detail for the Sybil success rates.
- Put the rotation period in a separate subsection from the number
of guards
- Switch to using min(X,X) and max(X,X) for the distribution for
the second and third layer guard lifespans, respectively. Add a subsection describing this distribution (3.2.3)
- Changed the default parameters based on these tables, and based
on my own intuition about Tor's performance properties.
- Move the load balancing, torrc, and other performance
considerations to their own section (Section 5).
- Move "3.2. Distinguishing new HS circuits from normal HS
circuits" to section 4.1.
- Fold in some of "3.3. Circuit nodes can now be linked to specific
hidden services" into 4.1. Some of it I just removed, though, because I did not find it credible.
Added Roger's concerns about guard linkability to Section 4.2.
Added a denial of service subsection to Section 4.3.
================================
Filename: 247-hs-guard-discovery.txt Title: Defending Against Guard Discovery Attacks using Vanguards Author: George Kadianakis Created: 2015-07-10 Status: Draft
- Motivation
A guard discovery attack allow attackers to determine the guard node of a Tor client. The hidden service rendezvous protocol provides an attack vector for a guard discovery attack since anyone can force an HS to construct a 3-hop circuit to a relay (#9001).
Following the guard discovery attack with a compromise and/or coercion of the guard node can lead to the deanonymization of a hidden service.
- Overview
This document tries to make the above guard discovery + compromise attack harder to launch. It introduces an optional configuration option which makes the hidden service also pin the second and third hops of its circuits for a longer duration.
With this new path selection, we force the adversary to perform a Sybil attack and two compromise attacks before succeeding. This is an improvement over the current state where the Sybil attack is trivial to pull off, and only a single compromise attack is required.
With this new path selection, an attacker is forced to do a one or more node compromise attacks before learning the guard node of a hidden service. This increases the uncertainty of the attacker, since compromise attacks are costly and potentially detectable, so an attacker will have to think twice before beginning a chain of node compromise attacks that he might not be able to complete.
1.1. Visuals
Here is how a hidden service rendezvous circuit currently looks like:
-> middle_1 -> middle_A -> middle_2 -> middle_B -> middle_3 -> middle_C -> middle_4 -> middle_D HS -> guard -> middle_5 -> middle_E -> Rendezvous Point -> middle_6 -> middle_F -> middle_7 -> middle_G -> middle_8 -> middle_H -> ... -> ... -> middle_n -> middle_n
this proposal pins the two middles nodes to a much more restricted set, as follows:
-> guard_3A_A -> guard_2_A -> guard_3A_B -> guard_3A_C -> Rendezvous Point HS -> guard_1 -> guard_3B_D -> guard_2_B -> guard_3B_E -> guard_3B_F -> Rendezvous Point
Note that the third level guards are partitioned into buckets such that they are only used with one specific second-level guard. In this way, we ensure that even if an adversary is able to execute a Sybil attack against the third layer, they only get to learn one of the second-layer Guards, and not all of them. This prevents the adversary from gaining the ability to take their pick of the weakest of the second-level guards for further attack.
- Design
This feature requires the HiddenServiceGuardDiscovery torrc option to be enabled.
When a hidden service picks its guard nodes, it also picks two additional sets of middle nodes `second_guard_set` and `third_guard_set` of size NUM_SECOND_GUARDS and NUM_THIRD_GUARDS respectively for each hidden service. These sets are unique to each hidden service created by a single Tor client, and must be kept separate and distinct.
When a hidden service needs to establish a circuit to an HSDir, introduction point or a rendezvous point, it uses nodes from `second_guard_set` as the second hop of the circuit and nodes from that second hop's corresponding `third_guard_set` as third hops of the circuit.
A hidden service rotates nodes from the 'second_guard_set' at a random time between MIN_SECOND_GUARD_LIFETIME hours and MAX_SECOND_GUARD_LIFETIME hours.
A hidden service rotates nodes from the 'third_guard_set' at a random time between MIN_THIRD_GUARD_LIFETIME and MAX_THIRD_GUARD_LIFETIME hours.
These extra guard nodes should be picked with the same path selection procedure that is used for regular middle nodes (though see Section 5.1 for performance reasons to restrict this slightly).
Each node's rotation time is tracked independently, to avoid disclosing the rotation times of the primary and second-level guards.
XXX how should proposal 241 ("Resisting guard-turnover attacks") be applied here?
2.1. Security parameters
We set NUM_SECOND_GUARDS to 4 nodes and NUM_THIRD_GUARDS to 16 nodes (ie four sets of four). XXX: 3 and 12 might be another option here, in which case our rotation period for the second guard position can be reduced to 15 days.
4 nodes and 16 nodes are good values, based on your probability table. and 4 second layer guards means more capacity as opposite to 3, meaning this can be applied to higher traffic hidden services easier.
We set MIN_SECOND_GUARD_LIFETIME to 1 day, and MAX_SECOND_GUARD_LIFETIME to 33 days, for an average rotation rate of ~11 days, using the min(X,X) distribution specified in Section 3.2.2.
We set MIN_THIRD_GUARD_LIFETIME to 1 hour, and MAX_THIRD_GUARD_LIFETIME to 18 hours, for an average rotation rate of ~12 hours, using the max(X,X) distribution specified in Section 3.2.2.
I think I like the numbers. As you recall, I have commented on this proposal when submitted by George and suggested a value as high as possible for NUM_THIRD_GUARDS, considering even making the third hop random all the time. The point was to defend against an active observer attack, not a Sybil attack, and I stick to my suggestion that we should not ignore the probability of it. While we want to rotate nodes slow enough in order to make a Sybil attack hard to accomplish, we should also choose nodes from a big pool, in order to make an attacker to watch many places on the internet at once, which we assume it's not trivial to do.
With these numbers in front of me and thinking from another perspective, I think an average lifetime of 12 hours for the third hop is ok. The main argument here is that, right now when second hop and third hop are always random, I am able to keep over 80% of my connections to hidden services alive for > 12 hours (in a single circuit). Please let's not make the average lifetime of the third hop more than 12 hours.
I was thinking of the third hop to be changed faster since we assume the rendezvous point is evil every time (it is selected by the client, and any client could be a potential attacker), and learning one third guard will allow the attacker to learn (at least one of) our second guards and climb on the circuit just by watching the in and out traffic of that node(s), without compromising them in a special way. This is applicable in both cases anyway, so I think having a third guard with an average lifetime of 12 hours is a better defense against a Sybil attack and not a significant drawback against the other attack.
Anyway, if the third hop is random, an attacker can use an evil rendezvous point which only responds and builds circuits with evil middle nodes. This way, an attacker can increase his chances to learn (at least one of) the second guards by connecting to a hidden service at such a rendezvous point (which will only work with the attacker's compromised relays as third hops).
How do we mitigate this with third level guards? If we decide to never drop our third level guards if they can't build circuits with rendezvous points and prefer to scrap circuits entirely and go on, a hidden service would then have a nonzero chance to select bad nodes as third guards which will censor it by not establishing rendezvous circuits at all. If we decide to rotate the third level guards if they fail more than n% of the rendezvous circuits, an attacker can force us to rotate really quick by opening rendezvous circuits at evil rendezvous points that drop circuits from all relays, tricking the client to believe that the third nodes are broken.
XXX make all the above consensus parameters? Yes. Very yes, especially if we decide to change the primary guard lifespan.
See Section 3 for more analysis on these constants.
Yes, very yes on consensus parameters. This offers us the ability to change this numbers with the least effort in case we will understand it better at some time in the future.
Consensus parameters with the possibility to be overwritten by client in torrc, as we currently have for NumEntryGuards.
- Rationale and Security Parameter Selection
3.1. Threat model, Assumptions, and Goals
Consider an adversary with the following powers:
- Can launch a Sybil guard discovery attack against any node of a
rendezvous circuit. The slower the rotation period of the node, the longer the attack takes. Similarly, the higher the percentage of the network is compromised, the faster the attack runs.
- Can compromise any node on the network, but this compromise
takes time and potentially even coercive action, and also carries risk of discovery.
We also make the following assumptions about the types of attacks:
- A Sybil attack is noisy. It will require either large amounts of
traffic, multiple test circuits, or both.
- A Sybil attack against the second or first layer Guards will be
more noisy than a Sybil attack against the third layer guard, since the second and first layer Sybil attack requires a timing side channel in order to determine success, where as the Sybil success is almost immediately obvious to third layer guard, since it will now be returned as a rend point for circuits for the hidden service in question.
- As soon as the adversary is confident they have won the Sybil
attack, an even more aggressive circuit building attack will allow them to determine the next node very fast (an hour or less).
- The adversary is strongly disincentivized from compromising
nodes that may prove useless, as node compromise is even more risky for the adversary than a Sybil attack in terms of being noticed.
Given this threat model, our security parameters were selected so that the first two layers of guards should be hard to attack using a Sybil guard discovery attack and hence require a node compromise attack. Ideally, we want the node compromise attacks to carry a non-negligible probability of being useless to the adversary by the time they complete.
On the other hand, the outermost layer of guards should rotate fast enough to _require_ a Sybil attack.
3.2. Parameter Tuning
3.2.1. Sybil rotation counts for a given number of Guards
The probability of Sybil success for Guard discovery can be modeled as the probability of choosing 1 or more malicious middle nodes for a sensitive circuit over some period of time.
P(At least 1 bad middle) = 1 - P(All Good Middles) = 1 - P(One Good middle)^(num_middles) = 1 - (1 - c/n)^(num_middles)
c/n is the adversary compromise percentage
In the case of Vanguards, num_middles is the number of Guards you rotate through in a given time period. This is a function of the number of vanguards in that position (v), as well as the number of rotations (r).
P(At least one bad middle) = 1 - (1 - c/n)^(v*r)
Here's detailed tables in terms of the number of rotations required for a given Sybil success rate for certain number of guards.
1.0% Network Compromise: Sybil Success One Two Three Four Five Six Eight Nine Ten Twelve Sixteen 10% 11 6 4 3 3 2 2 2 2 1 1 15% 17 9 6 5 4 3 3 2 2 2 2 25% 29 15 10 8 6 5 4 4 3 3 2 50% 69 35 23 18 14 12 9 8 7 6 5 60% 92 46 31 23 19 16 12 11 10 8 6 75% 138 69 46 35 28 23 18 16 14 12 9 85% 189 95 63 48 38 32 24 21 19 16 12 90% 230 115 77 58 46 39 29 26 23 20 15 95% 299 150 100 75 60 50 38 34 30 25 19 99% 459 230 153 115 92 77 58 51 46 39 29
5.0% Network Compromise: Sybil Success One Two Three Four Five Six Eight Nine Ten Twelve Sixteen 10% 3 2 1 1 1 1 1 1 1 1 1 15% 4 2 2 1 1 1 1 1 1 1 1 25% 6 3 2 2 2 1 1 1 1 1 1 50% 14 7 5 4 3 3 2 2 2 2 1 60% 18 9 6 5 4 3 3 2 2 2 2 75% 28 14 10 7 6 5 4 4 3 3 2 85% 37 19 13 10 8 7 5 5 4 4 3 90% 45 23 15 12 9 8 6 5 5 4 3 95% 59 30 20 15 12 10 8 7 6 5 4 99% 90 45 30 23 18 15 12 10 9 8 6
10.0% Network Compromise: Sybil Success One Two Three Four Five Six Eight Nine Ten Twelve Sixteen 10% 2 1 1 1 1 1 1 1 1 1 1 15% 2 1 1 1 1 1 1 1 1 1 1 25% 3 2 1 1 1 1 1 1 1 1 1 50% 7 4 3 2 2 2 1 1 1 1 1 60% 9 5 3 3 2 2 2 1 1 1 1 75% 14 7 5 4 3 3 2 2 2 2 1 85% 19 10 7 5 4 4 3 3 2 2 2 90% 22 11 8 6 5 4 3 3 3 2 2 95% 29 15 10 8 6 5 4 4 3 3 2 99% 44 22 15 11 9 8 6 5 5 4 3
The rotation counts in these tables were generated with: def count_rotations(c, v, success): r = 0 while 1-math.pow((1-c), v*r) < success: r += 1 return r
3.2.2. Rotation Period
As specified in Section 3.1, the primary driving force for the third layer selection was to ensure that these nodes rotate fast enough that it is not worth trying to compromise them, because it is unlikely for compromise to succeed and yield useful information before the nodes stop being used. For this reason we chose 1 to 18 hours, with a weighted distribution (Section 3.2.3) causing the expected average to be 12 hours.
From the table in Section 3.2.1, it can be seen that this means that the Sybil attack will complete with near-certainty (99%) in 29*12 hours (14.5 days) for the 1% adversary, 3 days for the 5% adversary, and 1.5 days for the 10% adversary.
Since rotation of each node happens independently, the distribution of when the adversary expects to win this Sybil attack in order to discover the next node up is uniform. This means that on average, the adversary should expect that half of the rotation period of the next node is already over by the time that they win the Sybil.
With this fact, we choose our range and distribution for the second layer rotation to be short enough to cause the adversary to risk compromising nodes that are useless, yet long enough to require a Sybil attack to be noticeable in terms of client activity. For this reason, we choose a minimum second-layer guard lifetime of 1 day, since this gives the adversary a minimum expected value of 12 hours for during which they can compromise a guard before it might be rotated. If the total expected rotation rate is 11 days, then the adversary can expect overall to have 5.5 days remaining after completing their Sybil attack before a second-layer guard rotates away.
3.2.3. Rotation distribution
In order to skew the distribution of the third layer guard towards higher values, we use max(X,X) for the distribution, where X is a random variable that takes on values from the uniform distribution.
In order to skew the distribution of the second layer guard towards low values (to increase the risk of compromising useless nodes) we skew the distribution towards lower values, using min(X,X).
Here's a table of expectation (arithmetic means) for relevant ranges of X (sampled from 0..N).
The current choice for second-layer guards is noted with **, and the current choice for third-layer guards is noted with ***.
Range Min(X,X) Max(X,X) 10 2.85 6.15 11 3.18 6.82 12 3.51 7.49 13 3.85 8.15 14 4.18 8.82 15 4.51 9.49 16 4.84 10.16 17 5.18 10.82*** 18 5.51 11.49 19 5.84 12.16 20 6.18 12.82 21 6.51 13.49 22 6.84 14.16 23 7.17 14.83 24 7.51 15.49 25 7.84 16.16 26 8.17 16.83 27 8.51 17.49 28 8.84 18.16 29 9.17 18.83 30 9.51 19.49 31 9.84 20.16 32 10.17** 20.83 33 10.51 21.49 34 10.84 22.16 35 11.17 22.83 36 11.50 23.50 37 11.84 24.16 38 12.17 24.83 39 12.50 25.50
- Security concerns and mitigations
4.1. Mitigating fingerprinting of new HS circuits
By pinning the middle nodes of rendezvous circuits, we make it easier for all hops of the circuit to detect that they are part of a special hidden service circuit with varying degrees of certainty.
The Guard node is able to recognize a Vanguard client with a high degree of certainty because it will observe a client IP creating the overwhelming majority of its circuits to just a few middle nodes in any given 10-18 day time period.
The middle nodes will be able to tell with a variable certainty that depends on both its traffic volume and upon the popularity of the service, because they will see a large number of circuits that tend to pick the same Guard and Exit.
The final nodes will be able to tell with a similar level certainty that depends on their capacity and the service popularity, because they will see a lot of rend handshakes that all tend to have the same second hop.
The most serious of these is the Guard fingerprinting issue. When proposal xxx-padding-negotiation is implemented, services that enable this feature should use those padding primitives to create fake circuits to random middle nodes that are not their guards, in an attempt to look more like a client.
Additionally, if Tor Browser implements "virtual circuits" based on SOCKS username+password isolation in order to enforce the re-use of paths when SOCKS username+passwords are re-used, then the number of middle nodes in use during a typical user's browsing session will be proportional to the number of sites they are viewing at any one time. This is likely to be much lower than one new middle node every ten minutes, and for some users, may be close to the number of Vanguards we're considering.
This same reasoning is also an argument for increasing the number of second-level guards beyond just two, as it will spread the hidden service's traffic over a wider set of middle nodes, making it both easier to cover, and behave closer to a client using SOCKS virtual circuit isolation.
Increasing the number of second level guards will indeed increase the false positives for fingerprinting, but it will also increase the success probability of a Sybil attack. So, we should put these in balance somehow.
I think padding is vital important in this context. We need 'fake' second level guards and 'fake' third level guards which we use only for padding circuits and not real data transfer.
Fingerprinting is even easier at third level guards (which are always before an evil rendezvous point and an attacker can have so many evil rendezvous points, feeling great when he has 100% chances to use his evil relays as rendezvous points in any circuit to any hidden service). If you combine this with the netflow data capturing threat, there are chances to find the location of a hidden service after analyzing the historic data and observing what came from where.
4.2. Hidden service linkability
Multiple hidden services on the same Tor instance should use separate second and third level guard sets, otherwise an adversary is trivially able to determine that the two hidden services are co-located by inspecting their current chosen rend point nodes.
Unfortunately, if the adversary is still able to determine that two or more hidden services are run on the same Tor instance through some other means, then they are able to take advantage of this fact to execute a Sybil attack more effectively, since there will now be an extra set of guard nodes for each hidden service in use.
For this reason, if Vanguards are enabled, and more than one hidden service is configured, the user should be advised to ensure that they do not accidentally leak that the two hidden services are from the same Tor instance.
Yes, a notice / warning here makes sense.
4.3. Denial of service
Since it will be fairly trivial for the adversary to enumerate the current set of rend nodes for a hidden service, denial of service becomes a serious risk for Vanguard users.
For this reason, it is important to support a large number of third-level guards, to increase the amount of resources required to bring a hidden service offline by DoSing just a few Tor nodes.
There are multiple denial of service attacks possible in this context which would force a client to rotate quickly the third guards, increasing the Sybil success probability. This is easy to detect by the client, but unfortunately it offers us only 2 options: a) rotate guards; increase Sybil success probability. b) if too many guards fail, don't rotate, wait for our original choices to be able to handle the traffic and be inaccessible until then.
- Performance considerations
The switch to a restricted set of nodes will very likely cause significant performance issues, especially for high-traffic hidden services. If any of the nodes they select happen to be temporarily overloaded, performance will suffer dramatically until the next rotation period.
5.1. Load Balancing
Since the second and third level "guards" are chosen from the set of all nodes eligible for use in the "middle" hop (as per hidden services today), this proposal should not significantly affect the long-term load on various classes of the Tor network, and should not require any changes to either the node weight equations, or the bandwidth authorities.
Unfortunately, transient load is another matter, as mentioned previously. It is very likely that this scheme will increase instances of transient overload at nodes selected by high-traffic hidden services.
One option to reduce the impact of this transient overload is to restrict the set of middle nodes that we chose from to some percentage of the fastest middle-capable relays in the network. This may have some impact on load balancing, but since the total volume of hidden service traffic is low, it may be unlikely to matter.
5.2. Circuit build timeout
The adaptive circuit build timeout mechanism in Tor is what corrects for instances of transient node overload right now.
The timeout will naturally tend to select the current fastest and least-loaded paths even through this set of restricted routes, but it may fail to behave correctly if there are a very small set of nodes in each guard set, as it is based upon assumptions about the current path selection algorithm, and it may need to be tuned specifically for Vanguards, especially if the set of possible routes is small.
5.3. OnionBalance
At first glance, it seems that this scheme makes multi-homed hidden services such as OnionBalance[1] even more important for high-traffic hidden services.
Unfortunately, if it is equally damaging to the user for any of their multi-homed hidden service locations to be discovered, then OnionBalance is strictly equivalent to simply increasing the number of second-level guard nodes in use, because an active adversary can perform simultaneous Sybil attacks against all of the rend points offered by the multi-homed OnionBalance introduction points.
5.4. Default vs optional behavior
We suggest this torrc option to be optional because it changes path selection in a way that may seriously impact hidden service performance, especially for high traffic services that happen to pick slow guard nodes.
However, by having this setting be disabled by default, we make hidden services who use it stand out a lot. For this reason, we should in fact enable this feature globally, but only after we verify its viability for high-traffic hidden services, and ensure that it is free of second-order load balancing effects.
Even after that point, until Single Onion Services are implemented, there will likely still be classes of very high traffic hidden services for whom some degree of location anonymity is desired, but for which performance is much more important than the benefit of Vanguards, so there should always remain a way to turn this option off.
Of course, we should always have the option to turn this on or off. I think it's better to make it optional, because I don't think clients reusing circuits based on SOCKS username+password authentication will be enough to hide this in the crowd.
- Future directions
Here are some more ideas for improvements that should be done sooner or later:
- Maybe we should make the size and rotation period of
secondary/third guard sets to be configurable by the user.
I think so, yes. Consensus parameters by default with option to be overwritten by user in torrc (same as we have for NumEntryGuards).
- To make it harder for an adversary, a hidden service MAY extend
the path length of its circuits by an additional static hop. This forces the adversary to use another coercion attack to walk the chain up to the hidden service.
- Acknowledgments
Thanks to Aaron Johnson, John Brooks, Mike Perry and everyone else who helped with this idea.
https://onionbalance.readthedocs.org/en/latest/design.html#overview
Mike Perry:
I spent some time trying to clean up proposal 247 based on everyone's comments, as well as based on my own thoughts. Please have a look if you commented on the original proposal, and complain if I've not taken your thoughts into account.
Ok, I've now folded in our discussion from the dev meeting into the proposal itself. This was done in this commit: https://gitweb.torproject.org/user/mikeperry/torspec.git/commit/proposals/24...
The full proposal is still at: https://gitweb.torproject.org/user/mikeperry/torspec.git/tree/proposals/247-...
I think this is ready for merge back into the main torspec.git as it is a substantial improvement over the version in there. If no one comments within the week, I will prepare a squashed version for merge.
Mike Perry:
Mike Perry:
I spent some time trying to clean up proposal 247 based on everyone's comments, as well as based on my own thoughts. Please have a look if you commented on the original proposal, and complain if I've not taken your thoughts into account.
Ok, I've now folded in our discussion from the dev meeting into the proposal itself. This was done in this commit: https://gitweb.torproject.org/user/mikeperry/torspec.git/commit/proposals/24...
The full proposal is still at: https://gitweb.torproject.org/user/mikeperry/torspec.git/tree/proposals/247-...
I think this is ready for merge back into the main torspec.git as it is a substantial improvement over the version in there. If no one comments within the week, I will prepare a squashed version for merge.
Ok, it's been much longer than a week. I've prepared a squashed version with a full changelog of all changes since origin/master as a single commit here: https://gitweb.torproject.org/user/mikeperry/torspec.git/commit/?h=guard_dis...
I recommend merging this, as it is not helping anyone for the current version of 247 to sit in our main spec repo. We all agree this is an improvement, at least.
I am happy to continue to write more updates if anyone has any additional concerns, corrections, or suggestions. Otherwise, my next plan for this proposal is to try to implement a prototype in txtorcon for simulation/testing/analysis (but I will be working on other things first).
On Thu, Jan 7, 2016 at 9:35 AM, Mike Perry mikeperry@torproject.org wrote:
Mike Perry:
Mike Perry:
I spent some time trying to clean up proposal 247 based on everyone's comments, as well as based on my own thoughts. Please have a look if you commented on the original proposal, and complain if I've not taken your thoughts into account.
Ok, I've now folded in our discussion from the dev meeting into the proposal itself. This was done in this commit: https://gitweb.torproject.org/user/mikeperry/torspec.git/commit/proposals/24...
The full proposal is still at: https://gitweb.torproject.org/user/mikeperry/torspec.git/tree/proposals/247-...
I think this is ready for merge back into the main torspec.git as it is a substantial improvement over the version in there. If no one comments within the week, I will prepare a squashed version for merge.
Ok, it's been much longer than a week. I've prepared a squashed version with a full changelog of all changes since origin/master as a single commit here: https://gitweb.torproject.org/user/mikeperry/torspec.git/commit/?h=guard_dis...
I recommend merging this, as it is not helping anyone for the current version of 247 to sit in our main spec repo. We all agree this is an improvement, at least.
I am happy to continue to write more updates if anyone has any additional concerns, corrections, or suggestions. Otherwise, my next plan for this proposal is to try to implement a prototype in txtorcon for simulation/testing/analysis (but I will be working on other things first).
Merged to torspec.
Hi folks,
Like I did for prop#271, I've done some fixes to prop#247 when I did my recent reread. Here were the easy fixes: https://lists.torproject.org/pipermail/tor-commits/2017-June/123303.html
And below are the parts that I shouldn't just unilaterally fix without the authors of the proposal being involved somehow. :)
On Sun, Sep 13, 2015 at 04:07:30PM -0700, Mike Perry wrote:
With this new path selection, we force the adversary to perform a Sybil attack and two compromise attacks before succeeding. This is an improvement over the current state where the Sybil attack is trivial to pull off, and only a single compromise attack is required.
With this new path selection, an attacker is forced to do a one or more node compromise attacks before learning the guard node of a hidden service. This increases the uncertainty of the attacker, since compromise attacks are costly and potentially detectable, so an attacker will have to think twice before beginning a chain of node compromise attacks that he might not be able to complete.
These look like duplicate paragraphs? We should pick the one we like more, or merge them or something.
- As soon as the adversary is confident they have won the Sybil attack, an even more aggressive circuit building attack will allow them to determine the next node very fast (an hour or less).
Won the Sybil attack against what? There is a lot of "won the Sybil attack" in this proposal, but I worry that each time it means against different nodes or different hops and that's not specified. We should try to be clear what exactly the attacker is attacking each time we talk about an attack.
From the table in Section 3.2.1, it can be seen that this means that the Sybil attack will complete with near-certainty (99%) in 29*12 hours (14.5 days) for the 1% adversary, 3 days for the 5% adversary, and 1.5 days for the 10% adversary.
Sybil attack to achieve what?
4.1. Mitigating fingerprinting of new HS circuits
By pinning the middle nodes of rendezvous circuits, we make it easier for all hops of the circuit to detect that they are part of a special hidden service circuit with varying degrees of certainty.
The Guard node is able to recognize a Vanguard client with a high degree of certainty because it will observe a client IP creating the overwhelming majority of its circuits to just a few middle nodes in any given 10-18 day time period.
The middle nodes will be able to tell with a variable certainty that depends on both its traffic volume and upon the popularity of the service, because they will see a large number of circuits that tend to pick the same Guard and Exit.
And if any of the "exits" have an exit policy of reject *:*, that's a great hint too.
This same reasoning is also an argument for increasing the number of second-level guards beyond just two
Maybe you already did that, since it says 'four' above?
4.3. Denial of service
And, woah, your 4.3 is "Denial of service", and the 4.3 in git is "Long term information leaks", so my final bug report here is that the last sent version of this proposal (sent by Mike on 13 Sept) differs from the one in git (!).
I guess we should figure out how they differ, since people are going to arrive on Monday having read different versions of them.
--Roger