-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
Hello,
Inspired by asn, dgoulet and Mike Perry discussing alternate path lengths for proposal 247, I wrote a proposal that could fit in nicely with proposal 247, maybe even allow us to make the 3rd hop wild (random) so we don't have to worry about HSDirs + HSDir fetch/post circuits and introduction points + introduction points circuits linking short term layer 3 guards.
Please find the proposal attached as well as inline for comments. Looking forward for reviews.
If I've structured it wrong or something I apologize in advance - will do better next time.
Filename: xxx-malicious-rendezvous-point-filter.txt Title: Filtering malicious rendezvous points at hidden service server side Authors: s7r [ 0x837FA52C81265B11 ] Created: 23 January 2016 Status: Draft
0. Definitions
client = an user running Tor as a client, which connects to a hidden service. hidden service server = the Tor instance hosting the hidden service. rendezvous point = the relay which acts as the rendezvous meeting point in a circuit between a hidden service client and a hidden service server.
1. Motivation
A hidden service circuit (rendezvous circuit) is always initiated by the client so the relay which should act as the rendezvous meeting point is selected by the client. We assume that any client can be an attacker and will try to make the hidden service server connect to a malicious rendezvous point under his control. The attacker is also a Sybil (holds an unknown % of the bandwidth in the Tor network). By making the hidden service server build many circuits to his evil rendezvous points, the attacker gets a high probability that the hidden service server will eventually pick his evil relays in a circuit, so the attacker will trivially perform a successful hidden service guard discovery attack or, with more luck, discover the real location of the hidden service server. The hidden service server can only defend itself by building a 3 hop circuit to the rendezvous point, but in practice this is not always enough.
2. Logic for filtering malicious rendezvous points
To be able to make fair assumptions about which rendezvous point might be malicious or not we are going to use the probability theory and statistics, precisely the binominal distribution equation [0]. In simple words, we count and keep track of how many rendezvous circuits a hidden service server built and to which rendezvous points. Then, based on the weight (middle probability fraction) of each rendezvous point, we determine if one was insanely overpicked by clients. Think of this like: Alice plays a game with Bob in which Bob has the success probability to draw a certain number n%. In practice however, he shows a success rate of more than (n*2)% (2 times bigger), Alice assumes Bob is not playing nice and will refuse to play with him for the next 24 hours or so. Fast example: A hidden service server built 100 rendezvous circuits in the last 24 hours, out of which in 7 of them the rendezvous point was a relay with the middle probability fraction of 0.5%. We assume that rendezvous point is malicious and refuse to build rendezvous circuits to it for the next 24 hours. 3. Equations and formulas
3.1 Probability of different clients genuinely picking the same rendezvous point n = total number of established rendezvous circuits. k = number of clients that could have genuinely picked the same rendezvous point. p = probability of picking that rendezvous point per client (middle probability fraction of the relay) divided to 100. H = Probability mass function expanded. C = The result of the equation, the value we need to determine multiplied with 100 to have the percent value. Formulas: H = n! / k!^(n - k)! C = H * p^k * (1 - p)^n - k Example: n = 100 k = 9 p = 0.5% / 100 = 0.005 H = 100! / 9!^91! = 20023492720 C = 20023492720 * 0.000000000000000000001953125 * 0.63372, ~ 0.00000000002478376524710625 * 100, ~ 0.000000002478376524710625% 3.2 Expected number of clients to pick the same rendezvous point n = Total number of established rendezvous circuits. p = Probability of picking that rendezvous point per client (middle probability fraction of the relay) divided to 100. We multiply this with 2 intentionally so we will allow 2 times more circuits than we would expect based on this formula - this should reduce the error margin enough. C = The result of the equation, the value we need to determine. Formula: C = n * p Example: n = 100 p = 0.5% / 100 = 0.005 * 2 = 0.01
C = 100 * 0.01 = 1 4. Implementation and variables
We keep track in the persistent state of a hidden service server of the total number of built rendezvous circuits as well as a counter for number of rendezvous circuits per rendezvous point. Variables: REND_FILTER_MONITOR_PERIOD = count and track the rendezvous circuits built within this latest period. We set it to 24 hours.
REND_FILTER_BAN_PERIOD = for how long we are going to ban a malicious rendezvous point. We set it to 24 hours. REND_FILTER_TOLERANCE = how many circuits we allow per rendezvous point regardless of its middle probability fraction. We set it to 4. REND_FILTER_BAN_ACTIVATION = when we should ban a malicious rendezvous point. To determine this, we are going to get from REND_FILTER_MONITOR_PERIOD the total number of established rendezvous circuits as well as the number of established rendezvous circuits with a given rendezvous point and apply the formula from 3.2 with approximation to the next whole number (example: 5.01 = 6; 5.99 = 6). Example: A hidden service server built 100 circuits in the last 24 hours out of which 5 to a rendezvous point with the middle probability fraction 1% (p = 0.02), so the expected number of circuits we expect is 2, but allow up to maximum 4 (if REND_FILTER_TOLERANCE > REND_FILTER_BAN_ACTIVATION, REND_FILTER_TOLERANCE prevails). In this case we consider this too much of a coincidence and ban this rendezvous point for the next 24 hours. We keep this ban in our persistent state with a timestamp so we'll know when it expires. 5. Ban malicious rendezvous points just for rendezvous circuits, not for anything else
If we treat as relay as a malicious rendezvous point for a given time period, this only means we are not building circuits with it as the rendezvous point, but we don't exclude it for other purpose circuits, regardless which are those other purpose circuits. We may use them in our exit circuits, consensus fetch circuits, as HSDirs and HSDir fetch / post circuits, as Introduction Points or Introduction Point circuits.
6. Non zero probability to assign the malicious rendezvous point flag to an innocent relay
This is indeed a non-zero value and it can be computed with the formula from 3.1, but it's a value so close to zero that we think it's totally worth it. Even if accidentally (low chances) an innocent relay will be banned, this will be something local to the hidden service server. It won't affect that relay at all, nor how other client or hidden service servers treat that relay. It has nothing to do with the network wide consensus as well. A honest client will always retry with a different rendezvous point, so honest clients should not experience reachability issues.
7. Rendezvous points that aren't included in hidden service server's consensus
Clients and hidden service servers might have different consensuses. This means that a hidden service server can be genuinely asked to build a rendezvous circuit to a rendezvous point for which it has no middle probability fraction because it's not included in that consensus. We should allow REND_FILTER_TOLERANCE number of circuits to the rendezvous points for which we have no weights at all and if that rendezvous point appears in a later consensus within the REND_FILTER_MONITOR_PERIOD update its middle probability fraction value to ensure the filter will activate and work properly once REND_FILTER_TOLERANCE threshold was reached. 8. Security implications for maintaining such records in the persistent state of a hidden service server
An attacker which compromises the location of a hidden service server will access this additional information: total number of rendezvous circuits built within the last REND_FILTER_MONITOR_PERIOD and to which rendezvous points these circuits were. This tradeoff should be worth it as well, since if the location of a hidden service server is compromised it is game over anyway, and this additional information shouldn't be so important to the attacker, except maybe for better determining the popularity of that hidden service (this can be determined many other ways, like the logs from the application running behind the hidden service, network counter, datacenter/ISP level counters and logs, etc.).
To avoid co-location linkability between different hidden services, we should keep rendezvous circuits records and counters individually for each hidden service. The logic is designed to adjust by itself dynamically, based on the popularity and usage of a hidden service. If it's a high profile hidden service, experiencing 10000 connections per 24 hours, the REND_FILTER_BAN_ACTIVATION will increase for every rendezvous point. 9. Default option, with possibility to turn off for testing environments and experiments
We add a new torrc option named HiddenServiceRendFilter 0|1 and set it to 1 by default. It can be turned off by setting it to 0, if this will be desired in some testing environments or experiments.
10. Other considerations, limitations and compatibility with other existing proposals This proposal won't disable an attacker entirely, but it will make it much harder and more expensive, given that the attacker will have to run more malicious relays to act as rendezvous points, increasing the capacity of the Tor network overall and directly the costs of the attack. Also, the attacker will have to grow the popularity of a hidden service so that he can establish more rendezvous circuits using the same rendezvous points. This will at least be kind of hard if other genuine clients pick the same rendezvous points and get noisy to the hidden service operator (much more traffic, same number or slightly more users on his service). This proposal is compatible with all the other hidden service related proposals to date. For tools like OnionBalance, each backend / fallback Tor instance included in the master hidden service descriptor will just transparently and individually maintain their own tracking vales and ban list. With proposal 260 - Rendezvous Single Onion Services this should be disabled since it makes no sense. Set HiddenServiceRendFilter 0 in a RSOS type hidden service. This proposal should be quite helpful once we implement enough padding between relays, where an attacker might have to own the entire path in order to distinguish dummy traffic from real traffic. This proposal should fit in nicely with proposal 247 - Vanguards and could make it safe to leave the 3rd hop wild (no layer 3 short term guards) and use only layer 2 medium term guards + layer 1 long term guard since an attacker will be able to establish less circuits to his malicious rendezvous point, thus drastically decreasing the chances of the hidden service server to also pick his malicious relay as 3rd hop to the malicious rendezvous point (layer 2 guards discovery).
[0]: https://en.wikipedia.org/wiki/Binomial_distribution
On Sat, Jan 23, 2016 at 11:38:00PM +0200, s7r wrote:
The attacker is also a Sybil (holds an unknown % of the bandwidth in the Tor network). By making the hidden service server build many circuits to his evil rendezvous points, the attacker gets a high probability that the hidden service server will eventually pick his evil relays in a circuit, so the attacker will trivially perform a successful hidden service guard discovery attack or, with more luck, discover the real location of the hidden service server.
That 'more luck' would involve becoming the guard of the hidden service, yes? I think at that point it doesn't matter whether the attacker controls the rendezvous point.
The hidden service server can only defend itself by building a 3 hop circuit to the rendezvous point, but in practice this is not always enough.
A few more details about "this is not always enough" would be helpful here. In particular, is it not always enough because sometimes even 3 hops is not safe enough, or not always enough besides sometimes making a 3-hop circuit isn't what the HS wants to do? Or something else?
In simple words, we count and keep track of how many rendezvous circuits a hidden service server built and to which rendezvous points. Then, based on the weight (middle probability fraction) of each rendezvous point, we determine if one was insanely overpicked by clients.
A) Can I deny service to a hidden service by methodically pretending to attack it from each honest relay, one at a time, causing it to become upset at each of these relays?
B) Can I fool your reputation system by raising the total number of rendezvous attempts that I attempt, in effect making the hidden service feel more popular so it's not alarmed as much by any single rendezvous point? I could imagine ways to launch a rendezvous attempt that are quite cheap on the part of a client who has no plans to follow through.
Even if accidentally (low chances) an innocent relay will be banned, this will be something local to the hidden service server. It won't affect that relay at all, nor how other client or hidden service servers treat that relay. It has nothing to do with the network wide consensus as well.
A honest client will always retry with a different rendezvous point, so honest clients should not experience reachability issues.
Actually, I don't think this is client behavior right now. (It could be if somebody changed the design of course.)
--Roger
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
On 1/24/2016 12:10 AM, Roger Dingledine wrote:
A few more details about "this is not always enough" would be helpful here. In particular, is it not always enough because sometimes even 3 hops is not safe enough, or not always enough besides sometimes making a 3-hop circuit isn't what the HS wants to do? Or something else?
Not enough in the sense that it can theoretically get Sybiled and end up with at least a guard discovery attack. Since an attacker can request unlimited circuits to an evil rendezvous point, his other evil relays will, with enough retries, end up in the path as well.
A) Can I deny service to a hidden service by methodically pretending to attack it from each honest relay, one at a time, causing it to become upset at each of these relays?
Only if you are the only one connecting to that hidden service and make 5 rendezvous circuits with each relay as a rendezvous point. But after little time the total number of rendezvous circuits will grow so large that you'll have to exponentially have to build more rendezvous circuits with each relay as a rendezvous point to ban them all. So it's a whole lot of work, you'll DDoS the hidden service guard or a lot of other things first, before hitting the limit of this protection.
B) Can I fool your reputation system by raising the total number of rendezvous attempts that I attempt, in effect making the hidden service feel more popular so it's not alarmed as much by any single rendezvous point? I could imagine ways to launch a rendezvous attempt that are quite cheap on the part of a client who has no plans to follow through.
Yes, you could I think. But this has costs and is also visible to the hidden service operator. And we keep count of established rendezvous circuits with streams inside, not failed rendezvous circuits. We only count successes, to make it costly for an attacker.
Actually, I don't think this is client behavior right now. (It could be if somebody changed the design of course.)
Hm, thought it would retry at least one time, if the first rend circuit fails. This can be trivially change though.
Thanks!
s7r:
On 1/24/2016 12:10 AM, Roger Dingledine wrote:
A few more details about "this is not always enough" would be helpful here. In particular, is it not always enough because sometimes even 3 hops is not safe enough, or not always enough besides sometimes making a 3-hop circuit isn't what the HS wants to do? Or something else?
Not enough in the sense that it can theoretically get Sybiled and end up with at least a guard discovery attack. Since an attacker can request unlimited circuits to an evil rendezvous point, his other evil relays will, with enough retries, end up in the path as well.
A) Can I deny service to a hidden service by methodically pretending to attack it from each honest relay, one at a time, causing it to become upset at each of these relays?
Only if you are the only one connecting to that hidden service and make 5 rendezvous circuits with each relay as a rendezvous point. But after little time the total number of rendezvous circuits will grow so large that you'll have to exponentially have to build more rendezvous circuits with each relay as a rendezvous point to ban them all. So it's a whole lot of work, you'll DDoS the hidden service guard or a lot of other things first, before hitting the limit of this protection.
Can you give a more rigorous analysis of this in the proposal, perhaps with more examples/math? This seems to suggest that if someone is mounting an attack, it gets harder and harder for the detector to recognize successive malicious RPs. Something seems amiss here.
B) Can I fool your reputation system by raising the total number of rendezvous attempts that I attempt, in effect making the hidden service feel more popular so it's not alarmed as much by any single rendezvous point? I could imagine ways to launch a rendezvous attempt that are quite cheap on the part of a client who has no plans to follow through.
Yes, you could I think. But this has costs and is also visible to the hidden service operator. And we keep count of established rendezvous circuits with streams inside, not failed rendezvous circuits. We only count successes, to make it costly for an attacker.
They may not be able to differentiate this from a sudden spike in interest in the service, or attention from a scraper or DDoS.
I'm also curious if you intend to combine this with Rend#1, Rend#2, or some other version than the previous thread? From your writing, my guess is you want this to apply this to paths like:
4. C - L - M - R&E -- E - M - L - H
(Let's call this Rend#4).
Is that true?
If so, I'm a bit worried about the timing attack version of the Sybil attack in that case. We're unlikely to ever want to pad all the way across the circuit, which means that an adversary doesn't have to control R&E, but merely watch for connections to their chosen non-malicious R&E *from* their malicious E. If they always chose R&E to be very low-traffic nodes, there will be a very low base rate of normal circuits from E to R&E, making it a high probability that if malicious E sees an extend from a middle M to R&E, it is in the right position in the circuit and wins the Sybil, discovering M.
This makes me think that while this detector is useful for Rend#2 or Rend#4 (modulo Roger's concerns), it still doesn't allow us to abandon Rend#1 completely as a high security option. Or, at least, the proposal as written also lacks the math for us to make it comparable to Rend#1.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
Hello Mike,
Answers inline.
On 1/24/2016 1:56 AM, Mike Perry wrote:
A) Can I deny service to a hidden service by methodically pretending to attack it from each honest relay, one at a time, causing it to become upset at each of these relays?
Only if you are the only one connecting to that hidden service and make 5 rendezvous circuits with each relay as a rendezvous point. But after little time the total number of rendezvous circuits will grow so large that you'll have to exponentially have to build more rendezvous circuits with each relay as a rendezvous point to ban them all. So it's a whole lot of work, you'll DDoS the hidden service guard or a lot of other things first, before hitting the limit of this protection.
Can you give a more rigorous analysis of this in the proposal, perhaps with more examples/math? This seems to suggest that if someone is mounting an attack, it gets harder and harder for the detector to recognize successive malicious RPs. Something seems amiss here.
The system is designed to filter rendezvous points based on the hidden service's popularity within the last 24 hours. It doesn't get necessarily any harder for the hidden service server, it only increases the costs, time required and efforts for an attacker. As mentioned in the proposal, it is not aimed to 100% eliminate the problem (nor we can ever truly eliminate it totally) but it will make it a lot harder for an attacker with very little costs for the Tor network, and here I mean load balancing and capacity, uniform distribution, alternate path design and complexity, layer 2 guards enumeration.
Consider this. We set the REND_FILTER_TOLERANCE to 4. This means that any relay, regardless of its middle probability fraction, will be able to build 4 +1 different rendezvous circuits with a hidden service server, before it gets banned for 24 hours.
So, an attacker will start by, as a client, building 5 rendezvous circuits with each relay in the consensus in order to try to make the hidden service inaccessible.
So he starts, and gets some relays banned. But after the attacker got 300 relays banned as rendezvous points on a hidden service (not affecting the other clients/hidden services in the network in any way) he would have made the hidden service server build 1500 circuits already.
This means that next time he wants to ban a relay as being used for rendezvous by that hidden service, and that relay has a 0.5% middle probability fraction (1% since we allow 2 times more to reduce error margin) he will have to build 1500 x 0.01 = 15 (+1) circuits to get this relay banned also.
After 100 more relays like this getting banned we have at least 1600 more rendezvous circuits, plus the previous 1500 = 3100 rendezvous circuits. And just 300 relays banned (out of ~7000). To continue, now the attacker has to build 3100 x 0.01 = 31 (+1) circuits to get another relay with middle probability fraction 0.5% banned. And so on.
As you can see, it's quite hard to make a hidden service inaccessible via this method, with over 7000 relays in the consensus and only 24 hours of rend circuit history (e.g. relays banned more than 24 hours ago have the ban lifted now).
There are plenty of things which will fail before this protection enables a DDoS attack for a hidden service, like the hidden service's guard which will fail _way_ before an attacker is able to ban ~7000 relays as rendezvous points for a hidden service. This is already an existing problem (bottleneck of hidden service capacity is its guard) so we are not making anything worse, the attack already exists with or without the rendezvous filter.
B) Can I fool your reputation system by raising the total number of rendezvous attempts that I attempt, in effect making the hidden service feel more popular so it's not alarmed as much by any single rendezvous point? I could imagine ways to launch a rendezvous attempt that are quite cheap on the part of a client who has no plans to follow through.
Yes, you could I think. But this has costs and is also visible to the hidden service operator. And we keep count of established rendezvous circuits with streams inside, not failed rendezvous circuits. We only count successes, to make it costly for an attacker.
They may not be able to differentiate this from a sudden spike in interest in the service, or attention from a scraper or DDoS.
Yes, if the attacker acts as a client which just builds proper rendezvous circuits at random rendezvous points, it increases the popularity and we cannot differentiate this from a sudden spike in interest, so the attacker will end up in being able to build little more circuits to a given rendezvous point. (HS popularity in the last 24 hours * relay middle probability fraction / 100 = circuits allowed for that rendezvous point).
Also, hidden services experiencing sudden spikes in interest from genuine clients will not be affected, and will nicely grow popularity with proper distribution among the relays in the consensus acting as rendezvous points, as math, probability theory and statistics tell us.
I'm also curious if you intend to combine this with Rend#1, Rend#2, or some other version than the previous thread? From your writing, my guess is you want this to apply this to paths like:
- C - L - M - R&E -- E - M - L - H
(Let's call this Rend#4).
Is that true?
No, this is what I intend to avoid, all the rend#n alternate paths. I want us to avoid having to do complex alternate paths and maintain different path selection. My idea is to leave clients as they are now - - no change, and just implement 4 medium term layer 2 guards (vanguards) at hidden service server side. So it will look like this:
C - L - E - R -- E - M - L - H
Plain and simple. If one rendezvous point can initiate a LIMITED number of circuits per 24 hours with one hidden service, based on its middle probability fraction, it becomes less likely to also pick this attacker's evil relay as 3rd hop in a rend circuit (from hidden service server side). Combined with a good random rotation period for the layer 2 vanguards we should keep them safe and maintain an acceptable low probability for an attacker to learn them. Of course the risk is non zero and will always be non zero regardless what we do - - we can only put things in balance (load balancing, complexity, etc.) and make sure we make the success rate of an attacker as low as we can and in the same time increase his costs.
If so, I'm a bit worried about the timing attack version of the Sybil attack in that case. We're unlikely to ever want to pad all the way across the circuit, which means that an adversary doesn't have to control R&E, but merely watch for connections to their chosen non-malicious R&E *from* their malicious E. If they always chose R&E to be very low-traffic nodes, there will be a very low base rate of normal circuits from E to R&E, making it a high probability that if malicious E sees an extend from a middle M to R&E, it is in the right position in the circuit and wins the Sybil, discovering M.
This makes me think that while this detector is useful for Rend#2 or Rend#4 (modulo Roger's concerns), it still doesn't allow us to abandon Rend#1 completely as a high security option. Or, at least, the proposal as written also lacks the math for us to make it comparable to Rend#1.
Still, even if it does not make the risk 0, it makes it way smaller than it currently is, which is good. Aside proposal 247 and aside alternate paths, it's not normal for a relay with 0.01% middle probability to be able to act as rendezvous point in UNLIMITED number of rendezvous circuits with the same hidden service. I am excluding clients because they are the ones who pick it, so they are not at risk here. If we can mitigate this by keeping a table with the rendezvous history for the last 24 hours it should be worth the effort I think.
Thanks for your time and looking forward to answer more questions.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
Sorry for posting 2 times, want to highlight something which doesn't read clear.
On 1/24/2016 4:44 AM, s7r wrote:
Consider this. We set the REND_FILTER_TOLERANCE to 4. This means that any relay, regardless of its middle probability fraction, will be able to build 4 +1 different rendezvous circuits with a hidden service server, before it gets banned for 24 hours.
I mean any relay, regardless of its middle probability fraction, will be able to be used as rendezvous point in at least 4 rendezvous circuits in 24 hours with the same hidden service, before it gets banned for 24 hours (the ban will be effective only for that particular hidden service, other hidden services will use it normally until it annoys them too).
So, an attacker will start by, as a client, building 5 rendezvous circuits with each relay in the consensus in order to try to make the hidden service inaccessible.
I mean building at least 5 different rendezvous circuits within 24 hours with the same relay as the rendezvous point to the same hidden service.
So he starts, and gets some relays banned. But after the attacker got 300 relays banned as rendezvous points on a hidden service (not affecting the other clients/hidden services in the network in any way) he would have made the hidden service server build 1500 circuits already.
This means that next time he wants to ban a relay as being used for rendezvous by that hidden service, and that relay has a 0.5% middle probability fraction (1% since we allow 2 times more to reduce error margin) he will have to build 1500 x 0.01 = 15 (+1) circuits to get this relay banned also.
After 100 more relays like this getting banned we have at least 1600 more rendezvous circuits, plus the previous 1500 = 3100 rendezvous circuits. And just 300 relays banned (out of ~7000). To continue, now the attacker has to build 3100 x 0.01 = 31 (+1) circuits to get another relay with middle probability fraction 0.5% banned. And so on.
Sorry, it's 400 relays banned here (300 + 100). I mean the attacker has to build another 31 (+1) different rendezvous circuits with the next relay as the rendezvous point to the same hidden service to get it banned as well.
As you can see, it's quite hard to make a hidden service inaccessible via this method, with over 7000 relays in the consensus and only 24 hours of rend circuit history (e.g. relays banned more than 24 hours ago have the ban lifted now).
There are plenty of things which will fail before this protection enables a DDoS attack for a hidden service, like the hidden service's guard which will fail _way_ before an attacker is able to ban ~7000 relays as rendezvous points for a hidden service. This is already an existing problem (bottleneck of hidden service capacity is its guard) so we are not making anything worse, the attack already exists with or without the rendezvous filter.
Hi s7r,
Great proposal, I really like it - I think it targets the actual behaviour we want to prevent in a less complex manner than the HS 3rd-hop alternatives being discussed for prop247.
Some general questions:
* Do we want to make this behaviour (somewhat) symmetric, so that a client which sees a hidden service choose the same introduction point(s) for N periods will refuse to use that introduction point?
* This will break some Tor2Web installations, which deliberately choose rendezvous points on the same server or network for latency reasons. (Forcing Tor2Web installations to choose multiple RPs may be a worthwhile security tradeoff.)
On 24 Jan 2016, at 08:38, s7r s7r@sky-ip.org wrote:
- Security implications for maintaining such records in the
persistent state of a hidden service server
An attacker which compromises the location of a hidden service server will access this additional information: total number of rendezvous circuits built within the last REND_FILTER_MONITOR_PERIOD and to which rendezvous points these circuits were. This tradeoff should be worth it as well, since if the location of a hidden service server is compromised it is game over anyway, and this additional information shouldn't be so important to the attacker, except maybe for better determining the popularity of that hidden service (this can be determined many other ways, like the logs from the application running behind the hidden service, network counter, datacenter/ISP level counters and logs, etc.).
To reduce the sensitivity of these counters, we should discard them after a certain period of time, perhaps every hidden service period (24 hours?)
We could also add noise and/or round into buckets, like we do for other statistics, but I'm not sure there's much point, as they are never seen outside the hidden service.
Tim
Tim Wilson-Brown (teor)
teor2345 at gmail dot com PGP 968F094B
teor at blah dot im OTR CAD08081 9755866D 89E2A06F E3558B7F B5A9D14F
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
Hi teor,
On 1/24/2016 12:11 AM, Tim Wilson-Brown - teor wrote:
Hi s7r,
Great proposal, I really like it - I think it targets the actual behaviour we want to prevent in a less complex manner than the HS 3rd-hop alternatives being discussed for prop247.
Thanks!
Some general questions:
- Do we want to make this behaviour (somewhat) symmetric, so that a
client which sees a hidden service choose the same introduction point(s) for N periods will refuse to use that introduction point?
I don't think so. I am not concerned about clients, given that they cannot be made to build circuits, only they initiate them. Plus the client builds normal circuit to the introduction points afaik, and we don't want to overload the client with more stuff to process. For the hidden service servers, it's totally worth the effort.
- This will break some Tor2Web installations, which deliberately
choose rendezvous points on the same server or network for latency reasons. (Forcing Tor2Web installations to choose multiple RPs may be a worthwhile security tradeoff.)
Yes, but there is a HiddenServiceRendFilter 0 in the proposal for this purpose and for RSOS services as well.
- Security implications for maintaining such records in the
persistent state of a hidden service server
An attacker which compromises the location of a hidden service server will access this additional information: total number of rendezvous circuits built within the last REND_FILTER_MONITOR_PERIOD and to which rendezvous points these circuits were. This tradeoff should be worth it as well, since if the location of a hidden service server is compromised it is game over anyway, and this additional information shouldn't be so important to the attacker, except maybe for better determining the popularity of that hidden service (this can be determined many other ways, like the logs from the application running behind the hidden service, network counter, datacenter/ISP level counters and logs, etc.).
To reduce the sensitivity of these counters, we should discard them after a certain period of time, perhaps every hidden service period (24 hours?)
We could also add noise and/or round into buckets, like we do for other statistics, but I'm not sure there's much point, as they are never seen outside the hidden service.
Hmm, we should think about this. The 24 hours will in fact be the latest 24 hours as of $now, something like this.
On 24 Jan 2016, at 09:28, s7r s7r@sky-ip.org wrote:
- This will break some Tor2Web installations, which deliberately
choose rendezvous points on the same server or network for latency reasons. (Forcing Tor2Web installations to choose multiple RPs may be a worthwhile security tradeoff.)
Yes, but there is a HiddenServiceRendFilter 0 in the proposal for this purpose and for RSOS services as well.
But that doesn't help clients to connect to every hidden service using the same rendezvous points for every connection, unless every hidden service sets HiddenServiceRendFilter 0.
Tor2Web instances are Tor clients that are configured as reverse proxies to the entire Tor hidden service address space.
HiddenServiceRendFilter 0 only modifies the behaviour of one hidden service with Tor2Web. But because Tor2Web is a client which is configured to use the same rendezvous point(s) for every hidden service connection, it will get banned if it connects to the same hidden service too many times.
Tim
Tim Wilson-Brown (teor)
teor2345 at gmail dot com PGP 968F094B
teor at blah dot im OTR CAD08081 9755866D 89E2A06F E3558B7F B5A9D14F
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
On 1/24/2016 1:51 AM, Tim Wilson-Brown - teor wrote:
On 24 Jan 2016, at 09:28, s7r <s7r@sky-ip.org mailto:s7r@sky-ip.org> wrote:
- This will break some Tor2Web installations, which
deliberately choose rendezvous points on the same server or network for latency reasons. (Forcing Tor2Web installations to choose multiple RPs may be a worthwhile security tradeoff.)
Yes, but there is a HiddenServiceRendFilter 0 in the proposal for this purpose and for RSOS services as well.
But that doesn't help clients to connect to every hidden service using the same rendezvous points for every connection, unless every hidden service sets HiddenServiceRendFilter 0.
Tor2Web instances are Tor clients that are configured as reverse proxies to the entire Tor hidden service address space.
HiddenServiceRendFilter 0 only modifies the behaviour of one hidden service with Tor2Web. But because Tor2Web is a client which is configured to use the same rendezvous point(s) for every hidden service connection, it will get banned if it connects to the same hidden service too many times.
Negative.
A Tor2Web instance runs as a normal Tor client and are configured as reverse proxies.
This means that Tor2Web runs as _normal_ and _genuine_ client, and picks rendezvous points randomly as per Tor's path selection. They do not request a hidden service server to connect 100 times to a middle relay (rendezvous point) with consensus weight of 0.01%, which is what we are trying to mitigate. So a super popular Tor2Web service might initiate 10000 rendezvous circuits in few hours, but they will all be at multiple rendezvous points, according to their weight and middle fraction probability, which is OK in our model and will not be affected.
Re-using circuits is not a problem either in our model, because if a Tor2Web client builds a rendezvous circuit with a hidden service to rendezvous point X, and has 500 users that want to access that hidden service, it will maintain the same (a single) rendezvous circuit to point X until it's unused for some period of time and gets killed, which means the hidden service server will count only 1 circuit, also OK in our model and will not be affected.
On 24 Jan 2016, at 13:04, s7r s7r@sky-ip.org wrote:
Signed PGP part
On 1/24/2016 1:51 AM, Tim Wilson-Brown - teor wrote:
On 24 Jan 2016, at 09:28, s7r <s7r@sky-ip.org mailto:s7r@sky-ip.org> wrote:
- This will break some Tor2Web installations, which
deliberately choose rendezvous points on the same server or network for latency reasons. (Forcing Tor2Web installations to choose multiple RPs may be a worthwhile security tradeoff.)
Yes, but there is a HiddenServiceRendFilter 0 in the proposal for this purpose and for RSOS services as well.
...
HiddenServiceRendFilter 0 only modifies the behaviour of one hidden service with Tor2Web. But because Tor2Web is a client which is configured to use the same rendezvous point(s) for every hidden service connection, it will get banned if it connects to the same hidden service too many times.
Negative.
A Tor2Web instance runs as a normal Tor client and are configured as reverse proxies.
This means that Tor2Web runs as _normal_ and _genuine_ client, and picks rendezvous points randomly as per Tor's path selection. They do not request a hidden service server to connect 100 times to a middle relay (rendezvous point) with consensus weight of 0.01%, which is what we are trying to mitigate. So a super popular Tor2Web service might initiate 10000 rendezvous circuits in few hours, but they will all be at multiple rendezvous points, according to their weight and middle fraction probability, which is OK in our model and will not be affected.
Please read the tor man page documentation for the option Tor2webRendezvousPoints. It's implemented in the function pick_tor2web_rendezvous_node().
Under this proposal, what is the smallest number of rendezvous points a Tor2web instance would need to specify in Tor2webRendezvousPoints to avoid being banned, or does it vary depending on the popularity of the hidden service?
Tim
Tim Wilson-Brown (teor)
teor2345 at gmail dot com PGP 968F094B
teor at blah dot im OTR CAD08081 9755866D 89E2A06F E3558B7F B5A9D14F
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
Hi teor,
On 1/24/2016 6:33 AM, Tim Wilson-Brown - teor wrote:
Please read the tor man page documentation for the option Tor2webRendezvousPoints. It's implemented in the function pick_tor2web_rendezvous_node().
Under this proposal, what is the smallest number of rendezvous points a Tor2web instance would need to specify in Tor2webRendezvousPoints to avoid being banned, or does it vary depending on the popularity of the hidden service?
Ideally pick_tor2web_rendezvous_node() should pick a random relay to act as a rendezvous point, based on its consensus weight / middle probability fraction, exactly the same as any regular client would do.
Obviously a tor2web service can skip the Guard since it doesn't care about anonymity, for lower latency to the hidden service.
For load balancing and capacity and everything, a tor2web service should always choose a random rendezvous relay for every circuit, and ideally (don't know if it's possible in reverse proxy mode) re-use the same circuit to the same hidden service if there are multiple visitors (tor2web clients) asking to connect to the same hidden service. If this is not possible (using one circuit per hidden service destination, regardless if there are more people accessing that hidden service address via tor2web) at least it should pick randomly rendezvous relays for every circuit.
This way, no rendezvous nodes will be banned. And the number of rendezvous circuits we allow with a single relay as rendezvous point is of course dependent on the popularity of the hidden service (total number of rendezvous circuits built by that hidden service in the last 24 hours). So if a hidden service experienced 10000 rendezvous circuits in the last 24 hours, a relay with middle probability fraction of 0.5% is able to build at least 10000 * 0.005 = 50 new rendezvous circuits before might get banned for the next 24 hours by that hidden service.
P.S. Picking the same rendezvous relays in tor2web services sounds like we are hammering them too much.
On 25 Jan 2016, at 03:10, s7r s7r@sky-ip.org wrote:
Signed PGP part Hi teor,
On 1/24/2016 6:33 AM, Tim Wilson-Brown - teor wrote:
Please read the tor man page documentation for the option Tor2webRendezvousPoints. It's implemented in the function pick_tor2web_rendezvous_node().
Under this proposal, what is the smallest number of rendezvous points a Tor2web instance would need to specify in Tor2webRendezvousPoints to avoid being banned, or does it vary depending on the popularity of the hidden service?
Ideally pick_tor2web_rendezvous_node() should pick a random relay to act as a rendezvous point, based on its consensus weight / middle probability fraction, exactly the same as any regular client would do.
This isn't how the current Tor2web implementation works. If your proposal will break the Tor2webRendezvousPoints option, or require a minimum number of relays to be specified in that option, you need to document these implementation changes or new configuration constraints clearly in the proposal.
... For load balancing and capacity and everything, a tor2web service should always choose a random rendezvous relay for every circuit, and ideally (don't know if it's possible in reverse proxy mode) re-use the same circuit to the same hidden service if there are multiple visitors (tor2web clients) asking to connect to the same hidden service.
I'm not sure whether different Tor2web web clients get different hidden service circuits. But as it's critical to the correct functioning of this proposal with Tor2web mode, I'd encourage you to find out, and update the proposal accordingly.
If this is not possible (using one circuit per hidden service destination, regardless if there are more people accessing that hidden service address via tor2web) at least it should pick randomly rendezvous relays for every circuit.
This isn't how Tor2web mode works. If your proposal needs the design to be changed, then please document that.
This way, no rendezvous nodes will be banned. And the number of rendezvous circuits we allow with a single relay as rendezvous point is of course dependent on the popularity of the hidden service (total number of rendezvous circuits built by that hidden service in the last 24 hours). So if a hidden service experienced 10000 rendezvous circuits in the last 24 hours, a relay with middle probability fraction of 0.5% is able to build at least 10000 * 0.005 = 50 new rendezvous circuits before might get banned for the next 24 hours by that hidden service.
So the number of rendezvous points required in Tor2webRendezvousPoints depends on the load on the particular hidden service, the number of web clients connecting to the particular hidden service via the Tor2web instance, and whether those clients get the same rendezvous circuit or not.
Such a non-deterministic requirement essentially breaks the Tor2webRendezvousPoints feature.
P.S. Picking the same rendezvous relays in tor2web services sounds like we are hammering them too much.
That's precisely why the option exists. Tor2web clients can choose a rendezvous relay controlled by the Tor2web operator, on the same machine or same network. This increases the speed of Tor2web connections.
And the operator can control for load on relays they control much more precisely than picking a random rendezvous point.
Tim
Tim Wilson-Brown (teor)
teor2345 at gmail dot com PGP 968F094B
teor at blah dot im OTR CAD08081 9755866D 89E2A06F E3558B7F B5A9D14F
On 23 Jan (23:38:00), s7r wrote:
Hello,
Inspired by asn, dgoulet and Mike Perry discussing alternate path lengths for proposal 247, I wrote a proposal that could fit in nicely with proposal 247, maybe even allow us to make the 3rd hop wild (random) so we don't have to worry about HSDirs + HSDir fetch/post circuits and introduction points + introduction points circuits linking short term layer 3 guards.
Please find the proposal attached as well as inline for comments. Looking forward for reviews.
If I've structured it wrong or something I apologize in advance - will do better next time.
Filename: xxx-malicious-rendezvous-point-filter.txt Title: Filtering malicious rendezvous points at hidden service server side Authors: s7r [ 0x837FA52C81265B11 ] Created: 23 January 2016 Status: Draft
- Definitions
client = an user running Tor as a client, which connects to a hidden service. hidden service server = the Tor instance hosting the hidden service. rendezvous point = the relay which acts as the rendezvous meeting point in a circuit between a hidden service client and a hidden service server.
- Motivation
A hidden service circuit (rendezvous circuit) is always initiated by the client so the relay which should act as the rendezvous meeting point is selected by the client. We assume that any client can be an attacker and will try to make the hidden service server connect to a malicious rendezvous point under his control.
The attacker is also a Sybil (holds an unknown % of the bandwidth in the Tor network). By making the hidden service server build many circuits to his evil rendezvous points, the attacker gets a high probability that the hidden service server will eventually pick his evil relays in a circuit, so the attacker will trivially perform a successful hidden service guard discovery attack or, with more luck, discover the real location of the hidden service server.
The hidden service server can only defend itself by building a 3 hop circuit to the rendezvous point, but in practice this is not always enough.
- Logic for filtering malicious rendezvous points
To be able to make fair assumptions about which rendezvous point might be malicious or not we are going to use the probability theory and statistics, precisely the binominal distribution equation [0].
In simple words, we count and keep track of how many rendezvous circuits a hidden service server built and to which rendezvous points. Then, based on the weight (middle probability fraction) of each rendezvous point, we determine if one was insanely overpicked by clients. Think of this like: Alice plays a game with Bob in which Bob has the success probability to draw a certain number n%. In practice however, he shows a success rate of more than (n*2)% (2 times bigger), Alice assumes Bob is not playing nice and will refuse to play with him for the next 24 hours or so.
Fast example: A hidden service server built 100 rendezvous circuits in the last 24 hours, out of which in 7 of them the rendezvous point was a relay with the middle probability fraction of 0.5%. We assume that rendezvous point is malicious and refuse to build rendezvous circuits to it for the next 24 hours.
Can you explain to me why 24 hours here? Why not 12 hours or one hour? I'm fine if it's some values that popped in your mind but if it's not, explaining why would be helpful.
- Equations and formulas
3.1 Probability of different clients genuinely picking the same rendezvous point n = total number of established rendezvous circuits. k = number of clients that could have genuinely picked the same rendezvous point. p = probability of picking that rendezvous point per client (middle probability fraction of the relay) divided to 100. H = Probability mass function expanded. C = The result of the equation, the value we need to determine multiplied with 100 to have the percent value.
Formulas: H = n! / k!^(n - k)! C = H * p^k * (1 - p)^n - k
Example: n = 100 k = 9 p = 0.5% / 100 = 0.005
H = 100! / 9!^91! = 20023492720 C = 20023492720 * 0.000000000000000000001953125 * 0.63372, ~ 0.00000000002478376524710625 * 100, ~ 0.000000002478376524710625%
3.2 Expected number of clients to pick the same rendezvous point n = Total number of established rendezvous circuits. p = Probability of picking that rendezvous point per client (middle probability fraction of the relay) divided to 100. We multiply this with 2 intentionally so we will allow 2 times more circuits than we would expect based on this formula - this should reduce the error margin enough. C = The result of the equation, the value we need to determine.
Formula: C = n * p
Example: n = 100 p = 0.5% / 100 = 0.005 * 2 = 0.01
C = 100 * 0.01 = 1
- Implementation and variables
We keep track in the persistent state of a hidden service server of the total number of built rendezvous circuits as well as a counter for number of rendezvous circuits per rendezvous point.
Variables: REND_FILTER_MONITOR_PERIOD = count and track the rendezvous circuits built within this latest period. We set it to 24 hours.
REND_FILTER_BAN_PERIOD = for how long we are going to ban a malicious rendezvous point. We set it to 24 hours.
REND_FILTER_TOLERANCE = how many circuits we allow per rendezvous point regardless of its middle probability fraction. We set it to 4.
REND_FILTER_BAN_ACTIVATION = when we should ban a malicious rendezvous point. To determine this, we are going to get from REND_FILTER_MONITOR_PERIOD the total number of established rendezvous circuits as well as the number of established rendezvous circuits with a given rendezvous point and apply the formula from 3.2 with approximation to the next whole number (example: 5.01 = 6; 5.99 = 6).
Example: A hidden service server built 100 circuits in the last 24 hours out of which 5 to a rendezvous point with the middle probability fraction 1% (p = 0.02), so the expected number of circuits we expect is 2, but allow up to maximum 4 (if REND_FILTER_TOLERANCE > REND_FILTER_BAN_ACTIVATION, REND_FILTER_TOLERANCE prevails). In this case we consider this too much of a coincidence and ban this rendezvous point for the next 24 hours. We keep this ban in our persistent state with a timestamp so we'll know when it expires.
What I would like to see in this proposal, and this might ties to what Mike asked, is a table that shows the effort needed for an attacker that tries to blacklist all nodes. How much circuit per RP will be needed as the attack evolve. 10 RP == how many circuits, then 100 then 1000. If I had 200 clients opening circuits, how much time and circuits will I need too blacklist all RP?
Your explanation on another email about this explains how the more circuits the more difficult it gets to black list node but still I'm not convinced that it's that difficult. The overloaded Guard argument can make sense here but 24h is along time so an attacker can try multiple patterns of circuits that might not overload the hs guard.
- Ban malicious rendezvous points just for rendezvous circuits, not
for anything else
If we treat as relay as a malicious rendezvous point for a given time period, this only means we are not building circuits with it as the rendezvous point, but we don't exclude it for other purpose circuits, regardless which are those other purpose circuits. We may use them in our exit circuits, consensus fetch circuits, as HSDirs and HSDir fetch / post circuits, as Introduction Points or Introduction Point circuits.
- Non zero probability to assign the malicious rendezvous point flag
to an innocent relay
This is indeed a non-zero value and it can be computed with the formula from 3.1, but it's a value so close to zero that we think it's totally worth it.
Even if accidentally (low chances) an innocent relay will be banned, this will be something local to the hidden service server. It won't affect that relay at all, nor how other client or hidden service servers treat that relay. It has nothing to do with the network wide consensus as well.
A honest client will always retry with a different rendezvous point, so honest clients should not experience reachability issues.
I confirm that Roger is right. See circuit_get_ready_rend_circ_by_rend_data() and how it's used, client will reuse a rend circuit as long as it's alive.
That can complicates thing a bit here because if by any chance that RP gets blacklisted then the client will retry some more times and ultimately failing to connect. Implementing this defense will probably also required some client tweaks...
Thanks! David
- Rendezvous points that aren't included in hidden service server's
consensus
Clients and hidden service servers might have different consensuses. This means that a hidden service server can be genuinely asked to build a rendezvous circuit to a rendezvous point for which it has no middle probability fraction because it's not included in that consensus. We should allow REND_FILTER_TOLERANCE number of circuits to the rendezvous points for which we have no weights at all and if that rendezvous point appears in a later consensus within the REND_FILTER_MONITOR_PERIOD update its middle probability fraction value to ensure the filter will activate and work properly once REND_FILTER_TOLERANCE threshold was reached.
- Security implications for maintaining such records in the
persistent state of a hidden service server
An attacker which compromises the location of a hidden service server will access this additional information: total number of rendezvous circuits built within the last REND_FILTER_MONITOR_PERIOD and to which rendezvous points these circuits were. This tradeoff should be worth it as well, since if the location of a hidden service server is compromised it is game over anyway, and this additional information shouldn't be so important to the attacker, except maybe for better determining the popularity of that hidden service (this can be determined many other ways, like the logs from the application running behind the hidden service, network counter, datacenter/ISP level counters and logs, etc.).
To avoid co-location linkability between different hidden services, we should keep rendezvous circuits records and counters individually for each hidden service.
The logic is designed to adjust by itself dynamically, based on the popularity and usage of a hidden service. If it's a high profile hidden service, experiencing 10000 connections per 24 hours, the REND_FILTER_BAN_ACTIVATION will increase for every rendezvous point.
- Default option, with possibility to turn off for testing
environments and experiments
We add a new torrc option named HiddenServiceRendFilter 0|1 and set it to 1 by default. It can be turned off by setting it to 0, if this will be desired in some testing environments or experiments.
- Other considerations, limitations and compatibility with other
existing proposals
This proposal won't disable an attacker entirely, but it will make it much harder and more expensive, given that the attacker will have to run more malicious relays to act as rendezvous points, increasing the capacity of the Tor network overall and directly the costs of the attack. Also, the attacker will have to grow the popularity of a hidden service so that he can establish more rendezvous circuits using the same rendezvous points. This will at least be kind of hard if other genuine clients pick the same rendezvous points and get noisy to the hidden service operator (much more traffic, same number or slightly more users on his service).
This proposal is compatible with all the other hidden service related proposals to date. For tools like OnionBalance, each backend / fallback Tor instance included in the master hidden service descriptor will just transparently and individually maintain their own tracking vales and ban list.
With proposal 260 - Rendezvous Single Onion Services this should be disabled since it makes no sense. Set HiddenServiceRendFilter 0 in a RSOS type hidden service.
This proposal should be quite helpful once we implement enough padding between relays, where an attacker might have to own the entire path in order to distinguish dummy traffic from real traffic.
This proposal should fit in nicely with proposal 247 - Vanguards and could make it safe to leave the 3rd hop wild (no layer 3 short term guards) and use only layer 2 medium term guards + layer 1 long term guard since an attacker will be able to establish less circuits to his malicious rendezvous point, thus drastically decreasing the chances of the hidden service server to also pick his malicious relay as 3rd hop to the malicious rendezvous point (layer 2 guards discovery).
Filename: xxx-malicious-rendezvous-point-filter.txt Title: Filtering malicious rendezvous points at hidden service server side Authors: s7r [ 0x837FA52C81265B11 ] Created: 23 January 2016 Status: Draft
Definitions
client = an user running Tor as a client, which connects to a hidden service. hidden service server = the Tor instance hosting the hidden service. rendezvous point = the relay which acts as the rendezvous meeting point in a circuit between a hidden service client and a hidden service server.
Motivation
A hidden service circuit (rendezvous circuit) is always initiated by the client so the relay which should act as the rendezvous meeting point is selected by the client. We assume that any client can be an attacker and will try to make the hidden service server connect to a malicious rendezvous point under his control.
The attacker is also a Sybil (holds an unknown % of the bandwidth in the Tor network). By making the hidden service server build many circuits to his evil rendezvous points, the attacker gets a high probability that the hidden service server will eventually pick his evil relays in a circuit, so the attacker will trivially perform a successful hidden service guard discovery attack or, with more luck, discover the real location of the hidden service server.
The hidden service server can only defend itself by building a 3 hop circuit to the rendezvous point, but in practice this is not always enough.
Logic for filtering malicious rendezvous points
To be able to make fair assumptions about which rendezvous point might be malicious or not we are going to use the probability theory and statistics, precisely the binominal distribution equation [0].
In simple words, we count and keep track of how many rendezvous circuits a hidden service server built and to which rendezvous points. Then, based on the weight (middle probability fraction) of each rendezvous point, we determine if one was insanely overpicked by clients. Think of this like: Alice plays a game with Bob in which Bob has the success probability to draw a certain number n%. In practice however, he shows a success rate of (n*2)% (2 times bigger), Alice assumes Bob is not playing nice and will refuse to play with him for the next 24 hours or so.
Fast example: A hidden service server built 100 rendezvous circuits in the last 24 hours, out of which in 5 of them the rendezvous point was a relay with the middle probability fraction of 0.5%. We assume that rendezvous point is malicious and refuse to build rendezvous circuits to it for the next 24 hours.
Equations and formulas
3.1 Probability of different clients genuinely picking the same rendezvous point n = total number of established rendezvous circuits. k = number of clients that could have genuinely picked the same rendezvous point. p = probability of picking that rendezvous point per client (middle probability fraction of the relay) divided to 100. H = Probability mass function expanded. C = The result of the equation, the value we need to determine multiplied with 100 to have the percent value.
Formulas: H = n! / k!^(n - k)! C = H * p^k * (1 - p)^n - k
Example: n = 100 k = 9 p = 0.5% / 100 = 0.005
H = 100! / 9!^91! = 20023492720 C = 20023492720 * 0.000000000000000000001953125 * 0.63372, ~ 0.00000000002478376524710625 * 100, ~ 0.000000002478376524710625%
3.2 Expected number of clients to pick the same rendezvous point n = Total number of established rendezvous circuits. p = Probability of picking that rendezvous point per client (middle probability fraction of the relay) divided to 100. We multiply this with 2 intentionally so we will allow 2 times more circuits than we would expect based on this formula - this should reduce the error margin enough. C = The result of the equation, the value we need to determine.
Formula: C = n * p
Example: n = 100 p = 0.5% / 100 = 0.005 * 2 = 0.01
C = 100 * 0.01 = 1
Implementation and variables
We keep track in the persistent state of a hidden service server of the total number of built rendezvous circuits as well as a counter for number of rendezvous circuits per rendezvous point.
Variables: REND_FILTER_MONITOR_PERIOD = count and track the rendezvous circuits built within this latest period. We set it to 24 hours.
REND_FILTER_BAN_PERIOD = for how long we are going to ban a malicious rendezvous point. We set it to 24 hours.
REND_FILTER_TOLERANCE = how many circuits we allow per rendezvous point regardless of its middle probability fraction. We set it to 4.
REND_FILTER_BAN_ACTIVATION = when we should ban a malicious rendezvous point. To determine this, we are going to get from REND_FILTER_MONITOR_PERIOD the total number of established rendezvous circuits as well as the number of established rendezvous circuits with a given rendezvous point and apply the formula from 3.2 with approximation to the next whole number (example: 5.01 = 6; 5.99 = 6).
Example: A hidden service server built 100 circuits in the last 24 hours out of which 5 to a rendezvous point with the middle probability fraction 1% (p = 0.01), so the expected number of circuits we expect is 2, but allow up to maximum 4 (if REND_FILTER_TOLERANCE > REND_FILTER_BAN_ACTIVATION, REND_FILTER_TOLERANCE prevails). In this case we consider this too much of a coincidence and ban this rendezvous point for the next 24 hours. We keep this ban in our persistent state with a timestamp so we'll know when it expires.
Ban malicious rendezvous points just for rendezvous circuits, not for anything else
If we treat as relay as a malicious rendezvous point for a given time period, this only means we are not building circuits with it as the rendezvous point, but we don't exclude it for other purpose circuits, regardless which are those other purpose circuits. We may use them in our exit circuits, consensus fetch circuits, as HSDirs and HSDir fetch / post circuits, as Introduction Points or Introduction Point circuits.
Non zero probability to assign the malicious rendezvous point flag to an innocent relay
This is indeed a non-zero value and it can be computed with the formula from 3.1, but it's a value so close to zero that we think it's totally worth it.
Even if accidentally (low chances) an innocent relay will be banned, this will be something local to the hidden service server. It won't affect that relay at all, nor how other client or hidden service servers treat that relay. It has nothing to do with the network wide consensus as well.
A honest client will always retry with a different rendezvous point, so honest clients should not experience reachability issues.
Rendezvous points that aren't included in hidden service server's consensus
Clients and hidden service servers might have different consensuses. This means that a hidden service server can be genuinely asked to build a rendezvous circuit to a rendezvous point for which it has no middle probability fraction because it's not included in that consensus. We should allow REND_FILTER_TOLERANCE number of circuits to the rendezvous points for which we have no weights at all and if that rendezvous point appears in a later consensus within the REND_FILTER_MONITOR_PERIOD update its middle probability fraction value to ensure the filter will activate and work properly once REND_FILTER_TOLERANCE threshold was reached.
Security implications for maintaining such records in the persistent state of a hidden service server
An attacker which compromises the location of a hidden service server will access this additional information: total number of rendezvous circuits built within the last REND_FILTER_MONITOR_PERIOD and to which rendezvous points these circuits were. This tradeoff should be worth it as well, since if the location of a hidden service server is compromised it is game over anyway, and this additional information shouldn't be so important to the attacker, except maybe for better determining the popularity of that hidden service (this can be determined many other ways, like the logs from the application running behind the hidden service, network counter, datacenter/ISP level counters and logs, etc.).
To avoid co-location linkability between different hidden services, we should keep rendezvous circuits records and counters individually for each hidden service.
The logic is designed to adjust by itself dynamically, based on the popularity and usage of a hidden service. If it's a high profile hidden service, experiencing 10000 connections per 24 hours, the REND_FILTER_BAN_ACTIVATION will increase for every rendezvous point.
Default option, with possibility to turn off for testing environments and experiments
We add a new torrc option named HiddenServiceRendFilter 0|1 and set it to 1 by default. It can be turned off by setting it to 0, if this will be desired in some testing environments or experiments.
Other considerations, limitations and compatibility with other existing proposals
This proposal won't disable an attacker entirely, but it will make it much harder and more expensive, given that the attacker will have to run more malicious relays to act as rendezvous points, increasing the capacity of the Tor network overall and directly the costs of the attack. Also, the attacker will have to grow the popularity of a hidden service so that he can establish more rendezvous circuits using the same rendezvous points. This will at least be kind of hard if other genuine clients pick the same rendezvous points and get noisy to the hidden service operator (much more traffic, same number or slightly more users on his service).
This proposal is compatible with all the other hidden service related proposals to date. For tools like OnionBalance, each backend / fallback Tor instance included in the master hidden service descriptor will just transparently and individually maintain their own tracking valuse and ban list.
With proposal 260 - Rendezvous Single Onion Services this should be disabled since it makes no sense. Set HiddenServiceRendFilter 0 in a RSOS type hidden service.
This proposal should be quite helpful once we implement enough padding between relays, where an attacker might have to own the entire path in order to distinguish dummy traffic from real traffic.
This proposal should fit in nicely with proposal 247 - Vanguards and could make it safe to leave the 3rd hop wild (no layer 3 short term guards) and use only layer 2 medium term guards + layer 1 long term guard since an attacker will be able to establish less circuits to his malicious rendezvous point, thus drastically decreasing the chances of the hidden service server to also pick his malicious relay as 3rd hop to the malicious rendezvous point (layer 2 guards discovery).
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev