Hello tor-dev,
While helping design ways to publish statistics about hidden services in a privacy-preserving manner, it has become clear to me that certain statistics cannot be safely reported using the current method of having each relay collect and report measurements. I am going to describe a couple of simple protocols to handle this problem that I think should be implementable without much effort. I'd be happy to get feedback in particular about the security or ease-of-implementation of these protocols.
Two HS statistics that we (i.e. people working on Sponsor R) are interested in collecting are: 1. The number of descriptor fetches received by a hidden-service directory (HSDir) 2. The number of client introduction requests at an introduction points (IPs) The privacy issue with #1 is that the set of HSDirs is (likely) unique to an HS, and so the number of descriptor fetches at its HSDirs could reveal the number of clients it had during a measurement period. Similarly, the privacy issue with #2 is that the set of IPs are (likely) unique to an HS, and so the number of client introductions at its IPs could reveal the number of client connections it received.
A approach to solve this problem would be to anonymize the reported statistics. Doing so raises a couple of challenges, however: 1. Anonymous statistics should be authenticated as coming from some relay. Otherwise, statistics could be polluted by any malicious actor. 2. Statistical inference should be made robust to outliers. Without the relay identities, it will be difficult to detect and remove values that are incorrect, whether due to faulty measurement or malicious action by a relay.
I propose some simple cryptographic techniques to privately collect the above statistics while handling the above challenges. I assume that there exists a set of Statistics Authorities (StatAuths), of which at least one must be honest for the protocol to be secure, but all of which can be "curious" (that is, we want to maintain privacy from them as well and allow their action to be completely public). The Directory Authorities could serve as StatAuths. A single server could as well, if you trust it to be honest- but-curious. If you have a trusted non-curious server, things become much simpler, of course: just have relays report their values to the server and then have it publish a global aggregate only.
The AnonStats1 protocol to privately publish both statistics if we trust relays not to pollute the statistics (i.e. #2 is not a problem) is as follows: 1. Each StatAuth provides 2k partially-blind signatures on authentication tokens to each relay in a consensus during the measurement period. The blind part of a signed token is simply a random number chosen by the relay. The non-blind part of a token consists of the start time of the current measurement period. The 2k tokens will allow the relay to submit k values to the StatAuths. Note that we could avoid using partially-blind signatures by changing keys at the StatAuths every measurement period and then simply providing blind signatures on random numbers. 2. At the end of the measurement period, for each statistic, each relay uploads the following each on its own Tor circuit and accompanied by a unique token from each StatAuth: 1. The count 2. The ``statistical weight'' of the relay (1/(# HSDirs) for statistic #1 and the probability of selection as an IP for statistic #2) 3. The StatAuths verify that each uploaded value is accompanied by a unique token from each StatAuth that is valid for the current measurement period. To infer the global statistic from the anonymous per-relay statistic, the StatAuths add the counts, add the weights, and divide the former by the latter.
The AnonStats1 protocol is vulnerable to a relay that publishes manipulated statistics (i.e. challenge #2). If this is a concern, the AnonStats2 protocol mitigates it by using a median statistic intead of a sum: 1. Relays are divided into b bins by consensus weight. The bins have exponentially-increasing length, and the rate of increase c is set such that the ith bin by increasing weights has at least r relays each of weight at least some minimum min_weight (say, r=500, min_weight=100). The first bin has weights in [0, min_weight), and the ith bin has weights in [min_weight*c^(i-2), min_weight*c^(i-1)). 2. Each StatAuth provides k partially-blind signatures on authentication tokens to each relay in a consensus during the measurement period. The blind part of a signed token is simply a random number chosen by the relay. The non-blind part of a token consists of the start time of the current measurement period and the bin containing the relay's consensus weight. 3. At the end of the measurement period, for each statistic, each relay divides each statistic by the relay's ``statistical weight'' (1/(# HSDirs) for statistic #1 and the probability of selection as an IP for statistic #2). The result is the relay's estimate of the global value of the statistic. The relay then uploads this value via a new Tor circuit, accompanied by a unique token from each StatAuth. 4. The StatAuths verify that each uploaded value is accompanied by a unique token from each StatAuth that is valid for the current measurement period and that contains the same relay-weight bin. To infer a final global statistic from the anonymous per-relay estimates, the StatAuths use the weighted median of the received estimates, where the weight of an estimates is taken to be the smallest value of its accompanying bin (i.e. the bin's left edge).
On Tue, Jan 6, 2015 at 12:14 PM, A. Johnson aaron.m.johnson@nrl.navy.mil wrote:
Hello tor-dev,
While helping design ways to publish statistics about hidden services in a privacy-preserving manner, it has become clear to me that certain statistics cannot be safely reported using the current method of having each relay collect and report measurements. I am going to describe a couple of simple protocols to handle this problem that I think should be implementable without much effort. I'd be happy to get feedback in particular about the security or ease-of-implementation of these protocols.
Two HS statistics that we (i.e. people working on Sponsor R) are interested in collecting are:
- The number of descriptor fetches received by a hidden-service directory (HSDir)
- The number of client introduction requests at an introduction points (IPs)
The privacy issue with #1 is that the set of HSDirs is (likely) unique to an HS, and so the number of descriptor fetches at its HSDirs could reveal the number of clients it had during a measurement period. Similarly, the privacy issue with #2 is that the set of IPs are (likely) unique to an HS, and so the number of client introductions at its IPs could reveal the number of client connections it received.
A approach to solve this problem would be to anonymize the reported statistics. Doing so raises a couple of challenges, however:
- Anonymous statistics should be authenticated as coming from some relay. Otherwise, statistics
could be polluted by any malicious actor. 2. Statistical inference should be made robust to outliers. Without the relay identities, it will be difficult to detect and remove values that are incorrect, whether due to faulty measurement or malicious action by a relay.
You know, when I got to the above paragraph, I asked myself, "Well clearly *I'd* use blind signatures, but I wonder what Aaron is going to suggest?"
And then I saw:
I propose some simple cryptographic techniques to privately collect the above statistics while handling the above challenges.
:)
I think that there are some details to work out, but the general approach you describe sounds reasonable. IMO it doesn't need to be directory authorities who are StatsAuths, and we could use a "blinded token once per relay per period" scheme for other stuff too down the line.
Median-and-quartiles or median-and-deciles sounds smarter than just-average to me.
cheers,
I think that there are some details to work out, but the general approach you describe sounds reasonable. IMO it doesn't need to be directory authorities who are StatsAuths, and we could use a "blinded token once per relay per period" scheme for other stuff too down the line.
I wonder what the minimum requirement for StatAuths would be. Is one StatsAuth too few? With only one, the statistics could be arbitrarily altered by that one, but privacy is still not at risk. Would two be acceptable if one is not? It would be nice to have a fairly minimal infrastructure for this, and I agree that it might be better to avoid loading the DirAuths with more functions.
Also, I had stated that the trust assumption on StatAuths was that all could be curious but one should also be honest. Actually, that wasn’t correct because one malicious StatAuth could refuse to issue tokens to some relays, thereby preventing them from getting any stats accepted. Instead, the relays should require tokens (i.e. blind signatures) from a *majority* of StatAuths. Then an honest majority is required to prevent malicious manipulation of the statistics.
Cheers, Aaron
"A. Johnson" aaron.m.johnson@nrl.navy.mil writes:
Hello tor-dev,
Hello,
and thanks for posting this to the list.
While helping design ways to publish statistics about hidden services in a privacy-preserving manner, it has become clear to me that certain statistics cannot be safely reported using the current method of having each relay collect and report measurements. I am going to describe a couple of simple protocols to handle this problem that I think should be implementable without much effort. I'd be happy to get feedback in particular about the security or ease-of-implementation of these protocols.
Two HS statistics that we (i.e. people working on Sponsor R) are interested in collecting are:
- The number of descriptor fetches received by a hidden-service directory (HSDir)
- The number of client introduction requests at an introduction points (IPs)
The privacy issue with #1 is that the set of HSDirs is (likely) unique to an HS, and so the number of descriptor fetches at its HSDirs could reveal the number of clients it had during a measurement period. Similarly, the privacy issue with #2 is that the set of IPs are (likely) unique to an HS, and so the number of client introductions at its IPs could reveal the number of client connections it received.
I was wondering, why do we care so much about these two statistics?
From what I see in this post, you also just care about their total
numbers (without connecting them to specific HSDirs or IPs). Is it because you are curious about the total number of HS users?
If that's the case, why not focus on "Number of rendezvous requests" which is not tied to specific hidden services or clients. It seems to me like an easier target (that would still need to be *thoroughly analyzed* of course).
A approach to solve this problem would be to anonymize the reported statistics. Doing so raises a couple of challenges, however:
- Anonymous statistics should be authenticated as coming from some relay. Otherwise, statistics
could be polluted by any malicious actor. 2. Statistical inference should be made robust to outliers. Without the relay identities, it will be difficult to detect and remove values that are incorrect, whether due to faulty measurement or malicious action by a relay.
I propose some simple cryptographic techniques to privately collect the above statistics while handling the above challenges. I assume that there exists a set of Statistics Authorities (StatAuths), of which at least one must be honest for the protocol to be secure, but all of which can be "curious" (that is, we want to maintain privacy from them as well and allow their action to be completely public). The Directory Authorities could serve as StatAuths. A single server could as well, if you trust it to be honest- but-curious. If you have a trusted non-curious server, things become much simpler, of course: just have relays report their values to the server and then have it publish a global aggregate only.
The AnonStats1 protocol to privately publish both statistics if we trust relays not to pollute the statistics (i.e. #2 is not a problem) is as follows:
- Each StatAuth provides 2k partially-blind signatures on authentication tokens to each relay in
a consensus during the measurement period. The blind part of a signed token is simply a random number chosen by the relay. The non-blind part of a token consists of the start time of the current measurement period. The 2k tokens will allow the relay to submit k values to the StatAuths. Note that we could avoid using partially-blind signatures by changing keys at the StatAuths every measurement period and then simply providing blind signatures on random numbers. 2. At the end of the measurement period, for each statistic, each relay uploads the following each on its own Tor circuit and accompanied by a unique token from each StatAuth: 1. The count 2. The ``statistical weight'' of the relay (1/(# HSDirs) for statistic #1 and the probability of selection as an IP for statistic #2) 3. The StatAuths verify that each uploaded value is accompanied by a unique token from each StatAuth that is valid for the current measurement period. To infer the global statistic from the anonymous per-relay statistic, the StatAuths add the counts, add the weights, and divide the former by the latter.
I'm going to mainly focus on Anonstats2, since IIUC Anonstats1 leaks the probability of selection as an IP of that relay which leaks the identity of the relay:
The AnonStats1 protocol is vulnerable to a relay that publishes manipulated statistics (i.e. challenge #2). If this is a concern, the AnonStats2 protocol mitigates it by using a median statistic intead of a sum:
- Relays are divided into b bins by consensus weight. The bins have exponentially-increasing
length, and the rate of increase c is set such that the ith bin by increasing weights has at least r relays each of weight at least some minimum min_weight (say, r=500, min_weight=100). The first bin has weights in [0, min_weight), and the ith bin has weights in [min_weight*c^(i-2), min_weight*c^(i-1)).
I'm curious to see how this binning strategy would work with the current Tor network. The network and hence the anonymity set of its relays is not that big. Also, the *big* hidden services are not that secret lately.
Let's consider the descriptor fetches statistic (#1) and assume that you are a StatAuth. Also let's say that there are 5 hidden services that generate most of the descriptor fetches of the network (e.g. botnets). It should be easy for you to find the responsible HSDirs of those hidden services, and then it might be possible to match them up with the blinded relays that report the highest counts of descriptor fetches. Is this wanted?
- Each StatAuth provides k partially-blind signatures on authentication tokens to each relay in
a consensus during the measurement period. The blind part of a signed token is simply a random number chosen by the relay. The non-blind part of a token consists of the start time of the current measurement period and the bin containing the relay's consensus weight. 3. At the end of the measurement period, for each statistic, each relay divides each statistic by the relay's ``statistical weight'' (1/(# HSDirs) for statistic #1 and the probability of selection as an IP for statistic #2). The result is the relay's estimate of the global value of the statistic. The relay then uploads this value via a new Tor circuit, accompanied by a unique token from each StatAuth.
It's worth mentioning that different relays use different consensuses and hence have different views of the network. So, relays will calculate their <probability of selection as an IP> using different/older consensuses than others.
- The StatAuths verify that each uploaded value is accompanied by a unique token from each
StatAuth that is valid for the current measurement period and that contains the same relay-weight bin. To infer a final global statistic from the anonymous per-relay estimates, the StatAuths use the weighted median of the received estimates, where the weight of an estimates is taken to be the smallest value of its accompanying bin (i.e. the bin's left edge).
I'm now wondering how useful this final statistic will be for us.
How useful is it to learn the median value of "client introduction requests" of all the relays of the network? And in our case we get an extrapolated weighted version of the median. How can such a number be used by us?
statistically yours!
Hi George,
Thanks for the really thoughtful comments.
Two HS statistics that we (i.e. people working on Sponsor R) are interested in collecting are:
- The number of descriptor fetches received by a hidden-service directory (HSDir)
- The number of client introduction requests at an introduction points (IPs)
The privacy issue with #1 is that the set of HSDirs is (likely) unique to an HS, and so the number of descriptor fetches at its HSDirs could reveal the number of clients it had during a measurement period. Similarly, the privacy issue with #2 is that the set of IPs are (likely) unique to an HS, and so the number of client introductions at its IPs could reveal the number of client connections it received.
I was wondering, why do we care so much about these two statistics? From what I see in this post, you also just care about their total numbers (without connecting them to specific HSDirs or IPs). Is it because you are curious about the total number of HS users?
If that's the case, why not focus on "Number of rendezvous requests" which is not tied to specific hidden services or clients. It seems to me like an easier target (that would still need to be *thoroughly analyzed* of course).
Yes, there are other ways to estimate the number of clients and/or connections than descriptor fetches or INTRODUCE1 cells. However, I don’t want to give up on these statistics for a couple of reasons. First, none of these alternatives is exactly the same, and it might miss certain events that we want to know about. For example, I believe that there are a ton of “zombie” fetches from bonnet clients whose HS has died. We would never see this any other way. Or we might miss DoS attacks the work by flooding IPs with INTRODUCE1. Second, I see this as a first step to improving the privacy/accuracy tradeoff of statistics collection in general. For example, it would be great to be able to add noise just once, to a final network-wide output, rather than once for each relay.
I'm going to mainly focus on Anonstats2, since IIUC Anonstats1 leaks the probability of selection as an IP of that relay which leaks the identity of the relay:
AnonStats1 doesn’t leak the relay identity. The relay probability is sent over a separate circuit (at a random time). I intentionally did that just to avoid the problem you describe.
The AnonStats1 protocol is vulnerable to a relay that publishes manipulated statistics (i.e. challenge #2). If this is a concern, the AnonStats2 protocol mitigates it by using a median statistic intead of a sum:
- Relays are divided into b bins by consensus weight. The bins have exponentially-increasing
length, and the rate of increase c is set such that the ith bin by increasing weights has at least r relays each of weight at least some minimum min_weight (say, r=500, min_weight=100). The first bin has weights in [0, min_weight), and the ith bin has weights in [min_weight*c^(i-2), min_weight*c^(i-1)).
I'm curious to see how this binning strategy would work with the current Tor network. The network and hence the anonymity set of its relays is not that big. Also, the *big* hidden services are not that secret lately.
Here’s a rough estimate. There were 5829 fast relays in the 11/30/2014 00:00 consensus. That consensus has a max consensus weight of 190000. If we set the min_weight=500 and c=2, then we get 9 bins with the following number of relays in each bin: [2927, 545, 660, 416, 463, 439, 199, 142, 37]. There were 3290 fast HSDirs in that consensus, with a max weight also of 190000. With the same bin parameters, we get 9 bins with the following numbers of HSDirs in each bin: [1375, 323, 396, 231, 315, 342, 154, 120, 34].
Let's consider the descriptor fetches statistic (#1) and assume that you are a StatAuth. Also let's say that there are 5 hidden services that generate most of the descriptor fetches of the network (e.g. botnets). It should be easy for you to find the responsible HSDirs of those hidden services, and then it might be possible to match them up with the blinded relays that report the highest counts of descriptor fetches. Is this wanted?
You’re right about this issue. If you know in advance which HS is likely to have statistics in a certain position among all HSes (e.g. highest, lowest, middle, etc.), then you may be able to pick out those anonymized estimates that belong to that HS. Then to get the count you would have to guess the relay identity and divide by its statistical weight. A possible solution to this would be to add a bunch of dummy statistics that are distributed randomly but such that they don’t alter the median much. This would be done by the relays themselves.
- Each StatAuth provides k partially-blind signatures on authentication tokens to each relay in
a consensus during the measurement period. The blind part of a signed token is simply a random number chosen by the relay. The non-blind part of a token consists of the start time of the current measurement period and the bin containing the relay's consensus weight. 3. At the end of the measurement period, for each statistic, each relay divides each statistic by the relay's ``statistical weight'' (1/(# HSDirs) for statistic #1 and the probability of selection as an IP for statistic #2). The result is the relay's estimate of the global value of the statistic. The relay then uploads this value via a new Tor circuit, accompanied by a unique token from each StatAuth.
It's worth mentioning that different relays use different consensuses and hence have different views of the network. So, relays will calculate their <probability of selection as an IP> using different/older consensuses than others.
I think that’s fine. This purpose of using the weighted median is to give those relays that see more traffic more of a say in the outcome. It doesn’t need to be exact.
- The StatAuths verify that each uploaded value is accompanied by a unique token from each
StatAuth that is valid for the current measurement period and that contains the same relay-weight bin. To infer a final global statistic from the anonymous per-relay estimates, the StatAuths use the weighted median of the received estimates, where the weight of an estimates is taken to be the smallest value of its accompanying bin (i.e. the bin's left edge).
I'm now wondering how useful this final statistic will be for us.
How useful is it to learn the median value of "client introduction requests" of all the relays of the network? And in our case we get an extrapolated weighted version of the median. How can such a number be used by us?
This median should be a good estimate of the network-wide total. An example of it is that blue line in the “single” plot of Figure 10 in the stats report that Karsten sent (that blue line was technically a mean rather than a median, but same idea). Each individual relay is actually reporting their estimate of the network-wide total, given their local observations, and the median among these should be pretty accurate.
Best, Aaron
"A. Johnson" aaron.m.johnson@nrl.navy.mil writes:
Hi George,
Thanks for the really thoughtful comments.
Two HS statistics that we (i.e. people working on Sponsor R) are interested in collecting are:
- The number of descriptor fetches received by a hidden-service directory (HSDir)
- The number of client introduction requests at an introduction points (IPs)
The privacy issue with #1 is that the set of HSDirs is (likely) unique to an HS, and so the number of descriptor fetches at its HSDirs could reveal the number of clients it had during a measurement period. Similarly, the privacy issue with #2 is that the set of IPs are (likely) unique to an HS, and so the number of client introductions at its IPs could reveal the number of client connections it received.
I was wondering, why do we care so much about these two statistics? From what I see in this post, you also just care about their total numbers (without connecting them to specific HSDirs or IPs). Is it because you are curious about the total number of HS users?
If that's the case, why not focus on "Number of rendezvous requests" which is not tied to specific hidden services or clients. It seems to me like an easier target (that would still need to be *thoroughly analyzed* of course).
Yes, there are other ways to estimate the number of clients and/or connections than descriptor fetches or INTRODUCE1 cells. However, I don’t want to give up on these statistics for a couple of reasons. First, none of these alternatives is exactly the same, and it might miss certain events that we want to know about. For example, I believe that there are a ton of “zombie” fetches from bonnet clients whose HS has died. We would never see this any other way. Or we might miss DoS attacks the work by flooding IPs with INTRODUCE1. Second, I see this as a first step to improving the privacy/accuracy tradeoff of statistics collection in general. For example, it would be great to be able to add noise just once, to a final network-wide output, rather than once for each relay.
I'm going to mainly focus on Anonstats2, since IIUC Anonstats1 leaks the probability of selection as an IP of that relay which leaks the identity of the relay:
AnonStats1 doesn’t leak the relay identity. The relay probability is sent over a separate circuit (at a random time). I intentionally did that just to avoid the problem you describe.
Ah, I see, that makes sense.
Some more notes from reading AnonStats1 then:
a) How do relays get more tokens when they deplete the initial 2k tokens? Is it easy for the StatAuth to generate 2k such tokens, or can relays DoS them by asking for tokens repeatedly?
b) It seems a bit weird to assume that all relay operators are good citizens, but still not trust the rest of the Internet at all (that's why we are doing the blind signature scheme, right?).
If an outside attacker wanted to influence the results, he could still sign up 10 relays on the network, get the blind signature tokens, and have them publish anonymized bad statistics, right?
In the median-based approach of AnonStats2 blind signatures make more sense, since they ensure that the adversary won't insert 1000 fake statistics to the network so that they influence the median.
Let's consider the descriptor fetches statistic (#1) and assume that you are a StatAuth. Also let's say that there are 5 hidden services that generate most of the descriptor fetches of the network (e.g. botnets). It should be easy for you to find the responsible HSDirs of those hidden services, and then it might be possible to match them up with the blinded relays that report the highest counts of descriptor fetches. Is this wanted?
You’re right about this issue. If you know in advance which HS is likely to have statistics in a certain position among all HSes (e.g. highest, lowest, middle, etc.), then you may be able to pick out those anonymized estimates that belong to that HS. Then to get the count you would have to guess the relay identity and divide by its statistical weight. A possible solution to this would be to add a bunch of dummy statistics that are distributed randomly but such that they don’t alter the median much. This would be done by the relays themselves.
Also, since the 'number of descriptor fetches' and 'number of IP requests' are related to each other, it might be possible to correlate these two statistics over different reporting relays. So for example, if you know that the network has a *very* popular HS, you can match its HSDir measurements with its IP measurements.
That's because the highest counts of both statistics will likely correspond to the HSDirs and IPs of the most popular hidden service of the network, if the most popular HS has a large user count difference from the least popular ones.
I can't think of a concrete attack behind this behavior, but it's something that blind signatures can't really protect us against, and we should think more about it and it's consequences.
AnonStats1 doesn’t leak the relay identity. The relay probability is sent over a separate circuit (at a random time). I intentionally did that just to avoid the problem you describe.
Ah, I see, that makes sense.
Some more notes from reading AnonStats1 then:
a) How do relays get more tokens when they deplete the initial 2k tokens? Is it easy for the StatAuth to generate 2k such tokens, or can relays DoS them by asking for tokens repeatedly?
New tokens are issued for each measurement period (e.g. every 24 hours). The relay should be limited to asking for its allotment once per period.
b) It seems a bit weird to assume that all relay operators are good citizens, but still not trust the rest of the Internet at all (that's why we are doing the blind signature scheme, right?).
It doesn’t seem that weird to me. Running a relay requires some level of effort.
If an outside attacker wanted to influence the results, he could still sign up 10 relays on the network, get the blind signature tokens, and have them publish anonymized bad statistics, right?
Right.
That's because the highest counts of both statistics will likely correspond to the HSDirs and IPs of the most popular hidden service of the network, if the most popular HS has a large user count difference from the least popular ones.
I am beginning to think that AnonStats2 is not secure enough to use. The consensus-weight bins were supposed to hide which relays exactly were reporting the statistics, but because the bins of HSDirs and IPs change over time, the adversary could watch the the HSDir/IP bins of a target HS to see if the stats tend to be larger or smaller over their average. This remains the case even if there is just one bin of relays that is allowed to report stats if that bin does not include all HSDirs/IPs that HSes might use.
Also, in AnonStats1 we maybe should require that counts are reported in constant-size chunks over separate circuits. For example, we could have every 100 unique HS descriptors sent in a different upload. This way, for example, a particularly large statistic wouldn’t identify a particularly large HSDir/IP (then if this stat is larger than its normally-large value, that difference could reveal the popularity of a target popular HS).
Best, Aaron
I am beginning to think that AnonStats2 is not secure enough to use.
But I have come up with a possible replacement. Let’s call it AnonStats3. AnonStats3 works in conjunction with AnonStats1. It provides a rough estimate of the statistic that probably is most useful as a sanity check on AnonStats1. Thus when AnonStats1 is not being messed up by a faulty or malicious relay, we will get a very accurate value, and when AnonStats1 is being messed up, we will notice it if it is too far off.
AnonStats3 works as follows: 0. We start with a seed estimate e_0 for the statistic. This can either be a reasonable guess about the statistic’s likely value (perhaps based on trusted local measurements) or based on some initial results from AnonStats1. 1. The weight of each relay r is rounded down to the nearest power of two. Call this rounded value W_r. The StatAuths each provide to r a number of tokens T_r that is the largest integer smaller than W_r/d, T_r = floor(w_r/d), where d represents the granularity of the value of a token, say d=100. A token is produced by providing a partially-blind signature on the timestamp for the next measurement period (again, “pure” blind signatures can be used if we simply have the StatAuths change keys each measurement period). 2. After the end of the ith measurement period, each relay r uses its local measurement L_r of the statistic to infer a global measurement G_r by dividing the local measurement by its “statistical weight” S_r (e.g. consensus weight for IP statistics): G_r = L_r/S_r. Let the last estimate for the statistic be e_{i-1}. If S_r < e_{i-1}, then relay r will set its vote to V_r = 0. Otherwise, it will set V_r=1. The relay will anonymously submit T_r votes with value V_r to each StatAuth. Each vote is accompanied by a token from the StatAuth it is being sent to, the vote is sent on a new Tor circuit, and it is sent at a random time during the (i+1)st measurement period. 3. Each StatAuth independently verifies the signatures on the tokens it receives and count the votes. The broadcast the winning outcome to the other StatAuths: a 0 outcome indicates that the statistic is smaller than e_{i-1}, and a 1 outcome is that the statistic is at least e_{i-1}. 4. Each StatAuth takes the majority opinion for how to adjust the estimate. If 0 wins, then the estimate e_i is set to e_{i-1}*(1-\epsilon), where \epsilon is the fraction by which we adjust the estimate in each period, say, \epsilon = 0.1. If 1 wins, then the estimate e_i is set to e_{i-1}*(1+\epsilon).
AnonStats3 provides only a rough estimate of the statistic, but it is secured by the bandwidth-weighted vote. When the statistic is stable, then AnonStats3 will oscillate back and forth. When the statistic changes by a lot, AnonStats3 will lag behind. However, AnonStats3 provides a nice sanity check on AnonStats1. If AnonStats3 suddenly spikes, but AnonStats1 has many votes below its current estimate, then we can recognize that the AnonStats1 value is probably wrong (even if the AnonStats3 votes sometimes don’t reach one half the total!).
Best, Aaron
"A. Johnson" aaron.m.johnson@nrl.navy.mil writes:
Hello tor-dev,
<snip>
Two HS statistics that we (i.e. people working on Sponsor R) are interested in collecting are:
- The number of descriptor fetches received by a hidden-service directory (HSDir)
- The number of client introduction requests at an introduction points (IPs)
The privacy issue with #1 is that the set of HSDirs is (likely) unique to an HS, and so the number of descriptor fetches at its HSDirs could reveal the number of clients it had during a measurement period. Similarly, the privacy issue with #2 is that the set of IPs are (likely) unique to an HS, and so the number of client introductions at its IPs could reveal the number of client connections it received.
<snip>
The AnonStats1 protocol to privately publish both statistics if we trust relays not to pollute the statistics (i.e. #2 is not a problem) is as follows:
- Each StatAuth provides 2k partially-blind signatures on authentication tokens to each relay in
a consensus during the measurement period. The blind part of a signed token is simply a random number chosen by the relay. The non-blind part of a token consists of the start time of the current measurement period. The 2k tokens will allow the relay to submit k values to the StatAuths. Note that we could avoid using partially-blind signatures by changing keys at the StatAuths every measurement period and then simply providing blind signatures on random numbers. 2. At the end of the measurement period, for each statistic, each relay uploads the following each on its own Tor circuit and accompanied by a unique token from each StatAuth: 1. The count 2. The ``statistical weight'' of the relay (1/(# HSDirs) for statistic #1 and the probability of selection as an IP for statistic #2) 3. The StatAuths verify that each uploaded value is accompanied by a unique token from each StatAuth that is valid for the current measurement period. To infer the global statistic from the anonymous per-relay statistic, the StatAuths add the counts, add the weights, and divide the former by the latter.
Some more thoughts on AnonStats1:
- The two statistics proposed here are not independent. I suspect that the two numbers will actually be quite close to each other, since to do an intro request you need to first fetch a descriptor.
(In practice, the numbers *will be* different because a user might do multiple intro requests without fetching the descriptor multiple times. Or maybe a descriptor fetch failed so the client could not follow with an introduction request.)
My worry is that the numbers might be quite close most of the time. This means that about 9 relays (6 HSDirs + 3 IPs) will include that number -- the popularity of the HS -- in their result in the end. Of course, that number will get smudged along with all the other measurements that the reporting relay sees, but if the number is big enough then it will dominate the other measurements and the actual value might be visible in the results.
The above might sound stupid. Here are some brief calculations:
There are 30000 hidden services and 3000 HSDirs. The recent tech report shows that each HSDir is responsible for about 150 hidden services. This means that there are about 150 numbers that get smudged together every time. If *most* of those 30k hidden services are tiny non-popular ones, there is a non-negligible chance that most of those 150 numbers are also going to be tiny, which means that any moderately big number will stand out. And for every measurement period, there are 9 relays that have a chance of making this number stand out.
Another issue here is that if you assume that the popularity of hidden services doesn't change drastically overnight, and you believe in the above paragraph, it's even possible to track the popularity of hidden services even if you don't know their actual popularity value. To do that, everyday you check the reported measurements to check if there are any numbers close to yesterday's numbers. If this consistently happens over a few days, you can be pretty confident that you have found the popularity of a hidden service.
To take this stupidity one step further, you can model this whole thing as a system of 3000 equations with 150 unknown variables each. Each day you get a new system of equations. It wouldn't surprise me if the value of most variables is negligible (tiny hidden services) which can be ignored. Everytime you find the popularity of a hidden service, you learn the value of another variable. If you assume that only 300 hidden services generate a substantial amount of HSDir requests, how many days do you need to find the value of those 300 variables?
Unfortunately, this is one of the worries that is hard to address without first making the whole thing and seeing how the actual numbers look like...
- And even if all the above is garbage, I'm still a bit concerned about the fact that the popularity of the *most popular* hidden service will be trackable using the above scheme. That's because the most popular hidden service will almost always dominate the other measurements.
- Also, the measurement period will have to change. Currently, each relay sends its extrainfo descriptor every 24 hours. For the AnonStats1 scheme to work, the measurement period needs to be non-deterministic, otherwise the StatsAuth can link relay measurements over different days based on when they reported stats.
Well, friends, here’s an attack on the whole anonymization approach to reporting HSDir (or other) statistics: 1. The adversary creates a large number of onion services that almost always cover the entire set of HSDirs. 2. The adversary performs actions with his onion services that add to the statistics (e.g. by doing directory fetches for his onion service “covering” a certain HSDir) such that each HSDir has its statistics value “set” to roughly a certain magnitude. 3. The adversary looks at the anonymized statistics and deanonymizes the HSDirs by locating the statistic with the magnitude he set for that HSDir.
Does this just kill this whole idea? Maybe true aggregation is the only thing that can work.
Sorry! Aaron
On Feb 18, 2015, at 10:42 AM, George Kadianakis desnacked@riseup.net wrote:
"A. Johnson" aaron.m.johnson@nrl.navy.mil writes:
Hello tor-dev,
<snip>
Two HS statistics that we (i.e. people working on Sponsor R) are interested in collecting are:
- The number of descriptor fetches received by a hidden-service directory (HSDir)
- The number of client introduction requests at an introduction points (IPs)
The privacy issue with #1 is that the set of HSDirs is (likely) unique to an HS, and so the number of descriptor fetches at its HSDirs could reveal the number of clients it had during a measurement period. Similarly, the privacy issue with #2 is that the set of IPs are (likely) unique to an HS, and so the number of client introductions at its IPs could reveal the number of client connections it received.
<snip>
The AnonStats1 protocol to privately publish both statistics if we trust relays not to pollute the statistics (i.e. #2 is not a problem) is as follows:
- Each StatAuth provides 2k partially-blind signatures on authentication tokens to each relay in
a consensus during the measurement period. The blind part of a signed token is simply a random number chosen by the relay. The non-blind part of a token consists of the start time of the current measurement period. The 2k tokens will allow the relay to submit k values to the StatAuths. Note that we could avoid using partially-blind signatures by changing keys at the StatAuths every measurement period and then simply providing blind signatures on random numbers. 2. At the end of the measurement period, for each statistic, each relay uploads the following each on its own Tor circuit and accompanied by a unique token from each StatAuth:
- The count
- The ``statistical weight'' of the relay (1/(# HSDirs) for statistic #1 and the probability of
selection as an IP for statistic #2) 3. The StatAuths verify that each uploaded value is accompanied by a unique token from each StatAuth that is valid for the current measurement period. To infer the global statistic from the anonymous per-relay statistic, the StatAuths add the counts, add the weights, and divide the former by the latter.
Some more thoughts on AnonStats1:
- The two statistics proposed here are not independent. I suspect that
the two numbers will actually be quite close to each other, since to do an intro request you need to first fetch a descriptor.
(In practice, the numbers *will be* different because a user might do multiple intro requests without fetching the descriptor multiple times. Or maybe a descriptor fetch failed so the client could not follow with an introduction request.)
My worry is that the numbers might be quite close most of the time. This means that about 9 relays (6 HSDirs + 3 IPs) will include that number -- the popularity of the HS -- in their result in the end. Of course, that number will get smudged along with all the other measurements that the reporting relay sees, but if the number is big enough then it will dominate the other measurements and the actual value might be visible in the results.
The above might sound stupid. Here are some brief calculations:
There are 30000 hidden services and 3000 HSDirs. The recent tech report shows that each HSDir is responsible for about 150 hidden services. This means that there are about 150 numbers that get smudged together every time. If *most* of those 30k hidden services are tiny non-popular ones, there is a non-negligible chance that most of those 150 numbers are also going to be tiny, which means that any moderately big number will stand out. And for every measurement period, there are 9 relays that have a chance of making this number stand out.
Another issue here is that if you assume that the popularity of hidden services doesn't change drastically overnight, and you believe in the above paragraph, it's even possible to track the popularity of hidden services even if you don't know their actual popularity value. To do that, everyday you check the reported measurements to check if there are any numbers close to yesterday's numbers. If this consistently happens over a few days, you can be pretty confident that you have found the popularity of a hidden service.
To take this stupidity one step further, you can model this whole thing as a system of 3000 equations with 150 unknown variables each. Each day you get a new system of equations. It wouldn't surprise me if the value of most variables is negligible (tiny hidden services) which can be ignored. Everytime you find the popularity of a hidden service, you learn the value of another variable. If you assume that only 300 hidden services generate a substantial amount of HSDir requests, how many days do you need to find the value of those 300 variables?
Unfortunately, this is one of the worries that is hard to address without first making the whole thing and seeing how the actual numbers look like...
- And even if all the above is garbage, I'm still a bit concerned
about the fact that the popularity of the *most popular* hidden service will be trackable using the above scheme. That's because the most popular hidden service will almost always dominate the other measurements.
- Also, the measurement period will have to change. Currently, each
relay sends its extrainfo descriptor every 24 hours. For the AnonStats1 scheme to work, the measurement period needs to be non-deterministic, otherwise the StatsAuth can link relay measurements over different days based on when they reported stats.