Hi! Tim helped me write this this draft last week, and I shared it with the PrivCount authors. I've already gotten some good comments from Aaron, which I'll repost in a followup message, with his permission.
================================
Filename: 280-privcount-in-tor.txt Title: Privacy-Preseving Statistics with Privcount in Tor Author: Nick Mathewson, Tim Wilson-Brown Created: 02-Aug-2017 Status: Draft
0. Acknowledgments
Tariq Elahi, George Danezis, and Ian Goldberg designed and implemented the PrivEx blinding scheme. Rob Jansen and Aaron Johnson extended PrivEx's differential privacy guarantees to multiple counters in PrivCount:
https://github.com/privcount/privcount/blob/master/README.markdown#research-...
Rob Jansen and Tim Wilson-Brown wrote the majority of the experimental PrivCount code, based on the PrivEx secret-sharing variant. This implementation includes contributions from the PrivEx authors, and others:
https://github.com/privcount/privcount/blob/master/CONTRIBUTORS.markdown
1. Introduction and scope
PrivCount is a privacy-preserving way to collect aggregate statistics about the Tor network without exposing the statistics from any single Tor relay.
This document describes the behavior of the in-Tor portion of the PrivCount system. It DOES NOT describe the counter configurations, or any other parts of the system. (These will be covered in separate proposals.)
2. PrivCount overview
Here follows an oversimplified summary of PrivCount, with enough information to explain the Tor side of things. The actual operation of the non-Tor components is trickier than described below.
All values in the scheme below are 64-bit unsigned integers; addition and subtraction are modulo 2^64.
In PrivCount, a Data Collector (in this case a Tor relay) shares numeric data with N different Tally Reporters. (A Tally Reporter performs the summing and unblinding roles of the Tally Server and Share Keeper from experimental PrivCount.)
All N Tally Reporters together can reconstruct the original data, but no (N-1)-sized subset of the Tally Reporters can learn anything about the data.
(In reality, the Tally Reporters don't reconstruct the original data at all! Instead, they will reconstruct a _sum_ of the original data across all participating relays.)
To share data, for each value X to be shared, the relay generates random values B_1 though B_n, and shares each B_i secretly with a single Tally Reporter. The relay then publishes Y = X + SUM(B_i) + Z, where Z is a noise value taken at random from a gaussian distribution. The Tally Reporters can reconstruct X+Z by securely computing SUM(B_i) across all contributing Data Collectors. (Tally Reporters MUST NOT share individual B_i values: that would expose the underlying relay totals.)
In order to prevent bogus data from corrupting the tally, the Tor relays and the Tally Reporters perform multiple "instances" of this algorithm, randomly sampling in each relays. The relay sends multiple Y values for each measurement, built with different sets of B_i. These "instances" are numbered in order from 1 to R.
So that the system will still produce results in the event of a single Tally Reporter failure, these instances are distributed across multiple subsets of Tally Reporters.
Below we describe a data format for this.
3. The document format
This document format builds on the line-based directory format used for other tor documents, described in Tor's dir-spec.txt.
Using this format, we describe two kinds of documents here: a "counters" document that publishes all the Y values, and a "blinding" document that describes the B_i values. But see "An optimized alternative" below.
The "counters" document has these elements:
"privctr-dump-format" SP VERSION SP SigningKey
[At start, exactly once]
Describes the version of the dump format, and provides an ed25519 signing key to identify the relay. The signing key is encoded in base64 with padding stripped. VERSION is "alpha" now, but should be "1" once this document is finalized.
[[[TODO: Do we need a counter version as well?
Noise is distributed across a particular set of counters, to provide differential privacy guarantees for those counters. Reducing noise requires a break in the collection. Adding counters is ok if the noise on each counter monotonically increases. (Removing counters always reduces noise.)
We also need to work out how to handle instances with mixed Tor versions, where some Data Collectors report different counters to other Data Collectors. (The blinding works if we substitute zeroes for missing counters on Tally Reporters. But we also need to add noise in this case.)
-teor ]]]
"starting-at" SP IsoTime
[Exactly once]
The start of the time period when the statistics here were collected.
"ending-at" SP IsoTime
[Exactly once]
The end of the time period when the statistics here were collected.
"num-instances" SP Number
[Exactly once]
The number of "instances" that the relay used (see above.)
"tally-reporter" SP Identifier SP Key SP InstanceNumbers
[At least twice]
The curve25519 public key of each Tally Reporter that the relay believes in. (If the list does not match the list of participating tally reporters, they won't be able to find the relay's values correctly.) The identifiers are non-space, non-nul character sequences. The Key values are encoded in base64 with padding stripped; they must be unique within each counters document. The InstanceNumbers are comma-separated lists of decimal integers from 0 to (num-instances - 1), in ascending order.
Keyword ":" SP Int SP Int SP Int ...
[Any number of times]
The Y values for a single measurement. There are num-instances such Y values for each measurement. They are 64-bit unsigned integers, expressed in decimal.
The "Keyword" denotes which measurement is being shared. Keyword MAY be any sequence of characters other than colon, nul, space, and newline, though implementators SHOULD avoid getting too creative here. Keywords MUST be unique within a single document. Tally Reporters MUST handle unrecognized keywords. Keywords MAY appear in any order.
It is safe to send the blinded totals for each instance to every Tally Reporter. To unblind the totals, a Tally Reporter needs: * a blinding document from each relay in the instance, and * the per-counter blinding sums from the other Tally Reporters in their instance.
[[[TODO: But is it safer to create a per-instance counters document? -- teor]]]
The semantics of individual measurements are not specified here.
"signature" SP Signature
[At end, exactly once]
The Ed25519 signature of all the fields in the document, from the first byte, up to but not including the "signature" keyword here. The signature is encoded in base64 with padding stripped.
The "blinding" document has these elements:
"privctr-secret-offsets" SP VERSION SP SigningKey
[At start, exactly once.]
The VERSION and SigningKey parameters are the same as for "privctr-dump-format".
"instances" SP Numbers
[Exactly once]
The instances that this Tally Reporter handles. They are given as comma-separated decimal integers, as in the "tally-reporter" entry in the counters document. They MUST match the instances listed in the counters document.
[[[TODO: this is redundant. Specify the constraint instead? --teor]]]
"num-counters" SP Number
[Exactly once]
The number of counters that the relay used in its counters document. This MUST be equal to the number of keywords in the counters document.
[[[TODO: this is redundant. Specify the constraint instead? --teor]]]
"tally-reporter-pubkey" SP Key
[Exactly once]
The curve25519 public key of the tally reporter who is intended to receive an decrypt this document. The key is base64-encoded with padding stripped.
"count-document-digest" SP "sha3" Digest NL "-----BEGIN ENCRYPTED DATA-----" NL Data "-----END ENCRYPTED DATA-----" NL
[Exactly once]
The SHA3-256 digest of the count document corresponding to this blinding document. The digest is base64-encoded with padding stripped. The data encodes the blinding values (See "The Blinding Values") below, and is encrypted to the tally reporter's public key using the hybrid encryption algorithm described below.
"signature" SP Signature
[At end, exactly once]
The Ed25519 signature of all the fields in the document, from the first byte, up to but not including the "signature" keyword here. The signature is encoded in base64 with padding stripped.
4. The Blinding Values
The "Data" field of the blinding documents above, when decrypted, yields a sequence of 64-bit binary values, encoded in network (big-endian) order. There are C * R such values, where C is the number of keywords in the count document, and R is the number of instances that the Tally Reporter participates in. The client generates all of these values uniformly at random.
For each keyword in the count document, in the order specified by the count document, the decrypted data holds R*8 bytes for the specified instance of that keyword's blinded counter.
For example: if the count document lists the keywords "b", "x", "g", and "a" (in that order), and lists instances "0", and "2", then the decrypted data will hold the blinding values in this order: b, instance 0 b, instance 2 x, instance 0 x, instance 2 g, instance 0 g, instance 2 a, instance 0 a, instance 2
4. Implementation Notes
A relay should, when starting a new round, generate all the blinding values and noise values in advance. The relay should then use these values to compute Y_0 = SUM(B_i) + Z for each instance of each counter. Having done this, the relay MUST encrypt the blinding values to the public key of each tally reporter, and wipe them from memory.
5. The hybrid encryption algorithm
We use a hybrid encryption scheme above, where items can be encrypted to a public key. We instantiate it as follows, using curve25519 public keys.
To encrypt a plaintext M to a public key PK1 1. the sender generates a new ephemeral keypair sk2, PK2. 2. The sender computes the shared diffie hellman secret SEED = (sk2 * PK1).
3. The sender derives 64 bytes of key material as SHAKE256(TEXT | SEED)[...64] where "TEXT" is "Expand curve25519 for privcount encryption".
The first 32 bytes of this is an aes key K1; the second 32 bytes are a mac key K2.
4. The sender computes a ciphertext C as AES256_CTR(K1, M)
5. The sender computes a MAC as SHA3_256([00 00 00 00 00 00 00 20] | K2 | C)
6. The hybrid-encrypted text is PK2 | MAC | C.
6. An optimized alternative
As an alternative, the sequences of blinding values is NOT transmitted to the tally reporters. Instead the client generates a single ephemeral keypair sk_c, PK_c, and places the public key in its counts document. It does this each time a new round begins.
For each tally reporter with public key PK_i, the client then does the handshake sk_c * PK_i to compute SEED_i.
The client then generates the blinding values for that tally reporter as SHAKE256(SEED_i)[...R*C*8].
After initializing the counters to Y_0, the client can discard the blinding values and sk_c.
Later, the tally reporters can reconstruct the blinding values as SHAKE256(sk_i * PK_c)[...]
This alternative allows the client to transmit only a single public key, when previously it would need to transmit a complete set of blinding factors for each tally reporter. Further, the alternative does away with the need for blinding documents altogether. It is, however, more sensitive to any defects in SHAKE256 than the design above. Like the rest of this design, it would need rethinking if we want to expand this scheme to work with anonymous data collectors, such as Tor clients.
[reposting this message with permission. It is a reply that I sent to Aaron, where I quoted an email from him about this proposal. Tim and Aaron had additional responses, which I'll let them quote here or not as they think best.]
On Sat, Aug 5, 2017 at 1:38 PM, Aaron Johnson aaron.m.johnson@nrl.navy.mil wrote: [...]
- There are a couple of documents in PrivCount that are missing: the deployment document and the configuration document. These set up things like the identities/public keys of the parties, the planned time of the measurements, the statistics to be computed, the noise levels to use. They were required to be agreed on by all parties. These values must be agreed upon by all parties (in some cases, such as disagreement about noise, the security/privacy guarantees could otherwise fail). How do you plan to replace these?
So, I hadn't planned to remove these documents, so much as to leave them out of scope for this proposal. Right now, in the code, there's no actual way to configure any of these things.
Thinking aloud:
I think we should engineer that piece by piece. We already have the consensus directory system as a way to communicate information that needs to be securely updated, and where everybody needs to update at once, so I'd like to reuse that to the extent that it's appropriate.
For some parts of it, I think we can use versions and named sets. For other parts, we want to be flexible, so that we can rotate keys frequently, react to tally reporters going offline, and so on. There may need to be more than one distribution mechanism for this metainfo.
These decisions will also be application-dependent: I've been thinking mainly of "always-on" applications, like network metrics, performance measurement, anomaly-detection [*], and so on. But I am probably under-engineering for "time-limited" applications like short-term research experiments.
- I believe that instead of dealing with Tally Reporter (TR) failures using multiple subsets, you could instead simply use (t,n) secret sharing, which would survive any t-1 failures (but also allow any subset of size t to determine the individual DC counts). The DC would create one blinding value B and then use Shamir secret sharing to send a share of B to each TR. To aggregate, each TR would first add together its shares, which would yield a share of the sum of the blinding values from all DCs. Then the TRs could simply reconstruct that sum publicly, which, when subtracted from the public, blinded, noisy, counts would reveal the final noisy sum. This would be more efficient than having each TR publish multiple potential inputs to different subsets of TRs.
So, I might have misunderstood the purpose here : I thought that the instances were to handle misbehaving DCs as well as malfunctioning TRs.
- Storing at the DC the blinded values encrypted to the TRs seems to violate forward privacy in that if during the measurement the adversary compromises a DC and then later (even after the final release) compromises the key of a TR, the adversary could determine the state of the DC’s counter at the time of compromise. The also applies to the optimization in Sec. 6 where the blinding values where a shared secret is hashed to produce the blinding values.
Well, the adversary would need to compromise the key of _every_ TR in at least one instance, or they couldn't recover the actual counters.
I guess we could, as in the original design (IIUC), send the encrypted blinding values (or public DH key in sec 6) immediately from the DC when it generates them, and then throw them away client-side. Now the adversary would need to break into all the TRs while they were holding these encrypted blinding values.
Or, almost equivalently, I think we could make the TR public encryption keys only get used for one round. That's good practice in general, and it's a direction I generally like.
And of course, DCs should use a forward-secure TLS for talking to the TRs, so that an eavesdropper doesn't learn anything.
[*] One anomaly detection mechanism I've been thinking of is to look at different "protocol-warn" log messages. These log messages indicate that some third party is not complying with the protocol. They're usually logged at info, since there's nothing an operator can do about them, but it would be good for us to get notification if some of them spike all of a sudden.
On 8 Aug 2017, at 03:50, Nick Mathewson nickm@torproject.org wrote:
[reposting this message with permission. It is a reply that I sent to Aaron, where I quoted an email from him about this proposal. Tim and Aaron had additional responses, which I'll let them quote here or not as they think best.]
[Re-posting this edited thread with permission. It's a conversation that continues on from the last re-post.]
Aaron:
Tim:
Aaron:
Nick:
Aaron:
...
- I believe that instead of dealing with Tally Reporter (TR) failures using multiple subsets, you could instead simply use (t,n) secret sharing, which would survive any t-1 failures (but also allow any subset of size t to determine the individual DC counts). The DC would create one blinding value B and then use Shamir secret sharing to send a share of B to each TR. To aggregate, each TR would first add together its shares, which would yield a share of the sum of the blinding values from all DCs. Then the TRs could simply reconstruct that sum publicly, which, when subtracted from the public, blinded, noisy, counts would reveal the final noisy sum. This would be more efficient than having each TR publish multiple potential inputs to different subsets of TRs.
So, I might have misunderstood the purpose here : I thought that the instances were to handle misbehaving DCs as well as malfunctioning TRs.
The mechanism you described (having each DC report different encrypted counters for different subsets of TRs) doesn’t handle failed (i.e. crashed) DCs. To handle failed DCs in the scheme you describe (with the blinding values started encrypted in a document), you can just have the TRs agree on which DCs succeeded at the end of the measurement and only use blinding values from those DCs. So you don’t need multiple TR subsets to handle failed DCs.
Each *subset* of DCs reports to a subset of the TRs. This deals with malicious and outlying DC values, as well as failed DCs. And it deals with failed TRs as well.
This seems unnecessary and inefficient. DC failures can be handled by the TRs at the end. TR failures can be handled using Shamir secret sharing.
Also, I should mention the reasons that Rob and I didn’t mention fault tolerance in the PrivCount design:
- The TRs (aka the SKs) only need to be online long enough to receive their blinding values, add them, and send the sum out. Therefore a measurement can recover from a failed TR if its blinding values are persistently stored somewhere and if *at any point* the TR can be restarted.
In the event of key compromise, or operator trust failure, or operator opt-out, the TR can never be restarted (securely).
If these are real concerns, then you should use Shamir secret sharing across the TRs. Honestly, they seem unlikely to me, and the cost of missing one round of statistics seems low. However, the cost of dealing with them is also low, and so you might as well do it!
- Handling DC failures is trivial. As mentioned above, the TRs simply wait until the end to determine which DCs succeeded and should have their blinding values included in the sum.
How would you do this securely? Any scheme I think of allows a malicious TR to eliminate particular relays.
A malicious TR can in any case eliminate a particular relay by destroying the outputs of any subsets containing that relay. Destroying an output is done by using a random value as the blinding value, making the output random (and likely obviously so). The privacy comes from the differentially private noise, and because TRs won’t agree on subsets that would reduce the added noise below the desired amount, the adversary couldn’t break privacy by eliminate particular relays. Moreover, if you wanted, you could use a secure broadcast (e.g. the Dolev-Strong protocol) to enable the TRs to agree on the union of DCs that any one of the TRs received the counters documents from. Such a secure broadcast in used in PrivCount to get consensus on the the deployment and configuration documents.
Also, one thing I forgot to mention in my last email is that you have removed the Tally Server, which is an untrusted entity that essentially acts as a public bulletin board. Without such a collection point, who obtains the outputs of the TRs and computes the final result?
We'll work with Tor metrics to decide on a mechanism for taking the counts from each TR subset, and turning them into a final count.
This would probably be some kind of median, possibly discarding nonsensical values first.
If you plan to release multiple values from different DC subsets to handle nonsensical values, then you will have to increase the noise to handle the additional statistics. This can be done just as with handling DC failures: TRs agree on several DC subsets from among the DCs that didn’t fail and then release å blinding value sum for each subset. Note that DCs actually only need to send one set of blinding values and one set of counters to the TRs.
- Storing at the DC the blinded values encrypted to the TRs seems to violate forward privacy in that if during the measurement the adversary compromises a DC and then later (even after the final release) compromises the key of a TR, the adversary could determine the state of the DC’s counter at the time of compromise. The also applies to the optimization in Sec. 6 where the blinding values where a shared secret is hashed to produce the blinding values.
Well, the adversary would need to compromise the key of _every_ TR in at least one instance, or they couldn't recover the actual counters.
That’s true.
I guess we could, as in the original design (IIUC), send the encrypted blinding values (or public DH key in sec 6) immediately from the DC when it generates them, and then throw them away client-side. Now the adversary would need to break into all the TRs while they were holding these encrypted blinding values.
Right, that is the original design and would provide a bit more forward security than in the current spec.
Or, almost equivalently, I think we could make the TR public encryption keys only get used for one round. That's good practice in general, and it's a direction I generally like.
That would work, too.
[*] One anomaly detection mechanism I've been thinking of is to look at different "protocol-warn" log messages. These log messages indicate that some third party is not complying with the protocol. They're usually logged at info, since there's nothing an operator can do about them, but it would be good for us to get notification if some of them spike all of a sudden.
Really interesting idea! Rob and I are interested in looking for attacks on the Tor network using metrics as well. This kind of anomaly reminds of the RELAY_EARLY attack that you wrote a detector for.
T -- Tim Wilson-Brown (teor)
teor2345 at gmail dot com PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B ricochet:ekmygaiu4rzgsk6n ------------------------------------------------------------------------
Hi Tim,
unfortunately, nobody from the metrics team can attend today's proposal 280 discussion in a few hours.
That's why we decided to provide some written feedback here.
We didn't find anything problematic in the proposal from the view of Tor metrics.
This is due to the narrow scope covering only the communication protocol between tally servers and relays, as we understand it.
All topics related to deriving counts, calculating final results, and anything else that could affect currently running metrics code are explicitly excluded or not mentioned.
If we misunderstood the scope and there is actually a part that covers current or future metrics code, please let us know, and we'll check that again.
Thanks for working on privacy-preserving statistics in Tor!
All the best, Karsten
On 2017-09-12 01:44, teor wrote:
On 8 Aug 2017, at 03:50, Nick Mathewson nickm@torproject.org wrote:
[reposting this message with permission. It is a reply that I sent to Aaron, where I quoted an email from him about this proposal. Tim and Aaron had additional responses, which I'll let them quote here or not as they think best.]
[Re-posting this edited thread with permission. It's a conversation that continues on from the last re-post.]
Aaron:
Tim:
Aaron:
Nick:
Aaron:
...
- I believe that instead of dealing with Tally Reporter (TR) failures using multiple subsets, you could instead simply use (t,n) secret sharing, which would survive any t-1 failures (but also allow any subset of size t to determine the individual DC counts). The DC would create one blinding value B and then use Shamir secret sharing to send a share of B to each TR. To aggregate, each TR would first add together its shares, which would yield a share of the sum of the blinding values from all DCs. Then the TRs could simply reconstruct that sum publicly, which, when subtracted from the public, blinded, noisy, counts would reveal the final noisy sum. This would be more efficient than having each TR publish multiple potential inputs to different subsets of TRs.
So, I might have misunderstood the purpose here : I thought that the instances were to handle misbehaving DCs as well as malfunctioning TRs.
The mechanism you described (having each DC report different encrypted counters for different subsets of TRs) doesn’t handle failed (i.e. crashed) DCs. To handle failed DCs in the scheme you describe (with the blinding values started encrypted in a document), you can just have the TRs agree on which DCs succeeded at the end of the measurement and only use blinding values from those DCs. So you don’t need multiple TR subsets to handle failed DCs.
Each *subset* of DCs reports to a subset of the TRs. This deals with malicious and outlying DC values, as well as failed DCs. And it deals with failed TRs as well.
This seems unnecessary and inefficient. DC failures can be handled by the TRs at the end. TR failures can be handled using Shamir secret sharing.
Also, I should mention the reasons that Rob and I didn’t mention fault tolerance in the PrivCount design:
- The TRs (aka the SKs) only need to be online long enough to receive their blinding values, add them, and send the sum out. Therefore a measurement can recover from a failed TR if its blinding values are persistently stored somewhere and if *at any point* the TR can be restarted.
In the event of key compromise, or operator trust failure, or operator opt-out, the TR can never be restarted (securely).
If these are real concerns, then you should use Shamir secret sharing across the TRs. Honestly, they seem unlikely to me, and the cost of missing one round of statistics seems low. However, the cost of dealing with them is also low, and so you might as well do it!
- Handling DC failures is trivial. As mentioned above, the TRs simply wait until the end to determine which DCs succeeded and should have their blinding values included in the sum.
How would you do this securely? Any scheme I think of allows a malicious TR to eliminate particular relays.
A malicious TR can in any case eliminate a particular relay by destroying the outputs of any subsets containing that relay. Destroying an output is done by using a random value as the blinding value, making the output random (and likely obviously so). The privacy comes from the differentially private noise, and because TRs won’t agree on subsets that would reduce the added noise below the desired amount, the adversary couldn’t break privacy by eliminate particular relays. Moreover, if you wanted, you could use a secure broadcast (e.g. the Dolev-Strong protocol) to enable the TRs to agree on the union of DCs that any one of the TRs received the counters documents from. Such a secure broadcast in used in PrivCount to get consensus on the the deployment and configuration documents.
Also, one thing I forgot to mention in my last email is that you have removed the Tally Server, which is an untrusted entity that essentially acts as a public bulletin board. Without such a collection point, who obtains the outputs of the TRs and computes the final result?
We'll work with Tor metrics to decide on a mechanism for taking the counts from each TR subset, and turning them into a final count.
This would probably be some kind of median, possibly discarding nonsensical values first.
If you plan to release multiple values from different DC subsets to handle nonsensical values, then you will have to increase the noise to handle the additional statistics. This can be done just as with handling DC failures: TRs agree on several DC subsets from among the DCs that didn’t fail and then release å blinding value sum for each subset. Note that DCs actually only need to send one set of blinding values and one set of counters to the TRs.
- Storing at the DC the blinded values encrypted to the TRs seems to violate forward privacy in that if during the measurement the adversary compromises a DC and then later (even after the final release) compromises the key of a TR, the adversary could determine the state of the DC’s counter at the time of compromise. The also applies to the optimization in Sec. 6 where the blinding values where a shared secret is hashed to produce the blinding values.
Well, the adversary would need to compromise the key of _every_ TR in at least one instance, or they couldn't recover the actual counters.
That’s true.
I guess we could, as in the original design (IIUC), send the encrypted blinding values (or public DH key in sec 6) immediately from the DC when it generates them, and then throw them away client-side. Now the adversary would need to break into all the TRs while they were holding these encrypted blinding values.
Right, that is the original design and would provide a bit more forward security than in the current spec.
Or, almost equivalently, I think we could make the TR public encryption keys only get used for one round. That's good practice in general, and it's a direction I generally like.
That would work, too.
[*] One anomaly detection mechanism I've been thinking of is to look at different "protocol-warn" log messages. These log messages indicate that some third party is not complying with the protocol. They're usually logged at info, since there's nothing an operator can do about them, but it would be good for us to get notification if some of them spike all of a sudden.
Really interesting idea! Rob and I are interested in looking for attacks on the Tor network using metrics as well. This kind of anomaly reminds of the RELAY_EARLY attack that you wrote a detector for.
T
Tim Wilson-Brown (teor)
teor2345 at gmail dot com PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B ricochet:ekmygaiu4rzgsk6n
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Hi Karsten and metrics,
On 13 Sep 2017, at 04:27, Karsten Loesing karsten@torproject.org wrote:
Hi Tim,
unfortunately, nobody from the metrics team can attend today's proposal 280 discussion in a few hours.
We turned on meetbot!
The meeting action items are:
• write a k-of-n secret sharing spec • revise prop280 to use k-of-n secret sharing • update the proposal to deal with post-submission shared-random-based relay subset selection • increase the noise added in the spec for each subset of relays that produces a result • specify how to estimate sensitivity and expected values for each counter, and how to turn that into a set of sigmas • specify how to safely change the set of counters that is collected (or the noise on those counters) as new tor versions that support new counters are added to the network (and old versions leave) • specify the privacy budget parameter that we need to turn into consensus parameters • specify how to maintain privacy guarantees when the set of statistics changes, probably by reducing accuracy
Here is a log of the meeting: http://meetbot.debian.net/tor-dev/2017/tor-dev.2017-09-13-00.16.html
That's why we decided to provide some written feedback here.
We didn't find anything problematic in the proposal from the view of Tor metrics.
This is due to the narrow scope covering only the communication protocol between tally servers and relays, as we understand it.
All topics related to deriving counts, calculating final results, and anything else that could affect currently running metrics code are explicitly excluded or not mentioned.
We mentioned a few of these topics in the meeting.
In particular, we talked about splitting relays into multiple subsets for fault-tolerance. This would give us one result per counter per subset.
We'd appreciate your feedback on these parts of the meeting.
If we misunderstood the scope and there is actually a part that covers current or future metrics code, please let us know, and we'll check that again.
We plan to write these specs separately. We will also make updates to the current prop280 spec.
Thanks for working on privacy-preserving statistics in Tor!
Looking forward to working with you on this.
T -- Tim Wilson-Brown (teor)
teor2345 at gmail dot com PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B ricochet:ekmygaiu4rzgsk6n ------------------------------------------------------------------------