On Jan 19, 2016, at 3:45 AM, Karsten Loesing karsten@torproject.org wrote:
Signed PGP part On 15/01/16 23:00, Rob Jansen wrote:
Hello,
Hi Rob,
I'm moving this discussion from metrics-team@ to tor-dev@, because I think it's relevant for little-t-tor devs who are not subscribed to metrics-team@. Hope you don't mind.
No problem.
Should Tor still be collecting these things? Should Tor disable the collection of these statistics until we have a more privacy-preserving way to collect and aggregate them?
The good news is that privacy-preserving techniques exist that can reduce information leakage. I'm developing a tool based on the secret-sharing variant of PrivEx [3] to collect some of these types of statistics while providing privacy guarantees. We are currently using it to collect only those stats that are useful for producing Tor traffic models. A great advantage of this tool is that the various counters that we store during the collection phase get noise added and are randomized during initialization; only the aggregates are ever known and revealed by the aggregation server, limiting the information that is lost if a relay is compromised. This is a large improvement over the current collection method, which only adds noise before publication and reveals statistics on a per-relay basis.
Suggestion: How about we evaluate these statistics published by relays in the past years to see if there are other benefits or risks we didn't think of, and then we decide whether to leave them in, modify them, or take them out?
Sounds great, though I'm not sure how this evaluation will happen.
The reason is that I'd want to avoid removing this code only to realize shortly after that we overlooked a good reason for keeping it.
The problem is that it is unlikely that anyone will speak up until *after* we remove them, so it may be difficult to realize all use cases until they have already been removed. At least for me, it's not just a matter of thinking hard enough about it.
That said, I think that for some of these stats, the risk is such that it is hard to imagine collecting it the way Tor does currently.
These statistics are being collected for years now, and it might take another year or so for relays to upgrade to stop collecting them. So what's another month.
Agreed.
To be clear, I am not suggesting that we simply remove everything and never look back. I'm actually suggesting using secure aggregation to *replace* the current method for counting and aggregating. Maybe the secure counting/aggregation happens occasionally, or maybe continuously. The details there still need to be worked out (working on it).
I would suggest that we wait until those details are in fact worked out and we discuss a transition plan before removing the old collection methods, but I think that some stats have enough risk that it may not be worth waiting. Maybe we can remove the riskiest stats (IP addresses, exit ports, exit bytes) and wait to remove the others until I have more details about a replacement.
Thanks for (re-)starting this discussion!
Cheers, Rob