Re: [tor-dev] Feedback on obfuscating hidden-service statistics

2 Dec 2014


      Comments on proposal 238:
1. I’m not convinced that the proposed amount of obfuscation is sufficient for the HS descriptor count. Adding noise to cover the contribution in a single period of any single HS doesn’t cover its vector of contributions. Thus, if over time the number of HSes stays the same (or has some other pattern that can be guessed by the adversary), then the randomness of the noise in the descriptor counts can effectively be removed by taking, say, taking the average. The best solution to this that I can think of is to bin every k consecutive integers and report the bin of the count after noise has been added. Then over time an adversary can at worst determine that the number of HSes lies within a range k. This applies to the cell counts also.
2. In 2.3, what exactly are “unique hidden-service identities”? .onion addresses?
3. It would hugely improve statistics accuracy to aggregate the statistics and only add noise once. However, this would require that the relays participate in a distributed protocol (e.g. [0]) rather than stick numbers in their extra-info docs.
4. Some possible privacy issues with revealing descriptor publication counts:
  - You wish to use hidden services in a way that involves a lot of .onion addresses for your service. This will blow past our noise, which I am assuming is calibrated to hide any single publication (or a small constant number of them). Then the total count could reveal when this new service appeared and is active (assuming the number of other descriptor publications is stable or otherwise predictable, say because they correspond to public HSes whose status can determined via a connection attempt).
  - You can factor out the noise over time if the total count is stable or otherwise predictable. This is the same issue as #1 above and using bins could work here as well.
[0] Our Data, Ourselves: Privacy via Distributed Noise Generation
  by Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor
  EUROCRYPT 2006
  http://research.microsoft.com/pubs/65086/odo.pdf
On Nov 25, 2014, at 5:14 PM, A. Johnson aaron.m.johnson@nrl.navy.mil wrote:
...
Hi George,
...
I posted an initial draft of the proposal here:
https://lists.torproject.org/pipermail/tor-dev/2014-November/007863.html
Any feedback would be awesome.
OK, I’ll have a chance to look at this in the next few days.
...
Specifically, I would be interested in undertanding the concept of
additive noise a bit better. As you can see the proposal draft is
still using multiplicative noise, and if you think that additive is
better we should change it. Unfortunately, I couldn't find any good
resources on the Internet explaining the difference between additive
and multiplicative noise. Could you expand a bit on what you said
above? Or link to a paper that explains more? Or link to some other
system that is doing additive noise (or even better its implementation)?
The technical argument for differential privacy is explained in http://research.microsoft.com/en-us/projects/databaseprivacy/dwork.pdf.  The definition appears in Def. 2, the Laplace mechanism is given in Eq. 3 of Sec. 5, and Thm. 4 shows why that mechanism achieves differential privacy.
But that stuff is pretty dry. The basic idea is that you’re trying to the contribution of any one sensitive input (e.g. a single user’s data or a single component of a single user’s data). The noise that you need to cover that doesn’t scale with the number of other users, and so you use additive noise.
Hope that helps,
Aaron
_______________________________________________
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [tor-dev] Feedback on obfuscating hidden-service statistics