"A. Johnson" aaron.m.johnson@nrl.navy.mil writes:
Hi George,
I posted an initial draft of the proposal here: https://lists.torproject.org/pipermail/tor-dev/2014-November/007863.html Any feedback would be awesome.
OK, I’ll have a chance to look at this in the next few days.
Specifically, I would be interested in undertanding the concept of additive noise a bit better. As you can see the proposal draft is still using multiplicative noise, and if you think that additive is better we should change it. Unfortunately, I couldn't find any good resources on the Internet explaining the difference between additive and multiplicative noise. Could you expand a bit on what you said above? Or link to a paper that explains more? Or link to some other system that is doing additive noise (or even better its implementation)?
The technical argument for differential privacy is explained in http://research.microsoft.com/en-us/projects/databaseprivacy/dwork.pdf. The definition appears in Def. 2, the Laplace mechanism is given in Eq. 3 of Sec. 5, and Thm. 4 shows why that mechanism achieves differential privacy.
But that stuff is pretty dry. The basic idea is that you’re trying to the contribution of any one sensitive input (e.g. a single user’s data or a single component of a single user’s data). The noise that you need to cover that doesn’t scale with the number of other users, and so you use additive noise.
Thanks for the resources!
I think I now get the general idea. I don't really understand why it works or why Laplace is the best distribution for this job, but maybe it doesn't matter too much for now.
The next problem is how to find the proper parameters for the Laplace distribution. I guess the mean μ needs to be 0, but the hard part is 'b'. In a few papers I read, they set 'b' to (Δf/ε).
In the above, Δf is the "largest change a single participant could have on the output" of the query. Trying to fit this database paradigm to our use case, the largest change a single HS could cause to the HSDir HS counting stats is change the result by 1. So Δf is 1, and I think that ε is some kind of security (sensitivity) parameter, let's set that to 0.3 or something.
So this gives us approx Laplace(0, 4) which can be seen with blue color here: https://upload.wikimedia.org/wikipedia/commons/0/0a/Laplace_pdf_mod.svg In the end of this post, I put a few samples from this distribution [0]. The generated noise seems reasonable for this job.
Now, I'm wondering how to do the same thing for the RP cell statistics. In this case, Δf would have to be the largest amount of cells we hope to obfuscate in an RP circuit. This is a chicken-and-egg situation, since we don't really know how many cells we usually get without doing these stats first.
Maybe we can use the preliminary stats from #13192, which contain both RP and IP cells (but IP cells will probably be a minority). Or maybe we can fit the distribution dynamically based on the amount of cells we receive every day (does this even make sense)? Or what?
BTW, I plan to start implementantion of this early next week, so that it's ready by mid-December. I hope we have a good solution to this by then, otherwise I will have to do something else (round up the stats to the nearest multiple or something) :/
Thanks!
[0]:for i in xrange(100): print numpy.random.laplace(0,4) ....: 3.75587440621 -4.28136229035 4.76311443928 4.05142557505 1.70198910055 -3.37374208295 1.12837234927 -0.905282823974 7.66083097188 0.246385660561 -3.52939581339 -1.3368353768 -1.7482807282 2.98489896819 2.87155179984 -2.72961210143 -3.04409210121 -1.1975804202 -0.34861261134 0.953918739146 -14.3586324803 0.272984575989 3.41377347603 6.48752681038 -4.74036696099 0.668294672995 3.15847434594 -1.58855932489 1.65921624515 -0.529373859224 1.1739048689 -2.2201602699 -0.510111160097 2.58474424973 -7.4773321899 -13.4406958005 -1.34083931335 0.34051030906 -1.09939649788 0.647560027442 2.05240761873 -0.275439053432 10.1238334205 9.0960448449 3.20236196087 -2.27093832694 -19.187310803 0.894898545361 3.62459774003 2.10979313978 -0.633823085078 3.32591049399 3.11206489604 6.52626921692 2.68590966921 1.64033470377 0.997911309606 2.39357922671 0.308907976786 -3.02768280735 3.07096999256 -0.907608650976 1.72587291595 -0.838153001361 -1.23764100768 -9.56662634071 7.89275256421 10.4346665539 -0.522605672578 1.88585708734 1.77708545023 -0.301420228241 8.69964251692 -3.35490635732 -3.14148766097 1.73070057195 0.0426008469217 -1.74373108092 4.18116416817 0.139266645962 -1.32024236062 -2.40639481448 -0.364266143555 -3.6882489347 5.79025078063 0.386467380832 2.5388775702 1.60630747885 3.53930934459 -3.2270856708 4.15611732496 6.53669582267 3.83838409062 -2.62835636891 -1.36484455975 5.02827935505 0.693370215176 1.91312352565 1.93007931702 3.24710666718