Re: [tor-dev] Feedback on obfuscating hidden-service statistics

26 Nov 2014


      "A. Johnson" aaron.m.johnson@nrl.navy.mil writes:
...
Hi George,
...
I posted an initial draft of the proposal here:
https://lists.torproject.org/pipermail/tor-dev/2014-November/007863.html
Any feedback would be awesome.
OK, I’ll have a chance to look at this in the next few days.
...
Specifically, I would be interested in undertanding the concept of
additive noise a bit better. As you can see the proposal draft is
still using multiplicative noise, and if you think that additive is
better we should change it. Unfortunately, I couldn't find any good
resources on the Internet explaining the difference between additive
and multiplicative noise. Could you expand a bit on what you said
above? Or link to a paper that explains more? Or link to some other
system that is doing additive noise (or even better its implementation)?
The technical argument for differential privacy is explained in
http://research.microsoft.com/en-us/projects/databaseprivacy/dwork.pdf.
The definition appears in Def. 2, the Laplace mechanism is given in
Eq. 3 of Sec. 5, and Thm. 4 shows why that mechanism achieves
differential privacy.
But that stuff is pretty dry. The basic idea is that you’re trying to
the contribution of any one sensitive input (e.g. a single user’s data
or a single component of a single user’s data). The noise that you
need to cover that doesn’t scale with the number of other users, and
so you use additive noise.
Thanks for the resources!
I think I now get the general idea. I don't really understand why it
works or why Laplace is the best distribution for this job, but maybe
it doesn't matter too much for now.
The next problem is how to find the proper parameters for the Laplace
distribution. I guess the mean μ needs to be 0, but the hard part is
'b'. In a few papers I read, they set 'b' to (Δf/ε).
In the above, Δf is the "largest change a single participant could
have on the output" of the query. Trying to fit this database paradigm
to our use case, the largest change a single HS could cause to the
HSDir HS counting stats is change the result by 1. So Δf is 1, and I
think that ε is some kind of security (sensitivity) parameter, let's
set that to 0.3 or something.
So this gives us approx Laplace(0, 4) which can be seen with blue color here:
https://upload.wikimedia.org/wikipedia/commons/0/0a/Laplace_pdf_mod.svg
In the end of this post, I put a few samples from this distribution [0].
The generated noise seems reasonable for this job.
Now, I'm wondering how to do the same thing for the RP cell
statistics.  In this case, Δf would have to be the largest amount of
cells we hope to obfuscate in an RP circuit. This is a chicken-and-egg
situation, since we don't really know how many cells we usually get
without doing these stats first.
Maybe we can use the preliminary stats from #13192, which contain both
RP and IP cells (but IP cells will probably be a minority). Or maybe
we can fit the distribution dynamically based on the amount of cells
we receive every day (does this even make sense)? Or what?
BTW, I plan to start implementantion of this early next week, so that
it's ready by mid-December. I hope we have a good solution to this by
then, otherwise I will have to do something else (round up the stats
to the nearest multiple or something) :/
Thanks!
[0]:for i in xrange(100):
       print numpy.random.laplace(0,4)
   ....:     
3.75587440621
-4.28136229035
4.76311443928
4.05142557505
1.70198910055
-3.37374208295
1.12837234927
-0.905282823974
7.66083097188
0.246385660561
-3.52939581339
-1.3368353768
-1.7482807282
2.98489896819
2.87155179984
-2.72961210143
-3.04409210121
-1.1975804202
-0.34861261134
0.953918739146
-14.3586324803
0.272984575989
3.41377347603
6.48752681038
-4.74036696099
0.668294672995
3.15847434594
-1.58855932489
1.65921624515
-0.529373859224
1.1739048689
-2.2201602699
-0.510111160097
2.58474424973
-7.4773321899
-13.4406958005
-1.34083931335
0.34051030906
-1.09939649788
0.647560027442
2.05240761873
-0.275439053432
10.1238334205
9.0960448449
3.20236196087
-2.27093832694
-19.187310803
0.894898545361
3.62459774003
2.10979313978
-0.633823085078
3.32591049399
3.11206489604
6.52626921692
2.68590966921
1.64033470377
0.997911309606
2.39357922671
0.308907976786
-3.02768280735
3.07096999256
-0.907608650976
1.72587291595
-0.838153001361
-1.23764100768
-9.56662634071
7.89275256421
10.4346665539
-0.522605672578
1.88585708734
1.77708545023
-0.301420228241
8.69964251692
-3.35490635732
-3.14148766097
1.73070057195
0.0426008469217
-1.74373108092
4.18116416817
0.139266645962
-1.32024236062
-2.40639481448
-0.364266143555
-3.6882489347
5.79025078063
0.386467380832
2.5388775702
1.60630747885
3.53930934459
-3.2270856708
4.15611732496
6.53669582267
3.83838409062
-2.62835636891
-1.36484455975
5.02827935505
0.693370215176
1.91312352565
1.93007931702
3.24710666718

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [tor-dev] Feedback on obfuscating hidden-service statistics