"A. Johnson" aaron.m.johnson@nrl.navy.mil writes:
Hello all,
<snip>
We put in some simple obfuscations in order to not reveal too sensitive data: we multiplied actual values with a random number in [0.9, 1.1] before including those obfuscated values in extra-info descriptors. Maybe there's something smarter we could do? Or is this okay?
I actually think that additive rather than multiplicative noise (i.e. randomness) makes sense here. Let’s suppose that you would like to obscure any individual connection that contains C cells or fewer (obscuring extremely and unusually large connections seems hopeless but unnecessary). That is, you don’t want the (distribution of) the RP cellcount from any relay to change by much whether or not C cells are removed The standard differential privacy approach would be to *add* noise from the Laplace distribution Lab(\epsilon/C), where \epsilon controls how much the statistics *distribution* can multiplicatively differ. I’m not saying that we need to add noise exactly from that distribution (maybe we weaken the guarantee slightly to get better accuracy), but the same idea applies. This would apply the same to both large and small relays. You *want* to learn roughly how much RP traffic each relay has - you just want to obscure the exact number within some tolerance.
Hello Aaron,
I posted an initial draft of the proposal here: https://lists.torproject.org/pipermail/tor-dev/2014-November/007863.html Any feedback would be awesome.
Specifically, I would be interested in undertanding the concept of additive noise a bit better. As you can see the proposal draft is still using multiplicative noise, and if you think that additive is better we should change it. Unfortunately, I couldn't find any good resources on the Internet explaining the difference between additive and multiplicative noise. Could you expand a bit on what you said above? Or link to a paper that explains more? Or link to some other system that is doing additive noise (or even better its implementation)?
Thanks!