Hi everyone,
we'd like to improve directory-request statistics by obfuscating values on relays before they are reported to the directory authorities.
A possible obfuscation method is to add Laplace noise to request counts for all ~250 countries, so that it's unclear whether a request was actually made by a user or is just noise. In fact, we did a similar thing to onion service statistics two years ago.
But before we do this we need to find out whether obfuscated values would still be useful enough to estimate user numbers in the Tor network.
We ran a simulation using archived descriptors and put our results including a graph, CSV files, and simulation code on this wiki page:
https://trac.torproject.org/projects/tor/wiki/org/teams/MetricsTeam/Obfuscat...
The result is that we can't just go ahead and put in this noise, because we'd have to improve our user number estimation algorithms first. Otherwise we might risk losing one of our most important statistics: the number of daily Tor users by country.
If you have any thoughts on these results or want to help make the simulation more accurate, please let us know!
All the best, Karsten (for the metrics team)