tl;dr: We propose collecting data from exit nodes to improve the Tor
network, using differential privacy and secure multiparty computation to
do it in a privacy-sensitive manner.
Hi tor-dev,
In the ongoing effort to make Tor faster, secure and more resilient, network data plays an important role. If we know how the network is being used, what its clients' needs are and the threats that it faces we can deal with these in an intelligent manner. While the Tor Project does collect some statistics from guards, it does not currently collect and share potentially sensitive exit statistics. This data includes destination statistics and client timing behaviour, among many other potentially interesting, but privacy sensitive, data points.
This reticence to collect data is due to the (well-founded) risk to clients and OR operators that this data could pose, such as correlation and coercion attacks. This is unfortunate since, as we observe above, in order to make improvements to the Tor network and its feature set, it would be beneficial to know what is going on inside it and with its users.
To that end, it would be great if we were able to learn about network and client trend data. Some concrete examples include circuit-level data volumes, guard traffic usage, lengths of internal buffers, and latencies at relays. Indeed, if it can be counted then we should be able to collect and report it in a privacy-preserving manner.
Which brings me to the reason for this email; I have had the good fortune to work with George Danezis at UCL and my supervisor Ian Goldberg at the University of Waterloo on coming up with a solution to this private data collection problem. We have created a system, PrivEx, that uses modern privacy-preserving techniques such as differential privacy and secure multiparty computation to address this thorny set of challenges; we have written up the details in a tech report that can be found here:http://cacr.uwaterloo.ca/techreports/2014/cacr2014-08.pdf .
We have also created implementations of the two variants of PrivEx as described in the tech report. We are currently putting in the finishing touches and will be releasing them soon as open source in a git repo.
We would like to start by rolling out our own PrivEx-enabled exits in the Tor network and begin collecting destination visit statistics. We expect that PrivEx will be generally useful to all exit operators and the Tor network in general but there is no requirement to deploy it everywhere. We hope to deploy PrivEx on a handful of exits during the June-August timeframe.
What we would really like in order of importance is 1) a design review of our proposal, 2) an implementation review would be nice (once we release it). We hope that these reviews will address the main concerns of the community at large as well as give it, and us, a measure of confidence that collecting data with PrivEx is inherently good and is being done in a responsible and intelligent manner. We anticipate that this would make PrivEx an attractive addition for the Tor Project and their data collection needs.
Please don't hesitate to give us your feedback, either to the list or to me via email.
Cheers,
Tariq