On 5/7/13 12:44 AM, Sam Burnett wrote:
Hi,
Hi Sam,
I'd like to help improve the Tor Censorship Detector. I've read some background material and think I understand the basics of George Danezis' detection algorithm [1, 2].
Great! Trivial nitpick: here's a better URL for George's tech report:
https://research.torproject.org/techreports/detector-2011-09-09.pdf
Is anyone still working on this? Two tickets from a year ago talk about experimenting with various detection algorithms and turning one of them into a standalone utility [3, 4]. Has anything happened since then?
I don't think that anyone made progress on the detection algorithm or tool. We're still running this code:
https://gitweb.torproject.org/metrics-tasks.git/tree/HEAD:/task-2718
What did change, however, is that we'll soon have better input data for a new detection algorithm available:
https://metrics.torproject.org/csv/userstats.csv
This file contains by-country statistics for directly connecting users and for bridge users. Here are the first five lines, to give you an idea:
date,node,country,transport,version,frac,users 2013-05-06,relay,,,,22,798301 2013-05-06,relay,??,,,22,10045 2013-05-06,relay,a1,,,22,692 2013-05-06,relay,a2,,,22,204 2013-05-06,relay,ad,,,22,162
- The date column is, well, the ISO 8601 date.
- The node column contains either 'relay' or 'bridge'.
- The country column contains either the empty string for all countries or the ISO 3166 two-letter lower-case country code plus some MaxMind-specific codes plus '??' for unknown.
- You can safely ignore the transport and version columns for the moment. These are for pluggable transport users and for users by IP version. In the future it may be interesting to see sudden changes in usage by transport, but so far these values are not stable enough.
- You can also ignore the frac line. It says what fraction of relays or bridges we're basing our estimate on, from 0 to 100. A value of 10 should be sufficient for the censorship detector, because we want it to warn as early as possible.
- The users column is the estimated number of users.
If you want to learn more about how we compute these estimates, here are the code and the tech report that the code is based on:
https://trac.torproject.org/projects/tor/ticket/8462
https://research.torproject.org/techreports/counting-daily-bridge-users-2012...
Just keep in mind that this is still work in progress.
My background: I'm a graduate student at Georgia Tech studying network censorship circumvention and measurement. Although I've met Tor developers on various occasions, I haven't directly contributed to the project; I'd like to change that.
Cool! Let me know if I can give you any more details or provide any assistance. Thanks for working on the censorship detector!
Best, Karsten
Thanks!
Sam
[1] https://lists.torproject.org/pipermail/tor-dev/2011-September/002923.html [2] https://metrics.torproject.org/papers/detector-2011-08-11.pdf [3] https://trac.torproject.org/projects/tor/ticket/3718 [4] https://trac.torproject.org/projects/tor/ticket/4180 _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev