On Thu, Aug 20, 2015 at 09:09:23AM -0400, l.m wrote:
Hi,
As some of you may be aware, the mailing list for censorship events was recently put on hold indefinitely. This appears to be due to the detector providing too much false positive in it's current implementation. It also raises the question of the purpose for such a mailing list. Who are the stakeholders? What do they gain from an improvement?
I've read some of the documentations about this. As far as I can tell at a minimum an `improvement` in the event detector would be to:
- reduce false positives
- distinguish between tor network reachability, and tor network
interference
- enable/promote client participation through the submission of
results from an ephemeral test (itself having property provably correct and valid)
In order to be of use to the researchers it needs greater analysis capability. Is it enough to say censorship is detected? By this point the analysis is less interesting--because the discourse which itself lead to the tor use is probably evident (or it becomes harder to find). On the other hand, if a researcher is aware of some emerging trend they may predict the censorship event by predicting the use of tor. This may also be of use in analysis of other events.
- should detect more than just censorship
- accept input from researchers
From the tech reports it looks like Philipp has a plan for an implementation of the tests noted above. It's only the format of the results submission which is unknown.
- provide client test results to tor project developers
- make decision related data available
Regards --leeroy
Hi,
These are well identified issues. We've been working here on a way to improve the current filtering detection approach, and several of the points above are things that we're actively hoping to work into our approach. Differentiating 'filtering' from 'other events that affect Tor usage' is tricky, and will most likely have to rely on other measurements from outside Tor. We're currently looking at ways to construct models of 'normal' behaviour in a way that incorporates multiple sources of data.
We have a paper up on arXiv that might be of interest. I'd be interested to be in touch with anyone who's actively working on this. (We have code, and would be very happy to work on getting it into production.) I've shared the paper with a few people directly, but not here on the list.
arXiv link: http://arxiv.org/abs/1507.05819
We were looking at any anomalies, not only pure Tor-based filtering events. For the broader analysis, significant shifts in Tor usage are very interesting. It's therefore useful to detect a range of unusual behaviours occurring around Tor, and have a set of criteria within that to allow differentiating 'hard' filtering events from softer anomalies occurring due to other factors.
Joss