Thank you all for the feedback and very useful comments!
I will quote replies from another thread and comment on some of the topics there.
At the end based on the feedback received I will make a list of what are the next main areas of focus for the next 6 months, some of the relevant tickets and what new tickets should be created.
If you have ideas on some specific topic/issue/feature that you believe should be tackled, please do append to this list.
On 2/17/15 5:28 PM, Nick Feamster wrote:
There are many interesting ideas in here. Thanks!
Off the top of my head, one could get a better global picture of censorship using data from the “global censorship measurement” tools (Sam’s Encore tool, Roya and Ben’s “Censored Planet” project) to trigger measurements from OONI, and vice versa.
Tools such as Encore and CP can get a signal of filtering from a larger and more diverse set of clients than an OONI deployment could (and, at lower risk); the drawback is a lack of detailed information (typically the information is a binary “yes/no” about whether filtering is taking place in TCP/IP, DNS, or HTTP, but not much else).
I could imagine an OONI deployment using information about observed filtering from Encore or CP to trigger a more extensive and detailed set of tests from the OONI nodes. Likewise, blocking observed at one OONI node at one layer could be fed back into Encore or CP to see whether any observable filtering behavior is observed across a broader range of sites.
I view this as a specific “research use case” for the integration, orchestration, and data analysis points that Arturo lists below.
I think Roya and Ben are probably not quite ready for this with CP (I’d want to ask them), but Encore is very stable and we could look into good ways of passing information back and forth, either to OONI or to a common visualization engine (or both).
Some of the other efforts look interesting, but the research payoff isn’t as readily apparent to me. (I’m suspicious of monthly reports without broader, more global baseline coverage like Encore or CP could provide, but we are also quite interested in visualization tools and reports for that data, so integrating all of that into a single “dashboard” would be something that could be very useful, based on what political scientists and others seem to be asking for.)
This is indeed a very interesting concept. With respect to using ooniprobe data to trigger other tests, this will be easier once we finish implementing the API to the pipeline. We now have all the data collected with ooniprobe inside of a database and will be working (mainly to provide analytics and visualization tools) on exposing access and query functionality.
Work on this has only recently started, but you can find code on it here: https://github.com/hellais/ooni-app.
With resepect to triggering ooniprobe measurements based on data collected by encore, censored planet, etc. I think we would need to support: https://trac.torproject.org/projects/tor/ticket/12551 unless we consider it acceptable to have to wait some time before all deployed probes run the measurement.
On 2/17/15 5:36 PM, Phillipa Gill wrote:
I would say the daily measurements from stable nodes is less interesting for us since ICLab is already 99% of the way there on those (ie., running baseline tests from VPN + deployed endpoints).
I think the idea of detecting different censorship circumvention tools could be interesting and if the tests are well specified this could be something that is ported/run in ICLab as well.
Yes we plan on openly specifying all the tests that we will be deploying in a study in a specific country.
I am not sure how many details I can disclose at the moment on this, but will be sure to update the list when more is known.
On 2/17/15 8:53 PM, Meredith Whittaker wrote:
The biggest issues that I see for OONI, and all efforts in this nascent space are around:
- *Getting consistent measurements at scale, over time, across broad
geographies.* - Consistent = from a stable set of tests that, *if they do change, are clearly documented as changed*, and this versioning is reflected in collected data and elsewhere. The less change, the better (which isn't to say that new tests are bad, at all, but that there needs to be set "cannon" of core tests that work to set a baseline. Without this, much of the research etc. is much, much less valuable.) - at scale = a lot -- a significant number of representative links, mapping diurnal patterns, etc. (omg obvs, but still) - over time = for as long as possible -- letting us compare the results of Stable Text X for Place Y in 2015 vs 2017, etc.. - Across broad geographies = how *are* UK censorship techniques different from those used in Liberia? Etc.
- *Dealing with issues around user consent and risk. *
measurement points can be decoupled from individuals (i.e. can
- This is huge. We all know that. For now, the more OONI probes and
be deployed without requiring a user to download, install, flash, access), the better. Impersonal Pis, or similar, seem like the best option for this currently. But, even these pose risk if they can be linked to an individual (scapegoat) in a hostile country. The bigger OONI's impact, the bigger this problem.
I would advocate that any work undertaken focus on these two goals -- which I believe to be fundamental to many of the great other goals that have been put forward here.
With that declaration, my specific comments on the individuals ideas in Arturo's thoughtful email:
- *Getting daily OONI measurements from 50 countries*
This is clearly a laudable goal. I am concerned with the means suggested for achieving this. How can it be done without placing users in danger? How can an answer to that question be obtained for 50 countries (and orders of magnitude more regions, factions, climates within each country)? This is also ambitious practically/logistically -- this is an increase in vantage points that will likely require localization, maintenance, support, troubleshooting, and consistent and well-documented updates, such that apples can be compared to apples and data can stand as "proof." A roadmap that narrows the focus, from 50 to "a pilot of 5," and that addresses the above issues, would be welcome.
- *Tests of circumvention tools*
This seems cool, but could as easily be titled The How Well is My Censorship Working Index" Answering the question "whom does this serve, and how?" would seem to be the next step to assessing the value of this proposition.
- *Orchestrating OONI probes*
This seems HUGELY problematic WRT privacy, user consent, and security. I am also concerned with how this butts up agains the need for consistent, and durational testing (which, IMO, is more important as a research tool and a means to understand censorship than novel tests run a couple times during a given month).
- *Data analytics and visualizations*
Is there enough consistent available data (and a roadmap that would guarantee a consistent pipeline of consistent data over time) to make this useful and worthwhile currently? If/when yes, I would suggest bringing in people who have professional experience with data visualizations and analysis. With M-Lab this goal has been a continual challenge -- there aren't accepted statistical ways of working with network data (more on that if you want in another thread), and visualizations need to be gorgeous, need to be written in a browser-friendly language, need to be maintained and updated, need to not have [too many] gaps, etc.. You are, in this effort, producing a user-facing product. All of the forever-work that attends a product attends this effort.
- *Pub system*
This seems potentially useful (I don't know enough to be concrete here), but again, my question is, Whose needs does this serve, specifically, and how does serving those needs further a longer-term OONI strategy? More generally, shared storage and transport mechanisms for measured data are something this space could use, for sure. Would this potentially help build those systems?
- *Production-quality OONI Pis. *
I like this, and I like the idea of partnering with CI lab (hey!) or others. Deployment and maintenance is expensive. The more work these Pis can do once they're deployed, before they die, the better. I would be more enthusiastic about this if it involved a deployment partner, as I don't see a huge value in spending dev time to ensure that the handful of people who'd flash a Pi can.
- *OONI on mobile*
I vote to have a more stable OONI before launching a mobile test. We at M-Lab have explored mobile at length, and it's tricky (I'm not sure tests written for a non-mobile environment would be as relevant to a mobile environment), deployment is hard (marketing is key! and, who wants to use their data cap on something that isn't whatsapp? (etc.).)
- *Research based on OONI*
If there's enough consistent data, it could be interesting. But, not sure that there is (?)
- *Monthly reports*
As above, I think this is premature. Getting good data, and getting enough of it, should come first.
- *Adopt an OONI probe*
I worry about consent and permissions here. Once those are figured out, I would suggest getting a big donor to adopt (deploy) a bunch of OONIs, instead of a smaller campaign.
- *Integration with other censorship measurement projects*
Really like the sound of this! The more resources can be shared, the better. I'll let y'all discuss...
- *Reaching out to communities inside censored regions (like the UK?)*
I'm all for this, but I think this should be led by groups like Citizen Lab, maybe Amnesty, and others that have experience in qualitative user studies and access to networks on the ground.
- *Redesign the OONI website*
Definitely necessary. Not clear on its priority. I would suggest that any redesign minimize the focus on "censorship" (and the use of the term). For all the reasons we've discussed. And that a technical writer and someone with some communications training be employed in drafting and tinkering.
- *Internet censorship conference*
What would the goals be? How would things be better/different when these goals were achieved? Without a concrete motive, I worry that this could be another "fly the same people somewhere new and pretend we're innovating" model.
- *Implement a GUI for OONI*
I would prioritize this after the backend is stable, and the consent and permissions issues have been worked through. (I also think this is something that should engage designers and UX experts outside of the OONI core team, because all the reasons
I agree fully with all that you say. In particular I think, although I had not put it amongst the options to vote on, that we should dedicate quite a bit of time working on sorting out the informed consent issue.
On 2/19/15 12:08 AM, Jed Crandall wrote:
Sorry for chiming in late, have been a bit under the weather this week. I'll just second Philipp's comment:
https://lists.torproject.org/pipermail/ooni-dev/2015-February/000253.html
I.e., a "me too" for giving "Implement data analytics and visualization for OONI tests" a 5. I don't mind writing a bit of code to start looking at some data, but there's so much data out there and I'm more likely to start looking at data (or suggest that a research or networking class student do so) if I already have some idea of what's there.
I'll also add that a "wish list" of problems you'd like solved could be helpful, like:
https://research.torproject.org/ideas.html https://www.torproject.org/getinvolved/volunteer.html.en#Research
For example, I've found IP geolocation services like MaxMind to be pretty bad in certain places. It says something is in a country and it's not, which makes debugging why that data point doesn't behave like others that are supposed to be in that country a pain. Is that a problem that would help OONI if solved? If not, is something else?
FWIW, this is a step in the direction of doing IP geolocation better in parts of the world other than the U.S. and Europe:
http://www.cs.unm.edu/~crandall/infocom2015rtt.pdf
Lastly, in terms of baselines, even very specific baslines like "Tor bridge reachability in Country X" can be very illuminating if the amount of space and time the data is taken over is broad enough. Measuring everything everywhere all the time would be nice, of course, but like Salvador Dali said, "Have no fear of perfection - you'll never reach it."
The suggestion for the wish list of problems we would like solved research wise is very good.
There are a few of those and I think it would be a good idea to write them down somewhere.
The MaxmMind issue is indeed a problem of ours. Luckily we will collect by default also the ASN and in most cases you will be able to identify the inconsistency manually by looking up the details of the ASN (in a whois database or similar).
Regarding reachability of Tor brides we have at this point about 3-4 months of data (daily measurements) for obfs2/obfs3/scramblesuit/fte bridges in Iran, China, Russia, Ukraine.
The latest data is not yet public, because it needs to be scrubbed of the metadata and I have not yet finished re-setting up the pipeline (after our other machine ran out of disk space).
We also have some visualization, that needs to be completed, for it and if somebody is interested in hacking on this I could give them access to the data and code.
....
And now onto the result of the voting session:
Implement data analytics and visualization for OONI tests 4.75
Reach production quality ooni rasperry-pi (beagle-board) images 4.25
Develop OONI tests for censorship circumvention tools 4
Develop scheme for orchestrating ooni-probes 4
Promote and further develop OONI on mobile (Android, iOS) 4
Get daily OONI measurements from 50 countries 3.75
Publish monthly reports about the status of internet censorship in a country 3.75
Implement a GUI for ooniprobes 3.75
Do research based on OONI 3.666666667
Implement pub-sub system for ooni collectors 3.5
Integration with other censorship measurement projects 3.5
Reaching out to communities inside of censored regions 3.5
Run "adopt an ooni-probe" campaign 3.25
Redesign the website for ooni 2.75
Hold an international internet censorship conference 2.25
To me these results are not surprising at all and it reflects more or less what have already been the main areas of work for the past couple of months.
I think therefore we should continue in this direction and hence focus on the the three main areas of "Implement data analytics and visualization for OONI tests", "Reach production quality ooni rasperry-pi (beagle-board) images", "Informed consent research", moreover we will also be working on "Develop OONI tests for censorship circumvention tools" for a project in a specific country.
Here is a list of what are the existing tickets on these areas and what are some potentially new tickets to be created.
# Implement data analytics and visualization for OONI tests
## Existing tickets
Add generation of reports index to the export task of the pipeline https://trac.torproject.org/projects/tor/ticket/13842
Migrating OONI data-pipeline containers and server configuration on a different server https://trac.torproject.org/projects/tor/ticket/13825
Better and more efficient database schema https://trac.torproject.org/projects/tor/ticket/13803
Mongodb queries for the nettest visualization https://trac.torproject.org/projects/tor/ticket/13759
Brainstorm ideas for possible visualisations https://trac.torproject.org/projects/tor/ticket/13731
Investigate possible performance improvements to the ooni-pipeline https://trac.torproject.org/projects/tor/ticket/13720
Align the dates in the visual timeline https://trac.torproject.org/projects/tor/ticket/13639
Better tokening in the output json data format for bridge reachability visualisation https://trac.torproject.org/projects/tor/ticket/13638
## New tickets
### Design and implement OONI reports explorer
This will allow users of OONI to explore the data that we have so far collected, by filtering and searching it.
# Reach production quality ooni rasperry-pi (beagle-board) images
## Existing tickets
OONI on Raspberry Pi https://trac.torproject.org/projects/tor/ticket/13870
## New tickets
### Embedded device configuration wizarcd
Setup a OONI wifi network on the raspberry pi to configure the device at first start. This will allow the user to configure how ooni-probe should connect to the internet and what measurements should be run.
It would also be useful to provide an informed consent information page.
# Informed consent research
## Existing tickets
Write documentation of benefits for running ooniprobe https://trac.torproject.org/projects/tor/ticket/14760
Brainstorm on possible ways of minimizing the risks involved with running ooniprobe while keeping the benefits https://trac.torproject.org/projects/tor/ticket/14761
Redesign how we inform the user of the risks of running ooniprobe and get informed consent from them https://trac.torproject.org/projects/tor/ticket/14762
## New tickets
Get legal feedback for the risks of running ooniprobe in a set of specific countries
Thanks for taking the time to reach this.
I will soon send out an email to schedule next weeks IRC meeting, since we have skipped it this week.
Have fun!
~ Arturo