On the visualization of OONI bridge reachability data - tor-dev

5 Oct 2014


      == What is bridge reachability data? ==
By bridge reachability data I'm referring to information about which
Tor bridges are censored in different parts of the world.
The OONI project has been developing a test that allows probes in
censored countries to test which bridges are blocked and which are
not. The test simply takes as input a list of bridges and tests
whether they work. It's also able to test obfuscated bridges with
various pluggable transports (PTs).
== Why do we care about this bridgability data? ==
A few different parties care about the results of the bridge
reachability test [0]. Some examples:
Tor developers and censorship researchers can study the bridge
reachability data to learn which PTs are currently useful around the
world, by seeing which pluggable transports get blocked and where.  We
can also learn which bridge distribution mechanisms are busted and
which are not.
Bridge operators, the press, funders and curious people, can learn
which countries conduct censorship and how advanced technology they
use. They can also learn how long it takes jurisdictions to block
public bridges. And in general, they can get a better understanding of
how well Tor is doing in censorship circumvention around the world.
Finally, censored users and world travelers can use the data to learn
which PTs are safe to use in a given jurisdiction.
== Visualizing bridge reachability data ==
So let's look at the data.
Currently, OONI bridge reachability reports look like this:
https://ooni.torproject.org/reports/0.1/CN/bridge_reachability-2014-07-02T00...
and you can retrieve them from this directory listing:
https://ooni.torproject.org/reports/0.1/
That's nice, but I doubt that many people will be able to access (let
alone understand) those reports. Hence, we need some kind of
visualization (and better dir listing) to conveniently display the
data to human beings.
However, a simple x-to-y graph will not suffice: our ploblem is
multidimensional. There are many use cases for the data and bridges
have various characteristics (obfuscation method, distribution method,
etc.) hence there are more than one useful ways to visualize this
dataset.
To give you an idea, I will show you two mockups of visualizations
that I would find useful. Please don't pay attention to the data
itself, I just made some things up while on a train.
Here is one that shows which PTs are blocked in which countries:
https://people.torproject.org/~asn/bridget_vis/countries_pts.jpg The
list would only include countries that are blocking at least a
bridge. Green is "works", red is "blocked". Also, you can imagine the
same visualization, but instead of PT names for columns it has
distribution methods ("BridgeDB HTTP distributor", "BridgeDB mail
distributor", "Private bridge", etc.).
And here is another one that shows how fast jurisdictions block the
default TBB bridges:
https://people.torproject.org/~asn/bridget_vis/tbb_blocked_timeline.jpg
These visualizations could be helpful, but they are not the only ones.
What other use cases do you imagine using this dataset for?
What graphs or visualizations would you like to see?
[0]: Here are some use cases:
Tor developers / Researcers:
      *** Which pluggable transports are blocked and where?
      *** Do they do DPI? Or did they just block the TBB hardcoded bridges?
      *** Which jurisdictions are most aggressive and what blocking technology do they use?
      *** Do they block based on IP or on (IP && PORT)?
Users:
      *** Which pluggable transport should I use in my jurisdiction?
Bridge operators / Press / Funders / Curious people:
      *** Which jurisdictions conduct Tor censorship? (block pluggable transports/distribution methods)
      *** How quickly do jurisdictions block bridges?
      *** How many users/traffic (and which locations) did the blocked bridges serve?
      **** Can be found out through extrainfo descriptors.
      *** How well are Tor bridges doing in censorship circumvention?