24 hours worth of BridgeDB usage metrics - tor-dev

29 Jul 2019


      Over at https://bugs.torproject.org/9316, we are working on having
BridgeDB export metrics.  The patch is almost done and I deployed the
work-in-progress code on BridgeDB, so we can take a look at the metrics
and think of ways to improve them.  The metrics format encodes the
approximate number of requests per distribution mechanism per transport
per country per success/fail.  All numbers are rounded up to the next
multiple of 10.  The last field, "none", will be used for an anomaly
score and is currently unused.
For example, the line
bridgedb-metric-count email.obfs4.riseup.success.none 10
tells us that there have been 1-10 successful email requests for obfs4
coming from Riseup addresses.
I attached 24 hours worth of metrics to this email.  Keep the following
issues in mind:
* My feature branch hasn't been reviewed yet and likely still has bugs,
  so take all numbers with a grain of salt.
* The country codes are based on Debian stretch's geoip-database, which
  is slightly outdated and uses Maxmind's far-from-perfect GeoLite
  database.
* The country code "??" refers to geo-location failure or lack of IP
  addresses (in the cast of moat).  The country code "zz" refers to a
  request from a Tor exit relay.
Some observations:
* Gmail sees much more use than Riseup.  That's no surprise.
* The email distributor sees more vanilla than obfs4 requests.
  I wonder to what degree this is caused by the poor UX of the email
  distributor.
* For HTTPS, many countries have a fail and success bucket of 10 each.
  I would expect this to be at least one user who failed the captcha at
  least once before finally getting it right.
* The captcha success rate for obfs4 over moat is 54%.  That's very low
  and must cause lots of frustration for users.  This is a known issue
  that's tracked in this ticket: https://bugs.torproject.org/29695
* Ignore the large amount of HTTPS requests from "zz" -- I expect the
  vast majority of these to be a bot that's interacting with BridgeDB
  over exit relays.
After a cursory look at the numbers, I would like to aggregate the data,
to make it easier to compare distributors, transports, and countries.
For example: how do moat, email, and HTTPS rank in popularity?  I'll
improve the patch to keep track of these numbers in separate metrics.
Any thoughts or suggestions?
Cheers,
Philipp