Over at https://bugs.torproject.org/9316, we are working on having BridgeDB export metrics. The patch is almost done and I deployed the work-in-progress code on BridgeDB, so we can take a look at the metrics and think of ways to improve them. The metrics format encodes the approximate number of requests per distribution mechanism per transport per country per success/fail. All numbers are rounded up to the next multiple of 10. The last field, "none", will be used for an anomaly score and is currently unused.
For example, the line
bridgedb-metric-count email.obfs4.riseup.success.none 10
tells us that there have been 1-10 successful email requests for obfs4 coming from Riseup addresses.
I attached 24 hours worth of metrics to this email. Keep the following issues in mind:
* My feature branch hasn't been reviewed yet and likely still has bugs, so take all numbers with a grain of salt.
* The country codes are based on Debian stretch's geoip-database, which is slightly outdated and uses Maxmind's far-from-perfect GeoLite database.
* The country code "??" refers to geo-location failure or lack of IP addresses (in the cast of moat). The country code "zz" refers to a request from a Tor exit relay.
Some observations:
* Gmail sees much more use than Riseup. That's no surprise.
* The email distributor sees more vanilla than obfs4 requests. I wonder to what degree this is caused by the poor UX of the email distributor.
* For HTTPS, many countries have a fail and success bucket of 10 each. I would expect this to be at least one user who failed the captcha at least once before finally getting it right.
* The captcha success rate for obfs4 over moat is 54%. That's very low and must cause lots of frustration for users. This is a known issue that's tracked in this ticket: https://bugs.torproject.org/29695
* Ignore the large amount of HTTPS requests from "zz" -- I expect the vast majority of these to be a bot that's interacting with BridgeDB over exit relays.
After a cursory look at the numbers, I would like to aggregate the data, to make it easier to compare distributors, transports, and countries. For example: how do moat, email, and HTTPS rank in popularity? I'll improve the patch to keep track of these numbers in separate metrics.
Any thoughts or suggestions?
Cheers, Philipp
That's awesome, and will shine a lot of light on user demand patterns and how well things are actually working through various channels. Could some metrics be added to summarize how the bridges and queries are distributed across the hashrings? As in, at the end of the day, roughly how many bridges are in each hashring, and how many requests were served from each hashring? I've seen behavior in the past that made me wonder if the internal HMAC/modulo partitioning method is actually uniformly distributed or not, like perhaps there are some hashrings with most of the bridges and others with too few (within a given distribution method), or maybe there are too many requests being pulled from certain hashrings, leaving others under-utilized. This might not need to be a permanent stat dump, but seeing it for at least a few days would help a lot to confirm that the db's guts are working as intended.
On Mon, Jul 29, 2019 at 09:22:52PM -0700, Rick Huebner wrote:
Could some metrics be added to summarize how the bridges and queries are distributed across the hashrings?
Thanks for this suggestion. I agree that it would be helpful and I'll look into incorporating it into the metrics.
Cheers, Philipp
On 2019-07-30 00:01, Philipp Winter wrote:
[...]
After a cursory look at the numbers, I would like to aggregate the data, to make it easier to compare distributors, transports, and countries. For example: how do moat, email, and HTTPS rank in popularity? I'll improve the patch to keep track of these numbers in separate metrics.
Any thoughts or suggestions?
Looks like a great start!
I have two questions and one suggestion, based on a quick read:
You say that you're planning to add aggregate statistics like numbers by distributor without drilling down to transports or countries. Keep in mind that this is going to reduce the noise that you added when rounding up to multiples of 10. For example, knowing that the total by country is closer to $entries_in_that_country * 1 or $entries_in_that_country * 10 will tell you something about the average noise added per entry. It would be more privacy-preserving (and also less accurate) to keep all the noise in the statistics and do the aggregation in a separate step.
What is obs4 in bridgedb-metric-count email.obs4.gmail.fail.none 10 (as opposed to obfs4)?
Would it make sense to add a line like bridge-stats-version to include a version number of some sort, just in case you want to change the format at a later time?
Cheers, Philipp
All the best, Karsten
On Tue, Jul 30, 2019 at 05:42:11PM +0200, Karsten Loesing wrote:
You say that you're planning to add aggregate statistics like numbers by distributor without drilling down to transports or countries. Keep in mind that this is going to reduce the noise that you added when rounding up to multiples of 10. For example, knowing that the total by country is closer to $entries_in_that_country * 1 or $entries_in_that_country * 10 will tell you something about the average noise added per entry. It would be more privacy-preserving (and also less accurate) to keep all the noise in the statistics and do the aggregation in a separate step.
That's a great point. I was originally concerned about the decrease in accuracy but, after running the numbers, it seems tolerable. Let's have a look at the lower and upper bound of the total number of HTTPS requests. Summing up all bins (and ignoring bot requests) gives us the upper bound:
grep https bridgedb-metrics.log | grep -v zz | cut -d ' ' -f 3 | paste -sd+ | bc 3850
To determine the lower bound, we first calculate the number of bins:
grep https bridgedb-metrics.log | grep -c -v zz 235
Then, we multiply the number of bins by 9 and subtract it from the upper bound, which gives us a lower bound of 1,735.
Applying this method to all three distribution mechanisms results in the following table:
Lower bound Upper bound ----------- ----------- Moat 4,576 4,630 HTTPS 1,735 3,850 Email 303 420
Despite the inaccuracy caused by the binning, we can be certain that moat is more popular than HTTPS (moat's lower bound > HTTPS's upper bound) and email is an order of magnitude less popular than both HTTPS and moat. HTTPS is the most inaccurate because of the large number of bins.
What is obs4 in bridgedb-metric-count email.obs4.gmail.fail.none 10 (as opposed to obfs4)?
That's a typo that a user made when requesting the transport. I had not yet changed the code to only consider transports that are supported by BridgeDB. All unsupported transport types should result in a log message and not affect the metrics.
Interestingly, there's another metrics line that shows that there were 1-10 successful requests for the invalid obs4 transport. When requesting an invalid transport, BridgeDB tells you that there are currently no bridges available. Instead, it should tell you that the requested transport does not exist.
Would it make sense to add a line like bridge-stats-version to include a version number of some sort, just in case you want to change the format at a later time?
Yes, that's a good idea. I will do that.
Thanks, Philipp