On Tue, Jul 30, 2019 at 05:42:11PM +0200, Karsten Loesing wrote:
You say that you're planning to add aggregate statistics like numbers by distributor without drilling down to transports or countries. Keep in mind that this is going to reduce the noise that you added when rounding up to multiples of 10. For example, knowing that the total by country is closer to $entries_in_that_country * 1 or $entries_in_that_country * 10 will tell you something about the average noise added per entry. It would be more privacy-preserving (and also less accurate) to keep all the noise in the statistics and do the aggregation in a separate step.
That's a great point. I was originally concerned about the decrease in accuracy but, after running the numbers, it seems tolerable. Let's have a look at the lower and upper bound of the total number of HTTPS requests. Summing up all bins (and ignoring bot requests) gives us the upper bound:
grep https bridgedb-metrics.log | grep -v zz | cut -d ' ' -f 3 | paste -sd+ | bc 3850
To determine the lower bound, we first calculate the number of bins:
grep https bridgedb-metrics.log | grep -c -v zz 235
Then, we multiply the number of bins by 9 and subtract it from the upper bound, which gives us a lower bound of 1,735.
Applying this method to all three distribution mechanisms results in the following table:
Lower bound Upper bound ----------- ----------- Moat 4,576 4,630 HTTPS 1,735 3,850 Email 303 420
Despite the inaccuracy caused by the binning, we can be certain that moat is more popular than HTTPS (moat's lower bound > HTTPS's upper bound) and email is an order of magnitude less popular than both HTTPS and moat. HTTPS is the most inaccurate because of the large number of bins.
What is obs4 in bridgedb-metric-count email.obs4.gmail.fail.none 10 (as opposed to obfs4)?
That's a typo that a user made when requesting the transport. I had not yet changed the code to only consider transports that are supported by BridgeDB. All unsupported transport types should result in a log message and not affect the metrics.
Interestingly, there's another metrics line that shows that there were 1-10 successful requests for the invalid obs4 transport. When requesting an invalid transport, BridgeDB tells you that there are currently no bridges available. Instead, it should tell you that the requested transport does not exist.
Would it make sense to add a line like bridge-stats-version to include a version number of some sort, just in case you want to change the format at a later time?
Yes, that's a good idea. I will do that.
Thanks, Philipp