While brainstorming for a recent funding proposal, I wrote up this list of "Things we should measure to track our impacts/success" in the context of the Salmon bridge distribution strategy.
Or to put it another way, these are questions I'll be asking phw et al to understand how things are going, once we deploy it. Some of the questions are variations on others, i.e. a single data source can answer several of them. So it's probably better framed as "questions to track" rather than "data sources we should collect".
(1a) how many obfs4 / httpt bridges are running total?
Bridges report their existence to the bridge authority, and bridgedb/rdsys aggregate them and send them to the metrics datasets. So we should already have these numbers.
(1b) out of those, how many are we distributing via salmon?
This is a parameter that we choose, and we will choose it based on how successful Salmon is compared to our existing distribution strategies. If we choose a larger fraction of our bridges to be used for Salmon, it's a good indication that we're finding Salmon to be an effective option.
(2a) how often we are reachability-testing each bridge?
We'll probably start off doing daily testing, but we should aim to get the frequency higher, and we might end up doing more targeted just-in-time testing where whenever a user reports that a bridge is blocked, we launch a test for it right then in order to make a decision about how to respond to the user.
(2b) from how many vantage points we are doing this reachability-testing?
We will want to start with n+1 vantage points, one for each target region and one "control" outside the censored area. That might be sufficient for a long time, or we might learn that we need to split up our testing across more vantage points, or test more in particular regions.
(2c) how many of our bridges are currently reachable from each of these vantage points?
Ideally the answer will be "all of them", but the reality is that some blocking will occur. So the higher the fraction here the better, and in some sense it is a measure of the health of our plans, and/or a measure of the intensity of attention from the censor.
(2d) When a bridge stops working in China, what fraction of the time is it because the bridge went down, i.e. normal churn?
One of the tradeoffs with a community of volunteer bridges is that bridges naturally come and go over time. Understanding the dynamics of our bridge population is key because it impacts the rate at which users need fresh bridges even when there is no censorship.
(3a) how many total users have registered with the salmon system?
This long-term measure of how many people have tried to use our system lets us see how well our outreach is working.
(3b) how many of those users do we think are recently active?
Tracking how many people are still using it is a key indicator for both usability and blocking-resistance: "do people actually find it useful?"
(3c) what is the rate of new users registering with the system?
First of all this helps us understand growth in interest, for example from our outreach efforts, but also it helps us understand how many fresh bridge addresses we need to support this growth. It is tied into the next item which is the other side of the question:
(3d) how many bridges do we have in reserve (not yet filled with users)?
The trouble comes when this number reaches 0. So the target is that we always have some bridges in reserve, which means we are keeping up with the rate of new user registrations. If this number hits zero, it means we need to activate more of our partners to get fresh bridges.
(4a) how many high-reputation users are currently assigned at least one bridge that we think works?
This number summarizes Salmon's success for our high-value or established users. That is, if this number remains high, then Salmon is succeeding at providing availability to its core set of users, even if the other numbers aren't doing well.
(4b) how many low-reputation users are currently assigned at least one bridge that we think works?
This number measures the other side: what is the health of the Salmon system at adding new users, compared to the attention the censor is giving at adding fake users in order to find and block our bridges?
(5a) at what rate are we filling bridges with new salmon users?
This one is quite related to 3c above, but it is from the bridge availability side: the best number here is the highest possible number such that 3d doesn't go to 0.
(5b) at what rate are we filling bridges with existing salmon users?
By looking at the rate where established users need new bridges, we can understand how much stability we have in the system.
(5c) at what rate are users reporting failed bridges but we think the bridges are working and reachable for them?
This point aims to measure our false positives, which could stem from logic errors inside Tor and Tor Browser (e.g. reporting bridges down when actually our internet isn't on), or from non-uniform blocking within countries, or probably many other reasons. If this rate gets much above 0, it's a bug report that we need to track down and understand.
--Roger
anti-censorship-team@lists.torproject.org