While brainstorming for a recent funding proposal, I wrote up this list of
"Things we should measure to track our impacts/success" in the context
of the Salmon bridge distribution strategy.
Or to put it another way, these are questions I'll be asking phw et
al to understand how things are going, once we deploy it. Some of the
questions are variations on others, i.e. a single data source can answer
several of them. So it's probably better framed as "questions to track"
rather than "data sources we should collect".
(1a) how many obfs4 / httpt bridges are running total?
Bridges report their existence to the bridge authority, and bridgedb/rdsys
aggregate them and send them to the metrics datasets. So we should
already have these numbers.
(1b) out of those, how many are we distributing via salmon?
This is a parameter that we choose, and we will choose it based on how
successful Salmon is compared to our existing distribution strategies. If
we choose a larger fraction of our bridges to be used for Salmon, it's
a good indication that we're finding Salmon to be an effective option.
(2a) how often we are reachability-testing each bridge?
We'll probably start off doing daily testing, but we should aim to get
the frequency higher, and we might end up doing more targeted just-in-time
testing where whenever a user reports that a bridge is blocked, we launch
a test for it right then in order to make a decision about how to respond
to the user.
(2b) from how many vantage points we are doing this reachability-testing?
We will want to start with n+1 vantage points, one for each target region
and one "control" outside the censored area. That might be sufficient
for a long time, or we might learn that we need to split up our testing
across more vantage points, or test more in particular regions.
(2c) how many of our bridges are currently reachable from each of these
vantage points?
Ideally the answer will be "all of them", but the reality is that some
blocking will occur. So the higher the fraction here the better, and in
some sense it is a measure of the health of our plans, and/or a measure
of the intensity of attention from the censor.
(2d) When a bridge stops working in China, what fraction of the time is
it because the bridge went down, i.e. normal churn?
One of the tradeoffs with a community of volunteer bridges is that
bridges naturally come and go over time. Understanding the dynamics of
our bridge population is key because it impacts the rate at which users
need fresh bridges even when there is no censorship.
(3a) how many total users have registered with the salmon system?
This long-term measure of how many people have tried to use our system
lets us see how well our outreach is working.
(3b) how many of those users do we think are recently active?
Tracking how many people are still using it is a key indicator for both
usability and blocking-resistance: "do people actually find it useful?"
(3c) what is the rate of new users registering with the system?
First of all this helps us understand growth in interest, for example
from our outreach efforts, but also it helps us understand how many
fresh bridge addresses we need to support this growth. It is tied into
the next item which is the other side of the question:
(3d) how many bridges do we have in reserve (not yet filled with users)?
The trouble comes when this number reaches 0. So the target is that we
always have some bridges in reserve, which means we are keeping up with
the rate of new user registrations. If this number hits zero, it means
we need to activate more of our partners to get fresh bridges.
(4a) how many high-reputation users are currently assigned at least one
bridge that we think works?
This number summarizes Salmon's success for our high-value or established
users. That is, if this number remains high, then Salmon is succeeding
at providing availability to its core set of users, even if the other
numbers aren't doing well.
(4b) how many low-reputation users are currently assigned at least one
bridge that we think works?
This number measures the other side: what is the health of the Salmon
system at adding new users, compared to the attention the censor is
giving at adding fake users in order to find and block our bridges?
(5a) at what rate are we filling bridges with new salmon users?
This one is quite related to 3c above, but it is from the bridge
availability side: the best number here is the highest possible number
such that 3d doesn't go to 0.
(5b) at what rate are we filling bridges with existing salmon users?
By looking at the rate where established users need new bridges, we can
understand how much stability we have in the system.
(5c) at what rate are users reporting failed bridges but we think the
bridges are working and reachable for them?
This point aims to measure our false positives, which could stem from
logic errors inside Tor and Tor Browser (e.g. reporting bridges down
when actually our internet isn't on), or from non-uniform blocking within
countries, or probably many other reasons. If this rate gets much above 0,
it's a bug report that we need to track down and understand.
--Roger