On Tue, Feb 20, 2018 at 05:51:44PM +0100, Karsten Loesing wrote:
FWIW, we collected all feedback from this thread, discussed this change in the metrics team, and forwarded our planned change to the Tor Research Safety Board. I don't know how fast that will move, but I could imagine it's a matter of weeks, not days.
I just put in a review over on the safety board page, but I'm publishing it here too for completeness / efficiency:
Thought 1: I wouldn't worry that much about whether published contactinfo would help an adversary do blocking. There are many ways that bridge enumeration might happen, and this one seems pretty tame and limited.
But thought 1b: Is there a way to discover if we were wrong and it *is* helping an adversary? It would be nice to have some way to validate this decision not to worry, and some way to detect if it turns out we were wrong. I can't think of a good way, and the lack of a feedback mechanism makes the assumption more risky to act on.
Thought 2: Ordinarily, research groups would do the analysis privately on their data set, and publish only the results. That is, the safety board question would be "Can I collect this data? I'll throw it away afterwards and only publish my analysis." But this is a different situation: the goal is to provide a public data set so others can do their own analysis. It's a tradeoff: potential surprises to bridge operators vs potential benefits to community. This is really a community growth strategy decision. When phrased that way, you might be able to include some more concrete points in the "positive" category, such as: ability for more external researchers to get involved, and increased chance that a community of bridge operators develops. And speaking of community-building, are there volunteers lined up who would contact bridge operators if given the chance, or is this more of a theoretical "maybe it would happen"?
Thought 3: I think sending mail to the current contactinfos, telling them that starting in a few weeks their contactinfo will go public, is a fine approach on the "notice / consent" spectrum -- especially since as you say they technically already got notice when they were editing the torrc file, so this follow-up attempt wouldn't be the first try.
Thought 4: In retrospect, it would be good to have some initial analysis of the (currently secret) data set. For example, how many bridges set contactinfo, and how many don't? How many of each of those are 'fast' (popular) bridges? What fraction of the contactinfos are actually a usable email address? How many bridge families are there now, i.e. bridges that use the same contact email address? Maybe most bridges don't set it currently, so this whole question doesn't matter much, or maybe many of them set it but obfuscate it, which will make your notification plan harder than you predicted.
--Roger