On Thu, Jul 29, 2021 at 04:46:37PM -0400, Cecylia Bocovich wrote:
I would like to propose that we remove the CAPTCHAs from BridgeDB entirely, but I'd like to know whether there is research out there *specifically that fits with the anti-censorship context* showing that these CAPTCHAs are actually doing useful work to prevent Bridge enumeration. But, even if the CAPTCHAs are preventing a small number of censors from enumerating more bridges, is the usability impact worth what marginal benefit we get from it?
Right. As another data point, the original bridge distribution design did not intend for the https bridge bucket to use captchas: https://svn-archive.torproject.org/svn/projects/design-paper/blocking.html#t... The original plan around captchas was to rely on Gmail's captcha, or whatever Gmail uses as an account creation rate limiter, for the email distribution bucket. That way *they* keep up with captcha research rather than forcing us to become (and stay) captcha experts.
Thought #1: While of course we don't necessarily need to stick to the vision from 15 years ago, I think there's a lot of merit to the let-a-thousand-flowers-bloom approach to distribution strategies, where we don't need to glue captchas on to every one of them. I support your goal of dropping Captchas from the https distributor, on the theory that they are implicitly included (and done better!) for the email distributor.
Thought #2: Are there adversaries who would happily scrape the https distributor if it were trivial to do, and just the barrier of solving the captchas dissuades them? I'm thinking of the Belarus A1 censorship event for example: https://gitlab.torproject.org/tpo/anti-censorship/censorship-analysis/-/blob... where our analysis indicates that they scraped the gmail distributor but not the https distributor. Maybe they already had the gmail accounts in place from some other attack, so it was cheap to use them for scraping.
Thought #3: We added captchas for the https distributor, but then when we added the Moat distributor we put captchas on it too. And the Moat distributor doesn't have any *other* rate-limiting or defense (compare to the isolation-by-address-block for answers from the https distributor). So Moat seems extra vulnerable to cheap full enumeration.
Thought #4: We would be in a much better position to experiment here if we had a better measurement and feedback infrastructure in place. Like, if we removed the captchas today, how would we know what the impacts are in terms of higher risk of blocking?
So, I too am tempted to get rid of the captchas, but especially since we use them in the Moat distributor too, it is unclear how much losing them would impact usability and security, and it is unclear how we would learn the answer to that in practice.
My suggestion would be to focus on getting that measurement and feedback infrastructure in place first, before considering improving the captchas. We know we need it to know how things are going now, and we're going to need it to understand the impact of any changes we make.
--Roger