Hi everyone,
We've been working on improving the usability of BridgeDB lately, and our CAPTCHAs have been a constant thorny problem. They are not accessible for blind users [0]. We've gotten many complaints over the years that they are hard to use [1], I'm sure Gus and the community team members can vent about the impact they've had on users.
We even have some evidence that bots have been able to enumerate our CAPTCHAs just fine [2]. There are other anti-enumeration defences on BridgeDB that are perhaps more useful including: - partitioning bridges into buckets based on IP subnet: A user who requests bridges from a single IP address or multiple IP addresses from the same subnet won't be able to see every bridge. - time-locking access to bridges: The set of bridges that BridgeDB distributes to users during a single time period is locked to a small number. Repeated requests for bridges during that time period will return the same 3 bridges to the user. We haven't done a lot of experimenting with tuning these parameters to slow bridge enumeration attempts.
For the most part, on the country level, censors that are willing to put the effort into blocking BridgeDB bridges seem to be pretty effective at it regardless of CAPTCHAs. The GFW blocked all of the new bridges from our 2019 bridge campaign in less than a month [3]. There was a recent case of censorship in Belarus where state censors blocked all of BridgeDB's email distributed bridges, but weren't able to enumerate bridges distributed over Moat or HTTPS [4]. It's possible that that CAPTCHAs were the reason behind this, but it's hard to know for sure.
I would like to propose that we remove the CAPTCHAs from BridgeDB entirely, but I'd like to know whether there is research out there *specifically that fits with the anti-censorship context* showing that these CAPTCHAs are actually doing useful work to prevent Bridge enumeration. But, even if the CAPTCHAs are preventing a small number of censors from enumerating more bridges, is the usability impact worth what marginal benefit we get from it?
Options for how to move forward:
Option 1: Just remove the CAPTCHAs already!
We're tired of waiting and just want our bridges.
Option 2: Do some science?
We could make a new distribution bucket in BridgeDB that distributes bridges through Moat without a CAPTCHA and have new versions of Tor Browser pull from this bucket. We can watch and perform measurements in places we know enumeration attempts have occurred in the past and see whether these bridges are enumerated more quickly and more completely than the old-school Moat bucket.
Option 3: Keep doing what we're doing but try to make the CAPTCHAs more usable.
This is the work we've had planned, but will only get us so far.
Endpoint enumeration is a tricky topic and we do have some other alternatives in the pipeline. Conjure is more of a "blocking resistance through collateral damage" approach that's somewhat similar to domain fronting [5]. We've been looking at and hope to do more work in the future on reputation-based bridge distribution [6]. I see these as more promising than CAPTCHAs in the long run, and CAPTCHA-less BridgeDB bridges seem to still fill a need that built-in bridges and private bridges don't fill.
I'd appreciate any thoughts, comments, or experiences others have!
Cecylia
[0] https://gitlab.torproject.org/tpo/anti-censorship/bridgedb/-/issues/10831 [1] https://gitlab.torproject.org/tpo/anti-censorship/bridgedb/-/issues/24607 [2] https://gitlab.torproject.org/tpo/anti-censorship/bridgedb/-/issues/32117 [3] https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/obfs4... [4] https://gitlab.torproject.org/tpo/anti-censorship/censorship-analysis/-/blob... [5] https://gitlab.torproject.org/tpo/anti-censorship/team/-/issues/9 [6] https://gitlab.torproject.org/tpo/anti-censorship/bridgedb/-/issues/31873
Cecylia Bocovich:
Option 1: Just remove the CAPTCHAs already!
We're tired of waiting and just want our bridges.
Option 2: Do some science?
We could make a new distribution bucket in BridgeDB that distributes
bridges through Moat without a CAPTCHA and have new versions of Tor Browser pull from this bucket. We can watch and perform measurements in places we know enumeration attempts have occurred in the past and see whether these bridges are enumerated more quickly and more completely than the old-school Moat bucket.
Hi Cecylia,
I understand that your Option 2 would remove all CAPTCHAs for all Tor Browser users.
I don't know much about bridge distribution so my idea is most likely flawed. But what about combining Option 1 and Option 2 by doing a bigger experiment that would already remove the CAPTCHAs for a significant amount of users:
Split the current CAPTCHA bridges 50/50 into 2 buckets:
- Bridges in the 1st bucket would be distributed without CAPTCHA. - Bridges in the 2nd bucket would be distributed with a CAPTCHA.
New versions of Tor Browser could pick from either of the 2 buckets. Maybe based on a silly metric like whether the 3rd part of the IP address is odd or even to be consistent across a same local network, or maybe something smarter.
You get the science while saving CAPTCHAs to 50% of users already and not risking all your CAPTCHA bridges in the gamble. It might be easier to measure how much CAPTCHAs really prevent enumeration by comparing both buckets over the same period of time. All Tor Browsers remain the same. The current UI could display or not display the CAPTCHAs when requesting a bridge without a lot of change.
Option 3: Keep doing what we're doing but try to make the CAPTCHAs more usable.
This is the work we've had planned, but will only get us so far.
I'd keep Option 3 for if the experiment proves that CAPTCHAs are really useful at preventing enumeration.
On Thu, Jul 29, 2021 at 04:46:37PM -0400, Cecylia Bocovich wrote:
Hi everyone,
We've been working on improving the usability of BridgeDB lately, and our CAPTCHAs have been a constant thorny problem. They are not accessible for blind users [0]. We've gotten many complaints over the years that they are hard to use [1], I'm sure Gus and the community team members can vent about the impact they've had on users.
Yeah :/
We even have some evidence that bots have been able to enumerate our CAPTCHAs just fine [2].
Many moons ago, BridgeDB proxied CAPTCHA challenges from ReCAPTCHA [7], instead of creating and serving its own. Eventually, Isis implemented the current custom (GIMP) CAPTCHA system themselves because ReCAPTCHA served impossible challenges [8].
There are other anti-enumeration defences on BridgeDB that are perhaps more useful including:
[snip]
I would like to propose that we remove the CAPTCHAs from BridgeDB entirely, but I'd like to know whether there is research out there *specifically that fits with the anti-censorship context* showing that these CAPTCHAs are actually doing useful work to prevent Bridge enumeration. But, even if the CAPTCHAs are preventing a small number of censors from enumerating more bridges, is the usability impact worth what marginal benefit we get from it?
Options for how to move forward:
Option 1: Just remove the CAPTCHAs already!
We're tired of waiting and just want our bridges.
For mostly obvious reasons, this option worries me.
Option 2: Do some science?
We could make a new distribution bucket in BridgeDB that distributes
bridges through Moat without a CAPTCHA and have new versions of Tor Browser pull from this bucket. We can watch and perform measurements in places we know enumeration attempts have occurred in the past and see whether these bridges are enumerated more quickly and more completely than the old-school Moat bucket.
I like this idea.
Option 3: Keep doing what we're doing but try to make the CAPTCHAs more usable.
This is the work we've had planned, but will only get us so far.
I'd be interested in this one, too, with a bit of (2) and some science. We could conduct an(other) experiment where the current CAPTCHA system is the control, and BridgeDB serves challenges from e.g., hCAPTCHA, 50% of the time, and that new experimental CAPTCHA system protects a new, independent, bridge bucket (like in 2).
There could be three outcomes:
1. Success/Failure rates of challenges per connection (summarized by quartiles?) 2. How many new bridges are blocked from within the countries identified in (2) after some time period? 3. How quickly new bridges are blocked from within the countries identified in (2)?
At the end of the day, my primary concern is whether *people* have access to the resources they need. I appreciate the the difficulty of this situation and running a service like this (and I don't envy you :)).
Endpoint enumeration is a tricky topic and we do have some other alternatives in the pipeline. Conjure is more of a "blocking resistance through collateral damage" approach that's somewhat similar to domain fronting [5]. We've been looking at and hope to do more work in the future on reputation-based bridge distribution [6]. I see these as more promising than CAPTCHAs in the long run, and CAPTCHA-less BridgeDB bridges seem to still fill a need that built-in bridges and private bridges don't fill.
I agree, and this sounds good to me.
Thanks!
I'd appreciate any thoughts, comments, or experiences others have!
Cecylia
[0] https://gitlab.torproject.org/tpo/anti-censorship/bridgedb/-/issues/10831 [1] https://gitlab.torproject.org/tpo/anti-censorship/bridgedb/-/issues/24607 [2] https://gitlab.torproject.org/tpo/anti-censorship/bridgedb/-/issues/32117 [3] https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/obfs4... [4] https://gitlab.torproject.org/tpo/anti-censorship/censorship-analysis/-/blob... [5] https://gitlab.torproject.org/tpo/anti-censorship/team/-/issues/9 [6] https://gitlab.torproject.org/tpo/anti-censorship/bridgedb/-/issues/31873
[7] https://gitlab.torproject.org/tpo/anti-censorship/bridgedb/-/issues/5481 [9] https://gitlab.torproject.org/tpo/anti-censorship/bridgedb/-/issues/10809
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
On Thu, Jul 29, 2021 at 04:46:37PM -0400, Cecylia Bocovich wrote:
I would like to propose that we remove the CAPTCHAs from BridgeDB entirely, but I'd like to know whether there is research out there *specifically that fits with the anti-censorship context* showing that these CAPTCHAs are actually doing useful work to prevent Bridge enumeration. But, even if the CAPTCHAs are preventing a small number of censors from enumerating more bridges, is the usability impact worth what marginal benefit we get from it?
Right. As another data point, the original bridge distribution design did not intend for the https bridge bucket to use captchas: https://svn-archive.torproject.org/svn/projects/design-paper/blocking.html#t... The original plan around captchas was to rely on Gmail's captcha, or whatever Gmail uses as an account creation rate limiter, for the email distribution bucket. That way *they* keep up with captcha research rather than forcing us to become (and stay) captcha experts.
Thought #1: While of course we don't necessarily need to stick to the vision from 15 years ago, I think there's a lot of merit to the let-a-thousand-flowers-bloom approach to distribution strategies, where we don't need to glue captchas on to every one of them. I support your goal of dropping Captchas from the https distributor, on the theory that they are implicitly included (and done better!) for the email distributor.
Thought #2: Are there adversaries who would happily scrape the https distributor if it were trivial to do, and just the barrier of solving the captchas dissuades them? I'm thinking of the Belarus A1 censorship event for example: https://gitlab.torproject.org/tpo/anti-censorship/censorship-analysis/-/blob... where our analysis indicates that they scraped the gmail distributor but not the https distributor. Maybe they already had the gmail accounts in place from some other attack, so it was cheap to use them for scraping.
Thought #3: We added captchas for the https distributor, but then when we added the Moat distributor we put captchas on it too. And the Moat distributor doesn't have any *other* rate-limiting or defense (compare to the isolation-by-address-block for answers from the https distributor). So Moat seems extra vulnerable to cheap full enumeration.
Thought #4: We would be in a much better position to experiment here if we had a better measurement and feedback infrastructure in place. Like, if we removed the captchas today, how would we know what the impacts are in terms of higher risk of blocking?
So, I too am tempted to get rid of the captchas, but especially since we use them in the Moat distributor too, it is unclear how much losing them would impact usability and security, and it is unclear how we would learn the answer to that in practice.
My suggestion would be to focus on getting that measurement and feedback infrastructure in place first, before considering improving the captchas. We know we need it to know how things are going now, and we're going to need it to understand the impact of any changes we make.
--Roger