Hi everyone,
We've been working on improving the usability of BridgeDB lately, and our CAPTCHAs have been a constant thorny problem. They are not accessible for blind users [0]. We've gotten many complaints over the years that they are hard to use [1], I'm sure Gus and the community team members can vent about the impact they've had on users.
We even have some evidence that bots have been able to enumerate our CAPTCHAs just fine [2]. There are other anti-enumeration defences on BridgeDB that are perhaps more useful including: - partitioning bridges into buckets based on IP subnet: A user who requests bridges from a single IP address or multiple IP addresses from the same subnet won't be able to see every bridge. - time-locking access to bridges: The set of bridges that BridgeDB distributes to users during a single time period is locked to a small number. Repeated requests for bridges during that time period will return the same 3 bridges to the user. We haven't done a lot of experimenting with tuning these parameters to slow bridge enumeration attempts.
For the most part, on the country level, censors that are willing to put the effort into blocking BridgeDB bridges seem to be pretty effective at it regardless of CAPTCHAs. The GFW blocked all of the new bridges from our 2019 bridge campaign in less than a month [3]. There was a recent case of censorship in Belarus where state censors blocked all of BridgeDB's email distributed bridges, but weren't able to enumerate bridges distributed over Moat or HTTPS [4]. It's possible that that CAPTCHAs were the reason behind this, but it's hard to know for sure.
I would like to propose that we remove the CAPTCHAs from BridgeDB entirely, but I'd like to know whether there is research out there *specifically that fits with the anti-censorship context* showing that these CAPTCHAs are actually doing useful work to prevent Bridge enumeration. But, even if the CAPTCHAs are preventing a small number of censors from enumerating more bridges, is the usability impact worth what marginal benefit we get from it?
Options for how to move forward:
Option 1: Just remove the CAPTCHAs already!
We're tired of waiting and just want our bridges.
Option 2: Do some science?
We could make a new distribution bucket in BridgeDB that distributes bridges through Moat without a CAPTCHA and have new versions of Tor Browser pull from this bucket. We can watch and perform measurements in places we know enumeration attempts have occurred in the past and see whether these bridges are enumerated more quickly and more completely than the old-school Moat bucket.
Option 3: Keep doing what we're doing but try to make the CAPTCHAs more usable.
This is the work we've had planned, but will only get us so far.
Endpoint enumeration is a tricky topic and we do have some other alternatives in the pipeline. Conjure is more of a "blocking resistance through collateral damage" approach that's somewhat similar to domain fronting [5]. We've been looking at and hope to do more work in the future on reputation-based bridge distribution [6]. I see these as more promising than CAPTCHAs in the long run, and CAPTCHA-less BridgeDB bridges seem to still fill a need that built-in bridges and private bridges don't fill.
I'd appreciate any thoughts, comments, or experiences others have!
Cecylia
[0] https://gitlab.torproject.org/tpo/anti-censorship/bridgedb/-/issues/10831 [1] https://gitlab.torproject.org/tpo/anti-censorship/bridgedb/-/issues/24607 [2] https://gitlab.torproject.org/tpo/anti-censorship/bridgedb/-/issues/32117 [3] https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/obfs4... [4] https://gitlab.torproject.org/tpo/anti-censorship/censorship-analysis/-/blob... [5] https://gitlab.torproject.org/tpo/anti-censorship/team/-/issues/9 [6] https://gitlab.torproject.org/tpo/anti-censorship/bridgedb/-/issues/31873