A proposal to phase out CAPTCHAs for BridgeDB - tor-dev

29 Jul 2021


      Hi everyone,
We've been working on improving the usability of BridgeDB lately, and
our CAPTCHAs have been a constant thorny problem. They are not
accessible for blind users [0]. We've gotten many complaints over the
years that they are hard to use [1], I'm sure Gus and the community team
members can vent about the impact they've had on users.
We even have some evidence that bots have been able to enumerate our
CAPTCHAs just fine [2]. There are other anti-enumeration defences on
BridgeDB that are perhaps more useful including:
- partitioning bridges into buckets based on IP subnet: A user who
requests bridges from a single IP address or multiple IP addresses from
the same subnet won't be able to see every bridge.
- time-locking access to bridges: The set of bridges that BridgeDB
distributes to users during a single time period is locked to a small
number. Repeated requests for bridges during that time period will
return the same 3 bridges to the user.
We haven't done a lot of experimenting with tuning these parameters to
slow bridge enumeration attempts.
For the most part, on the country level, censors that are willing to put
the effort into blocking BridgeDB bridges seem to be pretty effective at
it regardless of CAPTCHAs. The GFW blocked all of the new bridges from
our 2019 bridge campaign in less than a month [3]. There was a recent
case of censorship in Belarus where state censors blocked all of
BridgeDB's email distributed bridges, but weren't able to enumerate
bridges distributed over Moat or HTTPS [4]. It's possible that that
CAPTCHAs were the reason behind this, but it's hard to know for sure.
I would like to propose that we remove the CAPTCHAs from BridgeDB
entirely, but I'd like to know whether there is research out there
*specifically that fits with the anti-censorship context* showing that
these CAPTCHAs are actually doing useful work to prevent Bridge
enumeration. But, even if the CAPTCHAs are preventing a small number of
censors from enumerating more bridges, is the usability impact worth
what marginal benefit we get from it?
Options for how to move forward:
Option 1: Just remove the CAPTCHAs already!
We're tired of waiting and just want our bridges.
Option 2: Do some science?
We could make a new distribution bucket in BridgeDB that distributes
bridges through Moat without a CAPTCHA and have new versions of Tor
Browser pull from this bucket. We can watch and perform measurements in
places we know enumeration attempts have occurred in the past and see
whether these bridges are enumerated more quickly and more completely
than the old-school Moat bucket.
Option 3: Keep doing what we're doing but try to make the CAPTCHAs more
usable.
This is the work we've had planned, but will only get us so far.
Endpoint enumeration is a tricky topic and we do have some other
alternatives in the pipeline. Conjure is more of a "blocking resistance
through collateral damage" approach that's somewhat similar to domain
fronting [5]. We've been looking at and hope to do more work in the
future on reputation-based bridge distribution [6]. I see these as more
promising than CAPTCHAs in the long run, and CAPTCHA-less BridgeDB
bridges seem to still fill a need that built-in bridges and private
bridges don't fill.
I'd appreciate any thoughts, comments, or experiences others have!
Cecylia
[0]
https://gitlab.torproject.org/tpo/anti-censorship/bridgedb/-/issues/10831
[1]
https://gitlab.torproject.org/tpo/anti-censorship/bridgedb/-/issues/24607
[2]
https://gitlab.torproject.org/tpo/anti-censorship/bridgedb/-/issues/32117
[3]
https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/obfs4...
[4]
https://gitlab.torproject.org/tpo/anti-censorship/censorship-analysis/-/blob...
[5] https://gitlab.torproject.org/tpo/anti-censorship/team/-/issues/9
[6]
https://gitlab.torproject.org/tpo/anti-censorship/bridgedb/-/issues/31873