isis:
Mike Perry transcribed 5.1K bytes:
[…]
- Perhaps cleaner: if BridgeDB itself were accessible through a domain
front, we could export its captcha and bridge distribution through an API on this domain front. Once your IP forwarding in https://trac.torproject.org/projects/tor/ticket/13171 is solved, BridgeDB even could still make use of its IP-based hashring logic.
Maybe don't set the HTTP header name for the forwarded client IP to "X-Forwarded-For". Otherwise, it will probably get overridden by the Apache server which acts as a reverse proxy in front of BridgeDB's Twisted servers. Just set it to something else, e.g. "X-Domain-Fronted-For".
Then, on the BridgeDB side, it's easy: I'd need to add logic to BridgeDB to handle preferring "X-Domain-Fronted-For", "X-Forwarded-For", then request IP, in that order.
If we make use of this API in Tor Launcher (and we will, as soon as it exists — I'd even pull a crazy and roll it out in the middle of a stable, given the rapid rate of increase in these costs), users would not need to know the magic incantations to access this front, and new bridges could be obtained behind the scenes for them. All they would have to do is keep solving captchas until something worked (until we also implement some kind of fancy crypto like RBridge).
Perhaps the "BridgeDB API" part of what you want is the Tor Browser bridge distributor that I mentioned in §3.1, SOW.9., in my Statement of Work [0] for OTF?
Yes, this is exactly what I want. With respect to SOW.9.1, consider it feasible! Mission Accomplished! ;)
Additionally, SOW.9. is actually the chronological precursor to SOW.10., the latter of which is implementing rBridge (or at least getting started on it). (Work on this is still waiting on OTF to officially grant me the fellowship, along with the other prerequisite tasks getting finished.)
But just to be clear — since it sounds like you've asked for several new things in that last paragraph :) — which do you want:
Tor Browser users use meek to get to BridgeDB, to get non-meek bridges by: 1.a. Retrieving and solving a CAPTCHA inside Tor Launcher. 1.b. Solving a CAPTCHA on a BridgeDB web page.
Tor Browser users use BridgeDB's domain front, to get non-meek bridges by: 2.a. Retrieving and solving a CAPTCHA inside Tor Launcher. 2.b. Solving a CAPTCHA on a BridgeDB web page.
If you want #2, then we're essentially transferring the domain-fronting costs (and the DDoS risks) from meek to BridgeDB, and we'd need to decide who is going to maintain that service, and who is going to pay for it. Could The Tor Project fund BridgeDB domain fronting?
I proposed two things in my original email. My #1 is your #1.b. My #2 is your #2.a.
For my #2 (your #2.a), what I want is a separate domain front for BridgeDB. It makes the most sense to me for Tor to run its own domain front for this.
If for some reason #2.a can't be done, we could do #1.a and use all of meek+Tor, but this seems excessive, slow, and potentially confusing for users (their Tor client would have to bootstrap twice for each bridge set they test).
I only consider my #1 and #1.b emergency stopgaps, though. In fact, if any aspect of this this process is too slow and/or confusing, we won't take any load off of meek (unless the browser also starts regularly yelling at meek users to donate or something).
As far as maintenance goes, the threat to any of our domain fronts, including meek and any BridgeDB domain fronts, from China's Great Cannon waging economic counter-counter-warfare by attacking us (like they did to GreatFire.org) is something which must be taken into account. Will the maintainer of this service need to wake up to emergency, the-request-rate-is-skyrocketing, emails at 4AM to shut the service down?
I would love to hear how David deals with this risk since the Great Cannon incident.
Honestly, though, I think this is less likely now. If China wasn't somehow discouraged from this behavior via some diplomatic backchannel or just general public backlash, GreatFire.org would probably still be under attack right now.
Either way, it does seem wise to structure this such that multiple people can respond to emergencies here, and that individuals like you and/or David aren't on the hook for the financial damages.
Or do we already have technical measures to detect DDoS and prevent $30,000+/day CDN bills? Further, what happens when #2 is being DDoS-ed? Should we fallback to #1? Should we have both, and some strategy for balancing between the two?
I think trying to fall back or balance between the two is unlikely to save us much, and will just introduce excessive implementation complexity.
If they're going to attack domain fronting usage of Tor, it seems to me that they will attack both meek and BridgeDB.
Now that we have a browser updater, I think it is also OK for us to provide autoprobing options for Tor Launcher, so long as the user is informed what this means before they select it, and it only happens once.
Probing all of the different Pluggable Transport types simultaneously provides an excellent training classifier for DPI boxes to learn what new Pluggable Transport traffic looks like.
As long as it happens only once, and only uses the bridges bundled in Tor Browser, I don't see any issue with auto-selecting from the drop-down of transport methodnames in a predefined order. It's what users do anyway.
Oh, yes. I am still against "connect to all of the things at the same time." The probing I had in mind was to cycle through the transport list and try each type, except also obtain the bridges for each type from BridgeDB.
I also think we should be careful about the probing order. I want to probe the most popular and resilient transports (such as obfs4) first.
The autoprobing could then keep asking for non-meek bridges for either a given type of the user's choice, or optionally all non-meek types (with an additional warning that this increases their risk of being discovered as a Tor user).
If the autoprobing is going to include asking BridgeDB (multiple times?) for different types of bridges in the process, whether through a BridgeDB domain front or not, then I think there needs to be more discussion…
- Do you think could you explain more about the steps this autoprobing entails?
1. User starts a fresh Tor Browser (or one that fails to bootstrap) 2. User clicks "Configure" instead of "Connect" 3. User says they are censored 4. User selects a third radio button on the bridge dialog "Please help me obtain bridges". 5. Tor Browser launches a JSON-RPC request to BridgeDB's domain front for bridges of type $TYPE 6. BridgeDB responds with a Captcha 7. User solves captcha; response is posted back to BridgeDB. 8. BridgeDB response with bridges (or a captcha error) 9. Tor Launcher attempts to bootstrap with these bridges. 10. If bootstrap fails, goto step 5.
The number of loops for steps 5-10 for each $TYPE probably require some intuition on how frequently we expect bridges that we hand out to be blocked due to scraping, and how many bridge addresses we really want to hand out per Captcha+IP address combination.
Later, we can replace Captchas with future RBridge-style crypto, though we should design the domain front independently from RBridge, IMO.
- Is the autoprobing meant to solve the issue of not knowing which transport will work? Or the problem of not knowing whether the bridges in Tor Browser are already blocked? Or some other problem?
Both problems at once, though I suspect (or at least hope) that the current transport types included with Tor Browser are more likely to be blocked by scraping BridgeDB for IP addresses than by DPI.
If we're shipping transports known to be blocked by DPI, we should be phasing them out of Tor Browser, and definitely not using them for this autoprobing business.
- Does BridgeDB continue to always normally answer with one transport methodname at a time, unless the "russianroulette" meta-transport type is requested?
Yes, only one transport should be tested at a time, to avoid the possibility of bad transports revealing the IP addresses of the good ones by testing them in combination.
If we follow BridgeDB's spec, [1] and we allow wish for the logic controlling how Tor Browser users are handled to be separate (and thus more maintainable), then this will require a new bridge Distributor, and we should probably start thinking about the threat model/security requirements, and behaviours, of the new Distributor. Some design questions we'll need to answer include:
Should all points on the Distributor's hashring be reachable at a given time (i.e., should there be some feasible way, at any given point in time, to receive any and every Bridge allocated to the Distributor)?
Or should the Distributor's hashring rotate per time period? Or should it have sub-hashrings which rotate in and out of commission?
Should it attempt to balance the distribution of clients to Bridges, so that a (few) Bridge(s) at a time aren't hit with tons of new clients?
Should it treat users coming from the domain front as separate from those coming from elsewhere? (Is is even possible for clients to come from elsewhere? Can clients use Tor to reach this distributor? Can Tor Browser connect directly to BridgeDB, not through the domain front?)
If we're going to do autoprobing, should it still give out a maximum of three Bridges per request? More? Less?
Personally, I think the domain fronting distributor should behave identically to the closest equivalent distributor that isn't domain fronted, both to reduce implementation complexity, and to keep the system easy to reason about.
Before RBridge is implemented, this would mean using the X-Domain-Fronted-For header's IP address as if it were the real IP address, and index into the hashrings in the same way as we do with the web distributor.
I could see an argument that the set of bridges held by the domain fronting distributor should be kept separate from the web distributor, because heck, way more people should be able to access the domain fronted version, and maybe we want to drastically reduce the web distributor's pool because nobody can reach it (except for whitelisted scrapers and people who don't really need bridges).
However, if you do keep the domain front pool separate from the web distributor pool, you should ensure that you also properly handle the case where Tor IP addresses appear in the X-Domain-Fronted-For header. Again, for this case, I think the simplest answer is "use the same rules as the current web distributor does", though if the domain front pool is separate, perhaps the Tor fraction should be much smaller.
Would you and/or Isis be able to work on this on the backend? If not, can either of you recommend someone that might be able to help with the domain fronting bits and other bits involved?
I'm in. Yawning mentioned wanting to work on this too. :)
Great!