Specification for bypassing CAPTCHAs using blinded tokens

List overview All Threads
Download

newer

older

Does CloudFlare ever expire Tor...

CloudFlare CAPTCHA Timeout Loop

Alex Davidson

30 Sep 2016 30 Sep '16

8:53 p.m.

Hi everyone,

I've been interning at Cloudflare for the past 3 months and have been working on developing an implementation of the blinded tokens spec that George and Filippo developed a while back.

We've updated aspects of the original spec, the newest spec can now be viewed here: https://github.com/cloudflare/challenge-bypass-specification. We're happy to hear comments from any of you on the design and on the capacity of the solution for preserving anonymity.

We've managed to build a test copy of the protocol along with an extension that carries out the required operations. The extension is not completely finished yet, but we're looking towards making that open source as well when it is done.

Thanks, Alex

Attachments:

attachment.html (text/html — 943 bytes)

Show replies by date

Georg Koppen

3 Oct 3 Oct

9:43 a.m.

New subject: Specification for bypassing CAPTCHAs using blinded tokens

Hi,

Alex Davidson:

...

Hi everyone,

I've been interning at Cloudflare for the past 3 months and have been working on developing an implementation of the blinded tokens spec that George and Filippo developed a while back.

We've updated aspects of the original spec, the newest spec can now be viewed here: https://github.com/cloudflare/challenge-bypass-specification. We're happy to hear comments from any of you on the design and on the capacity of the solution for preserving anonymity.

Thanks for the update. While I am still mulling over parts of the specification let me start with an issue I found while reading. I am wondering about how your specification is supposed to work for users of Private Browsing Modes in general and Tor Browser with its Disk Avoidance requirement[1] in particular.

Looking at section 7.1 it seems to me all those tokens (blinded/signed) are saved somewhere on the user's machine. As far as I can see the specification does not elaborate on this point further but I guess they are supposed to be saved on disk. This is not working for us at least. Thus, the other option would be to have them in memory. But then the user would be recreating tokens and getting those new tokens signed with every visit of a Cloudflare guarded site after Tor Browser got started.

Additionally, and in contrast to what your spec is claiming in section 8.1, we have the New Identity feature[2] that is e.g. aimed at dealing with powerful first party entities and their long-term linkability capabilities. That feature ensures that by invoking it a user gets a clean, new browsing session. Thus, new tokens would need to get created, blinded and signed in this case again. A New Identity is easy to get and we encourage users to do so regularly: this option is the first one in our Torbutton menu and got keyboard shortcuts, too, as users over and over suggested that, to make it easier and more convenient to mitigate long-term linkability concerns.

And, as a final angle, we plan to get Tor Browser on mobile to feature parity with the one for desktop rather soon because there are large regions of the world that access the Internet mainly (or even only) over the phone. Often this is accompanied by unreliable and slow connections and it is probably still not uncommon that users in those countries have to pay for transferred bytes. Moreover, battery power is a scarce resource on mobile devices and public key cryptography is expensive. Thus, it seems to me that getting the tokens created, blinded and signed again and again after New Identity and restart of Tor Browser is especially problematic for those users.

As a last and more general point in my mail I thought it might be good to point out that we need to have a discussion about whether your blinded token idea is actually a good solution to the problem at hand. That might be the case or it might be a good solution to a different problem. Personally, I am not sure about that yet and I may have missed some of the previous discussions outside of this list. Sorry if that's the case. But I think it is important to get this more fundamental issue sorted out (in a different thread), especially if we want to implement and ship a solution in Tor Browser.

Georg

[1] https://www.torproject.org/projects/torbrowser/design/#disk-avoidance [2] https://www.torproject.org/projects/torbrowser/design/#new-identity

...

We've managed to build a test copy of the protocol along with an extension that carries out the required operations. The extension is not completely finished yet, but we're looking towards making that open source as well when it is done.

Thanks, Alex

tor-access mailing list tor-access@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-access

Jeff Burdges

5:25 p.m.

New subject: Predicting effectiveness

I'll split this off into a separate thread.

On Mon, 2016-10-03 at 09:43 +0000, Georg Koppen wrote:

...

As a last and more general point in my mail I thought it might be good to point out that we need to have a discussion about whether your blinded token idea is actually a good solution to the problem at hand.

I'm concerned that CloudFlare's concerns over Token stockpiling, coupled with not doing stuff Rodger asked for previously, like free GET requests, will result in a scheme that improves matters but still basically feels unusable.

I'd worry less if CloudFlare's crypto folk were confident they could push though previous tor requests like free GETs, or similar, either before or in tandem with deploying the token scheme. It'd be unfortunate if people spent oodles of time only for parameter choices to make the scheme remain quite painful.

Not if I understand CloudFlare's published blocking statistics for Tor relays, then CloudFlare sees roughly *two* Tor circuits as being bad at any given time, out of *all* Tor circuits, fewer if the bad Tor clients rotate circuits faster. This is quite a small set to detect.

I cannot estimate the bad page loads from their published data though, but presumably the detected bad page loads come form honey pot sites, so the actual bad page loads should quite numerous, which helps.

It'd be helpful if CloudFlare could provide some data from which we can estimate bad page loads, so that we can meaningfully discuss issues like token stockpiling.

Best, Jeff

John Graham-Cumming

6:55 p.m.

New subject: Predicting effectiveness

On Mon, Oct 3, 2016 at 6:25 PM, Jeff Burdges burdges@gnunet.org wrote:

...

I'm concerned that CloudFlare's concerns over Token stockpiling, coupled with not doing stuff Rodger asked for previously, like free GET requests, will result in a scheme that improves matters but still basically feels unusable.

Can you define "free GET requests"? I'm concerned that there's a perception that a GET request is 'safe' in some way when that's far from the truth.

Jeff Burdges

7:18 p.m.

New subject: Predicting effectiveness

On Mon, 2016-10-03 at 19:55 +0100, John Graham-Cumming wrote:

...

Can you define "free GET requests"? I'm concerned that there's a perception that a GET request is 'safe' in some way when that's far from the truth.

Ok. GETs are not supposed to modify resources, right? So they should be considerably safer than POSTs, right?

What are the concerns for GETs? Also, do those concerns apply to truly static content even?

Apologies if this was discussed previously with Tor people. I was not privy to those conversations.

Best, Jeff

John Graham-Cumming

7:28 p.m.

New subject: Predicting effectiveness

On Mon, Oct 3, 2016 at 8:18 PM, Jeff Burdges burdges@gnunet.org wrote:

...

Ok. GETs are not supposed to modify resources, right? So they should be considerably safer than POSTs, right?

When we are thinking about security (rather than the functionality of a web application) there is often little difference between a GET and a POST. Consider the following examples:

1. Benign GET / repeated 1000 times per second. That's a DoS on the server

2. Shellshock. Looks like a benign GET / but nasty payload in User-Agent header

3. Simple GET but with SQLi in the URI

What are the concerns for GETs? Also, do those concerns apply to truly

...

static content even?

Depends what you mean by 'static content'. If a web site was served entirely from Cloudflare's cache then I wouldn't be worried about #1, #2 or #3 above. Any time we hit the origin web server I would worry.

Jeff Burdges

7:55 p.m.

New subject: Predicting effectiveness

On Mon, 2016-10-03 at 20:28 +0100, John Graham-Cumming wrote:

...

Depends what you mean by 'static content'. If a web site was served entirely from Cloudflare's cache then I wouldn't be worried about #1, #2 or #3 above. Any time we hit the origin web server I would worry.

Ok. Just a silly question : Is there a way to rewrite request so that can only hit the cache and fail if they need to hit the origin?

Jeff

John Graham-Cumming

7:57 p.m.

New subject: Predicting effectiveness

On Mon, Oct 3, 2016 at 8:55 PM, Jeff Burdges burdges@gnunet.org wrote:

...

Ok. Just a silly question : Is there a way to rewrite request so that can only hit the cache and fail if they need to hit the origin?

I guess, but that seems sad. We should filter out bad requests and allow legitimate users to get access to web sites (static or not).

Jeff Burdges

4 Oct 4 Oct

3:15 a.m.

New subject: Predicting effectiveness

On Mon, 2016-10-03 at 20:28 +0100, John Graham-Cumming wrote:

...

Benign GET / repeated 1000 times per second. That's a DoS on

the server

Is this going to work over Tor anyways? I suppose your concern would be PHP, etc. that falls over much faster than the web server calling it, no?

...

Shellshock. Looks like a benign GET / but nasty payload in

User-Agent header

...

Simple GET but with SQLi in the URI

I suppose you're not worried about targeted attack per se here, as they can always solve the current CAPTCHA, but automated attackers who attempt attacks on many servers, no?

Are these serious concerns? I suppose they're more serious than the DoS concerns, so that sounds bad from the token stockpiling perspective.*

On Mon, 2016-10-03 at 20:57 +0100, John Graham-Cumming wrote:

...

I guess, but that seems sad. We should filter out bad requests and allow legitimate users to get access to web sites (static or not).

I'm wondering if the extension should first attempt to load the page in a way that ensures it does not need to spend a token. If that fails, there might be tricks to avoid the request entirely. Images could be dropped when the cache fails to serve them, for example.

Jeff

* If this becomes an issue, there is an approach that might work : Just use multiple signing keys, one system wide key C for all CloudFlare sites, and individual site keys for each site CloudFlare protects. If you solve a CAPTCHA then you withdraw a moderate stack of C tokens. If you visit site X then you spend an X token if you have one, but if you do not then you spend a single C token to withdraw tens of thousands of X tokens. So solving a CAPTCHA is worth hundreds of thousands of page loads, but only across a moderate number of sites. We could've separate Cbig and Csmall keys such that first it withdraws with Csmall, but if the users blows through that quickly then it withdraws with Cbig.

John Graham-Cumming

6:32 a.m.

New subject: Predicting effectiveness

On Tue, Oct 4, 2016 at 4:15 AM, Jeff Burdges burdges@gnunet.org wrote:

...

On Mon, 2016-10-03 at 20:28 +0100, John Graham-Cumming wrote:

...

Benign GET / repeated 1000 times per second. That's a DoS on

the server

Is this going to work over Tor anyways? I suppose your concern would be PHP, etc. that falls over much faster than the web server calling it, no?

It turns out that does work over a Tor. We see this type of DoS happen across the Tor network. The network has quite a lot of capacity and certainly enough to knock over smaller web sites. A related tool is "Tor's Hammer" which performs DoS using a slightly different method over Tor.

...

...

Shellshock. Looks like a benign GET / but nasty payload in

User-Agent header

...

Simple GET but with SQLi in the URI

I suppose you're not worried about targeted attack per se here, as they can always solve the current CAPTCHA, but automated attackers who attempt attacks on many servers, no?

Right. For example the popular sqlmap tool for finding SQLi vulnerabilities in a web site has a --tor option to run through the Tor network. Running attack tools via Tor is very common.

...

Are these serious concerns? I suppose they're more serious than the DoS concerns, so that sounds bad from the token stockpiling perspective.*

Yes, these are serious concerns. If they weren't I would have just dropped CAPTCHA for Tor exit nodes and be done with it. We know from watching attacks come through Tor that to do so would expose people's web sites.

* If this becomes an issue, there is an approach that might work : Just

...

use multiple signing keys, one system wide key C for all CloudFlare sites, and individual site keys for each site CloudFlare protects. If you solve a CAPTCHA then you withdraw a moderate stack of C tokens. If you visit site X then you spend an X token if you have one, but if you do not then you spend a single C token to withdraw tens of thousands of X tokens. So solving a CAPTCHA is worth hundreds of thousands of page loads, but only across a moderate number of sites. We could've separate Cbig and Csmall keys such that first it withdraws with Csmall, but if the users blows through that quickly then it withdraws with Cbig.

I'll let the crypto-heads explore that.

Mike Perry

7 Oct 7 Oct

11:17 p.m.

New subject: Predicting effectiveness

John Graham-Cumming:

...

On Tue, Oct 4, 2016 at 4:15 AM, Jeff Burdges burdges@gnunet.org wrote:

...
On Mon, 2016-10-03 at 20:28 +0100, John Graham-Cumming wrote:

...

Benign GET / repeated 1000 times per second. That's a DoS on

the server

...
Are these serious concerns? I suppose they're more serious than the DoS concerns, so that sounds bad from the token stockpiling perspective.*

Yes, these are serious concerns. If they weren't I would have just dropped CAPTCHA for Tor exit nodes and be done with it. We know from watching attacks come through Tor that to do so would expose people's web sites.

Hey John, at the CloudFlare Internet Summit, we spoke briefly about your efforts to work on a WAF-based approach for actively filtering out obviously bad requests, letting through obviously good requests, and then using this blind signed token scheme for the requests that were difficult to tell for whatever reason.

I am still convinced that this combination (or something like it) is the winning solution here, especially given what Georg pointed out about it being difficult for us to store tokens for very long (depending on user behavior, New Identity usage, and if they want to store disk history or not). On top of that, with concerns about token farming/hoarding and the need to expire keys/tokens somewhat frequently on CloudFlare's side, I'm not seeing a terribly high multiplier/CAPTCHA reduction for the tokens by themselves.

But I still do see a potentially high multiplier effect if we can do better on request filtering, and also add the crypto on top of that, even if the Tor Browser defaults work against us somewhat.

It sounded to me like you folks were really close to the WAF approach working. Can you say if that is still the case, and what timelines we might expect?

P.S. I hope I'm not stealing your thunder by talking about that project, but I know it made me a lot less skeptical of the blind token idea as a whole, and I suspect others here would also be comforted by that news as well. Knowing that general trajectory would help everybody get closer to being on the same page with this, I think :)

-- Mike Perry

John Graham-Cumming

8 Oct 8 Oct

4:56 p.m.

New subject: Predicting effectiveness

We already started that filtering approach. It's live for millions of web sites (although not all sites using Cloudflare) and has been for some time. It has resulted in a large drop in the use of CAPTCHA.

I was waiting to report to this group the results as I wanted to let it run for a while so we can see how well the abuse filtering works. But it's just fine that you mention is now.

I'm very interested in the problem of filtering abusive Tor traffic because the Tor system is unique in its approach to privacy making it a challenge to filter well. This is an interesting engineering problem and has great benefits for us as being good at filtering abusive traffic from Tor makes us better at filtering abuse from the wider internet.

John.

...

On 8 Oct 2016, at 00:17, Mike Perry mikeperry@torproject.org wrote:

John Graham-Cumming:

...
...
On Tue, Oct 4, 2016 at 4:15 AM, Jeff Burdges burdges@gnunet.org wrote:

...
On Mon, 2016-10-03 at 20:28 +0100, John Graham-Cumming wrote:

Benign GET / repeated 1000 times per second. That's a DoS on

the server

...
Are these serious concerns? I suppose they're more serious than the DoS concerns, so that sounds bad from the token stockpiling perspective.*

Yes, these are serious concerns. If they weren't I would have just dropped CAPTCHA for Tor exit nodes and be done with it. We know from watching attacks come through Tor that to do so would expose people's web sites.

Hey John, at the CloudFlare Internet Summit, we spoke briefly about your efforts to work on a WAF-based approach for actively filtering out obviously bad requests, letting through obviously good requests, and then using this blind signed token scheme for the requests that were difficult to tell for whatever reason.

I am still convinced that this combination (or something like it) is the winning solution here, especially given what Georg pointed out about it being difficult for us to store tokens for very long (depending on user behavior, New Identity usage, and if they want to store disk history or not). On top of that, with concerns about token farming/hoarding and the need to expire keys/tokens somewhat frequently on CloudFlare's side, I'm not seeing a terribly high multiplier/CAPTCHA reduction for the tokens by themselves.

But I still do see a potentially high multiplier effect if we can do better on request filtering, and also add the crypto on top of that, even if the Tor Browser defaults work against us somewhat.

It sounded to me like you folks were really close to the WAF approach working. Can you say if that is still the case, and what timelines we might expect?

P.S. I hope I'm not stealing your thunder by talking about that project, but I know it made me a lot less skeptical of the blind token idea as a whole, and I suspect others here would also be comforted by that news as well. Knowing that general trajectory would help everybody get closer to being on the same page with this, I think :)

-- Mike Perry _______________________________________________ tor-access mailing list tor-access@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-access

Mike Perry

12 Oct 12 Oct

9:41 p.m.

New subject: Predicting effectiveness

John Graham-Cumming:

...

We already started that filtering approach. It's live for millions of web sites (although not all sites using Cloudflare) and has been for some time. It has resulted in a large drop in the use of CAPTCHA.

This is great news! Thank you!

...

I was waiting to report to this group the results as I wanted to let it run for a while so we can see how well the abuse filtering works. But it's just fine that you mention is now.

I'm very interested in the problem of filtering abusive Tor traffic because the Tor system is unique in its approach to privacy making it a challenge to filter well. This is an interesting engineering problem and has great benefits for us as being good at filtering abusive traffic from Tor makes us better at filtering abuse from the wider internet.

That is good to hear, also.

I do still notice CAPTCHAs for bare requests to the top-level domains of paid sites. Is there a plan to roll this out for all CloudFlare sites at some point for Tor traffic?

Also, I will see what we can do on our side about reviewing the spec and the browser extension, taking Georg and Jeff's comments into consideration. Is there a timeline for when that system will be ready on your side?

-- Mike Perry

John Graham-Cumming

14 Oct 14 Oct

10:16 a.m.

New subject: Predicting effectiveness

On Wed, Oct 12, 2016 at 10:41 PM, Mike Perry mikeperry@torproject.org wrote:

...

I do still notice CAPTCHAs for bare requests to the top-level domains of paid sites. Is there a plan to roll this out for all CloudFlare sites at some point for Tor traffic?

That's correct. This is not enabled for every single Cloudflare site yet. We are working on the full roll out plan.

...

Also, I will see what we can do on our side about reviewing the spec and the browser extension, taking Georg and Jeff's comments into consideration. Is there a timeline for when that system will be ready on your side?

Nick Sullivan is driving that but we plan on it being available this quarter.

John.

Jeff Burdges

3 Oct 3 Oct

7:21 p.m.

New subject: Specification for bypassing CAPTCHAs using blinded tokens

On Mon, 2016-10-03 at 09:43 +0000, Georg Koppen wrote:

...

Thanks for the update. While I am still mulling over parts of the specification let me start with an issue I found while reading. I am wondering about how your specification is supposed to work for users of Private Browsing Modes in general and Tor Browser with its Disk Avoidance requirement[1] in particular.

Looking at section 7.1 it seems to me all those tokens (blinded/signed) are saved somewhere on the user's machine. As far as I can see the specification does not elaborate on this point further but I guess they are supposed to be saved on disk. This is not working for us at least. Thus, the other option would be to have them in memory. But then the user would be recreating tokens and getting those new tokens signed with every visit of a Cloudflare guarded site after Tor Browser got started.

This is an interesting objection that impacts Taler as well. For Tails, we'd might try to integrate with Tails directly and tweak Taler to ask for persistence before doing the withdrawal. I should read [1] to better understand disk avoidance issues beyond Tails.*

...

Additionally, and in contrast to what your spec is claiming in section 8.1, we have the New Identity feature[2] that is e.g. aimed at dealing with powerful first party entities and their long-term linkability capabilities. That feature ensures that by invoking it a user gets a clean, new browsing session. Thus, new tokens would need to get created, blinded and signed in this case again.

If properly implemented, then blind signatures from one session can safely be used with another session.**

There are a bunch of caveats about the "properly implemented" part in my previous email, like not using the full domain for the blinding factor, and some additional ones I neglected to mention like key validation with GCDs checks.

Yet, the RSA blind signature primitive itself seems information theoretically secure, assuming your blinding factor is a random oracle, and the mint's public key is honestly a product of two large primes.

Against Taler's anonymity, I believe the best attack is to detect SHA2-512 output as non-random without knowing anything about the value being hashed. If a adversarial mint can do O(m^2) such recognition attacks on SHA2-512 where m is the total coins processed system wide during a denomination key lifetime, say 6 months, then they gain 1 bit of deanonymizing information per coin, which lets them perform an intersection attack.

Anyways, there were notable gaps between theory and practice for blind signatures, which we've fixed in Taler afaik. If these gaps are addressed, like we addressed them, then tokens from one session should be safe to use with another session.

...

And, as a final angle, we plan to get Tor Browser on mobile to feature parity with the one for desktop rather soon because there are large regions of the world that access the Internet mainly (or even only) over the phone. Often this is accompanied by unreliable and slow connections and it is probably still not uncommon that users in those countries have to pay for transferred bytes. Moreover, battery power is a scarce resource on mobile devices and public key cryptography is expensive. Thus, it seems to me that getting the tokens created, blinded and signed again and again after New Identity and restart of Tor Browser is especially problematic for those users.

This should be okay for Taler itself as coins represent money.

Assuming persistence, users devices do their RSA computations during withdrawal, but they're doing them for future page load in CloudFlare's case. In Taler, a single token requires two exponentiations mod n by the smallish public exponent e, one for encrypting the blinding factor and one for signature verification, and two extra GCD computations to protect against malicious mint key's with hidden small prime factors***, and two cheap modular multiplications. Afaik, there need not be additional public key crypto at page load time.

I'd expect battery usage is improved by withdrawing more tokens at once. Yes, this drains the battery more then, possibly forcing a recharge, but overall this cost becomes less random. So the question becomes : Would CloudFlare make withdrawal sufficiently rare by allowing enough coins to be withdrawn at once?

Best, Jeff

...

[1] https://www.torproject.org/projects/torbrowser/design/#disk-avoidance [2] https://www.torproject.org/projects/torbrowser/design/#new-identity

* For Taler, an options might be to return all the coins to the user's reserve, which conceivably could be printed as a QR cod and scanned in by Taler. That's problematic because the reserve deanonymizes the user. We could however create a second anonymous reserve and construct the Taler coins so that they can only be refunded to that reserve via some analog of our refresh protocol. I think this still kinda sucks though because withdrawals produces correlated activity with a non-anonymous reserve. Persistence seems preferable.

** There were ideas about going with more efficient schemes, like maybe whatever Brave uses. We could look into that but my superficial take is : Anything like that will be doing this dance around pairing-based crypro, so one must worry seriously about the concerns around decisional Diffie-Hellman that I mentioned in my previous email. I'd expect the issues can be dealt with because there are a bunch of serious folks like Matt Green behind ZCash, but they require rather deep expertise with pairing-based crypto. Also these schemes all move public key crypto from the withdrawal phase to page load time, so maybe that's bad for battery life.

*** I neglected these GCD computations in my previous email, but they are crucial for anonymity. I actually missed them myself, after finding the more subtle hash recognition attacks, but a fellow named CodesInChaos asked me about GCDs at some point. We could ask him if he is interested in being added to this list.

Georg Koppen

6 Oct 6 Oct

12:49 p.m.

New subject: Specification for bypassing CAPTCHAs using blinded tokens

Jeff Burdges:

...

On Mon, 2016-10-03 at 09:43 +0000, Georg Koppen wrote:

[snip]

...

...
Additionally, and in contrast to what your spec is claiming in section 8.1, we have the New Identity feature[2] that is e.g. aimed at dealing with powerful first party entities and their long-term linkability capabilities. That feature ensures that by invoking it a user gets a clean, new browsing session. Thus, new tokens would need to get created, blinded and signed in this case again.

If properly implemented, then blind signatures from one session can safely be used with another session.**

Well, I assumed that blind signatures get properly implemented when writing my mail. There is more, though. The idea behind New Identity is clearing browser state as well as this state risks leaking into the new identity. "state" in this particular case would mean "having been on a clouldflare customer website before and having blinded tokens ready for spending". Having done New Identity might even be detectable by the edge in this case, given that it could send a cookie after performing the CAPTCHA request and signing the blinded tokens which would get cleared by New Identity.

[snip]

...

...
And, as a final angle, we plan to get Tor Browser on mobile to feature parity with the one for desktop rather soon because there are large regions of the world that access the Internet mainly (or even only) over the phone. Often this is accompanied by unreliable and slow connections and it is probably still not uncommon that users in those countries have to pay for transferred bytes. Moreover, battery power is a scarce resource on mobile devices and public key cryptography is expensive. Thus, it seems to me that getting the tokens created, blinded and signed again and again after New Identity and restart of Tor Browser is especially problematic for those users.

This should be okay for Taler itself as coins represent money.

Assuming persistence, users devices do their RSA computations during withdrawal, but they're doing them for future page load in CloudFlare's case. In Taler, a single token requires two exponentiations mod n by the smallish public exponent e, one for encrypting the blinding factor and one for signature verification, and two extra GCD computations to protect against malicious mint key's with hidden small prime factors***, and two cheap modular multiplications. Afaik, there need not be additional public key crypto at page load time.

I'd expect battery usage is improved by withdrawing more tokens at once. Yes, this drains the battery more then, possibly forcing a recharge, but overall this cost becomes less random. So the question becomes : Would CloudFlare make withdrawal sufficiently rare by allowing enough coins to be withdrawn at once?

I don't know. I guess seeing numbers about how both schemes fare with Tor Browser day-to-day usage might be interesting and could help getting a better understanding for the constraints at play.

Georg

[snip]

Jeff Burdges

2:06 p.m.

New subject: Specification for bypassing CAPTCHAs using blinded tokens

On Thu, 2016-10-06 at 12:49 +0000, Georg Koppen wrote:

...

...
If properly implemented, then blind signatures from one session can safely be used with another session.**

Well, I assumed that blind signatures get properly implemented when writing my mail. There is more, though. The idea behind New Identity is clearing browser state as well as this state risks leaking into the new identity. "state" in this particular case would mean "having been on a clouldflare customer website before and having blinded tokens ready for spending".

Yes, it leaks roughly a bit of information about the bipartite graph between users and site visits. And I mentioned a layered approach to Alex that leaks more than one.

These bits cannot compound across multiple page loads or site visits, as anyone who visits the site gets them, but certainly there are concerns : - These bits obviously compound with any information TBB or the user leaks to the site. - If multiple CDNs, etc. adopt this token based approach, then users can easily be deanonymized by the CDNs they have or have not used. - There is no way to safely use per site tokens as the differences across sites can be used to tag users. - We'd leak more if CloudFlare rotated their key. - The layered scheme for token withdrawal that I mentioned to Alex sounds more fragile now.

Very messy..

Thanks for pointing this out. :)

...

Having done New Identity might even be detectable by the edge in this case, given that it could send a cookie after performing the CAPTCHA request and signing the blinded tokens which would get cleared by New Identity.

I donno if I understand this part, but there is an existing problem that the edge sees cookies from many sites, allowing them to correlate traffic to deanonymize users with purely the cookies. I donno if these new edges cookies make that so much worse than cookies sites use anyways.

Ideas for fixing that sounds pretty drastic : Do not send cookies, site data, etc. to sites protected by CloudFlare without user consent. Attempt to load them as static pages from CloudFlare's cache without revealing cookies. Attempt to use Ceno, etc. to get a static version of any page that is not itself static. Requite that users click through some dialog to access dynamic content on a page. Ain't just CloudFlare that weakens TLS in that way though.

Jeff

Georg Koppen

7 Oct 7 Oct

10:03 a.m.

New subject: Specification for bypassing CAPTCHAs using blinded tokens

Jeff Burdges:

...

On Thu, 2016-10-06 at 12:49 +0000, Georg Koppen wrote:

...
...
If properly implemented, then blind signatures from one session can safely be used with another session.**

Well, I assumed that blind signatures get properly implemented when writing my mail. There is more, though. The idea behind New Identity is clearing browser state as well as this state risks leaking into the new identity. "state" in this particular case would mean "having been on a clouldflare customer website before and having blinded tokens ready for spending".

Yes, it leaks roughly a bit of information about the bipartite graph between users and site visits. And I mentioned a layered approach to Alex that leaks more than one.

These bits cannot compound across multiple page loads or site visits, as anyone who visits the site gets them, but certainly there are concerns :

These bits obviously compound with any information TBB or the user

leaks to the site.

If multiple CDNs, etc. adopt this token based approach, then users can

easily be deanonymized by the CDNs they have or have not used.

There is no way to safely use per site tokens as the differences

across sites can be used to tag users.

We'd leak more if CloudFlare rotated their key.

The layered scheme for token withdrawal that I mentioned to Alex

sounds more fragile now.

Very messy..

Thanks for pointing this out. :)

You are welcome. :) I am not sure yet how much of the information leakage you outlined above would still be an issue in case users did a New Identity and only the signed tokens remained. But as I said that is easily solvable: the tokens represent browser state and need to get treated accordingly (i.e. deleted if a user requests a new identity).

...

...
Having done New Identity might even be detectable by the edge in this case, given that it could send a cookie after performing the CAPTCHA request and signing the blinded tokens which would get cleared by New Identity.

I donno if I understand this part, but there is an existing problem that the edge sees cookies from many sites, allowing them to correlate traffic to deanonymize users with purely the cookies. I donno if these new edges cookies make that so much worse than cookies sites use anyways.

Sorry for being a bit dense. What I meant is a user having signed tokens trying to redeem one but is *not* sending cookies back to the edge (which would have been cleared by New Identiy) is a good indication for an edge that a user just requested a New Identity. And I'd like to avoid leaking that fact as well. (The probability that a Tor Browser user is crawling into some obscure Firefox menu to delete the cookies more or less manually when New Identity is the better option anyway seems to be pretty low to me).

Georg

Jeff Burdges

3 Oct 3 Oct

3:09 p.m.

New subject: Specification for bypassing CAPTCHAs using blinded tokens

On Fri, 2016-09-30 at 20:53 +0000, Alex Davidson wrote:

...

We've updated aspects of the original spec, the newest spec can now be viewed here: https://github.com/cloudflare/challenge-bypass-specification. We're happy to hear comments from any of you on the design and on the capacity of the solution for preserving anonymity.

I noticed full domain hash (FDH) do not appear. Instead I'm reading :

For both PK and SIGN: 2048-bit RSA using OAEP and PSS (respectively) for normal operations.

For H: SHA256

For MAC: HMAC-SHA256

PSS is completely incompatible with blind signatures because the signer must provide randomness. You could maybe fix this with some sort of cut & choose or zero knowledge scheme for choosing the randomness, but..

All the security proofs for RSA blind signatures just replace PSS with FDH anyways. In fact, CloudFlare might not need a FDH for verification because hash factoring attacks sound implausible, but worse..

There is nothing about how the blinding factors get chosen!

There are absolutely brutal deanonymization attack on blind signatures where the blinding factor is not created using a full domain PRNG, probably your FDH for the signature. In this case, I really mean full domain where you (1) generate a random 2048 bit number, (2) test that it's less than the RSA modulus n, and (3) throw it away and start again if it is not. On average, this requires generating two 2048 bit numbers because n should lie half way between 2^2047 and 2^2048, but obviously a malicious exchange could pick a small n to make the clients do a bit more work.

There are more issues with blind Schnorr signatures, but they look susceptible to this attack too. The blind BLS signature scheme somes with different concerns :

- I'm worried that, if the mint can afford to pick an weak public key, then they have interesting attacks, so maybe client's must validate that the mint's public key does not lie in some dangerous subgroup somehow.

- Pairing-based schemes are dangerous for anonymity applications because the pairing makes them fail decisional Diffie-Hellman, which anonymity tends to require. As I understand the BLS paper, we have e( H(M)*g^r, sigma ) = e( (H(M)*g^r)^x, H(M)) so the mint sees the two rhs of the pairing during signing, and sees the two lhs during verification, so they can perform an intersection attack to deanonymize the user. I imagine this is prevented by the pairing being asymmetric with no way to move sigma or H(M) from G_1 to G_2, so no way to compute these particular pairing. I'm nervous about treating the asymmetry of the pairing as a security property though, well the little pairing-based crypto work I've read does not cover it. Worse, another pairing e between G_1 and G_1 works just fine, so if the curve is already pairing friendly then how do you know this bad pairing does not exist? Also, an H : {0,1}* -> G_2 compatible with the original H : {0,1}* -> G_1 might still work for the original good pairing e. I'm happy to ask our local experts on pairing, but ultimately I'd need to write up an array of theoretical attacks and ask Tanja Lange who I should ask.

I know it's annoying to work with someone else's specs, codebase, etc., but these are the sort of issues I've dealt with in Taler, so it might behoove you guys too track us a bit more closely. Aside from ways you might benefit from our experience, we might benefit from you guys wanting to do stuff in the HTTP header, actually we've maybe already moved that way for our Javascript-free HTTP API.

Best, Jeff

Mike Perry

17 Oct 17 Oct

11:08 p.m.

New subject: Specification for bypassing CAPTCHAs using blinded tokens

Alex Davidson:

...

Hi everyone,

I've been interning at Cloudflare for the past 3 months and have been working on developing an implementation of the blinded tokens spec that George and Filippo developed a while back.

We've updated aspects of the original spec, the newest spec can now be viewed here: https://github.com/cloudflare/challenge-bypass-specification. We're happy to hear comments from any of you on the design and on the capacity of the solution for preserving anonymity.

I have a quick question about this comment in the spec in Section 8.1: https://github.com/cloudflare/challenge-bypass-specification/blob/master/cap...

"Since Cloudflare controls the origins, it could currently correlate user sessions across multiple circuits using these cookies. This is a gap in the Tor Browser threat model- the design explicitly ignores linking within a session by malicious first parties, but Cloudflare has effectively first-party control over a large proportion of the web."

The Tor Browser threat model actually does cover this. We call this "Cross-Origin Identifier Unlinkability". We isolate all browser state to the first party to prevent exactly this type of tracking.

This *should*[1] mean that even though CloudFlare controls a lot of first parties, it still can't track users from first party to first party simply though cookies/supercookies, etc. So even if CloudFlare controls both abc.com and xyz.com, the cookies, DOM Storage, and even TLS state set on abc.com are not visible on xyz.com and vice versa (even if both abc.com and xyz.com source eachother as third parties on their respective sites).

If, through your research into this work, or elsewhere in CloudFlare, you folks have actually found ways around this tracking protection, we would like to know about them. Any details would be appreciated.

[1]. As far as I know, the only major issue we are aware of is that we currently don't differentiate between fully automated first party redirect loops and user-initiated federated login clicks. We've been thinking about building some mechanism for differentiating these, so we could still support federated login but prevent automated redirect tracking, but first party redirect loops like this are pretty noticeable, and so far we have not seen this abused. If you think otherwise, or if there are other ways to circumvent our tracking protection if first parties cooperate, that would be useful to know.

-- Mike Perry

Georg Koppen

31 Dec 31 Dec

2:41 p.m.

New subject: Mitigating attacks that would lead to falling back to a purely CAPTCHA-based solution

Hi,

While re-reading the specification after talking to Cloudflare people at CCC I realized there is even an IETF draft of the blinded token idea in the Git repository. Looking at that one closer I found there is a potential anti-DDoS protection mentioned that is missing in the specification itself.

The idea is to include a proof-of-work (PoW) mechanism to avoid the option to send many viable-looking but invalid tokens to the Cloudflare servers engaging them in lots of expensive public-key operations before the tokens finally get dismissed.

Now, I've heard the specification is about to get updated significantly but I am wondering whether that PoW feature will be in it or not.

In general I feel this issue, the token stockpiling problem (section 8.3 in the spec) and any other one making it likely that the current CAPTCHA-based "solution", which is still a fallback

("We also leave the door open to an elevated threat response that does not offer to accept bypass tokens."),

is getting resurrected needs to get addressed. The last thing I want to happen is investing all the effort in getting the blinded token solution properly deployed just to realize one week (or N weeks) later we are back at square zero due to just a single jerk using newly created attack opportunities over Tor forcing Cloudflare into disabling that bypass token solution again.

This is not just some theoretical issue which might be entertaining to get discussed in an academic context. Rather, there are very likely attackers out there that are interested in it as we have seen that Cloudflare-style CAPTCHAs lead to users abandoning Tor which a) makes those more vulnerable and b) lowers the protections Tor provides to all of the remaining users.

Georg

2868

Age (days ago)

2960

Last active (days ago)

tor-access@lists.torproject.org

20 comments

5 participants

tags (0)

participants (5)

Alex Davidson
Georg Koppen
Jeff Burdges
John Graham-Cumming
Mike Perry