Hello,
Thanks for the feedback so far.
[ PEOPLE THAT HAVE BIG SCARY ADVERSARIES IN THEIR THREAT MODEL STILL SHOULD NOT USE THIS. ]
New version with changes some that add functionality, some code of quality stuff, hence a version bump to 0.0.2, especially since it'll probably be a bit before I can focus on tackling the TODO items.
Source: https://git.schwanenlied.me/yawning/cfc XPI: https://people.torproject.org/~yawning/volatile/cfc-20160327/
Major changes:
* Properly deregister the HTTP event listeners on addon unload.
* Toned down the snark when I rewrite the CloudFlare captcha page, since I wasn't very nice.
* Additional quality of life/privacy improvements courtesy of Will Scott, both optional and enabled by default.
* (QoL) Skip useless landing pages (github.com/twitter.com will be auto-redirected to the "search" pages).
* (Privacy) Kill twitter's outbound link tracking (t.co URLs) by rewriting the DOM to go to the actual URL when possible. Since DOM changes made from content scripts are isolated from page scripts, this shouldn't substantially alter behavior.
* (Code quality) Use a pref listener to handle preference changes.
TODO:
* Try to figure out a way to mitigate the ability for archive.is to track you. The IFRAME based approach might work here, needs more investigation.
* Handle custom CloudFlare captcha pages (In general my philosophy is to minimize false positives, over avoiding false negatives). Looking at the regexes in dcf's post, disabling the title check may be all that's needed.
* Handle CloudFlare 503 pages.
* Get samples of other common blanket CDN based Tor blocking/major sites that block Tor, and implement bypass methods similar to how CloudFlare is handled.
* Look into adding a "contact site owner" button as suggested by Jeff Burdges et al (Difficult?).
* Support a user specified "always use archive.is for these sites" list.
* UI improvements.
* More Quality of Life/Privacy improvements (Come for the Street Signs, stay for the user scripts).
* I will eventually get annoyed enough at being linked to mobile wikipedia that I will rewrite URLs to strip out the ".m.".
* Test this on Fennec.
* Maybe throw this up on addons.mozilla.org.
Regards,
* Yawning Angel schrieb am 2016-03-27 um 08:12 Uhr:
- (QoL) Skip useless landing pages (github.com/twitter.com will be auto-redirected to the "search" pages).
When you're logged into Twitter, https://twitter.com/ shows you your stream of tweets. With the current version, a user can't see its own stream anymore. Can you redirect to the search page only for non-logged-in users?
On Tue, 29 Mar 2016 10:09:15 +0200 Jens Kubieziel maillist@kubieziel.de wrote:
- Yawning Angel schrieb am 2016-03-27 um 08:12 Uhr:
- (QoL) Skip useless landing pages (github.com/twitter.com will
be auto-redirected to the "search" pages).
When you're logged into Twitter, https://twitter.com/ shows you your stream of tweets. With the current version, a user can't see its own stream anymore. Can you redirect to the search page only for non-logged-in users?
Probably since I should be able to see that you're logged in based on the request made (I never used twitter.com to view my own stream, just went to twitter.com/blahblahblah).
That said, debugging/implementing something like that requires a twitter account, which I don't have anymore (Nuked because it's a waste of time, horrible medium, etc).
For now, this feature can be disabled from the addon preferences.
Regards,
I'm impressed with how much nicer the web gets with this. Thank you Yawning! :)
On Sun, 2016-03-27 at 06:12 +0000, Yawning Angel wrote:
- (QoL) Skip useless landing pages (github.com/twitter.com will be auto-redirected to the "search" pages).
Ahh that's why that happened. lol
- (Privacy) Kill twitter's outbound link tracking (t.co URLs) by rewriting the DOM to go to the actual URL when possible. Since DOM changes made from content scripts are isolated from page scripts, this shouldn't substantially alter behavior.
Nice!
TODO:
- Try to figure out a way to mitigate the ability for archive.is to track you. The IFRAME based approach might work here, needs more investigation.
Interesting point.
- Handle custom CloudFlare captcha pages (In general my philosophy is to minimize false positives, over avoiding false negatives). Looking at the regexes in dcf's post, disabling the title check may be all that's needed.
I've noticed some hiccups with medium on the auto mode, like say https://medium.com/@octskyward/the-resolution-of-the-bitcoin-experiment-dabb... It sometimes works if you hit refresh though.
- Look into adding a "contact site owner" button as suggested by Jeff Burdges et al (Difficult?).
Just noticed this minimalist whois client in node.js : https://github.com/carlospaulino/node-whoisclient/blob/master/index.js
Support a user specified "always use archive.is for these sites" list.
UI improvements.
A task bar icon might find several uses: - A "View this page through archive.is" button for when CFC misses a CAPTCHA, or even if the CAPTCHA is not CloudFlare. - A "contact site button" that worked even after passing to archive.is. - A "Give me the CAPTCHA" button for those who configure CFC to automatically load archive.is.
I'm using another browser profile for this last point currently. In fact, it fit perfectly into my existing pattern of browser profiles. Yet, browser profiles are not user-friendly, especially in TBB, so this would benefit people who do not use profiles.
Wonderful extension! Jeff
Are there any more sites where CloudFalre appears on archive.is?
https://www.aei.org/publication/gen-michael-hayden-on-apple-the-fbi-and-data... https://archive.is/7u5P8
It's some particularly harsh CloudFlare configuration perhaps?
Jeff
On Fri, 01 Apr 2016 18:21:10 +0200 Jeff Burdges burdges@gnunet.org wrote:
Are there any more sites where CloudFalre appears on archive.is?
https://www.aei.org/publication/gen-michael-hayden-on-apple-the-fbi-and-data... https://archive.is/7u5P8
It's some particularly harsh CloudFlare configuration perhaps?
Without knowing how archive.is works, and how CloudFlare works, it's hard to tell.
Since archive.is sets "X-Forwarded-For", it's not particularly hard to figure out if a Tor user is the one requesting a snapshot. I requested a new snapshot and the captcha error page in the archive shows that the IP of my exit, so part of the ClouldFlare infrastructure at least peeks at the header.
I'll probably add support for other (user-configurable?) cached content providers when I have time. The archive.is person doesn't seem to want to respond to e-mail, so asking them to optionally not set X-F-F, seems like it'll go absolutely nowhere.
Regards,
On 2016-04-01 18:06, Yawning Angel wrote:
On Fri, 01 Apr 2016 18:21:10 +0200 Jeff Burdges burdges@gnunet.org wrote:
Are there any more sites where CloudFalre appears on archive.is?
https://www.aei.org/publication/gen-michael-hayden-on-apple-the-fbi-and-data... https://archive.is/7u5P8
It's some particularly harsh CloudFlare configuration perhaps?
Without knowing how archive.is works, and how CloudFlare works, it's hard to tell.
Since archive.is sets "X-Forwarded-For", it's not particularly hard to figure out if a Tor user is the one requesting a snapshot. I requested a new snapshot and the captcha error page in the archive shows that the IP of my exit, so part of the ClouldFlare infrastructure at least peeks at the header.
I'll probably add support for other (user-configurable?) cached content providers when I have time. The archive.is person doesn't seem to want to respond to e-mail, so asking them to optionally not set X-F-F, seems like it'll go absolutely nowhere.
Regards,
webcitation.org is an archive.is alternative. Potentially it doesn't forward request headers (?)
On Sat, 02 Apr 2016 17:00:10 +0000 bancfc@openmailbox.org wrote:
webcitation.org is an archive.is alternative. Potentially it doesn't forward request headers (?)
It's not a request header set by the browser. archive.is is acting like a HTTP proxy and explicitly setting X-F-F.
From the FAQ:
But take in mind that when you archive a page, your IP is being sent to the the website you archive as though you are using a proxy (in X-Forwarded-For header). This feature allows websites (e.g shops or the sites with weather forecast) target your region, not mine.
If there's an easy way to automate requests to other cache/archive services, I will integrate them when I have time[0].
As far as I've seen archive.is has a fairly principled stance on what they will and will not host (The takedown policies listed on webcitation.org don't particularly give me warm and happy feelings), so unless things become unworkable, I'm likely to leave it as the default for the foreseeable future.
Regards,
On Sat, 2 Apr 2016 18:14:26 -0400 Ian Goldberg iang@cs.uwaterloo.ca wrote:
On Sat, Apr 02, 2016 at 07:19:30PM +0000, Yawning Angel wrote:
It's not a request header set by the browser. archive.is is acting like a HTTP proxy and explicitly setting X-F-F.
I wonder what would happen if the browser *also* set X-F-F...?
Unfortunately, it appears that archive.is tramples over X-F-F if it is already set. Maybe others will have better luck engaging with the operator(s) of archive.is than I have.
Regards,
On Fri, Apr 01, 2016 at 06:06:18PM +0000, Yawning Angel wrote:
I'll probably add support for other (user-configurable?) cached content providers when I have time. The archive.is person doesn't seem to want to respond to e-mail, so asking them to optionally not set X-F-F, seems like it'll go absolutely nowhere.
This is some kind of meta-archive service. Their about page lists many web archives (some of the specialized): http://timetravel.mementoweb.org/about/ http://www.mementoweb.org/guide/quick-intro/