Hi all,
I'm looking for a guide how to take an existing service, and convert it into an .onion-too service (what facebook and propublica did).
Problem: Webservices tend to respond to a single URL only (like http://clearservice.com/), and won't deal gracefully with requests to http://onionservie.onion/); i.e. they might redirect to the clearservice address.
propublica has published their nginx setup to deal with this, but this looks a bit scary.
For one, they don't seem to rewrite protocol-relative URLs like href="//sub.clearservice.com/".
And then there generally is the question of ensuring that no clearnet URL escapes the rewriting. I guess for that, you'd need to implement a more thorough link checker and not just some ngnix filter rules.
Thoughts on that?
(It also occurred to me that you don't actually need to be the clearservice org to be able to set up an onion for them, as long as there is no https enforced/needed on the onion side.)
Andreas
Andreas Krey a.krey@gmx.de wrote:
I'm looking for a guide how to take an existing service, and convert it into an .onion-too service (what facebook and propublica did).
Problem: Webservices tend to respond to a single URL only (like http://clearservice.com/), and won't deal gracefully with requests to http://onionservie.onion/); i.e. they might redirect to the clearservice address.
propublica has published their nginx setup to deal with this, but this looks a bit scary.
For one, they don't seem to rewrite protocol-relative URLs like href="//sub.clearservice.com/".
And then there generally is the question of ensuring that no clearnet URL escapes the rewriting. I guess for that, you'd need to implement a more thorough link checker and not just some ngnix filter rules.
Thoughts on that?
Here's a brief description of the onion services that are currently being tested for freiheitsfoo: http://74i677egkh3zlk6r.onion/pmwiki.php?n=Main.Freiheitsfoo%dcberTor-Unsere...
The page is partly in German, but I assume that's not a problem for you.
Fabian
Hey all,
I'm looking for a guide how to take an existing service, and convert it into an .onion-too service (what facebook and propublica did).
Part of why I wrote [1] *is* because I was having just this problem. (Not many guides on providing onion access to clearnet sites and I found a lot of knowledge spread out around the internet, like subdomains and vanity onion domains and what to do about hosting onion services in production.)
It was my first stab at it, and I hope that now (that we've published that and people are talking about it and this mailing list exists) the guide situation will surely get better.
Problem: Webservices tend to respond to a single URL only (like http://clearservice.com/), and won't deal gracefully with requests to http://onionservie.onion/); i.e. they might redirect to the clearservice address.
Before settling on a proxy, I thought of the ways I could maybe handle this.
1) You update your application to generates .onion URIs when it sees that a request is coming from the onion service.
2) An HTTP proxy at the onion service rewrites your application's responses to turn your clearnet URIs into onion URIs.
3) Rely on the client application to rewrite the URIs; such as with something like the darkweb-everywhere extension (or similar https-everywhere ruleset manipulation) or something yet to be decided on and built, see [2].
#1 was a non-starter because of the size of our site (years and years of content) and the issue of CMS maintenance (I'm not the one who does it, nor did I want to become the person to hack it and own it going forward).
The non-ubiquity (or non-existence) of #3 is not desirable since we want all of our onion users to have the same clearnet-avoidance. (But FWIW, I very quietly submitted a pull request to add our onion site to darkweb-everywhere, many months before it became publicly-known.)
So, #2 proxying it is. (Looks like the freiheitsfoo config does the same thing.) Bonus with #2 is that it addresses the concern about a service which only responds properly to one domain: the proxy sets the upstream "Host" HTTP header to the expected clearnet domain (as ours sets this to "www.propublica.org"), so that the original application acts normally. Keeps your application code simple.
propublica has published their nginx setup to deal with this, but this looks a bit scary.
For one, they don't seem to rewrite protocol-relative URLs like href="//sub.clearservice.com/".
The config in question is [3]. Yep, admittedly it is a bit scary; the sheer amount of content-rewriting is possibly a bit dangerous. (Ours is particularly complicated because I was worried about rewriting things that shouldn't be, and because of some inconsistent cases where the same thing could live at multiple domain name variants. I like freiheitsfoo's if you want a simpler version of the same technique, though they don't handle protocol-relative URIs on input since I assume their clearnet site doesn't use them. And other very minor differences.)
I'll note that our config *does* try to rewrite protocol-relative URIs; it could be simpler and the config grouped a bit better. But for any of our domains, the regular expressions attempt to rewrite all three cases of http/https/proto-relative clearnet to protocol-relative onion, i.e. this line and others like it:
subs_filter (http:|https:)?//(www.)?propublica.org/ //www.propub3r6espa33w.onion/ gir;
(Chose protocol-relative onion since we're currently testing SSL out with a self-signed cert as we work on getting an EV SSL certificate for our onion site.)
The rules are a bit worse than that because we have some inconsistent conventions around our static assets in Amazon S3 (cloudfront CDN domain vs s3.amazonaws/<bucketname> vs <bucketname>.s3.amazonaws etc) and have many domains and things like that. (And now that I look, the gist is slighty messier and out of date too. Will see about cleaning up the publicly-posted version.) But I think maybe this is a point to bring up, too: for a large enough general-purpose clearnet site, any rewrite rules like this are perhaps going to evolve to become more specific to your site, your application, and your users.
And then there generally is the question of ensuring that no clearnet URL escapes the rewriting. I guess for that, you'd need to implement a more thorough link checker and not just some ngnix filter rules.
In theory, *maybe* we'd want something smarter than nginx with the substitutions filter. A limitation of that module is that we have no ability to rewrite strings inside the HTTP headers, we can only add/remove headers. But otherwise it does the job well.
For something that has to play the role of HTTP server or proxy, it'd have to be pretty performant, and nginx fits the bill quite well. (And I wonder what would accomplish this better than some smart regular expressions against the partial URI or hostname? Although something to test your site's content after the fact would be nice -- though wouldn't this also rely on some string/regex matching to see what was missed?)
And last: for our site (and I'm sure many other sites), outbound links (and even some assets/multimedia/etc) are always going to be a problem since not all of the content reference or use and not all of the partners we work with have onion sites. Such is life on clearnet websites, and when navigating the space between both "pure" clearnet and "pure" onionspace.
Anyway, sorry for the long reply here.
Very interested to hear others' thoughts about this sort of proxying and keeping onion-users in onion space.
Best,
Mike Tigas News Applications Developer, ProPublica https://www.propublica.org/ @mtigas | https://mike.tig.as/ | 0x6E0E9923
[1]: https://www.propublica.org/nerds/item/a-more-secure-and-anonymous-propublica... [1-onion]: http://www.propub3r6espa33w.onion/nerds/item/a-more-secure-and-anonymous-pro...
[2]: https://lists.torproject.org/pipermail/tor-talk/2016-January/039899.html
[3]: https://gist.github.com/mtigas/9a7425dfdacda15790b2#file-2-nginx
On Thu, 28 Jan 2016 03:19:18 +0000, Mike Tigas wrote: ...
I'll note that our config *does* try to rewrite protocol-relative URIs; it could be simpler and the config grouped a bit better. But for any of our domains, the regular expressions attempt to rewrite all three cases of http/https/proto-relative clearnet to protocol-relative onion, i.e. this line and others like it:
subs_filter (http:|https:)?//(www.)?propublica.org/ //www.propub3r6espa33w.onion/ gir;
Yeah, me reading rexep... But it modifies all the URLs, even outside hrefs and in plain text. I don't know if that is bad.
...
For something that has to play the role of HTTP server or proxy, it'd have to be pretty performant, and nginx fits the bill quite well.
I may look into proxying in go sometimes.
...
Anyway, sorry for the long reply here.
No worry. I can press 'd' anytime, and this text may well go into the config gist.
Very interested to hear others' thoughts about this sort of proxying and keeping onion-users in onion space.
I have two proto-projects: One would be to externally onionify our publicly-visible bugtracker, and the other is a bunch of raspberries where some pages point to each other. Making that so that onions reliably refer to the onions is...interesting.
Anyway, thanks for the responses. I way afraid that I was simply too early on the new list to get anything at all. :-)
Andreas
So, for fun, a real conversation that happened about an hour after we launched:
- "This is great - the box hosting the Onion is only 4.6% busy!" - "That's not so good." - "Wat?" - "The box has 20 cores and Tor is basically single-threaded." - "Oh. Right."
...i.e. we were about 92% busy, but everything worked out okay in the end. :-)
We launched with totally stock, unmodified 2.6 tor code and ran it for a year.
This was adequately performant, though the user experience was quite affected by latency.
For clarity's sake, we actually run daemons for three onion addresses - one serves the "www" role, another is "cdn" and the third is "sbx" / for uploads.
Basic maths is correct that we actually run 3 addresses x 2 datacentres = 6 daemons, all on separate hardware, so that the servers that run the tor daemons don't have to think about very much at all. Maillist software tends to swallow attachments, so instead there's a diagram of how this is all laid out in my (slightly out of date) notes at https://storify.com/AlecMuffett/tor-tips https://storify.com/AlecMuffett/tor-tips
All the Tor daemons have to do is pass HTTPS traffic outbound to a VIP which fans out to our SSL termination tier.
About halfway through the year the onion site was impacted by scheduled DR-testing; this led to a "how do we fix this?" discussion, and we decided "why not just run two copies of each onion, in separate datacentres? what's the worst that could happen?" - and that's what we still do.
Running replica daemons seems to mostly work. People receive and cache a descriptor for one-or-other datacentre, and then use it for a while, yielding a coarse load-balancing effect. If one goes offline, the other eventually picks up the slack.
A few months ago, we integrated RSOS into 2.6.10 (thanks, teor!) and deployed it.
With RSOS the latency issues in the UX are much-reduced, and it's arguable that the onion site runs as-fast-as-or-perhaps-marginally-faster-than the site when accessed over normal Tor. The argument for why the onion site might be a little faster is essentially "same number of hops but no resource-contention for exit-node-usage". It's a general sense from usage rather than some scientific claim of performance.
On 28 Jan 2016, at 03:19, Mike Tigas mike@tig.as wrote: ... Before settling on a proxy, I thought of the ways I could maybe handle this.
- You update your application to generates .onion URIs when it sees
that a request is coming from the onion service.
This is what we do; when a request is inbound to our reverse-proxy tier (see "proxygen" in the diagram linked in the storify above) and is sourced from the servers which handle the Tor daemons, inbound "Host:" headers are rewritten from "onionaddress.onion" to "sitename.com" (preserving the subdomain) and an extra "magic" header is injected to denote that the responses to this request need "Onionification".
Then when the request is actually handled by the web tier, essentially everything proceeds as normal for the rest of the site. When a URI/cookie/JS is being rendered to send back to the requester, the "magic" header is checked-for and (if found) the ".onion" TLD is used, rather than the ".com" one. There are a couple of gotchas - eg: don't onionify URIs which are used for internal data fetches necessary to serve the request - but generally the code is remarkably straightforward.
It's simply like serving a different TLD in a consistent manner.
- An HTTP proxy at the onion service rewrites your application's
responses to turn your clearnet URIs into onion URIs.
This was what we did as a proof of concept; I fired up an instance and built "mitmproxy" (mitmproxy.org) on it, and did something like:
# configure Tor to forward Hidden Service to localhost:443 # for inspiration only, this probably won't work/is a bad idea/go read the manual SITE="domain.com" ONION="somekindofonion.onion" mitmproxy -p 443 -P "https://www.$%7BSITE%7D" --anticache \ --replace ":~hq:${ONION}:${SITE}" \ --replace ":~hs:${SITE}:${ONION}" \ --replace ":~bs ~t "application/json":${SITE}:${ONION}" \ --replace ":~bs ~t "application/x-javascript":${SITE}:${ONION}" \ --replace ":~bs ~t "text/css":${SITE}:${ONION}" \ --replace ":~bs ~t "text/html":${SITE}:${ONION}" \ --replace ":~s:${SITE}:${ONION}"
...the idea being to listen to port 443 locally, connecting that onwards to the backend site.
It took an afternoon to test, and I was impressed how much stuff "just worked". Maybe 90% of the site.
Given our volumes we felt it would not be viable for us to do rewriting for every request, hence fixing the codebase a-la "solution-1", above. YMMV.
Some day I would like maybe to adapt Wordpress to support such "solution-1" rewriting. Wordpress strikes me as a good target platform for Onion-enablement. Maybe a plugin would work.
== Increasing Aggregate Bandwidth ==
So, at the moment, we are running 2x daemons with RSOS per onion.
Upcoming plans are approximately:
1) build an "RSOS Onionbalance Appliance" - 10 RSOS daemons, with random-ish Onion addresses, on a single box (use more of those cores!) and wrap them in onionbalance to publish a unified descriptor for them. Deploy and test that.
2) deploy a second replica Onionbalance Appliance for coarse loadbalancing/failover.
3) build a RSOS Onionbalance-NG cluster - up to 60 daemons across several servers, using new UCL-inspired upcoming OnionBalance features to publish up to 6 distinct descriptors at different points into the HSDir. (NB: 6 descriptors * 10 IntroPoints per descriptor = 60 daemon limit)
4) Talk more to Tom about Rendezvous-Callback Handoff, and integrate that into one of the above architectures when it eventually lands.
— Alec Muffett Security Infrastructure Facebook Engineering London
On 28 Jan 2016, at 00:06, Andreas Krey a.krey@gmx.de wrote:
(It also occurred to me that you don't actually need to be the clearservice org to be able to set up an onion for them, as long as there is no https enforced/needed on the onion side.)
Yes, which is a bit of a security nightmare. Malicious onion sites proxying clearnet or onion sites is a known issue.
There was a post on tor-talk about it recently: https://lists.torproject.org/pipermail/tor-talk/2016-January/040038.html
Tim
Tim Wilson-Brown (teor)
teor2345 at gmail dot com PGP 968F094B
teor at blah dot im OTR CAD08081 9755866D 89E2A06F E3558B7F B5A9D14F
tor-onions@lists.torproject.org