Re: [tor-onions] web service onionification

28 Jan 2016


      Hey all,
...
I'm looking for a guide how to take an existing service,
and convert it into an .onion-too service (what facebook
and propublica did).
Part of why I wrote [1] *is* because I was having just this problem.
(Not many guides on providing onion access to clearnet sites and I found
a lot of knowledge spread out around the internet, like subdomains and
vanity onion domains and what to do about hosting onion services in
production.)
It was my first stab at it, and I hope that now (that we've published
that and people are talking about it and this mailing list exists) the
guide situation will surely get better.
...
Problem: Webservices tend to respond to a single URL
only (like http://clearservice.com/), and won't deal
gracefully with requests to http://onionservie.onion/);
i.e. they might redirect to the clearservice address.
Before settling on a proxy, I thought of the ways I could maybe handle this.
1) You update your application to generates .onion URIs when it sees
that a request is coming from the onion service.
2) An HTTP proxy at the onion service rewrites your application's
responses to turn your clearnet URIs into onion URIs.
3) Rely on the client application to rewrite the URIs; such as with
something like the darkweb-everywhere extension (or similar
https-everywhere ruleset manipulation) or something yet to be decided on
and built, see [2].
#1 was a non-starter because of the size of our site (years and years of
content) and the issue of CMS maintenance (I'm not the one who does it,
nor did I want to become the person to hack it and own it going forward).
The non-ubiquity (or non-existence) of #3 is not desirable since we want
all of our onion users to have the same clearnet-avoidance. (But FWIW, I
very quietly submitted a pull request to add our onion site to
darkweb-everywhere, many months before it became publicly-known.)
So, #2 proxying it is. (Looks like the freiheitsfoo config does the same
thing.) Bonus with #2 is that it addresses the concern about a service
which only responds properly to one domain: the proxy sets the upstream
"Host" HTTP header to the expected clearnet domain (as ours sets this to
"www.propublica.org"), so that the original application acts normally.
Keeps your application code simple.
...
propublica has published their nginx setup to deal
with this, but this looks a bit scary.
For one, they don't seem to rewrite protocol-relative
  URLs like href="//sub.clearservice.com/".
The config in question is [3]. Yep, admittedly it is a bit scary; the
sheer amount of content-rewriting is possibly a bit dangerous. (Ours is
particularly complicated because I was worried about rewriting things
that shouldn't be, and because of some inconsistent cases where the same
thing could live at multiple domain name variants. I like freiheitsfoo's
if you want a simpler version of the same technique, though they don't
handle protocol-relative URIs on input since I assume their clearnet
site doesn't use them. And other very minor differences.)
I'll note that our config *does* try to rewrite protocol-relative URIs;
it could be simpler and the config grouped a bit better. But for any of
our domains, the regular expressions attempt to rewrite all three cases
of http/https/proto-relative clearnet to protocol-relative onion, i.e.
this line and others like it:
subs_filter (http:|https:)?//(www.)?propublica.org/
//www.propub3r6espa33w.onion/ gir;
(Chose protocol-relative onion since we're currently testing SSL out
with a self-signed cert as we work on getting an EV SSL certificate for
our onion site.)
The rules are a bit worse than that because we have some inconsistent
conventions around our static assets in Amazon S3 (cloudfront CDN domain
vs s3.amazonaws/<bucketname> vs <bucketname>.s3.amazonaws etc) and have
many domains and things like that. (And now that I look, the gist is
slighty messier and out of date too. Will see about cleaning up the
publicly-posted version.) But I think maybe this is a point to bring up,
too: for a large enough general-purpose clearnet site, any rewrite rules
like this are perhaps going to evolve to become more specific to your
site, your application, and your users.
...
And then there generally is the question of ensuring
that no clearnet URL escapes the rewriting. I guess
for that, you'd need to implement a more thorough
link checker and not just some ngnix filter rules.
In theory, *maybe* we'd want something smarter than nginx with the
substitutions filter. A limitation of that module is that we have no
ability to rewrite strings inside the HTTP headers, we can only
add/remove headers. But otherwise it does the job well.
For something that has to play the role of HTTP server or proxy, it'd
have to be pretty performant, and nginx fits the bill quite well. (And I
wonder what would accomplish this better than some smart regular
expressions against the partial URI or hostname? Although something to
test your site's content after the fact would be nice -- though wouldn't
this also rely on some string/regex matching to see what was missed?)
And last: for our site (and I'm sure many other sites), outbound links
(and even some assets/multimedia/etc) are always going to be a problem
since not all of the content reference or use and not all of the
partners we work with have onion sites. Such is life on clearnet
websites, and when navigating the space between both "pure" clearnet and
"pure" onionspace.
Anyway, sorry for the long reply here.
Very interested to hear others' thoughts about this sort of proxying and
keeping onion-users in onion space.
Best,
Mike Tigas
News Applications Developer, ProPublica
https://www.propublica.org/
@mtigas | https://mike.tig.as/ | 0x6E0E9923
[1]:
https://www.propublica.org/nerds/item/a-more-secure-and-anonymous-propublica...
[1-onion]:
http://www.propub3r6espa33w.onion/nerds/item/a-more-secure-and-anonymous-pro...
[2]:
https://lists.torproject.org/pipermail/tor-talk/2016-January/039899.html
[3]: https://gist.github.com/mtigas/9a7425dfdacda15790b2#file-2-nginx

2024

2023

2022

2021

2020

2019

2018

2017

2016

Re: [tor-onions] web service onionification