So, for fun, a real conversation that happened about an hour after we launched:

- "This is great - the box hosting the Onion is only 4.6% busy!"
- "That's not so good."
- "Wat?"
- "The box has 20 cores and Tor is basically single-threaded." 
- "Oh. Right."

...i.e. we were about 92% busy, but everything worked out okay in the end. :-)

We launched with totally stock, unmodified 2.6 tor code and ran it for a year. 

This was adequately performant, though the user experience was quite affected by latency. 

For clarity's sake, we actually run daemons for three onion addresses - one serves the "www" role, another is "cdn" and the third is "sbx" / for uploads.

Basic maths is correct that we actually run 3 addresses x 2 datacentres = 6 daemons, all on separate hardware, so that the servers that run the tor daemons don't have to think about very much at all. Maillist software tends to swallow attachments, so instead there's a diagram of how this is all laid out in my (slightly out of date) notes at https://storify.com/AlecMuffett/tor-tips

All the Tor daemons have to do is pass HTTPS traffic outbound to a VIP which fans out to our SSL termination tier.

About halfway through the year the onion site was impacted by scheduled DR-testing; this led to a "how do we fix this?" discussion, and we decided "why not just run two copies of each onion, in separate datacentres? what's the worst that could happen?" - and that's what we still do. 

Running replica daemons seems to mostly work. People receive and cache a descriptor for one-or-other datacentre, and then use it for a while, yielding a coarse load-balancing effect.  If one goes offline, the other eventually picks up the slack.

A few months ago, we integrated RSOS into 2.6.10 (thanks, teor!) and deployed it. 

With RSOS the latency issues in the UX are much-reduced, and it's arguable that the onion site runs as-fast-as-or-perhaps-marginally-faster-than the site when accessed over normal Tor. The argument for why the onion site might be a little faster is essentially "same number of hops but no resource-contention for exit-node-usage". It's a general sense from usage rather than some scientific claim of performance.


On 28 Jan 2016, at 03:19, Mike Tigas <mike@tig.as> wrote:
...
Before settling on a proxy, I thought of the ways I could maybe handle this.

1) You update your application to generates .onion URIs when it sees
that a request is coming from the onion service.


This is what we do; when a request is inbound to our reverse-proxy tier (see "proxygen" in the diagram linked in the storify above) and is sourced from the servers which handle the Tor daemons, inbound "Host:" headers are rewritten from "onionaddress.onion" to "sitename.com" (preserving the subdomain) and an extra "magic" header is injected to denote that the responses to this request need "Onionification".

Then when the request is actually handled by the web tier, essentially everything proceeds as normal for the rest of the site. When a URI/cookie/JS is being rendered to send back to the requester, the "magic" header is checked-for and (if found) the ".onion" TLD is used, rather than the ".com" one.  There are a couple of gotchas - eg: don't onionify URIs which are used for internal data fetches necessary to serve the request - but generally the code is remarkably straightforward.

It's simply like serving a different TLD in a consistent manner.


2) An HTTP proxy at the onion service rewrites your application's
responses to turn your clearnet URIs into onion URIs.


This was what we did as a proof of concept; I fired up an instance and built "mitmproxy" (mitmproxy.org) on it, and did something like:

  # configure Tor to forward Hidden Service to localhost:443
  # for inspiration only, this probably won't work/is a bad idea/go read the manual
  SITE="domain.com"
  ONION="somekindofonion.onion"
  mitmproxy -p 443 -P "https://www.${SITE}" --anticache \
    --replace ":~hq:${ONION}:${SITE}" \
    --replace ":~hs:${SITE}:${ONION}" \
    --replace ":~bs ~t \"application/json\":${SITE}:${ONION}" \
    --replace ":~bs ~t \"application/x-javascript\":${SITE}:${ONION}" \
    --replace ":~bs ~t \"text/css\":${SITE}:${ONION}" \
    --replace ":~bs ~t \"text/html\":${SITE}:${ONION}" \
    --replace ":~s:${SITE}:${ONION}"

...the idea being to listen to port 443 locally, connecting that onwards to the backend site.

It took an afternoon to test, and I was impressed how much stuff "just worked".  Maybe 90% of the site.

Given our volumes we felt it would not be viable for us to do rewriting for every request, hence fixing the codebase a-la "solution-1", above.  YMMV.

Some day I would like maybe to adapt Wordpress to support such "solution-1" rewriting.  Wordpress strikes me as a good target platform for Onion-enablement.  Maybe a plugin would work.

== Increasing Aggregate Bandwidth ==

So, at the moment, we are running 2x daemons with RSOS per onion.

Upcoming plans are approximately:

1) build an "RSOS Onionbalance Appliance" - 10 RSOS daemons, with random-ish Onion addresses, on a single box (use more of those cores!) and wrap them in onionbalance to publish a unified descriptor for them. Deploy and test that.

2) deploy a second replica Onionbalance Appliance for coarse loadbalancing/failover.

3) build a RSOS Onionbalance-NG cluster - up to 60 daemons across several servers, using new UCL-inspired upcoming OnionBalance features to publish up to 6 distinct descriptors at different points into the HSDir. (NB: 6 descriptors * 10 IntroPoints per descriptor = 60 daemon limit)

4) Talk more to Tom about Rendezvous-Callback Handoff, and integrate that into one of the above architectures when it eventually lands.

Alec Muffett
Security Infrastructure
Facebook Engineering
London