Re: [tor-dev] Proposal: Load-balancing hidden services by splitting introduction from rendezvous

3 Oct 2015

      Op 02/10/15 om 14:56 schreef Tim Wilson-Brown - teor:
...
...
On 2 Oct 2015, at 14:43, Tom van der Woerdt <info@tvdw.eu
mailto:info@tvdw.eu> wrote:
Hi Tim,
Thanks for your great comments, very much appreciated!
Comments inline.
Op 30/09/15 om 19:40 schreef Tim Wilson-Brown - teor:
...
...
On 30 Sep 2015, at 17:27, Tom van der Woerdt <info@tvdw.eu
mailto:info@tvdw.eu
mailto:info@tvdw.eu> wrote:
...
Filename: xxx-intro-rendezvous-controlsocket.txt
Title: Load-balancing hidden services by splitting introduction from
     rendezvous
Author: Tom van der Woerdt
Created: 2015-09-30
Status: draft

Overview and motivation

To address scaling concerns with the onion web, we want to be able to
spread the load of hidden services across multiple machines.
OnionBalance is a great stab at this, and it can currently give us 60x
the capacity by publishing 6 separate descriptors, each with 10
introduction points, but more is better. This proposal aims to address
hidden service scaling up to a point where we can handle millions of
concurrent connections.
The basic idea involves splitting the 'introduce' from the
'rendezvous', in the tor implementation, and adding new events and
commands to the control specification to allow intercepting
introductions and transmitting them to different nodes, which will then
take care of the actual rendezvous.
…
...
In general, I’m concerned that we need to think through the
implementation of this proposal more carefully, because it will help us
decide whether it’s compatible with:

Current Hidden Services
Next-Generation Hidden Services

And perhaps make changes to any of these proposals to make them work
together.
Thoughts welcome! I don't think I'm the right person to address those.
...
I’d also note that it’s definitely not compatible with Single Onion
Services as specified in Proposal #252, as there is no rendezvous in
that protocol.
Indeed.
Splitting the introduction and rendezvous is another use case for
NAT-punching single-rendezvous-hop onion services.
However, splitting hidden services into multiple different
implementations provides less cover for users who really need three-hop
hidden services. We’ll need to decide what the tradeoff is here.
Tim
Tim Wilson-Brown (teor)
teor2345 at gmail dot com
PGP 968F094B
teor at blah dot im
OTR CAD08081 9755866D 89E2A06F E3558B7F B5A9D14F
Had some time, implemented the rest of the proposal as well to check 
feasibility and updated the proposal with some new insights. Looking 
forward to all comments on this!
Patch 1, splitting the intro and rendezvous code (basically a no-op): 
https://github.com/TvdW/tor/commit/0443e38d776a114458e2f56e435f324c38e7a17a
Patch 2, actually implementing the proposal: 
https://github.com/TvdW/tor/commit/b8d41f66efdb856b7813c4394f8a81c82e1f2e07
Simple controller implementation that proves my proposal works: 
https://gist.github.com/TvdW/3f720b9c6ffcd71967c1
Tom
==================================================================
Filename: TBD.txt
Title: Load-balancing hidden services by splitting introduction from
        rendezvous
Author: Tom van der Woerdt
Created: 2015-09-30
Status: draft
1. Overview and motivation
To address scaling concerns with the onion web, we want to be able to
spread the load of hidden services across multiple machines.
OnionBalance is a great stab at this, and it can currently give us 60x
the capacity by publishing 6 separate descriptors, each with 10
introduction points, but more is better. This proposal aims to address
hidden service scaling up to a point where we can handle millions of
concurrent connections.
The basic idea involves splitting the 'introduce' from the
'rendezvous', in the tor implementation, and adding new events and
commands to the control specification to allow intercepting
introductions and transmitting them to different nodes, which will then
take care of the actual rendezvous. External controller code could
relay the data to another node or a pool of nodes, all which are run by
the hidden service operator, effectively distributing the load of
hidden services over multiple processes.
By cleverly utilizing the current descriptor methods through
OnionBalance, we could publish up to sixty unique introduction points,
which could translate to many thousands of parallel tor workers after
implementing this proposal. This should allow hidden services to go
multi-threaded with a few small changes, and continue scaling for a
long time.
2. Specification
We propose two additions to the control specification, of which one is
an event and the other is a new command. We also introduce two new
configuration options.
2.1. HiddenServiceAutomaticRendezvous configuration option
The syntax is:
     "HiddenServiceAutomaticRendezvous" SP [1|0] CRLF
This configuration option is defined to be a boolean toggle which, if
zero, stops the tor implementation from automatically doing a rendezvous
when an INTRODUCE2 cell is received. Instead, an event will be sent to
the controllers. If no controllers are present, the introduction cell
should be dropped, as acting on it instead of dropping it could open a
window for a DoS.
This configuration option can be specified on a per-hidden service
level, and can be set through the controller for ephemeral hidden
services as well.
2.2. HiddenServiceTag configuration option
The syntax is:
     "HiddenServiceTag" SP [a-zA-Z0-9] CRLF
To identify groups of hidden services more easily across nodes, a
name/tag can be given to a hidden service. Defaults to the storage path
of the hidden service (HiddenServiceDir).
2.3. The "INTRODUCE" event
The syntax is:
     "650" SP "INTRODUCE" SP HSTag SP RendezvousData CRLF
HSTag = the tag of the hidden service
     RendezvousData = implementation-specific, but must not contain
                      whitespace, must only contain human-readable
                      characters, and should be no longer than 2048 bytes
The INTRODUCE event should contain sufficient data to allow continuing
the rendezvous from another Tor instance. The exact format is left
unspecified and left up to the implementation. From this follows that
only matching versions can be used safely to coordinate the rendezvous
of hidden service connections.
2.4. "PERFORM-RENDEZVOUS" command
The syntax is:
   "PERFORM-RENDEZVOUS" SP HSTag SP RendezvousData CRLF
This command allows a controller to perform a rendezvous using data
received through an INTRODUCE event. The format of RendezvousData is
not specified other than that it must not contain whitespace, and
should be no longer than 2048 bytes.
3. Compatibility and security
The implementation of these methods should, ideally, not change
anything in the network, and all control changes are opt-in, so this
proposal is fully backwards compatible.
Controllers handling this data must be careful to not leak rendezvous
data to untrusted parties, as it could be used to intercept and
manipulate hidden services traffic.
4. Example
Let's take an example where a client (Alice) tries to contact Bob's
hidden service. To do this, Bob follows the normal hidden service
specification, except he sets up ten servers to do this. One of these
publishes the descriptor, the others have this disabled. When the
INTRODUCE2 cell arrives at the node which published the descriptor, it
does not immediately try to perform the rendezvous, but instead outputs
this to the controller. Through an out-of-band process this message is
relayed to a controller of another node of Bob's, and this transmits
the "PERFORM-RENDEZVOUS" command to that node. This node finally
performs the rendezvous, and will continue to serve data to Alice,
whose client will now not have to talk to the introduction point
anymore.
5. Other considerations
We have left the actual format of the rendezvous data in the control
protocol unspecified, so that controllers do not need to worry about
the various types of hidden service connections, most notably proposal
224.
The decision to not implement the actual cell relaying in the tor
implementation itself was taken to allow more advanced configurations,
and to leave the actual load-balancing algorithm to the implementor of
the controller. The developer of the tor implementation should not
have to choose between a round-robin algorithm and something that could
pull CPU load averages from a centralized monitoring system.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [tor-dev] Proposal: Load-balancing hidden services by splitting introduction from rendezvous