On 30 Sep 2015, at 17:27, Tom van der Woerdt <info@tvdw.eu> wrote:

...

Filename: xxx-intro-rendezvous-controlsocket.txt
Title: Load-balancing hidden services by splitting introduction from
      rendezvous
Author: Tom van der Woerdt
Created: 2015-09-30
Status: draft

1. Overview and motivation

To address scaling concerns with the onion web, we want to be able to
spread the load of hidden services across multiple machines.
OnionBalance is a great stab at this, and it can currently give us 60x
the capacity by publishing 6 separate descriptors, each with 10
introduction points, but more is better. This proposal aims to address
hidden service scaling up to a point where we can handle millions of
concurrent connections.

The basic idea involves splitting the 'introduce' from the
'rendezvous', in the tor implementation, and adding new events and
commands to the control specification to allow intercepting
introductions and transmitting them to different nodes, which will then
take care of the actual rendezvous.
2.1. DisableAutomaticRendezvous configuration option

The syntax is:
   "DisableAutomaticRendezvous" SP [1|0] CRLF

This configuration option is defined to be a boolean toggle which, if
set, stops the tor implementation from automatically doing a rendezvous
when an INTRODUCE2 cell is received. Instead, an event will be sent to
the controllers. If no controllers are present, the introduction cell
should be dropped, as acting on it instead of dropping it could open a
window for a DoS.

For security reasons, the configuration should be made available only
in the configuration files, and not as an option settable by the
controller.

I’m not sure it’s necessary to prevent the controller setting this option.
We trust the controller, and might need it to be able to set this option for compatibility with ephemeral hidden services.

What is the threat model where a controller could set this option, but not do things that are much worse?

2.2. The "INTRODUCE" event

The syntax is:
   "650" SP "INTRODUCE" SP RendezvousData CRLF

   RendezvousData = implementation-specific, but must not contain
                    whitespace, must only contain human-readable
                    characters, and should be no longer than 512 bytes

I don’t think 512 bytes is enough for the current implementation, I recommend at least 2048 bytes. (See below.)

The INTRODUCE event should contain sufficient data to allow continuing
the rendezvous from another Tor instance. The exact format is left
unspecified and left up to the implementation. From this follows that
only matching versions can be used safely to coordinate the rendezvous
of hidden service connections.

I would appreciate a list of the data needed by the current version of the hidden service protocol to rendezvous, even if we don’t want to specify the exact format, or specify data items for future implementations. This helps ensure that the limits in the proposal are sane, and that the proposal doesn’t have any unexpected implementation issues.

From reading rend_service_receive_introduction think the data is at least:
* service_id - the hidden service address (16 base32 bytes)
* intro_key - the introduction-point specific key (128 binary bytes, 171 base64 bytes)
* request - the encrypted portion of the INTRODUCE2 cell (up to 476 binary bytes(?), 635 base64 bytes)
Therefore, I think the minimum for the current hidden service implementation is around 830 bytes, at least if we want to offload the maximum processing to the rendezvous instances by sending the entire encrypted INTRODUCE2 cell. Therefore, I’d suggest that a limit of 2048 bytes is much more reasonable for future-proofing this proposal.

It also looks like you might need to split rend_service_t into:
* introduction point-specific data
* rendezvous-specific data
* shared data
Does any data need to be shared, and, if so, how do you intend to keep the shared data synchronised?
(Putting it in the RendezvousData each time might blow out the size considerably.)

I’d also appreciate an example of which parts of rend_service_receive_introduction could be performed by each of the cooperating tor instances. I assume that sending the data “as early as possible” would offload the most processing to the rendezvous side. I think that the split could happen right before the decryption of the cell, at the lines:
  stage_descr = "decryption";
  /* Now try to decrypt it */

This would avoid having to share the intro point encrypted replay cache (intro_point->accepted_intro_rsa_parts), but there’s still the hidden service Diffie-Hellman handshake cache (service->accepted_intro_dh_parts). If we don’t share that:
* two backend instances could accidentally compete for the same rendezvous point if the client times out
* a client could more easily DoS the hidden service by using the same Diffie-Hellman handshake
We’d have to decide if this security issue outweighs the benefit of doing the decryption on multiple rendezvous-side instances.

In general, I’m concerned that we need to think through the implementation of this proposal more carefully, because it will help us decide whether it’s compatible with:
* Current Hidden Services
* Next-Generation Hidden Services
And perhaps make changes to any of these proposals to make them work together.

I’d also note that it’s definitely not compatible with Single Onion Services as specified in Proposal #252, as there is no rendezvous in that protocol.

Tim

Tim Wilson-Brown (teor)

teor2345 at gmail dot com
PGP 968F094B

teor at blah dot im
OTR CAD08081 9755866D 89E2A06F E3558B7F B5A9D14F