0. Introduction The current Hidden Service design allows a single node to upload a descriptor to the set of applicable Hidden Service Directories. This descriptor describes how a client can establish a connectiob wih the hidden service. A question arises when the hidden service receives more connections than it can handle. Perhaps this performance issue is not at the hidden service, but at the Introduction Points. In either case, the current design does not sufficiently handle this situation, as a result, some users have implemented ad-hoc solutions. Any improvements to the hidden service design should include mechanisms for providing scalable hidden service solutions. It is our hope that this proposal will provide such an option. 1. Overview With the implementation of this proposal a hidden service operator will have the ability to choose a methods by which connections to their hidden service are load balanced to more than one node for servicing a single hidden service. We begin by describing the necessary modifications to the hidden services descriptor in Section 2, then in Section 3 we explain a Master-Slave configuration, and follow that with a homogeneous nodes configuration in Section 4. Section 5 delves briefly into a limited-scope threat model, and Section 6 concludes with the references. Sections 3 and 4 each present the basic design of their system and then also describe two possible configurations based on the operators use case. Let it be understood that this proposal is largely based on, and supercedes parts of, the previously presented proposal "Migration to ed25519 HS identity keys and privacy-preserving directory documents" [0]. The timeline for implementation of scalable hidden services also follows from the implementation of that proposal and as well as the implementation of the migration plan described in "On upgrading HS identity keys and on a new HS directory scheme that does not leak" [1]. Please see those proposals for more details. Scalable hidden services will not effect usability of the system and will not significantly change the maintainability of a multi-node hidden service. Single-node hidden services will not be impacted. 2. Descriptor Modifications To accomplish load balancing across multiple nodes two new optional fields are added to the hidden service descriptor [2] "additional-data" SP public-key [Any number] An ephemeral key that may be used to obtain a second descriptor from the designated hidden service directory server. "alternate-addresses" SP address NL [Any number] A secondary address by which the hidden service may be connected. 3. Master-Slave Configuration The first optional configuration consists of a master node and one or more slave nodes. When a master node exists, it is charged with maintaining the hidden service descriptor and servicing clients, while the slave nodes only service clients. In this proposal we describe two possible scenarios for implementing this paradigm. The first involves distributing a shared master secret key and the second takes advantage of the hidden services infrastructure to distribute client traffic to "sub"-hidden services. 3.1. Common Initialization The two scenarios have some common procedures. These include torrc configuration settings and master node selection. We will use the term 'peer' to refer to the slave nodes. All nodes must create a new hidden service that will be used for communication between nodes. In this configuration all communication will take place between the master and a slave node, the slave nodes never communicate with each other. To configure this "management" hidden service we extend the use of the HiddenServicePort torrc option. Currently it is defined as HiddenServicePort VIRTPORT [TARGET] where VIRTPORT is a virtual port. If the value of VIRTPORT is a negative integer, we now define the hidden service as a management hidden service. The hidden service address and key management are unchanged. After all nodes have created a management hidden service, every other node must be told about it. To facilitate this, we introduce a new torrc option. The HiddenServicePeers option will take a list of hidden service addesses. HiddenServicePeers hostname1,hostname2,... One or more hostname will be expected on this line. The first node will be the master node. The hidden service operator is expected to configure all nodes with same hostnames and listed in the same order. Failure to do so will result in undefined behavior. After this is configured, the master node fetches the hidden service descriptor for all slave nodes. When it has retrieved all the descriptors or a 5 minute timeout period is exceeded then it proceeds to the next phase. If any descriptors were unavailable, then their respective hidden service is considered unavailable for a 1 hour period. The master node will attempt to refetch all descriptors, again, after 1 hour. This will ensure the master can always communicate with its peers. Similarly, all slave nodes fetch the master node's descriptor every hour. The next phase of initialization requires that the master node generate the public key pair for the hidden service to which users will connect. 3.2 Distributed Key 3.2.1 Key Distribution The first method we propose to distribute load among the peers is by simply distributing the master secret key to every node. It is extremely important that a malicious peer node is not within an operator's threat model if they choose this scenario. The only difference between a slave node and the master node is that the slave nodes decide not to publish a descriptor. To transfer the key we introduce two new cell types 50 -- RELAY_COMMAND_KEY_FETCH 51 -- RELAY_COMMAND_KEY_FETCHED KEY_FETCH is sent from the slave to the master, and KEY_FETCHED is sent from the master to the slave. The cell format of KEY_FETCH is VER Version [1 octet] HN Hostname [52 octets] XL X point length [2 octets] XP X point [XL octets] SIG Signature of above fields [variable] Upon receiving this cell, the master node checks that the signature is correct using the public key included with the descriptor that maps to the provided hostname in HN. If HN was not specified on the HiddenServicePeers or the master node was unable to retrieve its descriptor or the signature is invalid, then the master node discards the cell. Otherwise, the master node forms a first-degree polynomial with x0 as the public hidden service's private key and x1 is a randomly chosen value. We will use a key transport protocol[3] to send the key to the slave node. We will also choose an appropriate modulus. [ Should we hardcode the modulus? ] The master node then constructs a KEY_FETCHED cell containing: VER Version [1 octet] PHN Public hostname [52 octets] Y0L Y0 point length [2 octets] Y0P Y0 point [X0L octets] X1L X1 point length [2 octets] X1P X1 point [X1L octets] Y1L Y1 point length [2 octets] Y1P Y1 point [Y1L octets] MDL Modulus length [3 octets] MD Modulus [MDL octets] SIG Signature of above fields [variable] This cell is then sent back to the slave node. The slave node checks the signature using the public key it obtained in the descriptor it has for the master node. If it does not validate, the node the circuit. On success, the node reconstructs the curve using the two points it now has. It also verifies that public hostname in PHN is derived from the public key it computes. If any of these checks fail, the node makes an attempt to connect to the next hostname it has listed on the HiddenServicePeers line. If they all fail, then the node waits a full descriptor period, refetches the descriptors and tries again. 3.2.2 Distributed-Key Descriptor Creation The only prerequisite a master node must fulfill before it can publish the hidden service descriptor is to retrieve the introduction points from its peers so it can add them to the descriptor for the public hidden service. To accomplish this we introduce two additional cell types. 52 -- RELAY_COMMAND_SEND_INTROPNT 53 -- RELAY_COMMAND_INTROPNT_RECVD The slave node sends a SEND_INTROPNT cell composed of: VER Version [1 octet] NIP Number of Introduction Points [1 octet] IPS Introduction Points [variable] The master node replies with a INTROPNT_RECVD cell that has an empty payload. The slave node SHOULD use the same circuit for sending a KEY_FETCH cell and a SEND_INTROPNT cell. When the master node has received introduction points from each of its slave nodes or 5 minutes has past since it published the descriptor for its management hidden service, then it creates a new descriptor for the public hidden service. If the master node has more than N introduction points it chooses one of: A) Publishing the first N introduction points in the descriptor and the remaining in a separate descriptor, linked to by the address on the additional-data line. B) Only publish N introduction points. Select a different set of N introduction points when the descriptor is next published. Then it publishes the descriptor for the public hidden service. 3.3. Layered Addressing 3.3.1 Address Distribution The second method we propose for distributing load among multiple nodes is to create an abstraction layer. Similar to the design proposed in 3.2, we maintain the master-slave configuration, but the secret key is distributed in a need-to-know fashion. This potentially increases the security of the key for a short time, and therefore limits the threat of compromise, however it also increases the probability that node failure will result in the hidden service being unavailable. Every slave node generates a second hidden service, in addition to its management hidden service. Each slave then send this address to the master node. To do this, we introduce two new cell types: 54 -- RELAY_COMMAND_TELL_HSADDR 55 -- RELAY_COMMAND_HSADDR_LEARNED A slave node sends the TELL_HSADDR cell as: VER Version [1 octet] HN Hostname [52 octets] LEN Address length [1 octet] ADR Address [LEN octets] SIG Signature of above fields [variable] Upon receiving this cell, the master node checks that the signature is correct using the public key included in the descriptor that maps to the provided hostname in HN. If HN was no specified on the HiddenServicePeers or the master node was unable to retrieve its descriptor or the signature is invalid, then the master node discards the cell. If the cell is validated, then the address is accepted and the master node replies using a HSADDR_LEARNED cell with an empty payload. 3.3.2 Layered-Addressing Descriptor Creation When the master node has received an address from each of its slave nodes or 5 minutes has past since it published the descriptor for its management hidden service, then it creates a new descriptor for the public hidden service using to "alternate-addresses" line to specify the hidden service addresses for the slave nodes. If the master node has more than N addresses it chooses one of: A) Publishing the first N addresses in the descriptor and the remaining in a separate descriptor, linked to by the address on the additional-data line. B) Only publish N addresses. Select a different set of N addresses when the descriptor is next published. Then it publishes the descriptor for the public hidden service. 3.3.3. Master Node If the master node becomes unavailable then the hidden service will not be able to publish new descriptors. To prevent this from occurring, we reusing the logic from section 3.2 and share the master secret key with the lesser of "the next two nodes listed on the HiddenServicePeers line following the current master, in a circular list" and "the next 1/3 of all nodes on the HiddenServicePeers line following the current master, in a circular list". All nodes that do not meet this criteria SHOULD securely expunge knowledge of the key. All slave nodes will attempt the next potential master node if the current master if unreachable. 4. Homogeneous Section 3 discussed scaling hidden services using a master-slave model. This is useful in some cases, and easier, but is not ideal or safe in all. As such, we describe a solution involving only peer nodes. Whereas we had a predefined master node, now we will have an agreed-upon captain. 4.1. Common Initialization The two scenarios in this section have some common procedures. These include torrc configuration settings and captain selection. We will use the term 'peer' to refer to all other nodes that comprise a multi-node hidden service. All nodes must create a new hidden service that will be used for communication between nodes. In this configuration all communication will take place between all nodes, creating a clique. Exactly as described in Section 3.1, to configure this "management" hidden service we extend the use of the HiddenServicePort torrc option. Currently it is defined as HiddenServicePort VIRTPORT [TARGET] where VIRTPORT is a virtual port. If the value of VIRTPORT is a negative integer, we now define the hidden service as a management hidden service. The hidden service address and key management are unchanged. After all nodes have created a management hidden service, every other node must be told about it. To facilitate this, we introduce a new torrc option, also described in Section 3.1. The HiddenServicePeers option will take a list of hidden service addesses. HiddenServicePeers hostname1,hostname2,... At least one hostname will be expected on this line. The hidden service operator is expected to configure all nodes using same hostnames in the same order. Failure to do so will result in undefined behavior. We also define 4 new relay commands: 56 -- RELAY_COMMAND_PREAUTH 57 -- RELAY_COMMAND_CONTRIB_KEY 58 -- RELAY_COMMAND_PRELIM_KEY 59 -- RELAY_COMMAND_KEY 4.2.1. Peer-Contributed Key Generation The public hidden service's private key will be created from input provided by all available peer nodes on first-launch. The resulting private key will be known by all nodes. At initialization-time every peer will request the hidden service descriptor for every address on its HiddenServicePeers line. If a descriptor is unfetchable for a D minute period then the peer is considered unavailable and is not included in the key generation process by the node. After a node retrieves all descriptors or reaches the D minute limit, it then establishes a connection with all available peers. If a connection can not be established with a peer within a period of E minutes then the peer is considered to be unavailable and is not included in the key generation process by the node. When a connection is successfully established, the node sends a PREAUTH cell to the hidden service containing the following: VER Version [1 octet] HN Hostname [32 octets] SIG Signature [64 octets] [ XXX We don't want the hidden service to initiate authentication with an AUTH_CHALLENGE so we use this as a seed. ] When the hidden service receives this cell, it checks that the specified hostname was listed on its HiddenServicePeers line. If it was not then it destroys the circuit. Otherwise it checks that the signature is valid for the public key provided in the hidden service descriptor. If it is not valid then the hidden service destroys the circuit. Otherwise it creates a temporary mapping between the circuit and the specified hostname. It then responds with an AUTH_CHALLENGE cell. The node then creates an AUTHENTICATE cell as defined in Section 4 of Proposal 220 [ Migrate server identity keys to Ed25519 ] but the CID, SID, SCERT and TLSSECRETS MUST be zero-filled. The hidden service then verifies the content. The circuit is destroyed if authentication fails. If successful the node sends a CONTRIB_KEY cell to the hidden service. The content of this cell is the peers contribution to the master key. A CONTRIB_KEY's payload contains: VER Version [1 octet] KEY Key Material [64 octets] After the node's contribution is sent to all available peers and the node receives contributions from all available peers the node then computes a preliminary master key by XORing all contributions together. The node then sends this prelimilary key to all its peers in a PRELIM_KEY cell and likewise receives a cell from all peers. These transfers MUST occur over the same circuits that were previously authenticated. If the circuit was destroyed after successful authenication then a new circuit must be established and reauthenticated, as defined by a PREAUTH, AUTH_CHALLENGE, and AUTHENTICATE transaction. A PRELIM_KEY's payload is the same as the CONTRIB_KEY's payload. The KEY cell will also have the same format. When the node has received a PRELIM_KEY cell from its peers it determines which value was sent by at least two-thirds of its peers. If no value reaches this ratio then the process is restarted. If a common value exists then this is the private key for the public hidden service. The node then sends this key to its peers in a KEY cell, thus making certain all nodes agreed on the private key. Again, if two-thirds of the keys are not in agreement then the process is restarted. 4.2.2. Single Node Key Generation A simpler alternative to Peer-Contributed Key Generation is to allow a node generate the private key and share it with the its peers. 4.2.3. Key Retreival If a node is added to the set of peers after key generation takes place then it must be able to seamlessly retrieve the key. This is done the using the same method introduced in Section 3.2 with the addition that any node may be queried. 4.2.4. Captain Selection When all nodes holds the secret key they are all capable of generating the hidden service descriptor. The node that creates the descriptor is the first available node listed on the HiddenServicePeers line, in a ring. When the current captain is not available, the peers declare the next node as the new Captain. 4.2.5. Introduction Point Distribution Before a new descriptor can be published the captain should obtain the introduction points for its peers. To accomplish this, every node sends the captain its set of introduction points using the SEND_INTROPNT cell and receives an acknowledgement with an INTROPNT_RECVD cell defined in Section 3.2. The captain includes a subset of these introduction points in the descriptor. 4.2.6. Descriptor Publication As described in Section 3.2,2 the Captain follows the same logic as the Master node. 4.3.1. Layered Addressing This is a mix of Section 3.3 and 4.2 whereas each node has the private key and can create a hidden service descriptor, but scalability is achieved by a level of indirection and clients connect to a secondary hidden service, specific to a node rather than the public hidden service address derived from with the private key generated in Section 4.2.1 or 4.2.2. [ XXX This section will remain mostly unspecified unless it is decided that this is an advantageous design. ] 5. Threat Model The security of scalable hidden services is bounded by the protections that Tor can provide. As such, users of hidden services and inter-hidden service communication are current assumed to be suseptible to end-to-end correlation and traffic confirmation attacks, scalable hidden services do not solve this problem or any other known attack. Scalable hidden services, the same as hidden services, do not solve deanonymization attacks on the application layer; this is true for both client and server (i.e. hidden services do not prevent deanonymization due to vulnerabilities in Firefox nor PHP) targets. Scalable hidden services introduce additional attack vectors compared to single-node hidden services. As a result, the protocol should be designed under the assumption that the nodes are not compromised but that a single node can not influence the availability of another node except if the compromised node was selected as the node that creates the hidden service descriptor. It is difficult to hide the size of a multi-node hidden service. These designs reveal the same amount of information to the Tor network as single-node hidden services but allow clients to potentially estimate the size of the service. By placing the burden of attacking hidden services at the client-end, this allows each node in a scalable hidden service to maintain the same level of security as an individual single-node hidden service. 6 References [0] https://lists.torproject.org/pipermail/tor-dev/2013-October/005536.html [1] https://lists.torproject.org/pipermail/tor-dev/2013-October/005534.html [2] https://gitweb.torproject.org/torspec.git/blob/HEAD:/rend-spec.txt#l219 [3] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.12.5552&rep=rep1&type=pdf