On Mon, Oct 28, 2013 at 08:49:46PM +0000, George Kadianakis wrote:
Christopher Baines cbaines8@gmail.com writes:
On 28/10/13 13:19, Matthew Finkel wrote:
This is a proposal I wrote to implement scalable hidden services. It's by no means finished (there are some slight inconsistencies which I will be correcting later today or tomorrow) but I want to make it public in the meantime. I'm also working on some additional security measures that can be used, but those haven't been written yet.
Great, I will try to link this in to the earlier thread for some continuity.
It seems to me that this is a description of "Alternative 3" in Nick's email. Multiple instances, with multiple sets of introduction points, somehow combined in to one service descriptor? I haven't managed to fully comprehend your proposal yet, but I though I would try and continue the earlier discussion.
So, going back to the goals, this "alternative" can have master nodes, but, can have you can also just have this "captain" role dynamically self assigned. Why did you include an alternative here, do you see these being used differently? It seems like the initial mode does not fulfil goal 2 or 3?
One of the differences between the alternatives that keeps coming up, is who (if anyone) can determine the number of nodes. Alternative 3 can keep this secret to the service operator by publishing a combined descriptor. I also discussed in the earlier thread how you could do this in the "Alternative 4: Single hidden service descriptor, multiple service instances per intro point." design, by having the instances connect to each introduction point 1, or more times, and possibly only connecting to a subset of the introduction points (possibly didn't consider this in the earlier thread).
So far, we have avoided defining our adversaries. Here is an example of some adversaries (wrt distinguishing multi-node HSes from single-node HSes and finding the number of nodes and their status):
- HS Client. This is a client who knows the descriptor of an HS, and hence all its IPs.
- Introduction Point. This is an introduction point of the HS. This is a naive version of the next adversary and I will not consider it in the attacks below.
- Introduction Point + Client: This is an adversary who knows the descriptor of an HS, hence all its IPs, and is also an IP of the Hidden Service. This is superior to the simple 'Introduction Point' adversary and more realistic (since not many Introduction Points will target random HSes, but if you know an HS, you will try targetting it by becoming its IP).
Let's see how these adversaries are doing if they want to determine the number of HS-nodes:
- Alternative 3: Single hidden service descriptor, one service instance per intro point:
-- From PoV of a client: If a client suspects that an HS is multi-node, then its number of nodes is simply the number of its introduction points. -- Same thing applies for IP+Client.
Based on the proposal it could be even easier than that, but it is definitely easy.
- Alternative 4: Single hidden service descriptor, multiple service instances per intro point.
-- From PoV of an IP+Client: It's trivial for an IP+Client to distinguish a multi-node HS from a single-node HS, by looking at the number of introduction circuits to it. Single-node HSes only have a single IP circuit (IIRC).
Yup, but I think that with the current design, we will need to decide how much information leakage we can accept. Is it acceptable for a client or IP to know that we are scaling without knowing the number of nodes or does the mere fact that we are scaling reveal enough information to an attacker? I completely understand why we want to limit the amount of information we reveal, but maybe we should also categorize these properties into "We can not leak this information" and "This would be a nice property to have in a design just because the probability that an attack is successful is proportional to the amount of information they have about the target."
Also, depending on how we assign HS-nodes to IPs it might be possible to find the number of HS-nodes too (or at least a lower or upper limit of them).
I don't see a way for a client to get the number of nodes of an HS in Alternative 4. However, an IP+Client is able to do so in both alternatives.
BTW, as Paul said, if we try to hide the number of nodes (from an IP+Client adversary) by establishing multiple circuits from a single HS-node to the IP, we should be careful because multiple "same source same destination" circuits might lead to nasty attacks.
Finally, if we go with the "Alternative 4: Single hidden service descriptor, multiple service instances per intro point." (which currently seems as the best idea to me), we should think of how many IPs each HS-node will connect to. There are at least three ways: a) An HS-node establishes circuits to all the IPs. b) An HS-node establishes circuits to a k-subset of the IPs. c) An HS-node establishes circuits to a random number of the IPs.
From the above, a) trivially reveals the number of nodes to all IPs and also establishes too many circuits which is bad for the network. I think b) and c) our best options here. We should think of how various values of 'k' change our security and availability here, and we should think whether randomization actually adds any useful obfuscation wrt the number/uptime of HS-nodes.
We should also think of how we assign HS-nodes to IPs. Lars Luthman started doing so in https://lists.torproject.org/pipermail/tor-dev/2013-October/005615.html. We should think more!
We should also think about how many IPs are selected, because as far as I can tell, each node is blind to how many peers it has. So, if the initial hidden service chooses three IPs, and then 5 more nodes are added, then each of those 5 nodes connect to k out of 3 IPs. How does a node decide that it should add an IP to the set?
Another recurring point for comparison, is can anyone determine if a particular service instance is down. Alternative 4 can get around this by hiding the instances behind the introduction points, and to keep the information from the introduction points, each instance (as described above) can keep multiple connections open, occasionally dropping some to keep the introduction point guessing. I think this would work, providing that the introduction point cannot work out what connections correspond with what instances. If each instance has a disjoint set of introduction points, of which some subset (possibly total) is listed in the descriptor, it would be possible to work out both if a instance goes down, and what introduction points correspond to that instance, just by repeatedly trying to connect through all the introduction points? If you start failing to connect for a particular subset of the introduction points, this could suggest a instance failure. Correlating this with power or network outages could give away the location of that instance?
Indeed.
It also seems hard to me to obfuscate the number and status of HS-nodes by randomly disconnecting introduction circuits from IPs. Especially so if we want to do it without influencing the performance of the HS (i.e. avoiding disconnecting circuits when clients are using them). Naive solutions will probably allow IPs to distinguish random decoy failure from an actual permanent power-outage-like failure of an HS-node.
I'll ignore the random-disconnects idea for now and analyze our alternatives with respect to recognizing the status (uptime) of HS-nodes:
- Alternative 4: Single hidden service descriptor, multiple service instances per intro point.
-- From the PoV of IP+Client: An IP+Client will be able to detect changes in the status of HS-nodes by monitoring its introduction circuits.
- Alternative 3: Single hidden service descriptor, one service instance per intro point.
-- From the PoV of a client: Clients can distinguish uptime of HS peers, since they know that each peer has one IP (and they know all the IPs of a hidden service). -- Same thing applies for IP+Client.
It seems to me that an IP+Client adversary is always able to find the number and status of HS-nodes. The proposed ways to fix this is to add measures like random-circuit-disconnects and connecting to IPs multiple times from a single HS-node. Both of these solutions seems easy to get wrong and hard to prove secure. We should think more about them!
Perhaps we should look at hybrid solutions. None of them seem worthwhile at this point, but they are worth considering.
- Shared Introduction Points, Multiple Descriptors -- From POV of IP+Client: The client can retrieve all descriptors from the responsible HSDirs thus revealing all of the Introduction Points. The IP will receive connections from a subset of the hidden service nodes. This reveals the fact that it is a multi-node hidden service, but does not directly reveal how many nodes it contains.
Another, similar, option is for a HS to publish different descriptors for each period which include different IPs. Initially this will limit the likelihood of correlation and therefore prevent leaking the HS is multi-node and how many nodes. However, over time, this will be broken, as well, so this is only acceptable for short lived (possibly single period) hidden services.
Similarly, an HS publishes multiple descriptors to the same HSDir and the HSDir randomly chooses one of the descriptors everytime it is queried. However, this suffers the same fate.
- Shared Introduction Points, Combined descriptors -- This provides the HSDir with too much power but for the sake of completeness we can see that this will improve the simplicity of setting up and configuring the hidden services but it is no more secure than the Shared Introduction Point design.
The more I think about this problem the more I convince myself we need another layer of abstraction between the client and the IP, perhaps proxy the INTRODUCE1 cell via the HSDir that is responible for the HS's descriptor and the client never need know the IPs. This requires a lot more thought (more than I can give it right now, at least).
Thanks for the analysis!