Christopher Baines cbaines8@gmail.com writes:
On 28/10/13 13:19, Matthew Finkel wrote:
This is a proposal I wrote to implement scalable hidden services. It's by no means finished (there are some slight inconsistencies which I will be correcting later today or tomorrow) but I want to make it public in the meantime. I'm also working on some additional security measures that can be used, but those haven't been written yet.
Great, I will try to link this in to the earlier thread for some continuity.
It seems to me that this is a description of "Alternative 3" in Nick's email. Multiple instances, with multiple sets of introduction points, somehow combined in to one service descriptor? I haven't managed to fully comprehend your proposal yet, but I though I would try and continue the earlier discussion.
So, going back to the goals, this "alternative" can have master nodes, but, can have you can also just have this "captain" role dynamically self assigned. Why did you include an alternative here, do you see these being used differently? It seems like the initial mode does not fulfil goal 2 or 3?
One of the differences between the alternatives that keeps coming up, is who (if anyone) can determine the number of nodes. Alternative 3 can keep this secret to the service operator by publishing a combined descriptor. I also discussed in the earlier thread how you could do this in the "Alternative 4: Single hidden service descriptor, multiple service instances per intro point." design, by having the instances connect to each introduction point 1, or more times, and possibly only connecting to a subset of the introduction points (possibly didn't consider this in the earlier thread).
So far, we have avoided defining our adversaries. Here is an example of some adversaries (wrt distinguishing multi-node HSes from single-node HSes and finding the number of nodes and their status): - HS Client. This is a client who knows the descriptor of an HS, and hence all its IPs. - Introduction Point. This is an introduction point of the HS. This is a naive version of the next adversary and I will not consider it in the attacks below. - Introduction Point + Client: This is an adversary who knows the descriptor of an HS, hence all its IPs, and is also an IP of the Hidden Service. This is superior to the simple 'Introduction Point' adversary and more realistic (since not many Introduction Points will target random HSes, but if you know an HS, you will try targetting it by becoming its IP).
Let's see how these adversaries are doing if they want to determine the number of HS-nodes:
- Alternative 3: Single hidden service descriptor, one service instance per intro point: -- From PoV of a client: If a client suspects that an HS is multi-node, then its number of nodes is simply the number of its introduction points. -- Same thing applies for IP+Client.
- Alternative 4: Single hidden service descriptor, multiple service instances per intro point. -- From PoV of an IP+Client: It's trivial for an IP+Client to distinguish a multi-node HS from a single-node HS, by looking at the number of introduction circuits to it. Single-node HSes only have a single IP circuit (IIRC).
Also, depending on how we assign HS-nodes to IPs it might be possible to find the number of HS-nodes too (or at least a lower or upper limit of them).
I don't see a way for a client to get the number of nodes of an HS in Alternative 4. However, an IP+Client is able to do so in both alternatives.
BTW, as Paul said, if we try to hide the number of nodes (from an IP+Client adversary) by establishing multiple circuits from a single HS-node to the IP, we should be careful because multiple "same source same destination" circuits might lead to nasty attacks.
Finally, if we go with the "Alternative 4: Single hidden service descriptor, multiple service instances per intro point." (which currently seems as the best idea to me), we should think of how many IPs each HS-node will connect to. There are at least three ways: a) An HS-node establishes circuits to all the IPs. b) An HS-node establishes circuits to a k-subset of the IPs. c) An HS-node establishes circuits to a random number of the IPs.
From the above, a) trivially reveals the number of nodes to all IPs
and also establishes too many circuits which is bad for the network. I think b) and c) our best options here. We should think of how various values of 'k' change our security and availability here, and we should think whether randomization actually adds any useful obfuscation wrt the number/uptime of HS-nodes.
We should also think of how we assign HS-nodes to IPs. Lars Luthman started doing so in https://lists.torproject.org/pipermail/tor-dev/2013-October/005615.html. We should think more!
Another recurring point for comparison, is can anyone determine if a particular service instance is down. Alternative 4 can get around this by hiding the instances behind the introduction points, and to keep the information from the introduction points, each instance (as described above) can keep multiple connections open, occasionally dropping some to keep the introduction point guessing. I think this would work, providing that the introduction point cannot work out what connections correspond with what instances. If each instance has a disjoint set of introduction points, of which some subset (possibly total) is listed in the descriptor, it would be possible to work out both if a instance goes down, and what introduction points correspond to that instance, just by repeatedly trying to connect through all the introduction points? If you start failing to connect for a particular subset of the introduction points, this could suggest a instance failure. Correlating this with power or network outages could give away the location of that instance?
Indeed.
It also seems hard to me to obfuscate the number and status of HS-nodes by randomly disconnecting introduction circuits from IPs. Especially so if we want to do it without influencing the performance of the HS (i.e. avoiding disconnecting circuits when clients are using them). Naive solutions will probably allow IPs to distinguish random decoy failure from an actual permanent power-outage-like failure of an HS-node.
I'll ignore the random-disconnects idea for now and analyze our alternatives with respect to recognizing the status (uptime) of HS-nodes:
- Alternative 4: Single hidden service descriptor, multiple service instances per intro point. -- From the PoV of IP+Client: An IP+Client will be able to detect changes in the status of HS-nodes by monitoring its introduction circuits.
- Alternative 3: Single hidden service descriptor, one service instance per intro point. -- From the PoV of a client: Clients can distinguish uptime of HS peers, since they know that each peer has one IP (and they know all the IPs of a hidden service). -- Same thing applies for IP+Client.
It seems to me that an IP+Client adversary is always able to find the number and status of HS-nodes. The proposed ways to fix this is to add measures like random-circuit-disconnects and connecting to IPs multiple times from a single HS-node. Both of these solutions seems easy to get wrong and hard to prove secure. We should think more about them!