Hi everyone,
This is a proposal I wrote to implement scalable hidden services. It's by no means finished (there are some slight inconsistencies which I will be correcting later today or tomorrow) but I want to make it public in the meantime. I'm also working on some additional security measures that can be used, but those haven't been written yet.
Many thanks to George for his initial feedback. I pushed this version to my public repo, and will continue to push updates there.
In reality this is really 3.2 proposals, so the end result, if accepted, will look significantly different, but I'm indecisive at this point about which of the designs is least dangerous.
Thanks, Matt
On Mon, Oct 28, 2013 at 9:19 AM, Matthew Finkel matthew.finkel@gmail.com wrote:
Hi everyone,
This is a proposal I wrote to implement scalable hidden services. It's by no means finished (there are some slight inconsistencies which I will be correcting later today or tomorrow) but I want to make it public in the meantime. I'm also working on some additional security measures that can be used, but those haven't been written yet.
Many thanks to George for his initial feedback. I pushed this version to my public repo, and will continue to push updates there.
In reality this is really 3.2 proposals, so the end result, if accepted, will look significantly different, but I'm indecisive at this point about which of the designs is least dangerous.
Looks cool!
Generally, I think that requiring a single node to be master is fine in initial designs only: if we consider such a design, it's important to have a migration path to a design where the whole service doesn't fall over when a single leader goes down. It's less important to me that there be no master key, especially if that master key can be kept offline.
I also like designs where it's not immediately obvious how many hosts a hidden service has just from looking at its descriptor.
FWIW, I'm working on a complete revision to rend-spec.txt, since I think the scope of hidden service changes under consideration is broad enough that it makes sense to reconsider the entire system as a whole rather than trying to do it one piece at a time. I haven't even gotten to the spec parts yet -- just the preliminaries -- but you can see the draft in progress as it was on my desktop this morning in branch "rendspec-ng" in my torspec repo. I'll be putting more updates there as I go.
I mention that here because some of the design work I'm recording there should mitigate the dangers of distributing keys to different hosts.
yrs,
Hi,
you good people do still have a VM on its own ipV4 sat twiddling its thumbs. It is pencilled in to be a 2nd relay for beta testing on. If you'd like to put it to a different use, please feel more than free to ask.
Regards,
Phill.
On 28 October 2013 16:53, Nick Mathewson nickm@alum.mit.edu wrote:
On Mon, Oct 28, 2013 at 9:19 AM, Matthew Finkel matthew.finkel@gmail.com wrote:
Hi everyone,
This is a proposal I wrote to implement scalable hidden services. It's by no means finished (there are some slight inconsistencies which I will be correcting later today or tomorrow) but I want to make it public in the meantime. I'm also working on some additional security measures that can be used, but those haven't been written yet.
Many thanks to George for his initial feedback. I pushed this version to my public repo, and will continue to push updates there.
In reality this is really 3.2 proposals, so the end result, if accepted, will look significantly different, but I'm indecisive at this point about which of the designs is least dangerous.
Looks cool!
Generally, I think that requiring a single node to be master is fine in initial designs only: if we consider such a design, it's important to have a migration path to a design where the whole service doesn't fall over when a single leader goes down. It's less important to me that there be no master key, especially if that master key can be kept offline.
I also like designs where it's not immediately obvious how many hosts a hidden service has just from looking at its descriptor.
FWIW, I'm working on a complete revision to rend-spec.txt, since I think the scope of hidden service changes under consideration is broad enough that it makes sense to reconsider the entire system as a whole rather than trying to do it one piece at a time. I haven't even gotten to the spec parts yet -- just the preliminaries -- but you can see the draft in progress as it was on my desktop this morning in branch "rendspec-ng" in my torspec repo. I'll be putting more updates there as I go.
I mention that here because some of the design work I'm recording there should mitigate the dangers of distributing keys to different hosts.
yrs,
Nick _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
On Mon, Oct 28, 2013 at 12:53:35PM -0400, Nick Mathewson wrote:
On Mon, Oct 28, 2013 at 9:19 AM, Matthew Finkel matthew.finkel@gmail.com wrote:
Hi everyone,
This is a proposal I wrote to implement scalable hidden services. It's by no means finished (there are some slight inconsistencies which I will be correcting later today or tomorrow) but I want to make it public in the meantime. I'm also working on some additional security measures that can be used, but those haven't been written yet.
I forgot to note that it is in the hs_scaling branch.
Many thanks to George for his initial feedback. I pushed this version to my public repo, and will continue to push updates there.
In reality this is really 3.2 proposals, so the end result, if accepted, will look significantly different, but I'm indecisive at this point about which of the designs is least dangerous.
Looks cool!
I'm glad you think so!
Generally, I think that requiring a single node to be master is fine in initial designs only: if we consider such a design, it's important to have a migration path to a design where the whole service doesn't fall over when a single leader goes down. It's less important to me that there be no master key, especially if that master key can be kept offline.
I agree. To be honest, the master-slave design is included more for completeness and comparison than for implementation. I think it is a reasonable design, but I favor much more the peer-nodes design, in general.
I also decided to propose a simpler method of selecting a captain. The original process resulted in a random node being selected, but I decided it was unnecessarily complex, at this point.
Allowing offline key storage would be ideal, and with inter-node communication a key can easily be distributed to the other nodes, as needed. But, depending on the design, it really only buys a hidden services operator the ability to upload a key on one node and have it automagically distributed to the others, whereas the operator would need to do this manually otherwise.
I also like designs where it's not immediately obvious how many hosts a hidden service has just from looking at its descriptor.
This would also be ideal, however after spending some time working on this problem I don't like the currently available solutions for solving this. At its core, we are trying to take a one-to-one mapping and turn it into a one-to-many. The proposed solutions are more like hacks, if this is to be done correctly I think we should try to redesign the components that are causing problems instead of trying to mangle the current design in a (hopefully) secure way.
FWIW, I'm working on a complete revision to rend-spec.txt, since I think the scope of hidden service changes under consideration is broad enough that it makes sense to reconsider the entire system as a whole rather than trying to do it one piece at a time. I haven't even gotten to the spec parts yet -- just the preliminaries -- but you can see the draft in progress as it was on my desktop this morning in branch "rendspec-ng" in my torspec repo. I'll be putting more updates there as I go.
Great, I'll take a look. I think this will result in a better (more secure, usable, cohesive) product.
I mention that here because some of the design work I'm recording there should mitigate the dangers of distributing keys to different hosts.
Oh, this reminds me, I forgot to thank George for his suggestion to use the Key Transport protocol (thanks again George!).
Mitigating the danger will be nice. I'm also not sure how severe the danger is or if a secret sharing scheme is really necessary for this, but I added it in any case for the defense in-depth approach.
Thanks for your thoughts/comments!
- Matt
On 28/10/13 13:19, Matthew Finkel wrote:
This is a proposal I wrote to implement scalable hidden services. It's by no means finished (there are some slight inconsistencies which I will be correcting later today or tomorrow) but I want to make it public in the meantime. I'm also working on some additional security measures that can be used, but those haven't been written yet.
Great, I will try to link this in to the earlier thread for some continuity.
It seems to me that this is a description of "Alternative 3" in Nick's email. Multiple instances, with multiple sets of introduction points, somehow combined in to one service descriptor? I haven't managed to fully comprehend your proposal yet, but I though I would try and continue the earlier discussion.
So, going back to the goals, this "alternative" can have master nodes, but, can have you can also just have this "captain" role dynamically self assigned. Why did you include an alternative here, do you see these being used differently? It seems like the initial mode does not fulfil goal 2 or 3?
One of the differences between the alternatives that keeps coming up, is who (if anyone) can determine the number of nodes. Alternative 3 can keep this secret to the service operator by publishing a combined descriptor. I also discussed in the earlier thread how you could do this in the "Alternative 4: Single hidden service descriptor, multiple service instances per intro point." design, by having the instances connect to each introduction point 1, or more times, and possibly only connecting to a subset of the introduction points (possibly didn't consider this in the earlier thread).
Another recurring point for comparison, is can anyone determine if a particular service instance is down. Alternative 4 can get around this by hiding the instances behind the introduction points, and to keep the information from the introduction points, each instance (as described above) can keep multiple connections open, occasionally dropping some to keep the introduction point guessing. I think this would work, providing that the introduction point cannot work out what connections correspond with what instances. If each instance has a disjoint set of introduction points, of which some subset (possibly total) is listed in the descriptor, it would be possible to work out both if a instance goes down, and what introduction points correspond to that instance, just by repeatedly trying to connect through all the introduction points? If you start failing to connect for a particular subset of the introduction points, this could suggest a instance failure. Correlating this with power or network outages could give away the location of that instance?
Also, compared to the setup of other alternatives, this seems more complex for the hidden service operator. Both in terms of understanding what they are doing, and debugging failures? I think it would be good to partition the goals (as there are quite a lot (not inherently bad)). In particular, one subset of goals would be as follows:
Operator (the person, or people controlling the service) Usability - Simple Initial Setup - Simple Addition of new Instances - Simple Removal of Instances - Graceful behaviour regarding instance failure (with respect to the operator) - Well defined failure modes - If something doesn’t work, it should be possible for the operator to work out what, and how to resolve it.
Now, obviously, these are minor compared to the more technical goals, but I think they are worth considering, as we have a few technically workable proposals on the table.
As for what I am doing on this currently, I have been reading lots of the related, and not so related papers. I hope to begin doing some more Hidden Service related Chutney stuff this or next week, such that I have something to test with when I start implementing something (note: what I implement, might not be adopted/adoptable by the project, but I am doing it anyway as part of my degree). I am available on #tor-dev as cbaines for any and all discussion.
Christopher Baines cbaines8@gmail.com writes:
On 28/10/13 13:19, Matthew Finkel wrote:
This is a proposal I wrote to implement scalable hidden services. It's by no means finished (there are some slight inconsistencies which I will be correcting later today or tomorrow) but I want to make it public in the meantime. I'm also working on some additional security measures that can be used, but those haven't been written yet.
Great, I will try to link this in to the earlier thread for some continuity.
It seems to me that this is a description of "Alternative 3" in Nick's email. Multiple instances, with multiple sets of introduction points, somehow combined in to one service descriptor? I haven't managed to fully comprehend your proposal yet, but I though I would try and continue the earlier discussion.
So, going back to the goals, this "alternative" can have master nodes, but, can have you can also just have this "captain" role dynamically self assigned. Why did you include an alternative here, do you see these being used differently? It seems like the initial mode does not fulfil goal 2 or 3?
One of the differences between the alternatives that keeps coming up, is who (if anyone) can determine the number of nodes. Alternative 3 can keep this secret to the service operator by publishing a combined descriptor. I also discussed in the earlier thread how you could do this in the "Alternative 4: Single hidden service descriptor, multiple service instances per intro point." design, by having the instances connect to each introduction point 1, or more times, and possibly only connecting to a subset of the introduction points (possibly didn't consider this in the earlier thread).
So far, we have avoided defining our adversaries. Here is an example of some adversaries (wrt distinguishing multi-node HSes from single-node HSes and finding the number of nodes and their status): - HS Client. This is a client who knows the descriptor of an HS, and hence all its IPs. - Introduction Point. This is an introduction point of the HS. This is a naive version of the next adversary and I will not consider it in the attacks below. - Introduction Point + Client: This is an adversary who knows the descriptor of an HS, hence all its IPs, and is also an IP of the Hidden Service. This is superior to the simple 'Introduction Point' adversary and more realistic (since not many Introduction Points will target random HSes, but if you know an HS, you will try targetting it by becoming its IP).
Let's see how these adversaries are doing if they want to determine the number of HS-nodes:
- Alternative 3: Single hidden service descriptor, one service instance per intro point: -- From PoV of a client: If a client suspects that an HS is multi-node, then its number of nodes is simply the number of its introduction points. -- Same thing applies for IP+Client.
- Alternative 4: Single hidden service descriptor, multiple service instances per intro point. -- From PoV of an IP+Client: It's trivial for an IP+Client to distinguish a multi-node HS from a single-node HS, by looking at the number of introduction circuits to it. Single-node HSes only have a single IP circuit (IIRC).
Also, depending on how we assign HS-nodes to IPs it might be possible to find the number of HS-nodes too (or at least a lower or upper limit of them).
I don't see a way for a client to get the number of nodes of an HS in Alternative 4. However, an IP+Client is able to do so in both alternatives.
BTW, as Paul said, if we try to hide the number of nodes (from an IP+Client adversary) by establishing multiple circuits from a single HS-node to the IP, we should be careful because multiple "same source same destination" circuits might lead to nasty attacks.
Finally, if we go with the "Alternative 4: Single hidden service descriptor, multiple service instances per intro point." (which currently seems as the best idea to me), we should think of how many IPs each HS-node will connect to. There are at least three ways: a) An HS-node establishes circuits to all the IPs. b) An HS-node establishes circuits to a k-subset of the IPs. c) An HS-node establishes circuits to a random number of the IPs.
From the above, a) trivially reveals the number of nodes to all IPs
and also establishes too many circuits which is bad for the network. I think b) and c) our best options here. We should think of how various values of 'k' change our security and availability here, and we should think whether randomization actually adds any useful obfuscation wrt the number/uptime of HS-nodes.
We should also think of how we assign HS-nodes to IPs. Lars Luthman started doing so in https://lists.torproject.org/pipermail/tor-dev/2013-October/005615.html. We should think more!
Another recurring point for comparison, is can anyone determine if a particular service instance is down. Alternative 4 can get around this by hiding the instances behind the introduction points, and to keep the information from the introduction points, each instance (as described above) can keep multiple connections open, occasionally dropping some to keep the introduction point guessing. I think this would work, providing that the introduction point cannot work out what connections correspond with what instances. If each instance has a disjoint set of introduction points, of which some subset (possibly total) is listed in the descriptor, it would be possible to work out both if a instance goes down, and what introduction points correspond to that instance, just by repeatedly trying to connect through all the introduction points? If you start failing to connect for a particular subset of the introduction points, this could suggest a instance failure. Correlating this with power or network outages could give away the location of that instance?
Indeed.
It also seems hard to me to obfuscate the number and status of HS-nodes by randomly disconnecting introduction circuits from IPs. Especially so if we want to do it without influencing the performance of the HS (i.e. avoiding disconnecting circuits when clients are using them). Naive solutions will probably allow IPs to distinguish random decoy failure from an actual permanent power-outage-like failure of an HS-node.
I'll ignore the random-disconnects idea for now and analyze our alternatives with respect to recognizing the status (uptime) of HS-nodes:
- Alternative 4: Single hidden service descriptor, multiple service instances per intro point. -- From the PoV of IP+Client: An IP+Client will be able to detect changes in the status of HS-nodes by monitoring its introduction circuits.
- Alternative 3: Single hidden service descriptor, one service instance per intro point. -- From the PoV of a client: Clients can distinguish uptime of HS peers, since they know that each peer has one IP (and they know all the IPs of a hidden service). -- Same thing applies for IP+Client.
It seems to me that an IP+Client adversary is always able to find the number and status of HS-nodes. The proposed ways to fix this is to add measures like random-circuit-disconnects and connecting to IPs multiple times from a single HS-node. Both of these solutions seems easy to get wrong and hard to prove secure. We should think more about them!
On Mon, Oct 28, 2013 at 08:49:46PM +0000, George Kadianakis wrote:
Christopher Baines cbaines8@gmail.com writes:
On 28/10/13 13:19, Matthew Finkel wrote:
This is a proposal I wrote to implement scalable hidden services. It's by no means finished (there are some slight inconsistencies which I will be correcting later today or tomorrow) but I want to make it public in the meantime. I'm also working on some additional security measures that can be used, but those haven't been written yet.
Great, I will try to link this in to the earlier thread for some continuity.
It seems to me that this is a description of "Alternative 3" in Nick's email. Multiple instances, with multiple sets of introduction points, somehow combined in to one service descriptor? I haven't managed to fully comprehend your proposal yet, but I though I would try and continue the earlier discussion.
So, going back to the goals, this "alternative" can have master nodes, but, can have you can also just have this "captain" role dynamically self assigned. Why did you include an alternative here, do you see these being used differently? It seems like the initial mode does not fulfil goal 2 or 3?
One of the differences between the alternatives that keeps coming up, is who (if anyone) can determine the number of nodes. Alternative 3 can keep this secret to the service operator by publishing a combined descriptor. I also discussed in the earlier thread how you could do this in the "Alternative 4: Single hidden service descriptor, multiple service instances per intro point." design, by having the instances connect to each introduction point 1, or more times, and possibly only connecting to a subset of the introduction points (possibly didn't consider this in the earlier thread).
So far, we have avoided defining our adversaries. Here is an example of some adversaries (wrt distinguishing multi-node HSes from single-node HSes and finding the number of nodes and their status):
- HS Client. This is a client who knows the descriptor of an HS, and hence all its IPs.
- Introduction Point. This is an introduction point of the HS. This is a naive version of the next adversary and I will not consider it in the attacks below.
- Introduction Point + Client: This is an adversary who knows the descriptor of an HS, hence all its IPs, and is also an IP of the Hidden Service. This is superior to the simple 'Introduction Point' adversary and more realistic (since not many Introduction Points will target random HSes, but if you know an HS, you will try targetting it by becoming its IP).
Let's see how these adversaries are doing if they want to determine the number of HS-nodes:
- Alternative 3: Single hidden service descriptor, one service instance per intro point:
-- From PoV of a client: If a client suspects that an HS is multi-node, then its number of nodes is simply the number of its introduction points. -- Same thing applies for IP+Client.
Based on the proposal it could be even easier than that, but it is definitely easy.
- Alternative 4: Single hidden service descriptor, multiple service instances per intro point.
-- From PoV of an IP+Client: It's trivial for an IP+Client to distinguish a multi-node HS from a single-node HS, by looking at the number of introduction circuits to it. Single-node HSes only have a single IP circuit (IIRC).
Yup, but I think that with the current design, we will need to decide how much information leakage we can accept. Is it acceptable for a client or IP to know that we are scaling without knowing the number of nodes or does the mere fact that we are scaling reveal enough information to an attacker? I completely understand why we want to limit the amount of information we reveal, but maybe we should also categorize these properties into "We can not leak this information" and "This would be a nice property to have in a design just because the probability that an attack is successful is proportional to the amount of information they have about the target."
Also, depending on how we assign HS-nodes to IPs it might be possible to find the number of HS-nodes too (or at least a lower or upper limit of them).
I don't see a way for a client to get the number of nodes of an HS in Alternative 4. However, an IP+Client is able to do so in both alternatives.
BTW, as Paul said, if we try to hide the number of nodes (from an IP+Client adversary) by establishing multiple circuits from a single HS-node to the IP, we should be careful because multiple "same source same destination" circuits might lead to nasty attacks.
Finally, if we go with the "Alternative 4: Single hidden service descriptor, multiple service instances per intro point." (which currently seems as the best idea to me), we should think of how many IPs each HS-node will connect to. There are at least three ways: a) An HS-node establishes circuits to all the IPs. b) An HS-node establishes circuits to a k-subset of the IPs. c) An HS-node establishes circuits to a random number of the IPs.
From the above, a) trivially reveals the number of nodes to all IPs and also establishes too many circuits which is bad for the network. I think b) and c) our best options here. We should think of how various values of 'k' change our security and availability here, and we should think whether randomization actually adds any useful obfuscation wrt the number/uptime of HS-nodes.
We should also think of how we assign HS-nodes to IPs. Lars Luthman started doing so in https://lists.torproject.org/pipermail/tor-dev/2013-October/005615.html. We should think more!
We should also think about how many IPs are selected, because as far as I can tell, each node is blind to how many peers it has. So, if the initial hidden service chooses three IPs, and then 5 more nodes are added, then each of those 5 nodes connect to k out of 3 IPs. How does a node decide that it should add an IP to the set?
Another recurring point for comparison, is can anyone determine if a particular service instance is down. Alternative 4 can get around this by hiding the instances behind the introduction points, and to keep the information from the introduction points, each instance (as described above) can keep multiple connections open, occasionally dropping some to keep the introduction point guessing. I think this would work, providing that the introduction point cannot work out what connections correspond with what instances. If each instance has a disjoint set of introduction points, of which some subset (possibly total) is listed in the descriptor, it would be possible to work out both if a instance goes down, and what introduction points correspond to that instance, just by repeatedly trying to connect through all the introduction points? If you start failing to connect for a particular subset of the introduction points, this could suggest a instance failure. Correlating this with power or network outages could give away the location of that instance?
Indeed.
It also seems hard to me to obfuscate the number and status of HS-nodes by randomly disconnecting introduction circuits from IPs. Especially so if we want to do it without influencing the performance of the HS (i.e. avoiding disconnecting circuits when clients are using them). Naive solutions will probably allow IPs to distinguish random decoy failure from an actual permanent power-outage-like failure of an HS-node.
I'll ignore the random-disconnects idea for now and analyze our alternatives with respect to recognizing the status (uptime) of HS-nodes:
- Alternative 4: Single hidden service descriptor, multiple service instances per intro point.
-- From the PoV of IP+Client: An IP+Client will be able to detect changes in the status of HS-nodes by monitoring its introduction circuits.
- Alternative 3: Single hidden service descriptor, one service instance per intro point.
-- From the PoV of a client: Clients can distinguish uptime of HS peers, since they know that each peer has one IP (and they know all the IPs of a hidden service). -- Same thing applies for IP+Client.
It seems to me that an IP+Client adversary is always able to find the number and status of HS-nodes. The proposed ways to fix this is to add measures like random-circuit-disconnects and connecting to IPs multiple times from a single HS-node. Both of these solutions seems easy to get wrong and hard to prove secure. We should think more about them!
Perhaps we should look at hybrid solutions. None of them seem worthwhile at this point, but they are worth considering.
- Shared Introduction Points, Multiple Descriptors -- From POV of IP+Client: The client can retrieve all descriptors from the responsible HSDirs thus revealing all of the Introduction Points. The IP will receive connections from a subset of the hidden service nodes. This reveals the fact that it is a multi-node hidden service, but does not directly reveal how many nodes it contains.
Another, similar, option is for a HS to publish different descriptors for each period which include different IPs. Initially this will limit the likelihood of correlation and therefore prevent leaking the HS is multi-node and how many nodes. However, over time, this will be broken, as well, so this is only acceptable for short lived (possibly single period) hidden services.
Similarly, an HS publishes multiple descriptors to the same HSDir and the HSDir randomly chooses one of the descriptors everytime it is queried. However, this suffers the same fate.
- Shared Introduction Points, Combined descriptors -- This provides the HSDir with too much power but for the sake of completeness we can see that this will improve the simplicity of setting up and configuring the hidden services but it is no more secure than the Shared Introduction Point design.
The more I think about this problem the more I convince myself we need another layer of abstraction between the client and the IP, perhaps proxy the INTRODUCE1 cell via the HSDir that is responible for the HS's descriptor and the client never need know the IPs. This requires a lot more thought (more than I can give it right now, at least).
Thanks for the analysis!
On Wed, Oct 30, 2013 at 08:11:16AM +0000, Matthew Finkel wrote:
On Mon, Oct 28, 2013 at 08:49:46PM +0000, George Kadianakis wrote:
...
It seems to me that an IP+Client adversary is always able to find the number and status of HS-nodes. The proposed ways to fix this is to add measures like random-circuit-disconnects and connecting to IPs multiple times from a single HS-node. Both of these solutions seems easy to get wrong and hard to prove secure. We should think more about them!
...
The more I think about this problem the more I convince myself we need another layer of abstraction between the client and the IP, perhaps proxy the INTRODUCE1 cell via the HSDir that is responible for the HS's descriptor and the client never need know the IPs. This requires a lot more thought (more than I can give it right now, at least).
To solidify this description, the idea is to take George's Intermediate descriptor and encrypt everything using the symmetric key except the Intro Point details. These would need to be encrypted with a key that was exchanged with the HSDirs. Then, when a client requests a descriptor from an HSDir, it receives everything it needs to send the INTRODUCE1 cell except it will not know the introduction points. It will be at this point where the client sends the INTRODUCE1 payload encapsulated in another cell (along with the hidden service's address so the HSDir knows where to connect) to the HSDir. The HSDir then establishes a connection with one of the introduction points and forwards the INTRODUCE1 payload to the hidden service.
I want to like this tweak, but I don't think we gain enough from it, alas. The main improvement is that a client will never know the introduction points for a hidden service, therefore it will never be able to estimate the size of the hidden service solely from its descriptor, assuming the intro point section is padded to a prespecified length or is sent to the HSDir separately. However, this is only a slight hinderance for a Client+IP and will not prevent the information leakage described by George, over time.
Assuming the HSDirs for a hidden service are sufficiently unpredictable prior to the period, the main improvement this provides is the ability to scale with respect to IPs without significant leakage to a Client+IP, but this does nothing to prevent any correlation attacks that may exist. But, again, over time with the one-hidden-service-per-intro-point a client will still be able to recognize a hidden service failure even if it may not know the size of a hidden service. Furthermore, this all strictly depends on the randomness by which HSDirs become responsible for a hidden service's descriptor.
Can this idea be improved to make it worthwhile and safer? What othere modifications can we make to the rend protocol to improve the "hiddenness" of an HS (without needing to go as far as the valet design) while also improving usability/speed?
On Mon, Oct 28, 2013 at 07:40:12PM +0000, Christopher Baines wrote:
On 28/10/13 13:19, Matthew Finkel wrote:
This is a proposal I wrote to implement scalable hidden services. It's by no means finished (there are some slight inconsistencies which I will be correcting later today or tomorrow) but I want to make it public in the meantime. I'm also working on some additional security measures that can be used, but those haven't been written yet.
Great, I will try to link this in to the earlier thread for some continuity.
Sounds good. For those just joining this discussion, the previous thread can be found at https://lists.torproject.org/pipermail/tor-dev/2013-October/005556.html
It seems to me that this is a description of "Alternative 3" in Nick's email. Multiple instances, with multiple sets of introduction points, somehow combined in to one service descriptor?
Yes, mostly. The proposal describes Nick's "Alternative 3", but there is no technical reason why the instances can not coordinate their IPs and share some subset, assuming some additional modicications to the HS design. This obviously was not included in the prop, but it would be easy to extend the protocol to include this.
I haven't managed to fully comprehend your proposal yet, but I though I would try and continue the earlier discussion.
That's fine, we can work through it, but you seem to understand it pretty well
So, going back to the goals, this "alternative" can have master nodes, but, can have you can also just have this "captain" role dynamically self assigned.
It's not really self-assigned, more like "assigned by the operator but it is a result of previous node failures". I think we can think of it as an "awareness" rather than an "assignment", in this scenario.
Why did you include an alternative here, do you see these being used differently? It seems like the initial mode does not fulfil goal 2 or 3?
Yes, I see the Master-Slave design as an alternative to the Peer-Nodes design. I think they do satisfy Nick's goal 3, but they obviously don't satisfy goal 2.
One of the differences between the alternatives that keeps coming up, is who (if anyone) can determine the number of nodes. Alternative 3 can keep this secret to the service operator by publishing a combined descriptor. I also discussed in the earlier thread how you could do this in the "Alternative 4: Single hidden service descriptor, multiple service instances per intro point." design, by having the instances connect to each introduction point 1, or more times, and possibly only connecting to a subset of the introduction points (possibly didn't consider this in the earlier thread).
Out of the 4 designs in the proposal, I think section 4.2, the Homogeneous shared-key peer-nodes design, is the best and is the most versatile (but the most complex, as a result). So, technically, our two proposals can be merged without much difficulty. However, there are still some issues that I'm having some trouble solving in a sane way. When you make the introduction point the focal point there are some tradeoffs. I'm still not sure if these are the right tradeoffs just to disguise the size of the hidden service.
Another recurring point for comparison, is can anyone determine if a particular service instance is down.
Absolutely. This is a problem I hope we can solve.
Alternative 4 can get around this by hiding the instances behind the introduction points, and to keep the information from the introduction points, each instance (as described above) can keep multiple connections open, occasionally dropping some to keep the introduction point guessing. I think this would work, providing that the introduction point cannot work out what connections correspond with what instances.
True, but this sounds extremely risky and error prone. I really hope we can do better than this to solve the problem.
If each instance has a disjoint set of introduction points, of which some subset (possibly total) is listed in the descriptor, it would be possible to work out both if a instance goes down, and what introduction points correspond to that instance, just by repeatedly trying to connect through all the introduction points? If you start failing to connect for a particular subset of the introduction points, this could suggest a instance failure. Correlating this with power or network outages could give away the location of that instance?
Sure, but as the proposal describes, the security of the multi-node hidden service is reduced to the security of the single-node hidden service. As such, the location-via-correlation attack you describe (there's probably a real/better name for it) is a result of the design, and I decided not to fix it in fear of introducing another, more dangerous, attack.
Also, compared to the setup of other alternatives, this seems more complex for the hidden service operator. Both in terms of understanding what they are doing, and debugging failures?
It is more complex, no argument there, but I don't think that it is unfair to impose this on the operator (nothing in the world is free). If an op wants to setup a multi-node hidden service, then it will not be much more effort than setting up a single-node service. If the application running behind the hidden service can support scaling, then its configuration will likely be a lot more complicated than configuring a multi-node HS. The way the proposal describes it, it's very repetitive. So, on the bright side, the operator will be very familiar with the torrc config lines when their they're done. :)
I think it would be good to partition the goals (as there are quite a lot (not inherently bad)). In particular, one subset of goals would be as follows:
Operator (the person, or people controlling the service) Usability
- Simple Initial Setup
From the proposal, every node will contain two hidden services, one for
the public hidden service, and one used for inter-node communication. I don't think this will be a barrier for entry as the operator will likely be following step-by-step directions, in any case.
- Simple Addition of new Instances
As soon as the management hidden service has been created on the new node, the operator simply appends it to the HiddenServicePeers line on every node. I agree that this will require more work than most people want to spend on something like this, but there are ways we can solve this, with a little more complexity in the protocol. For example, we can allow peers to sync configs with each other and then rewrite portions of the torrc. This is something I am very hesitant to do, though.
- Simple Removal of Instances
The opposite of addition, just remove the hidden service address from the HiddenServicePeers line on each node.
- Graceful behaviour regarding instance failure (with respect to the
operator)
- Well defined failure modes
- If something doesn’t work, it should be possible for the operator
to work out what, and how to resolve it.
This is tricky but doable within the scope of this proposal. The tricky part arises when returning any type of error code/message in the event of an authentication/validation failure. But we can still produce useful information that an operator can use to troubleshoot the configuration.
As an example, say we have a multi-node configuration that contains nodes A and B. The HiddenServicePeers line in A's torrc contains the hidden service for B but the HiddenServicePeers line in B's torrc does not contain A, as a result when A tries to connect to B the circuit is destroyed during authentication. The two failure reasons here are that B doesn't have A's hidden service descriptor or A is not in B's torrc. It is very easy for an operator to rule out the latter case, and log messages from B should allow the operator to determine any other problem.
I think it's a given that this configuration is much more complex than the one you proposed, but the failure mode is not much worse because in both proposals *some node*, somewhere, will always publish a descriptor. It will certainly be difficult for the operator to determine which node actually published it (without adding a new mechanism to Tor for this) and the load balancing may not work as expected until the operator investigates and fixes it. But, as far as I can tell, neither of them fail closed, for better or worse. I'm reanalyzing my design to see if I missed a case that should require it.
Now, obviously, these are minor compared to the more technical goals, but I think they are worth considering, as we have a few technically workable proposals on the table.
As for what I am doing on this currently, I have been reading lots of the related, and not so related papers. I hope to begin doing some more Hidden Service related Chutney stuff this or next week, such that I have something to test with when I start implementing something (note: what I implement, might not be adopted/adoptable by the project, but I am doing it anyway as part of my degree). I am available on #tor-dev as cbaines for any and all discussion.
Awesome! Are you planning on revising your proposal and sending it to tor-dev again? I know I am interested in seeing what changes you decided to make.
Thanks for your feedback and thoughts on information leakage!