I have been looking at doing some work on Tor as part of my degree, and more specifically, looking at Hidden Services. One of the issues where I believe I might be able to make some progress, is the Hidden Service Scaling issue as described here [1].
So, before I start trying to implement a prototype, I thought I would set out my ideas here to check they are reasonable (I have also been discussing this a bit on #tor-dev). The goal of this is two fold, to reduce the probability of failure of a hidden service and to increase hidden service scalability.
I think what I am planning distils down to two main changes. Firstly, when a OP initialises a hidden service, currently if you start a hidden service using an existing keypair and address, the new OP's introduction points replace the existing introduction points [2]. This does provide some redundancy (if slow), but no load balancing.
My current plan is to change this such that if the OP has an existing public/private keypair and address, it would attempt to lookup the existing introduction points (probably over a Tor circuit). If found, it then establishes introduction circuits to those Tor servers.
Then comes the second problem, following the above, the introduction point would then disconnect from any other connected OP using the same public key (unsure why as a reason is not given in the rend-spec). This would need to change such that an introduction point can talk to more than one instance of the hidden service.
These two changes combined should help with the two goals. Reliability is improved by having multiple OP's providing the service, and having all of these accessible from the introduction points. Scalability is also improved, as you are not limited to one OP (as described above, currently you can also have +1 but only one will receive most of the traffic, and fail over is slow).
I am aware that there are several undefined parts of the above description, e.g. how does a introduction point choose what circuit to use? but at the moment I am more interested in the wider picture. It would be good to get some feedback on this.
1: https://blog.torproject.org/blog/hidden-services-need-some-love 2: http://tor.stackexchange.com/questions/13/can-a-hidden-service-be-hosted-by-...
On Tue, Oct 8, 2013 at 1:52 AM, Christopher Baines cbaines8@gmail.com wrote:
I have been looking at doing some work on Tor as part of my degree, and more specifically, looking at Hidden Services. One of the issues where I believe I might be able to make some progress, is the Hidden Service Scaling issue as described here [1].
So, before I start trying to implement a prototype, I thought I would set out my ideas here to check they are reasonable (I have also been discussing this a bit on #tor-dev). The goal of this is two fold, to reduce the probability of failure of a hidden service and to increase hidden service scalability.
I think what I am planning distils down to two main changes. Firstly, when a OP initialises a hidden service, currently if you start a hidden service using an existing keypair and address, the new OP's introduction points replace the existing introduction points [2]. This does provide some redundancy (if slow), but no load balancing.
My current plan is to change this such that if the OP has an existing public/private keypair and address, it would attempt to lookup the existing introduction points (probably over a Tor circuit). If found, it then establishes introduction circuits to those Tor servers.
Then comes the second problem, following the above, the introduction point would then disconnect from any other connected OP using the same public key (unsure why as a reason is not given in the rend-spec). This would need to change such that an introduction point can talk to more than one instance of the hidden service.
So, let's figure out all our possibilities before we pick one, and talk about requirements a little.
Alternative 1: Multiple hidden service descriptors.
Each instance of a hidden service picks its own introduction points, and uploads a separate hidden service descriptor to a subset of the HSDir nodes handling that service.
Alternative 2: Combined hidden service descriptors in the network.
Each instance of a hidden service picks its own introduction points, and uploads something to every appropriate HSDir node. The HSDir nodes combine those somethings, somehow, into a hidden service descriptor.
Alternative 3: Single hidden service descriptor, one service instance per intro point.
Each instance of a hidden service picks its introduction points, and somehow they coordinate so that they, together, get a single unified list of all their introduction points. They use this list to make a single signed hidden service descriptor, and upload that to the appropriate HSDirs.
Alternative 4: Single hidden service descriptor, multiple service instances per intro point.
This is your design above, where there's one descriptor chosen by a single hidden service instance (or possibly made collaboratively?), and the rest of the service instances fetch it, learn which intro points they're supposed to be at, and parasitically establish fallback introduction circuits there.
There are probably other alternatives too; let's see if we can think of some more.
Here are some possible desirable things. I don't know if they're all important, or all worth it. Let's discuss!
Goal 1) Obscure number of hidden service instances. Goal 2) No "master" hidden service instance. Goal 3) If there is a "master" hidden service instance, clean fail-over from one master to the next, undetectable by the network. Goal 4) Obscure which instances are up and which are down.
What other goals should we have in this kind of design?
On 08/10/13 23:41, Nick Mathewson wrote:
Here are some possible desirable things. I don't know if they're all important, or all worth it. Let's discuss!
So, I think it makes more sense to cover the goals first.
Goal 1) Obscure number of hidden service instances.
Good to have, as it probably helps with the anonymity of a hidden service. This is a guess based on the assumption that attacks based on traffic analysis are harder if you don't know the number of servers that you are looking for.
Goal 2) No "master" hidden service instance. Goal 3) If there is a "master" hidden service instance, clean fail-over from one master to the next, undetectable by the network.
That sounds reasonable.
Goal 4) Obscure which instances are up and which are down.
I think it would be good to make the failure of a hidden service server (perhaps alone, or one of many for that service) indistinguishable from a breakage in any of the relays. If you don't have this property, distributing the service does little to help with attacks based on correlating server downtime with public events (power outages, network outages, ...). This is a specific form of this goal, that applies if you are in communication with a instance that goes down.
What other goals should we have in this kind of design?
Goal 5) It should cope (all the goals hold) with taking down (planned downtime), and bringing up instances.
Goal 6) Adding instances should not reduce the performance.
I can see problems if you have a large powerful server, adding a smaller server could actually reduce the performance, if the load is distributed equally.
Goal 7) It should be possible to move between a single instance and mutiple instance service easily. (this might be a specific case of goal 5, or just need consolidating)
Alternative 1: Multiple hidden service descriptors.
Each instance of a hidden service picks its own introduction points, and uploads a separate hidden service descriptor to a subset of the HSDir nodes handling that service.
This is close to breaking goal 1, as each instance would have to have >= 1 introduction point, this puts a upper bound on the number of instances. The way the OP picks the number of introduction points to create would have to be thought about with respect to this.
Also, goal 4 could be broken, as if the service becomes unreachable through a subset of the introduction points, this probably means that one or more of the instances have gone down. (assuming that an attacker can discover all the introduction points?)
Alternative 2: Combined hidden service descriptors in the network.
Each instance of a hidden service picks its own introduction points, and uploads something to every appropriate HSDir node. The HSDir nodes combine those somethings, somehow, into a hidden service descriptor.
Same problem with goal 4 as alternative 1. Probably also has problems obscuring the number of instances from the HSDir's.
Alternative 3: Single hidden service descriptor, one service instance per intro point.
Each instance of a hidden service picks its introduction points, and somehow they coordinate so that they, together, get a single unified list of all their introduction points. They use this list to make a single signed hidden service descriptor, and upload that to the appropriate HSDirs.
Same problem with goal 4 as alternative 1. I don't believe this has the same problem with the number of instances as Alternative 3 though.
Alternative 4: Single hidden service descriptor, multiple service instances per intro point.
This is your design above, where there's one descriptor chosen by a single hidden service instance (or possibly made collaboratively?), and the rest of the service instances fetch it, learn which intro points they're supposed to be at, and parasitically establish fallback introduction circuits there.
I don't really see how choosing introduction points collaboratively would work, as it could lead to a separation between single instance services, and multiple instance services, which could break goal 7. It would also require the instances to interact, which adds some complexity.
As for the fallback circuits, they are probably better off being just circuits. This would be what provides the scaling. The way you do this would have to be thought out though, to avoid breaking goal 6.
A simple algorithm would be for the introduction point to just use a round robin for all the circuits to that service, but allow a service to reject a connection (if it has two much load), the introduction point would then continue to the next circuit.
The introduction would also know the number of instances, if each instance only connected once. This could be masked by having instances making multiple connections to each introduction point (both in one instance and multiple instance services).
While an external attacker might not be able to detect individual instance failure by trying to continuously connect through all the introduction points, the introduction points themselves would probably be able to work out if one or more instances just failed. To combat this, the service could inject random failures (some kind of non-response which would be given if the service had actually failed) in to some of the circuits, to keep the introduction point guessing. This hopefully would not have too much detrimental effect, as the introduction point would just try the next circuit.
There are probably other alternatives too; let's see if we can think of some more.
I believe that to mask the state and possibly number of instances, you would have to at least have some of the introduction points connecting to multiple instances.
You could also have some random or coordinated shuffling of the connections, such that the instance(s) behind each introduction point keeps changing (this might address the above concerns).
Combining this with having multiple circuits to introduction points (from the same instance), and random failures (to hide real failures), might give the required level of security.
I will try and develop some full alternatives, once I have had some sleep... I realise that I have only commented negatively regarding the alternatives that you gave, but thanks enormously for talking to me about this, as it has really helped me.
Hi Christopher,
It's great that you started thinking about a design (and the potential obstacles). I will try not to reiterate what Nick already said, though.
On Tue, Oct 08, 2013 at 06:52:39AM +0100, Christopher Baines wrote:
I have been looking at doing some work on Tor as part of my degree, and more specifically, looking at Hidden Services. One of the issues where I believe I might be able to make some progress, is the Hidden Service Scaling issue as described here [1].
So, before I start trying to implement a prototype, I thought I would set out my ideas here to check they are reasonable (I have also been discussing this a bit on #tor-dev). The goal of this is two fold, to reduce the probability of failure of a hidden service and to increase hidden service scalability.
These are excellent goals. It would be even better if you made a stronger statement about hidden service failure. Something closer to "increase hidden service availablity", but I won't bikeshed on the wording.
I think what I am planning distils down to two main changes. Firstly, when a OP initialises a hidden service, currently if you start a hidden service using an existing keypair and address, the new OP's introduction points replace the existing introduction points [2]. This does provide some redundancy (if slow), but no load balancing.
So an interesting thing to note about this hack is that it does provide *some* load balancing. Not much, but some. The reason for this is because Tor clients cache hidden service descriptors so that they don't need to refetch every time they want to connect to it.
My current plan is to change this such that if the OP has an existing public/private keypair and address, it would attempt to lookup the existing introduction points (probably over a Tor circuit). If found, it then establishes introduction circuits to those Tor servers.
Then comes the second problem, following the above, the introduction point would then disconnect from any other connected OP using the same public key (unsure why as a reason is not given in the rend-spec). This would need to change such that an introduction point can talk to more than one instance of the hidden service.
It's important to think about the current design based on the assumption that a hidden service is a single node. Any modifications to this assumption will change the behavior of the various components.
These two changes combined should help with the two goals. Reliability is improved by having multiple OP's providing the service, and having all of these accessible from the introduction points. Scalability is also improved, as you are not limited to one OP (as described above, currently you can also have +1 but only one will receive most of the traffic, and fail over is slow).
Do you see any disadvantages to this design?
I am aware that there are several undefined parts of the above description, e.g. how does a introduction point choose what circuit to use? but at the moment I am more interested in the wider picture. It would be good to get some feedback on this.
1: https://blog.torproject.org/blog/hidden-services-need-some-love 2: http://tor.stackexchange.com/questions/13/can-a-hidden-service-be-hosted-by-...
This is a good start! Some important criteria you might also think about include how much you trust each component/node and which nodes do you want to be responsible for deciding where connections are routed. Also seriously think about how something like a botnet that uses hidden services might impact the reliability of your design (crazy idea, I know).
I'll defer to Nick's email for other thought provoking ideas.
- Matt
On 09/10/13 01:16, Matthew Finkel wrote:
So, before I start trying to implement a prototype, I thought I would set out my ideas here to check they are reasonable (I have also been discussing this a bit on #tor-dev). The goal of this is two fold, to reduce the probability of failure of a hidden service and to increase hidden service scalability.
These are excellent goals. It would be even better if you made a stronger statement about hidden service failure. Something closer to "increase hidden service availablity", but I won't bikeshed on the wording.
I agree, that is clearer.
I think what I am planning distils down to two main changes. Firstly, when a OP initialises a hidden service, currently if you start a hidden service using an existing keypair and address, the new OP's introduction points replace the existing introduction points [2]. This does provide some redundancy (if slow), but no load balancing.
So an interesting thing to note about this hack is that it does provide *some* load balancing. Not much, but some. The reason for this is because Tor clients cache hidden service descriptors so that they don't need to refetch every time they want to connect to it.
My current plan is to change this such that if the OP has an existing public/private keypair and address, it would attempt to lookup the existing introduction points (probably over a Tor circuit). If found, it then establishes introduction circuits to those Tor servers.
Then comes the second problem, following the above, the introduction point would then disconnect from any other connected OP using the same public key (unsure why as a reason is not given in the rend-spec). This would need to change such that an introduction point can talk to more than one instance of the hidden service.
It's important to think about the current design based on the assumption that a hidden service is a single node. Any modifications to this assumption will change the behavior of the various components.
The only interactions I currently believe can be affected are the Hidden Service instance <-> Introduction point(s) and Hidden Service instance <-> directory server. I need to go and read more about the latter, as I don't have all the information yet.
These two changes combined should help with the two goals. Reliability is improved by having multiple OP's providing the service, and having all of these accessible from the introduction points. Scalability is also improved, as you are not limited to one OP (as described above, currently you can also have +1 but only one will receive most of the traffic, and fail over is slow).
Do you see any disadvantages to this design?
So, care needs to be taken around the interaction between the hidden service instances, and the introduction points. If each instance just makes one circuit, then this reveals the number of instances.
There is also uncertainty around the replacement of failing introduction points. New ones have to be chosen, but as the service instances do not directly communicate, there could be some interesting behaviour unless this is done carefully.
I am also unsure how the lack of direct communication between the hidden service instances could affect the usability of this. I think what would be good to do is take some large, open source, distributed web applications and look at how/how not to set them up using various possible implementations of distributed hidden services.
I am aware that there are several undefined parts of the above description, e.g. how does a introduction point choose what circuit to use? but at the moment I am more interested in the wider picture. It would be good to get some feedback on this.
1: https://blog.torproject.org/blog/hidden-services-need-some-love 2: http://tor.stackexchange.com/questions/13/can-a-hidden-service-be-hosted-by-...
This is a good start! Some important criteria you might also think about include how much you trust each component/node and which nodes do you want to be responsible for deciding where connections are routed. Also seriously think about how something like a botnet that uses hidden services might impact the reliability of your design (crazy idea, I know).
I assume the characteristics of this are: 1 or more hidden service instances, connected to by very large numbers of clients, sending and reviving small amounts of information?
On Wed, Oct 09, 2013 at 09:58:07AM +0100, Christopher Baines wrote:
On 09/10/13 01:16, Matthew Finkel wrote:
Then comes the second problem, following the above, the introduction point would then disconnect from any other connected OP using the same public key (unsure why as a reason is not given in the rend-spec). This would need to change such that an introduction point can talk to more than one instance of the hidden service.
It's important to think about the current design based on the assumption that a hidden service is a single node. Any modifications to this assumption will change the behavior of the various components.
The only interactions I currently believe can be affected are the Hidden Service instance <-> Introduction point(s) and Hidden Service instance <-> directory server. I need to go and read more about the latter, as I don't have all the information yet.
Indeed. Lots of issues there.
These two changes combined should help with the two goals. Reliability is improved by having multiple OP's providing the service, and having all of these accessible from the introduction points. Scalability is also improved, as you are not limited to one OP (as described above, currently you can also have +1 but only one will receive most of the traffic, and fail over is slow).
Do you see any disadvantages to this design?
So, care needs to be taken around the interaction between the hidden service instances, and the introduction points. If each instance just makes one circuit, then this reveals the number of instances.
You said something similar in response to Nick, specifically you said
I believe that to mask the state and possibly number of instances, you would have to at least have some of the introduction points connecting to multiple instances.
I didn't understand why you said this in either place. Someone would have to know they had a complete list of introduction points to know the number of instances, but that would depend on how HS descriptors are created, stored, and distributed. From whom is this being hidden? You didn't state the adversary. Is it HS directory servers, intro point operators, potential clients of a hidden service? I don't see why any of these necessarily learns the state or number of instances simply because each intro point is chosen by a single instance (ignoring coincidental collisions if these choices are not coordinated).
Also, in your response to Nick you said that not having instances share intro points in some way would place an upper bound on the number of instances. True, but if the number of available intro points >> likely number of instances, this is a nonissue. And come to think of it, not true: if the instances are sometimes choosing the same intro points then this does not bound the number of instances possible (ignoring the number of HSes or instances for which a single intro point can serve as intro point at one time).
Also, above you said "If each instance just makes one circuit". Did you mean if there is a single intro point per instance?
Hard to say specifically without exploring more, but in general I would be more worried about what is revealed because circuits are built to common intro points by different instances and the intro points can recognize and manage these, e.g., dropping redundant ones than I would be because the number of intro points puts an upper bound on instances.
HTH, Paul
On 09/10/13 11:41, Paul Syverson wrote:
These two changes combined should help with the two goals. Reliability is improved by having multiple OP's providing the service, and having all of these accessible from the introduction points. Scalability is also improved, as you are not limited to one OP (as described above, currently you can also have +1 but only one will receive most of the traffic, and fail over is slow).
Do you see any disadvantages to this design?
So, care needs to be taken around the interaction between the hidden service instances, and the introduction points. If each instance just makes one circuit, then this reveals the number of instances.
You said something similar in response to Nick, specifically you said
I believe that to mask the state and possibly number of instances, you would have to at least have some of the introduction points connecting to multiple instances.
I didn't understand why you said this in either place. Someone would have to know they had a complete list of introduction points to know the number of instances, but that would depend on how HS descriptors are created, stored, and distributed. From whom is this being hidden? You didn't state the adversary. Is it HS directory servers, intro point operators, potential clients of a hidden service? I don't see why any of these necessarily learns the state or number of instances simply because each intro point is chosen by a single instance (ignoring coincidental collisions if these choices are not coordinated).
To clarify, I was interpreting the goal as only the service operator should know the number of instances. In particular, the adversary here is the introduction point. If hidden service instances only ever create one circuit to each introduction point, each introduction point knows the number of instances of every service it is an introduction point for, as this is the same as the number of circuits for that service.
Also, in your response to Nick you said that not having instances share intro points in some way would place an upper bound on the number of instances. True, but if the number of available intro points >> likely number of instances, this is a nonissue.
I don't really follow your reasoning.
And come to think of it, not true: if the instances are sometimes choosing the same intro points then this does not bound the number of instances possible (ignoring the number of HSes or instances for which a single intro point can serve as intro point at one time).
Ok, but I was assuming the current behaviour of Tor, which I believe prevents instances using some of the same introduction points.
Also, above you said "If each instance just makes one circuit". Did you mean if there is a single intro point per instance?
No, as you could have one instance that makes say 3 circuits to just one introduction point. This can help, as it can hide the number of instances from the introduction point.
Hard to say specifically without exploring more, but in general I would be more worried about what is revealed because circuits are built to common intro points by different instances and the intro points can recognize and manage these, e.g., dropping redundant ones than I would be because the number of intro points puts an upper bound on instances.
I don't quite understand the last part, but regarding introduction points handling more that one circuit for the same service. I think that having this helps possibly hide information (like the number of instances). This does depend on also allowing one instance to use multiple circuits, otherwise, some information would be given away.
I might try creating a wiki page on the Tor wiki to collect all of the information in this thread, as it might be a nice reference for discussion.
On Wed, Oct 09, 2013 at 03:02:47PM +0100, Christopher Baines wrote:
On 09/10/13 11:41, Paul Syverson wrote:
These two changes combined should help with the two goals. Reliability is improved by having multiple OP's providing the service, and having all of these accessible from the introduction points. Scalability is also improved, as you are not limited to one OP (as described above, currently you can also have +1 but only one will receive most of the traffic, and fail over is slow).
Do you see any disadvantages to this design?
So, care needs to be taken around the interaction between the hidden service instances, and the introduction points. If each instance just makes one circuit, then this reveals the number of instances.
You said something similar in response to Nick, specifically you said
I believe that to mask the state and possibly number of instances, you would have to at least have some of the introduction points connecting to multiple instances.
I didn't understand why you said this in either place. Someone would have to know they had a complete list of introduction points to know the number of instances, but that would depend on how HS descriptors are created, stored, and distributed. From whom is this being hidden? You didn't state the adversary. Is it HS directory servers, intro point operators, potential clients of a hidden service? I don't see why any of these necessarily learns the state or number of instances simply because each intro point is chosen by a single instance (ignoring coincidental collisions if these choices are not coordinated).
To clarify, I was interpreting the goal as only the service operator should know the number of instances. In particular, the adversary here is the introduction point. If hidden service instances only ever create one circuit to each introduction point, each introduction point knows the number of instances of every service it is an introduction point for, as this is the same as the number of circuits for that service.
I'm missing something. Suppose there is a hidden service with ten instances, each of which runs its own introduction point. How do any of these ten introduction points know the number of instances because they each see a single circuit from the hidden service?
Also, in your response to Nick you said that not having instances share intro points in some way would place an upper bound on the number of instances. True, but if the number of available intro points >> likely number of instances, this is a nonissue.
I don't really follow your reasoning.
If there are a thousand possible introduction points for a given HS, if each instance runs say two intro points, then that bounds the number of instances at 500 (ignoring that the intro points for different instances overlap q.v. below).
And come to think of it, not true: if the instances are sometimes choosing the same intro points then this does not bound the number of instances possible (ignoring the number of HSes or instances for which a single intro point can serve as intro point at one time).
Ok, but I was assuming the current behaviour of Tor, which I believe prevents instances using some of the same introduction points.
Why? If two different instances of the same HS operated completely independently (just for the sake of argument, I'm assuming there are good reasons this wouldn't happen in reality) then they wouldn't even know they were colliding on intro points. And neither would the intro points.
Also, above you said "If each instance just makes one circuit". Did you mean if there is a single intro point per instance?
No, as you could have one instance that makes say 3 circuits to just one introduction point. This can help, as it can hide the number of instances from the introduction point.
Off the top of my head, I'm guessing this would be a bad idea since the multiple circuits with the same source and destination will create more observation opportunities for either compromised Tor nodes or underlying ASes routers, etc. I don't have a specific attack in mind, but this seems a greater threat to locating a hidden service than would be revealing the number of instances to an intro point (which I still don't understand your argument that this gets revealed anyway).
Hard to say specifically without exploring more, but in general I would be more worried about what is revealed because circuits are built to common intro points by different instances and the intro points can recognize and manage these, e.g., dropping redundant ones than I would be because the number of intro points puts an upper bound on instances.
I don't quite understand the last part, but regarding introduction points handling more that one circuit for the same service. I think that having this helps possibly hide information (like the number of instances). This does depend on also allowing one instance to use multiple circuits, otherwise, some information would be given away.
I think our miscommunication above is just reiterated here. Hopefully something I have said will spark you to recognize the confusion (and indicate to you which one of us is having it) and you can tell me.
I might try creating a wiki page on the Tor wiki to collect all of the information in this thread, as it might be a nice reference for discussion.
Sure. I tend to do things more via email, but a chacun son gout. Note that I'm swamped for at least the next week so sorry if I don't respond any time soon.
aloha, Paul
On 10/10/13 23:28, Paul Syverson wrote:
On Wed, Oct 09, 2013 at 03:02:47PM +0100, Christopher Baines wrote:
On 09/10/13 11:41, Paul Syverson wrote:
These two changes combined should help with the two goals. Reliability is improved by having multiple OP's providing the service, and having all of these accessible from the introduction points. Scalability is also improved, as you are not limited to one OP (as described above, currently you can also have +1 but only one will receive most of the traffic, and fail over is slow).
Do you see any disadvantages to this design?
So, care needs to be taken around the interaction between the hidden service instances, and the introduction points. If each instance just makes one circuit, then this reveals the number of instances.
You said something similar in response to Nick, specifically you said
I believe that to mask the state and possibly number of instances, you would have to at least have some of the introduction points connecting to multiple instances.
I didn't understand why you said this in either place. Someone would have to know they had a complete list of introduction points to know the number of instances, but that would depend on how HS descriptors are created, stored, and distributed. From whom is this being hidden? You didn't state the adversary. Is it HS directory servers, intro point operators, potential clients of a hidden service? I don't see why any of these necessarily learns the state or number of instances simply because each intro point is chosen by a single instance (ignoring coincidental collisions if these choices are not coordinated).
To clarify, I was interpreting the goal as only the service operator should know the number of instances. In particular, the adversary here is the introduction point. If hidden service instances only ever create one circuit to each introduction point, each introduction point knows the number of instances of every service it is an introduction point for, as this is the same as the number of circuits for that service.
I'm missing something. Suppose there is a hidden service with ten instances, each of which runs its own introduction point. How do any of these ten introduction points know the number of instances because they each see a single circuit from the hidden service?
Ah, I have not been explicit enough when describing the behaviour I want to implement. In my original email, I set out that each instance of the service connects to each introduction point (this has also developed/changed a bit since that email). Unfortunately, I did not state the resultant behaviour I was looking to achieve (the above), just the changes to the protocol that would result in this behaviour.
Also, in your response to Nick you said that not having instances share intro points in some way would place an upper bound on the number of instances. True, but if the number of available intro points >> likely number of instances, this is a nonissue.
I don't really follow your reasoning.
If there are a thousand possible introduction points for a given HS, if each instance runs say two intro points, then that bounds the number of instances at 500 (ignoring that the intro points for different instances overlap q.v. below).
I think this resolves itself with the above clarification.
And come to think of it, not true: if the instances are sometimes choosing the same intro points then this does not bound the number of instances possible (ignoring the number of HSes or instances for which a single intro point can serve as intro point at one time).
Ok, but I was assuming the current behaviour of Tor, which I believe prevents instances using some of the same introduction points.
Why? If two different instances of the same HS operated completely independently (just for the sake of argument, I'm assuming there are good reasons this wouldn't happen in reality) then they wouldn't even know they were colliding on intro points. And neither would the intro points.
Given my above clarification, the instances perform some coordination via the hidden service directories. When a new instance starts, it finds existing introduction points in exactly the same way a client (who wants to connect to the hidden service) does.
Also, above you said "If each instance just makes one circuit". Did you mean if there is a single intro point per instance?
No, as you could have one instance that makes say 3 circuits to just one introduction point. This can help, as it can hide the number of instances from the introduction point.
Off the top of my head, I'm guessing this would be a bad idea since the multiple circuits with the same source and destination will create more observation opportunities for either compromised Tor nodes or underlying ASes routers, etc. I don't have a specific attack in mind, but this seems a greater threat to locating a hidden service than would be revealing the number of instances to an intro point (which I still don't understand your argument that this gets revealed anyway).
That is a concern. This will need more thought.
Hard to say specifically without exploring more, but in general I would be more worried about what is revealed because circuits are built to common intro points by different instances and the intro points can recognize and manage these, e.g., dropping redundant ones than I would be because the number of intro points puts an upper bound on instances.
I don't quite understand the last part, but regarding introduction points handling more that one circuit for the same service. I think that having this helps possibly hide information (like the number of instances). This does depend on also allowing one instance to use multiple circuits, otherwise, some information would be given away.
I think our miscommunication above is just reiterated here. Hopefully something I have said will spark you to recognize the confusion (and indicate to you which one of us is having it) and you can tell me.
I think this has now been addressed, let me know if it has not.
If the goal is to prevent introduction points from guessing the number of instances because of multiple instances using the same introduction points, shouldn't this scheme work?
1. On deployment, all instances of a hidden service have a copy of a secret bitstring (maybe the private key for the hidden service, maybe an additional secret) and the number of instances N. Every instance also has a unique instance ID k in the range [0, N-1].
2. When selecting an introduction point, an instance only considers candidates for which
hash(introduction-point-address || shared-secret) = k mod N.
With this system no two instances will ever connect to the same introduction point, and it doesn't require any synchronisation between the instances other than the initial instance ID assignation. But it relies on there being enough potential introduction points for which the equality holds.
This will also mean that an introduction point knows that it is always being used by the same instance of a hidden service. If you want to avoid this you could add the current day or hour or random time period to the hashed value, but then you might get a collision when a new time period begins.
Apologies if this has already been discussed.
--ll
Christopher Baines cbaines8@gmail.com writes:
On 10/10/13 23:28, Paul Syverson wrote:
On Wed, Oct 09, 2013 at 03:02:47PM +0100, Christopher Baines wrote:
On 09/10/13 11:41, Paul Syverson wrote:
> These two changes combined should help with the two goals. Reliability > is improved by having multiple OP's providing the service, and having > all of these accessible from the introduction points. Scalability is > also improved, as you are not limited to one OP (as described above, > currently you can also have +1 but only one will receive most of the > traffic, and fail over is slow).
Do you see any disadvantages to this design?
So, care needs to be taken around the interaction between the hidden service instances, and the introduction points. If each instance just makes one circuit, then this reveals the number of instances.
You said something similar in response to Nick, specifically you said
I believe that to mask the state and possibly number of instances, you would have to at least have some of the introduction points connecting to multiple instances.
I didn't understand why you said this in either place. Someone would have to know they had a complete list of introduction points to know the number of instances, but that would depend on how HS descriptors are created, stored, and distributed. From whom is this being hidden? You didn't state the adversary. Is it HS directory servers, intro point operators, potential clients of a hidden service? I don't see why any of these necessarily learns the state or number of instances simply because each intro point is chosen by a single instance (ignoring coincidental collisions if these choices are not coordinated).
To clarify, I was interpreting the goal as only the service operator should know the number of instances. In particular, the adversary here is the introduction point. If hidden service instances only ever create one circuit to each introduction point, each introduction point knows the number of instances of every service it is an introduction point for, as this is the same as the number of circuits for that service.
I'm missing something. Suppose there is a hidden service with ten instances, each of which runs its own introduction point. How do any of these ten introduction points know the number of instances because they each see a single circuit from the hidden service?
Ah, I have not been explicit enough when describing the behaviour I want to implement. In my original email, I set out that each instance of the service connects to each introduction point (this has also developed/changed a bit since that email). Unfortunately, I did not state the resultant behaviour I was looking to achieve (the above), just the changes to the protocol that would result in this behaviour.
That's from the PoV of Introduction Points. On the other hand, a client that knows the onion address of an HS (or an HSDir before https://lists.torproject.org/pipermail/tor-dev/2013-October/005534.html gets implemented) can still get the list of all IPs (at least with the current directory design). If some of the "HS peers" that correspond to those IPs are down, then a client can notice this by sending INTRODUCE1 cells to all the IPs and seeing which ones fail.
As a more conditional attack (from the IP PoV), let's think of a super-HS with two "HS peers" where each of them has one Introduction Point. If one of the "HS peers" goes down, then the other IP might be able to figure this out using the number of introduction it conducts (if we assume that each IP used to do half of the introductions of the HS, then the number of introductions will increase when one "HS peer" goes down.)
On Wed, Oct 09, 2013 at 09:58:07AM +0100, Christopher Baines wrote:
On 09/10/13 01:16, Matthew Finkel wrote:
Then comes the second problem, following the above, the introduction point would then disconnect from any other connected OP using the same public key (unsure why as a reason is not given in the rend-spec). This would need to change such that an introduction point can talk to more than one instance of the hidden service.
It's important to think about the current design based on the assumption that a hidden service is a single node. Any modifications to this assumption will change the behavior of the various components.
The only interactions I currently believe can be affected are the Hidden Service instance <-> Introduction point(s) and Hidden Service instance <-> directory server. I need to go and read more about the latter, as I don't have all the information yet.
Also, to be fair, one of the devs has already started working on upgrading various components of hidden services [3][4]. You may also want to read through these so you have an idea of what are some future plans.
Also, keep in mind that that the current design may not work well for this (scaling) use case. Perhaps also thinking about modifications to the current design that are backwards compatible will help.
These two changes combined should help with the two goals. Reliability is improved by having multiple OP's providing the service, and having all of these accessible from the introduction points. Scalability is also improved, as you are not limited to one OP (as described above, currently you can also have +1 but only one will receive most of the traffic, and fail over is slow).
Do you see any disadvantages to this design?
So, care needs to be taken around the interaction between the hidden service instances, and the introduction points. If each instance just makes one circuit, then this reveals the number of instances.
Does it?
There is also uncertainty around the replacement of failing introduction points. New ones have to be chosen, but as the service instances do not directly communicate, there could be some interesting behaviour unless this is done carefully.
Is there a reason they shouldn't communicate with each other?
I am also unsure how the lack of direct communication between the hidden service instances could affect the usability of this. I think what would be good to do is take some large, open source, distributed web applications and look at how/how not to set them up using various possible implementations of distributed hidden services.
I am aware that there are several undefined parts of the above description, e.g. how does a introduction point choose what circuit to use? but at the moment I am more interested in the wider picture. It would be good to get some feedback on this.
1: https://blog.torproject.org/blog/hidden-services-need-some-love 2: http://tor.stackexchange.com/questions/13/can-a-hidden-service-be-hosted-by-...
This is a good start! Some important criteria you might also think about include how much you trust each component/node and which nodes do you want to be responsible for deciding where connections are routed. Also seriously think about how something like a botnet that uses hidden services might impact the reliability of your design (crazy idea, I know).
I assume the characteristics of this are: 1 or more hidden service instances, connected to by very large numbers of clients, sending and reviving small amounts of information?
Perhaps, but just think about the load an intro point can handle and sustain. If Introduction Points are where load balacing takes place, then does this affect the difficulty of attacking a hidden service? (for some undefined definition of 'attack'.)
[3] https://lists.torproject.org/pipermail/tor-dev/2013-October/005534.html [4] https://lists.torproject.org/pipermail/tor-dev/2013-October/005536.html
On 09/10/13 18:05, Matthew Finkel wrote:
These two changes combined should help with the two goals. Reliability is improved by having multiple OP's providing the service, and having all of these accessible from the introduction points. Scalability is also improved, as you are not limited to one OP (as described above, currently you can also have +1 but only one will receive most of the traffic, and fail over is slow).
Do you see any disadvantages to this design?
So, care needs to be taken around the interaction between the hidden service instances, and the introduction points. If each instance just makes one circuit, then this reveals the number of instances.
Does it?
Given the above, that is, each instance of the hidden serivce connects once to each introduction point. Then the number of instances of a hidden service, is equal to the number of connections with that each introduction point sees with that key.
There is also uncertainty around the replacement of failing introduction points. New ones have to be chosen, but as the service instances do not directly communicate, there could be some interesting behaviour unless this is done carefully.
Is there a reason they shouldn't communicate with each other?
I have avoided it so far, as it increases the complexity both the implementation and setup. However, this is probably a minor issue, as the major issue is how service providers would want to use this? Complex hidden services (compared to hidden services with static content) will probably require either communication between instances, or communication from all instances to another server (set of servers)?
I am aware that there are several undefined parts of the above description, e.g. how does a introduction point choose what circuit to use? but at the moment I am more interested in the wider picture. It would be good to get some feedback on this.
1: https://blog.torproject.org/blog/hidden-services-need-some-love 2: http://tor.stackexchange.com/questions/13/can-a-hidden-service-be-hosted-by-...
This is a good start! Some important criteria you might also think about include how much you trust each component/node and which nodes do you want to be responsible for deciding where connections are routed. Also seriously think about how something like a botnet that uses hidden services might impact the reliability of your design (crazy idea, I know).
I assume the characteristics of this are: 1 or more hidden service instances, connected to by very large numbers of clients, sending and reviving small amounts of information?
Perhaps, but just think about the load an intro point can handle and sustain. If Introduction Points are where load balacing takes place, then does this affect the difficulty of attacking a hidden service? (for some undefined definition of 'attack'.)
At the moment, I am really considering the redundancy and scalibility of the serivce. Both of these could be helped by allowing for multi-instance hidden serivces (in a planned and thought through manor). Hopefully allowing for this will increase the difficulty to attack hidden serivce, not directly, but by allowing the operators to use this functionality.
On Sun, Oct 13, 2013 at 10:22:29PM +0100, Christopher Baines wrote:
On 09/10/13 18:05, Matthew Finkel wrote:
These two changes combined should help with the two goals. Reliability is improved by having multiple OP's providing the service, and having all of these accessible from the introduction points. Scalability is also improved, as you are not limited to one OP (as described above, currently you can also have +1 but only one will receive most of the traffic, and fail over is slow).
Do you see any disadvantages to this design?
So, care needs to be taken around the interaction between the hidden service instances, and the introduction points. If each instance just makes one circuit, then this reveals the number of instances.
Does it?
Given the above, that is, each instance of the hidden serivce connects once to each introduction point. Then the number of instances of a hidden service, is equal to the number of connections with that each introduction point sees with that key.
Ah, I missed something earlier, this makes more sense now. Thanks for reiterating that point.
So, this having been said, do you have some thoughts on how to counter this? At this point, introduction points are selected at random, it will be unforunate if they can build a profile of a hidden service's usage over time.
There is also uncertainty around the replacement of failing introduction points. New ones have to be chosen, but as the service instances do not directly communicate, there could be some interesting behaviour unless this is done carefully.
Is there a reason they shouldn't communicate with each other?
I have avoided it so far, as it increases the complexity both the implementation and setup. However, this is probably a minor issue, as the major issue is how service providers would want to use this? Complex hidden services (compared to hidden services with static content) will probably require either communication between instances, or communication from all instances to another server (set of servers)?
It will surely increase the complexity, however allowing the hidden service peers to coordinate their introduction points (and/or other information) could be a useful feature. This could be especially true if we want to address the "all introduction points know the number of hidden service instances that constitute a hidden service address" problem.
As a general rule, we want to minimize the number of nodes that are given a priviledged position within the network. As an example, if we go back to my earlier comment and assume all instances of a hidden service use the same introduction points, then a client will use any one of the introduction points with equal probability. Given this, an introduction point 1) knows the size (number of instances) of the hidden service, 2) can influence which hidden service instances are used by clients, 3) can communicate with a HS without knowing who it is, and 4) can potentially determine the geographical location of the hidden service's users (based on when it is used). These last few points are not unique to your design and the last point is not unique to introduction points, but these leakages are important and we should try to account for them (and plug them, if possible). (This is not an exhaustive list)
I am aware that there are several undefined parts of the above description, e.g. how does a introduction point choose what circuit to use? but at the moment I am more interested in the wider picture. It would be good to get some feedback on this.
1: https://blog.torproject.org/blog/hidden-services-need-some-love 2: http://tor.stackexchange.com/questions/13/can-a-hidden-service-be-hosted-by-...
This is a good start! Some important criteria you might also think about include how much you trust each component/node and which nodes do you want to be responsible for deciding where connections are routed. Also seriously think about how something like a botnet that uses hidden services might impact the reliability of your design (crazy idea, I know).
I assume the characteristics of this are: 1 or more hidden service instances, connected to by very large numbers of clients, sending and reviving small amounts of information?
Perhaps, but just think about the load an intro point can handle and sustain. If Introduction Points are where load balacing takes place, then does this affect the difficulty of attacking a hidden service? (for some undefined definition of 'attack'.)
At the moment, I am really considering the redundancy and scalibility of the serivce. Both of these could be helped by allowing for multi-instance hidden serivces (in a planned and thought through manor). Hopefully allowing for this will increase the difficulty to attack hidden serivce, not directly, but by allowing the operators to use this functionality.
Understood, and I appreciate this concentration, but also try to keep in mind that, while you work on this design, if you are faced with a tradeoff between scalability/reliability and anonymity, default to anonymity.
On 08/10/13 06:52, Christopher Baines wrote:
I have been looking at doing some work on Tor as part of my degree, and more specifically, looking at Hidden Services. One of the issues where I believe I might be able to make some progress, is the Hidden Service Scaling issue as described here [1].
So, before I start trying to implement a prototype, I thought I would set out my ideas here to check they are reasonable (I have also been discussing this a bit on #tor-dev). The goal of this is two fold, to reduce the probability of failure of a hidden service and to increase hidden service scalability.
Previous threads on this subject: https://lists.torproject.org/pipermail/tor-dev/2013-October/005556.html https://lists.torproject.org/pipermail/tor-dev/2013-October/005674.html
I have now implemented a prototype, for one possible design of how to allow distribution in hidden services. While developing this, I also made some modifications to chutney to allow for the tests I wanted to write.
In short, I modified tor such that: - The services public key is used in the connection to introduction points (a return to the state as of the v0 descriptor) - multiple connections for one service to an introduction point is allowed (previously, existing were closed) - tor will check for a descriptor when it needs to establish all of its introduction points, and connect to the ones in the descriptor (if it is available) - Use a approach similar to the selection of the HSDir's for the selection of new introduction points (instead of a random selection) - Attempt to reconnect to an introduction point, if the connection is lost
With chutney, I added support for interacting with the nodes through Stem, I also moved the control over starting the nodes to the test, as this allows for more complex behaviour.
Currently the one major issue is that using an approach similar to the HSDir selection means that introduction points suffer from the same issue as HSDir's currently [1]. I believe any satisfactory solution to the HSDir issue would resolve this problem also.
One other thing of note, tor currently allows building circuits to introduction points, through existing introduction points, and selecting introduction points on circuits used to connect to other introduction points. These two issues mean that a failure in one introduction point, can currently cause tor to change two introduction points. (I am not saying this needs changing, but you could adjust the circuit creation, to prevent some extra work later if a failure occurs).
Any comments regarding the above would be welcome.
I have put the code for this up, but it should not be used for anything other than private testing (and will not work properly outside of chutney at the moment anyway).
The modifications to tor can be found in the disths branch of: git://git.cbaines.net/tor.git The modifications and additional tests for chutney can be found in the disths branch of: git://git.cbaines.net/chutney.git
To run the tests against the new code, you would do something along the lines of: git clone -b disths git://git.cbaines.net/tor.git git clone -b disths git://git.cbaines.net/chutney.git
cd tor ./autogen.sh ./configure make clean all
cd ../chutney git submodule update --init export PATH=../tor/src/or:../tor/src/tools/:$PATH
ls networks/hs-* | xargs -n 1 ./chutney configure ls networks/hs-* | xargs -n 1 ./chutney --quiet start
The last command should yield some output similar to: networks/hs-dual-intro-fail-3 PASS networks/hs-intro-fail-2 PASS networks/hs-intro-fail-3 PASS networks/hs-intro-select-2 PASS networks/hs-start-3 PASS networks/hs-stop-3 PASS networks/hs-tripple-intro-fail-3 PASS
El 30/04/14 17:06, Christopher Baines escribió:
On 08/10/13 06:52, Christopher Baines wrote:
I have been looking at doing some work on Tor as part of my degree, and more specifically, looking at Hidden Services. One of the issues where I believe I might be able to make some progress, is the Hidden Service Scaling issue as described here [1].
So, before I start trying to implement a prototype, I thought I would set out my ideas here to check they are reasonable (I have also been discussing this a bit on #tor-dev). The goal of this is two fold, to reduce the probability of failure of a hidden service and to increase hidden service scalability.
Previous threads on this subject: https://lists.torproject.org/pipermail/tor-dev/2013-October/005556.html https://lists.torproject.org/pipermail/tor-dev/2013-October/005674.html
I have now implemented a prototype, for one possible design of how to allow distribution in hidden services. While developing this, I also made some modifications to chutney to allow for the tests I wanted to write.
In short, I modified tor such that:
- The services public key is used in the connection to introduction
points (a return to the state as of the v0 descriptor)
- multiple connections for one service to an introduction point is
allowed (previously, existing were closed)
- tor will check for a descriptor when it needs to establish all of its
introduction points, and connect to the ones in the descriptor (if it is available)
- Use a approach similar to the selection of the HSDir's for the
selection of new introduction points (instead of a random selection)an taack involvin
- Attempt to reconnect to an introduction point, if the connection is lost
I appreciate your work since Hidden services are really bad. Hard to reach ATM sometimes. But ... how you do this in details? Sorry but walking over your sources could be challenging if I don't know the original codebase you used and is gonna take more time than if I just ask you. I also can't test as I don't have enough resources/know how/time.
I am worried about an attack coming from evil IP based on forced disconnection of the HS from the IP. I don't know if this is possible but I am worried that if you pick a new circuit randomly could be highly problematic. Lets say I am NSA and I own 10% of the routers and disconnecting your HS from an IP I control, if you select a new circuit randomly, even if the probabilities are low, eventually is a matters of time until I force you to use an specific circuit from those convenient to me in order to have a possible circuit(out of many) that transfers your original IP as metadata through cooperative routers I own and then do away with the anonymity of the hidden service.
The big question I have is what is the probability with current Tor network size of this happening? If things are like I describe, is a matter of seconds or thousand of years?
With chutney, I added support for interacting with the nodes through Stem, I also moved the control over starting the nodes to the test, as this allows for more complex behaviour.
Currently the one major issue is that using an approach similar to the HSDir selection means that introduction points suffer from the same issue as HSDir's currently [1]. I believe any satisfactory solution to the HSDir issue would resolve this problem also.
One other thing of note, tor currently allows building circuits to introduction points, through existing introduction points, and selecting introduction points on circuits used to connect to other introduction points. These two issues mean that a failure in one introduction point, can currently cause tor to change two introduction points. (I am not saying this needs changing, but you could adjust the circuit creation, to prevent some extra work later if a failure occurs).
Any comments regarding the above would be welcome.
I have put the code for this up, but it should not be used for anything other than private testing (and will not work properly outside of chutney at the moment anyway).
The modifications to tor can be found in the disths branch of: git://git.cbaines.net/tor.git The modifications and additional tests for chutney can be found in the disths branch of: git://git.cbaines.net/chutney.git
To run the tests against the new code, you would do something along the lines of: git clone -b disths git://git.cbaines.net/tor.git git clone -b disths git://git.cbaines.net/chutney.git
cd tor ./autogen.sh ./configure make clean all
cd ../chutney git submodule update --init export PATH=../tor/src/or:../tor/src/tools/:$PATH
ls networks/hs-* | xargs -n 1 ./chutney configure ls networks/hs-* | xargs -n 1 ./chutney --quiet start
The last command should yield some output similar to: networks/hs-dual-intro-fail-3 PASS networks/hs-intro-fail-2 PASS networks/hs-intro-fail-3 PASS networks/hs-intro-select-2 PASS networks/hs-start-3 PASS networks/hs-stop-3 PASS networks/hs-tripple-intro-fail-3 PASS
1: https://trac.torproject.org/projects/tor/ticket/8244
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
On 02/05/14 00:45, waldo wrote:
El 30/04/14 17:06, Christopher Baines escribió:
On 08/10/13 06:52, Christopher Baines wrote:
I have been looking at doing some work on Tor as part of my degree, and more specifically, looking at Hidden Services. One of the issues where I believe I might be able to make some progress, is the Hidden Service Scaling issue as described here [1].
So, before I start trying to implement a prototype, I thought I would set out my ideas here to check they are reasonable (I have also been discussing this a bit on #tor-dev). The goal of this is two fold, to reduce the probability of failure of a hidden service and to increase hidden service scalability.
Previous threads on this subject: https://lists.torproject.org/pipermail/tor-dev/2013-October/005556.html https://lists.torproject.org/pipermail/tor-dev/2013-October/005674.html
I have now implemented a prototype, for one possible design of how to allow distribution in hidden services. While developing this, I also made some modifications to chutney to allow for the tests I wanted to write.
In short, I modified tor such that:
- The services public key is used in the connection to introduction
points (a return to the state as of the v0 descriptor)
- multiple connections for one service to an introduction point is
allowed (previously, existing were closed)
- tor will check for a descriptor when it needs to establish all of its
introduction points, and connect to the ones in the descriptor (if it is available)
- Use a approach similar to the selection of the HSDir's for the
selection of new introduction points (instead of a random selection)an taack involvin
- Attempt to reconnect to an introduction point, if the connection
is lost
I appreciate your work since Hidden services are really bad. Hard to reach ATM sometimes. But ... how you do this in details? Sorry but walking over your sources could be challenging if I don't know the original codebase you used and is gonna take more time than if I just ask you. I also can't test as I don't have enough resources/know how/time.
In terms of the code, just when the circuit to an introduction point has failed, try to establish another one. I am unsure if I have taken the best approach in terms of code, but it does seem to work.
I am worried about an attack coming from evil IP based on forced disconnection of the HS from the IP. I don't know if this is possible but I am worried that if you pick a new circuit randomly could be highly problematic. Lets say I am NSA and I own 10% of the routers and disconnecting your HS from an IP I control, if you select a new circuit randomly, even if the probabilities are low, eventually is a matters of time until I force you to use an specific circuit from those convenient to me in order to have a possible circuit(out of many) that transfers your original IP as metadata through cooperative routers I own and then do away with the anonymity of the hidden service.
Yeah, that does appear plausible. Are there guard nodes used for hidden service circuits (I forget)?
The big question I have is what is the probability with current Tor network size of this happening? If things are like I describe, is a matter of seconds or thousand of years?
I am unsure. I implemented this, as it was quite probable when testing with a small network using chutney. When testing the behaviour of the network when an introduction point fails, you need to have reconnection, otherwise instances which connect to other introduction points through that failed introduction point, will also see those working introduction points as failing. Leading to the instances using different introduction points (what I was trying to avoid).
El 02/05/14 02:34, Christopher Baines escribió:
On 02/05/14 00:45, waldo wrote:
El 30/04/14 17:06, Christopher Baines escribió:
On 08/10/13 06:52, Christopher Baines wrote:
I have been looking at doing some work on Tor as part of my degree, and more specifically, looking at Hidden Services. One of the issues where I believe I might be able to make some progress, is the Hidden Service Scaling issue as described here [1].
So, before I start trying to implement a prototype, I thought I would set out my ideas here to check they are reasonable (I have also been discussing this a bit on #tor-dev). The goal of this is two fold, to reduce the probability of failure of a hidden service and to increase hidden service scalability.
Previous threads on this subject: https://lists.torproject.org/pipermail/tor-dev/2013-October/005556.html https://lists.torproject.org/pipermail/tor-dev/2013-October/005674.html
I have now implemented a prototype, for one possible design of how to allow distribution in hidden services. While developing this, I also made some modifications to chutney to allow for the tests I wanted to write.
In short, I modified tor such that:
- The services public key is used in the connection to introduction
points (a return to the state as of the v0 descriptor)
- multiple connections for one service to an introduction point is
allowed (previously, existing were closed)
- tor will check for a descriptor when it needs to establish all of its
introduction points, and connect to the ones in the descriptor (if it is available)
- Use a approach similar to the selection of the HSDir's for the
selection of new introduction points (instead of a random selection)an taack involvin
- Attempt to reconnect to an introduction point, if the connection
is lost
I appreciate your work since Hidden services are really bad. Hard to reach ATM sometimes. But ... how you do this in details? Sorry but walking over your sources could be challenging if I don't know the original codebase you used and is gonna take more time than if I just ask you. I also can't test as I don't have enough resources/know how/time.
In terms of the code, just when the circuit to an introduction point has failed, try to establish another one. I am unsure if I have taken the best approach in terms of code, but it does seem to work.
I am worried about an attack coming from evil IP based on forced disconnection of the HS from the IP. I don't know if this is possible but I am worried that if you pick a new circuit randomly could be highly problematic. Lets say I am NSA and I own 10% of the routers and disconnecting your HS from an IP I control, if you select a new circuit randomly, even if the probabilities are low, eventually is a matters of time until I force you to use an specific circuit from those convenient to me in order to have a possible circuit(out of many) that transfers your original IP as metadata through cooperative routers I own and then do away with the anonymity of the hidden service.
Yeah, that does appear plausible. Are there guard nodes used for hidden service circuits (I forget)?
No idea, according to this docs https://www.torproject.org/docs/hidden-services.html.en there aren't guards in the circuits to the IP in step one(not mentioned). They are definitely used on step five to protect against a timing attack with a corrupt entry node.
Even if they are used, I still see some problems. I mean it looks convenient to try to reconnect to the same IP but in real life you are going to find nodes that fail a lot so if you picked an IP that has bad connectivity reconnecting to it is not gonna contribute at all with the HS scalability or availability of your HS, on the contrary.
Maybe a good idea would be to try to reconnect and if it is failing too much select another IP.
If the IP is doing it on purpose the HS Is going to go away so the control the IP has disconnecting your HS is capped for any attack known or unknown. If is not on purpose the HS goes throwing away failing nodes until it picks a good node as IP. I think it would cause over time, the tor network re-balance/readapt to new conditions itself. For instance in the case some IP is overloaded (maybe by DoS) causes the HS to go away from the IP.
I would also rotate the IPs after using them some time. I don't think is good to have one IP for too long. Doesn't sounds good to me. If for instance I am big daddy and know your IPs I could go there seize the computers and start gathering funny statistics about your HS. Or simply censor your HS by dropping messages from clients trying to send you the rendezvous point (is this possible? looks like it is if I drop introduce messages and generate fake ones). You wouldn't even know cause I can keep your connected and receiving fake connections. Only maybe if you try to check the IP by trying to send a rendezvous point from your HS to your HS (this IP quality test would be great if tor would do it periodically). I somehow do it myself manually when I notice the HS is superhard to reach. Sometimes it works great, sometimes even being turned on the server and online, is not visible. So you have to take down tor and restart it and wait again for a while.
I was thinking maybe you could select new ones and inform HSDirs about the change and after the new ones are known end circuits to the previous IPs and with that avoid the overhead of the rotation.
I would rebuild circuits to the IP from time to time (originating from the HS). Multiple connections to the same IP would permit to do this better since I can make a new one and afterwards kill a previous circuit remaining connected all the time.
In some previous messages about the subject I saw that HSDirs provide all the HS IPs. I don't like this way of doing things since let's say I have 6 IPs to my HS available to everyone. To cause a DoS to your HS seems to me all I have to do is cause a DoS to the IPs. And there is no need for everyone to know all the IPs of one HS all the time. All one user needs to connect is just some maybe for redundancy but not all.
Is there some way to only provide part of the IPs of one HS to one user? Avoid enumeration? Maybe distribute partial information to HSDirs? Don't know, just thinking. Maybe "abuse" some caching effect on HSDirs and publish partial IP information on one end and partial in another end that only reaches all users in entirety over time.
The big question I have is what is the probability with current Tor network size of this happening? If things are like I describe, is a matter of seconds or thousand of years?
I am unsure. I implemented this, as it was quite probable when testing with a small network using chutney. When testing the behaviour of the network when an introduction point fails, you need to have reconnection, otherwise instances which connect to other introduction points through that failed introduction point, will also see those working introduction points as failing. Leading to the instances using different introduction points (what I was trying to avoid).
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
On 04/05/14 11:43, waldo wrote:
El 02/05/14 02:34, Christopher Baines escribió:
On 02/05/14 00:45, waldo wrote:
I am worried about an attack coming from evil IP based on forced disconnection of the HS from the IP. I don't know if this is possible but I am worried that if you pick a new circuit randomly could be highly problematic. Lets say I am NSA and I own 10% of the routers and disconnecting your HS from an IP I control, if you select a new circuit randomly, even if the probabilities are low, eventually is a matters of time until I force you to use an specific circuit from those convenient to me in order to have a possible circuit(out of many) that transfers your original IP as metadata through cooperative routers I own and then do away with the anonymity of the hidden service.
Yeah, that does appear plausible. Are there guard nodes used for hidden service circuits (I forget)?
No idea, according to this docs https://www.torproject.org/docs/hidden-services.html.en there aren't guards in the circuits to the IP in step one(not mentioned). They are definitely used on step five to protect against a timing attack with a corrupt entry node.
Even if they are used, I still see some problems. I mean it looks convenient to try to reconnect to the same IP but in real life you are going to find nodes that fail a lot so if you picked an IP that has bad connectivity reconnecting to it is not gonna contribute at all with the HS scalability or availability of your HS, on the contrary.
I don't think a minority of bad IP's will do much to hurt a hidden service. Clients will try connecting through all IP's until giving up, and this will only happen when they initially connect.
Maybe a good idea would be to try to reconnect and if it is failing too much select another IP.
It currently does do this, but on probably a shorter time period than you are suggesting. It keeps a count of connection failures while trying to reconnect, but this is reset once a new connection is established.
This gets complicated, as you need to ensure that each instance of the service is using the same introduction points, it seems to me that tracking connectivity failures over the long term, and changing IP on some threshold could break this.
If the IP is doing it on purpose the HS Is going to go away so the control the IP has disconnecting your HS is capped for any attack known or unknown. If is not on purpose the HS goes throwing away failing nodes until it picks a good node as IP. I think it would cause over time, the tor network re-balance/readapt to new conditions itself. For instance in the case some IP is overloaded (maybe by DoS) causes the HS to go away from the IP.
I would also rotate the IPs after using them some time. I don't think is good to have one IP for too long. Doesn't sounds good to me. If for instance I am big daddy and know your IPs I could go there seize the computers and start gathering funny statistics about your HS. Or simply censor your HS by dropping messages from clients trying to send you the rendezvous point (is this possible? looks like it is if I drop introduce messages and generate fake ones). You wouldn't even know cause I can keep your connected and receiving fake connections. Only maybe if you try to check the IP by trying to send a rendezvous point from your HS to your HS (this IP quality test would be great if tor would do it periodically). I somehow do it myself manually when I notice the HS is superhard to reach. Sometimes it works great, sometimes even being turned on the server and online, is not visible. So you have to take down tor and restart it and wait again for a while.
I was thinking maybe you could select new ones and inform HSDirs about the change and after the new ones are known end circuits to the previous IPs and with that avoid the overhead of the rotation.
I would rebuild circuits to the IP from time to time (originating from the HS). Multiple connections to the same IP would permit to do this better since I can make a new one and afterwards kill a previous circuit remaining connected all the time.
Lots of things here, generally, some things seem quite hard to do in a uncoordinated, distributed manor (e.g. IP rotation). And I am not to sure that things like IP rotation and rebuilding circuits to IP's will even help with anonymity issues.
In some previous messages about the subject I saw that HSDirs provide all the HS IPs. I don't like this way of doing things since let's say I have 6 IPs to my HS available to everyone. To cause a DoS to your HS seems to me all I have to do is cause a DoS to the IPs. And there is no need for everyone to know all the IPs of one HS all the time. All one user needs to connect is just some maybe for redundancy but not all.
Is there some way to only provide part of the IPs of one HS to one user? Avoid enumeration? Maybe distribute partial information to HSDirs? Don't know, just thinking. Maybe "abuse" some caching effect on HSDirs and publish partial IP information on one end and partial in another end that only reaches all users in entirety over time.
As the set of IP's is so small, I cannot think of any practical way to do this without it being trivial to break.
El 04/05/14 07:42, Christopher Baines escribió:
On 04/05/14 11:43, waldo wrote:
El 02/05/14 02:34, Christopher Baines escribió:
On 02/05/14 00:45, waldo wrote:
I am worried about an attack coming from evil IP based on forced disconnection of the HS from the IP. I don't know if this is possible but I am worried that if you pick a new circuit randomly could be highly problematic. Lets say I am NSA and I own 10% of the routers and disconnecting your HS from an IP I control, if you select a new circuit randomly, even if the probabilities are low, eventually is a matters of time until I force you to use an specific circuit from those convenient to me in order to have a possible circuit(out of many) that transfers your original IP as metadata through cooperative routers I own and then do away with the anonymity of the hidden service.
Yeah, that does appear plausible. Are there guard nodes used for hidden service circuits (I forget)?
No idea, according to this docs https://www.torproject.org/docs/hidden-services.html.en there aren't guards in the circuits to the IP in step one(not mentioned). They are definitely used on step five to protect against a timing attack with a corrupt entry node.
Even if they are used, I still see some problems. I mean it looks convenient to try to reconnect to the same IP but in real life you are going to find nodes that fail a lot so if you picked an IP that has bad connectivity reconnecting to it is not gonna contribute at all with the HS scalability or availability of your HS, on the contrary.
I don't think a minority of bad IP's will do much to hurt a hidden service.
Hi Christopher. You are correct a minority can't do much harm, but they don't contribute. What's the point on keeping them? I don't meant to be rude, but also minority is relative. Can you please tell us what is the total number of IPs? I ask you because you were working there so you likely know better. If is 3 then one bad node is 33% of failed connections, If they are 50 one is only 2%.
Clients will try connecting through all IP's until giving up, and this will only happen when they initially connect.
What I've had noticed is that initial connection is what causes most troubles. Once you establish a rendezvous with the HS things go smooth. Is my personal experience don't know what others experience. I've noticed also that usually highly used services like the hidden wiki tend to behave a whole lot better. No idea why. Could be related to the HSDir query and this http://donncha.is/2013/05/trawling-tor-hidden-services/. Don't know if that was fixed.
Maybe a good idea would be to try to reconnect and if it is failing too much select another IP.
It currently does do this, but on probably a shorter time period than you are suggesting. It keeps a count of connection failures while trying to reconnect, but this is reset once a new connection is established.
Yes I meant measuring a larger time period and over different circuits as the cause of disconnection could have being the circuit failing and not the IP.
What happens if the HS just goes offline for a while? It keeps trying to connect, finds that it can't connect to the IPs and picks another set? You are differentiating that case?
How do they coordinate which one publishes the descriptor? Wich one puts the first descriptor?
This gets complicated, as you need to ensure that each instance of the service is using the same introduction points,
I saw your answer to another person and seems to me is related to this you are saying
If the "service key"'s (randomly generated keys per introduction point) are used, then this would complicate/cause problems with the multiple instances connecting to one introduction point. Only one key would be listed in the descriptor, which would only allow one instance to get the traffic.
What if the instances interchange keys and use the same? Master/slave for example and one of them take the master role if the master goes offline. Lets say master instance creates the IPs and sends a message to the rest to connect there.
How about changing the descriptor to host several keys per IP in case previous is not possible/too difficult?
Why this needs to be ensured? Does it breaks something? I understand it could be convenient to have at least two to avoid correlation attacks from the IP but why all? What would happen if one instance goes offline? The whole thing breaks? The desirable behavior I think in that case is that other instances take over and split load as if nothing happened.
I also think is highly desirable they are indistinguishable to provide less information (enumeration, etc).
If they use the same key, you could send the rendezvous message to all instances from the IP as all would have the same private key and can decrypt it (if the IP is not shared by different HS, don't know if this is possible currently). So the message doesn't gets lost in a failed circuit. If it is shared by several HSs some routing could be convenient but not necessary IMHO as they don't have the key to decrypt and statistics gathering could be blinded by the HS sending bogus RV messages that later discards.
Instances could negotiate which one is going to answer even if the one who is going to answer is not connected to the IP if instances talk to each other.
Let's say I could have a master that receives information of the load of slaves and receives the RV messages all instances receive. Later instructs the instance with less load to answer.
it seems to me that tracking connectivity failures over the long term, and changing IP on some threshold could break this.
Why would break it? I would create new IPs connect to them and would keep the old ones until the new ones become accessible (I could detect this by monitoring if I receive messages or by querying the HSDirs). The IP could act non cooperatively and send me bogus messages to try to confuse the HS and avoid it going away so probably checking both could be a good idea. I could also test if I start receiving messages through the new IPs. Just ideas.
If the IP is doing it on purpose the HS Is going to go away so the control the IP has disconnecting your HS is capped for any attack known or unknown. If is not on purpose the HS goes throwing away failing nodes until it picks a good node as IP. I think it would cause over time, the tor network re-balance/readapt to new conditions itself. For instance in the case some IP is overloaded (maybe by DoS) causes the HS to go away from the IP.
I would also rotate the IPs after using them some time. I don't think is good to have one IP for too long. Doesn't sounds good to me. If for instance I am big daddy and know your IPs I could go there seize the computers and start gathering funny statistics about your HS. Or simply censor your HS by dropping messages from clients trying to send you the rendezvous point (is this possible? looks like it is if I drop introduce messages and generate fake ones). You wouldn't even know cause I can keep your connected and receiving fake connections. Only maybe if you try to check the IP by trying to send a rendezvous point from your HS to your HS (this IP quality test would be great if tor would do it periodically). I somehow do it myself manually when I notice the HS is superhard to reach. Sometimes it works great, sometimes even being turned on the server and online, is not visible. So you have to take down tor and restart it and wait again for a while.
I was thinking maybe you could select new ones and inform HSDirs about the change and after the new ones are known end circuits to the previous IPs and with that avoid the overhead of the rotation.
I would rebuild circuits to the IP from time to time (originating from the HS). Multiple connections to the same IP would permit to do this better since I can make a new one and afterwards kill a previous circuit remaining connected all the time.
Lots of things here, generally, some things seem quite hard to do in a uncoordinated, distributed manor (e.g. IP rotation).
Why uncoordinated? Looks to me it would be convenient instances could talk. Load balance, taking over of failed instances, etc. Would take work to do that for sure but doesn't seems impossible to me. I guess the HSDir code would have to be modified to be able to host new signed information coming for the same HS while maintaining the old one. Maybe follow signed commands from the HS. Delete this IPs, add this other IPs with some limit to avoid the HS attack HSDirs flooding them to store bogus info). New question here could anyone flood an HSDir by posting a zillion descriptors for a zillion bogus HSs? The HS would have to select new ones, publish them wait until they become available before dropping circuits to the old ones.
And I am not to sure that things like IP rotation and rebuilding circuits to IP's will even help with anonymity issues.
Regarding entry guards this is one article speaking about them, don't know if up to date:
https://blog.torproject.org/category/tags/entry-guards
Seems they are always used for all sort of circuits including HS. So if you reused the code they are being selected.
I am still concerned that if things stay too long in one way, big players (antidemocratic governments for instance) could do things. Keep in mind that if you have more running instances of your HS the chances to locate one of them increases since I only have to locate one of your instances to know who you are.
Ok take a look at this attack, correct me if I am wrong and some point is not possible (I invite anyone to proof me wrong). As I said I appreciate your work, but it needs to be challenged to be accepted by the community so it doesn't stays in a limbo of "I don't know" and is better to patch every possible hole before it becomes mainstream.
Suppose I am an totalitarian government and you are a dissident running an HS over Tor in the same country.
1 - I start introducing high availability high bandwidth corrupt nodes to the network across the globe (I could rent servers in case you decide to connect to nodes offshore or simply deploy nodes in another country), the more I put the higher the chances of being a stone in your anonymity path. To lower the budget I could host several Tor routers in one computer with several network interfaces and fast CPU/crypto hard for OpenSSL.
2 - I see you are using some IPs (I can query the HSDirs to get some) so If I am not lucky to be your HS's IP at start, I go flooding those you select to take them offline in order to force you switch so you pick eventually one of my corrupt nodes as IP. Is not clear to me if choosing them in a deterministic way gets in the way of this or helps. If is deterministic I could precalculate how many nodes I have to force go offline until you pick one of mine, so I could shutdown those nodes that will never be selected and lower my budget at least. I could bribe some ISP to rent servers with specific IP numbers(don't know if you selected this) or bribe the IP operator if he/she publishes the contact email. You can't even suspect if you don't know the flooded IP operator as is totally normal a node going offline. So no dust raised (there is a way to make a Tor router publicly publish it was attacked so other ppl can be warned?).
3 - I become one of your IPs at least. This is a good achievement. From now on I know you are going to connect back to me using some circuit that can contain corrupt nodes or not when I disconnect you. When you connect back to me you have the counter reset so I can disconnect you as much as I want.
4 - I know your last node so if it is not mine I disconnect your circuit until you select a circuit that contains one of my corrupt nodes as the last one.
5 - When you do I can see the previous node in your circuit. If it is not mine I disconnect you and go to step 4. If it is, I learn about one of your guards. I stay for a while going to step 4 to enumerate your guards. The more you have the longer it takes, the less guards the easier for me. But I can continue the attack as long as I know one of the guards to gain time and see if I have success. I could flood your guard to force you select another guard and accelerate the process. Or globally block access to the guard.
6 - Once I learn all your guards I can do some things in front of that.
- Since your instance is going to connect to those nodes for a long while, I could censor your instance flooding those nodes at least until you notice and select new guard nodes (I can be insidious here repeating the attack over and over again and for each instance). I could wait to enumerate all the guard nodes of all of your instances since all connect to my IP.
- Since I am a big government and I have control of the ISP you are using, I could monitor incoming connections to your guard nodes. I record all network IP numbers connecting to those nodes for a while. Maybe I could filter here some nodes with some heuristics (nodes that only connect to those guards since the probability of another node connecting to those specific guards should be low) but not necessarily as I am going to filter later. Notice that HSs stay connected for looong time periods so the connection time of a server should be longer than a client and I could discard nodes with that information.
Once I have enough information I disconnect you from the net some small amount of time to avoid you leave my IP leaving some room for random failed circuits. I can tell the ISPs to do this for me. Is totally normal a disconnection so no dust raised. I can take my time too. Disconnect you today, wait some time and disconnect you again.
I could do some things for the case of an HS hosted offshore. Bribe ISP employees, cause DoS to ISPs or individual computers, sell backdoored routers, backdoor router firmware, but that is less realistic and harder. Not discardable IMHO but I am going for the easy case here.
Notice I don't care which circuit you use from now on to reconnect back to me even if you select new guards. I could monitor if your HS answers to RV messages using several preselected RV points too.
If once I disconnect those nodes, I don't see any instance go away I discard those nodes as hosting the HS.
If I see some of the instances go away then your node is in that subset.
I perform a binary search here disconnecting half nodes every time so the disconnection number it takes me is O(log(n)) where n is the total number of nodes I see connecting to those guards.
I repeat each of this steps several times to filter with statistics the casual circuit failure of your HS to my IP.
If two or more instances use the same IP I would still see some instances staying or going so seems it doesn't protects against the attack at all even if they are indistinguishable. If you close circuits and reopen new ones from time to time I would get noise here but maybe I could filter with statistics.
- I could in some cases seize a guard node or bribe the operator.
If so far there are no flaws, I can spy you to know where you are hosting computers and seize them without giving you time to turn them off and use plausible deniability crypto soft (truecrypt) and be able to claim you where routing instead of hosting the HS (by cloning the router without the HS data).
With multiple instances seems to me now becomes desirable to host at least one router per instance, to be able to deny you where hosting and claim you were routing. As looks it won't be possible to correlate the router down with the HS down (other instances would hide that if they are indistinguishable and take over when one instance goes down).
Now if you instead rotate the IPs from time to time, I would be forced to go back to step 2 of the attack but on the other side your chances of selecting a corrupt IP or a corrupt node in your circuits increase. I as hosting one of your IPs would have less control since is not going to last forever and would be time limited.
Changing circuits could introduce noise in step 6 to some extent, on the other side increases the chances of selecting a corrupt node in your circuit.
So probably all of this would have to be studied with statistics and current Tor network size.
Looks to me this could have more implications in other areas.
In some previous messages about the subject I saw that HSDirs provide all the HS IPs. I don't like this way of doing things since let's say I have 6 IPs to my HS available to everyone. To cause a DoS to your HS seems to me all I have to do is cause a DoS to the IPs. And there is no need for everyone to know all the IPs of one HS all the time. All one user needs to connect is just some maybe for redundancy but not all.
Is there some way to only provide part of the IPs of one HS to one user? Avoid enumeration? Maybe distribute partial information to HSDirs? Don't know, just thinking. Maybe "abuse" some caching effect on HSDirs and publish partial IP information on one end and partial in another end that only reaches all users in entirety over time.
As the set of IP's is so small,
Again small is relative. Earlier you mentioned some nodes failing where not going to affect too much the service so seems contradictory to me. Can you please mention numbers? Can't this number be increased?
I cannot think of any practical way to do this without it being trivial to break.
This is not directly related to your work but could be worth discussing. I was thinking that one property that maybe could be exploited could be the fact that the whole Tor network has lots of computational power that is hard to match by a single player (unless the player is really big). This is a rough idea that could contain flaws and maybe could be improved.
What if lets say the IP information is encrypted by the HS, doesn't provides the key and makes Tor clients "bruteforce" them to open the encrypted message containing the IP. All IP keys scattered through the keyspace that could be larger or shorter depending on the time I want you to spend looking for it. So any IP would have equal chance of being found. Passing the key through an ASIC resistant function and then encrypt with the result so big players would have to use at most GPU and somehow equalize different CPU powers through memory bandwidth. All Tor clients start looking at a different random position of the keyspace until they find one key to desencrypt one IP.
From there they start to communicate with the IP if available and keep looking for the rest of the IPs (to be able to reconnect if the RV and the initial used IP go offline). Maybe re-connection to the RV could be desirable too to some extent to improve availability of HS in case the circuit fails. Don't know if currently possible.
Maybe adding 1 bit of the key inside the encrypted message for another encrypted IP so finding through a route of decryption makes it more feasible than starting from zero (this could be a good idea or not). Therefore forcing anyone to follow a decryption path depending on where they start decrypting IPs information.
To explain the idea better lets say I have 3 introduction points A B and C, but I don't see why there couldn't be more specially since the bulk of the traffic goes through the rendezvous and not through the IP, IPs could be shared by several HS and since is harder to cause a DoS to more nodes than just a few.
lets say one Tor client starts looking at a random position and finds the key for B (now it can connect to B if available and pass the RV message) once decrypted it gets one bit for the key of C so starts looking for the key of C as is easier to find than the key for A. Then gets C and that gives it bits for A.
Another Tor client starts random finds A and gets 1 bit for B (now it can connect to A and pass the RV message if available) goes to B and gets the bit for C. finds C.
After a while the HS rotates IPs and the process starts again (not necessarily all at once could be a flow in a way some get replaced maintaining part of the old ones). So anyone trying to cause a DoS would have to perform all of that work again and be very quick to find all of the IPs before they are rotated again by the HS in order to flood all of the IPs. So at most all they could do is turn the HS intermittent (if enough CPU power and enough bandwidth). On the other side the Tor swarn would be very efective at finding them all.
The big question that remains to me here is how much this causes a big player waste a load of resources without pushing out of the network small players (mobile devices for instance). The memory hard functions equalizes somehow devices by the memory bandwidth limit but still can make large differences. Seems to me that the more IPs are selected for an HS the more this can be achieved.
One option I was thinking to fix that problem is for instance the HS could function normally if everything is running smooth, and if it detects that can't create circuits to the IPs switch to protective mode in a way that things work as usual if there is no attack but the system changes to defensive mode if there is an attack.
Let me explain better. I as an HS work as normal. Suddenly I see all the circuits to my IPs can't be established (or router operator of my IP publish they are being attacked). I create new ones and see again that eventually I can't connect. That probably means someone is attacking my IPs. I switch to defensive mode and publish encrypted IPs. Clients notice they are encrypted and start each on their side to look for the IP keys.
I could have degrees here and increase computational complexity to do away with the attack. Lets say maybe that only mobile devices get pushed away from the net but CPU and GPU nodes could still connect so the attack is only to a part.
One problem could appear here is the time it takes the information published to HSDirs to reach clients. I ignore a lot of information about that part and I beleive is under active research ATM.
The scheme doesn't pushes away big players with large computational power and bandwidth but can push away some medium players. Also it doesn't protects to other attacks for instance flooding through a RV (could puzzle solving be applied here? Let's say the harder the puzzle the more bandwidth I give to you)
Regards Waldo
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
On 09/05/14 20:05, waldo wrote:
El 04/05/14 07:42, Christopher Baines escribió:
On 04/05/14 11:43, waldo wrote:
El 02/05/14 02:34, Christopher Baines escribió:
On 02/05/14 00:45, waldo wrote:
I am worried about an attack coming from evil IP based on forced disconnection of the HS from the IP. I don't know if this is possible but I am worried that if you pick a new circuit randomly could be highly problematic. Lets say I am NSA and I own 10% of the routers and disconnecting your HS from an IP I control, if you select a new circuit randomly, even if the probabilities are low, eventually is a matters of time until I force you to use an specific circuit from those convenient to me in order to have a possible circuit(out of many) that transfers your original IP as metadata through cooperative routers I own and then do away with the anonymity of the hidden service.
Yeah, that does appear plausible. Are there guard nodes used for hidden service circuits (I forget)?
No idea, according to this docs https://www.torproject.org/docs/hidden-services.html.en there aren't guards in the circuits to the IP in step one(not mentioned). They are definitely used on step five to protect against a timing attack with a corrupt entry node.
Even if they are used, I still see some problems. I mean it looks convenient to try to reconnect to the same IP but in real life you are going to find nodes that fail a lot so if you picked an IP that has bad connectivity reconnecting to it is not gonna contribute at all with the HS scalability or availability of your HS, on the contrary.
I don't think a minority of bad IP's will do much to hurt a hidden service.
Hi Christopher. You are correct a minority can't do much harm, but they don't contribute. What's the point on keeping them? I don't meant to be rude, but also minority is relative. Can you please tell us what is the total number of IPs? I ask you because you were working there so you likely know better. If is 3 then one bad node is 33% of failed connections, If they are 50 one is only 2%.
I agree that it would be good if the service could detect and avoid "bad" IP's, but I don't yet see a good method for doing so (that fits within the rest of the design).
Regarding the number of IP's, unfortunately I also don't know. This is possible to look up though, as you could modify a node running in the real network to log the number of nodes it considers choosing for an IP (I just haven't got time to do that atm). There might also be an easier way to do it with some existing Tor network stats tool.
Maybe a good idea would be to try to reconnect and if it is failing too much select another IP.
It currently does do this, but on probably a shorter time period than you are suggesting. It keeps a count of connection failures while trying to reconnect, but this is reset once a new connection is established.
Yes I meant measuring a larger time period and over different circuits as the cause of disconnection could have being the circuit failing and not the IP.
What happens if the HS just goes offline for a while? It keeps trying to connect, finds that it can't connect to the IPs and picks another set? You are differentiating that case?
I am unsure what you mean here, can you clarify that you do mean the "HS", and what "It" refers to?
How do they coordinate which one publishes the descriptor? Wich one puts the first descriptor?
So, starting from a state where you have no instances of a hidden service running.
You start instance 1, it comes up and checks for a descriptor. This fails as this service is new and has not been published before. It picks some introduction points (does not matter how), and publishes a descriptor.
You then start instance 2, like 1, it comes up and checks for a descriptor. This succeeds, instance 2 then connects to each of the introduction points in the descriptor.
So in terms of coordination, there is none. You have to start the instances one after the other (you just have to start one before the rest in the general case).
Thinking through this now has also brought up another point for interesting behaviour which I don't think I have tested, what happens if the descriptor contains 1 or more unreachable IP's at the time the second instance retrieves it... (just thought I would note this here).
This gets complicated, as you need to ensure that each instance of the service is using the same introduction points,
I saw your answer to another person and seems to me is related to this you are saying
If the "service key"'s (randomly generated keys per introduction point) are used, then this would complicate/cause problems with the multiple instances connecting to one introduction point. Only one key would be listed in the descriptor, which would only allow one instance to get the traffic.
What if the instances interchange keys and use the same? Master/slave for example and one of them take the master role if the master goes offline. Lets say master instance creates the IPs and sends a message to the rest to connect there.
When designing this, I chose to try and avoid any direct instance to instance communication or master/slave relationships. This decision has advantages and disadvantages.
How about changing the descriptor to host several keys per IP in case previous is not possible/too difficult?
That would reveal some information in the descriptor about the number of service instances, it would also require some HSDir logic to combine the descriptors uploaded by different instances.
Why this needs to be ensured? Does it breaks something? I understand it could be convenient to have at least two to avoid correlation attacks from the IP but why all? What would happen if one instance goes offline? The whole thing breaks? The desirable behavior I think in that case is that other instances take over and split load as if nothing happened.
If the private part of the public key included for an IP is only held by one of the n instances, then only that one instance with the private part of the key will get any of the clients.
If they use the same key, you could send the rendezvous message to all instances from the IP as all would have the same private key and can decrypt it (if the IP is not shared by different HS, don't know if this is possible currently). So the message doesn't gets lost in a failed circuit. If it is shared by several HSs some routing could be convenient but not necessary IMHO as they don't have the key to decrypt and statistics gathering could be blinded by the HS sending bogus RV messages that later discards.
Instances could negotiate which one is going to answer even if the one who is going to answer is not connected to the IP if instances talk to each other.
Let's say I could have a master that receives information of the load of slaves and receives the RV messages all instances receive. Later instructs the instance with less load to answer.
This is something interesting that this design allows, as the instances could communicate with the IP's to dynamically allocate new clients.
it seems to me that tracking connectivity failures over the long term, and changing IP on some threshold could break this.
Why would break it? I would create new IPs connect to them and would keep the old ones until the new ones become accessible (I could detect this by monitoring if I receive messages or by querying the HSDirs). The IP could act non cooperatively and send me bogus messages to try to confuse the HS and avoid it going away so probably checking both could be a good idea. I could also test if I start receiving messages through the new IPs. Just ideas.
If it is not a problem with the IP, but with an instances local connection, that one instance would decide to switch that IP out, upload a new descriptor, and thus break the consistency between the different instances (the different instances would not be using the same IP).
If the IP is doing it on purpose the HS Is going to go away so the control the IP has disconnecting your HS is capped for any attack known or unknown. If is not on purpose the HS goes throwing away failing nodes until it picks a good node as IP. I think it would cause over time, the tor network re-balance/readapt to new conditions itself. For instance in the case some IP is overloaded (maybe by DoS) causes the HS to go away from the IP.
I would also rotate the IPs after using them some time. I don't think is good to have one IP for too long. Doesn't sounds good to me. If for instance I am big daddy and know your IPs I could go there seize the computers and start gathering funny statistics about your HS. Or simply censor your HS by dropping messages from clients trying to send you the rendezvous point (is this possible? looks like it is if I drop introduce messages and generate fake ones). You wouldn't even know cause I can keep your connected and receiving fake connections. Only maybe if you try to check the IP by trying to send a rendezvous point from your HS to your HS (this IP quality test would be great if tor would do it periodically). I somehow do it myself manually when I notice the HS is superhard to reach. Sometimes it works great, sometimes even being turned on the server and online, is not visible. So you have to take down tor and restart it and wait again for a while.
I was thinking maybe you could select new ones and inform HSDirs about the change and after the new ones are known end circuits to the previous IPs and with that avoid the overhead of the rotation.
I would rebuild circuits to the IP from time to time (originating from the HS). Multiple connections to the same IP would permit to do this better since I can make a new one and afterwards kill a previous circuit remaining connected all the time.
Lots of things here, generally, some things seem quite hard to do in a uncoordinated, distributed manor (e.g. IP rotation).
Why uncoordinated?
Simply because that is how I have chosen to approach the issue. There will be advantages and disadvantages.
Looks to me it would be convenient instances could talk. Load balance, taking over of failed instances, etc. Would take work to do that for sure but doesn't seems impossible to me. I guess the HSDir code would have to be modified to be able to host new signed information coming for the same HS while maintaining the old one. Maybe follow signed commands from the HS. Delete this IPs, add this other IPs with some limit to avoid the HS attack HSDirs flooding them to store bogus info). New question here could anyone flood an HSDir by posting a zillion descriptors for a zillion bogus HSs? The HS would have to select new ones, publish them wait until they become available before dropping circuits to the old ones.
I think it is possible to load balance new clients without needing direct instance to instance communication.
And I am not to sure that things like IP rotation and rebuilding circuits to IP's will even help with anonymity issues.
Regarding entry guards this is one article speaking about them, don't know if up to date:
https://blog.torproject.org/category/tags/entry-guards
Seems they are always used for all sort of circuits including HS. So if you reused the code they are being selected.
I am still concerned that if things stay too long in one way, big players (antidemocratic governments for instance) could do things. Keep in mind that if you have more running instances of your HS the chances to locate one of them increases since I only have to locate one of your instances to know who you are.
There are also other factors that make it harder to locate any instance of a service with multiple instances. For example, it becomes harder to correlate data center power failures with hidden service failures if that service is hosted in multiple physical locations.
Ok take a look at this attack, correct me if I am wrong and some point is not possible (I invite anyone to proof me wrong). As I said I appreciate your work, but it needs to be challenged to be accepted by the community so it doesn't stays in a limbo of "I don't know" and is better to patch every possible hole before it becomes mainstream.
Suppose I am an totalitarian government and you are a dissident running an HS over Tor in the same country.
1 - I start introducing high availability high bandwidth corrupt nodes to the network across the globe (I could rent servers in case you decide to connect to nodes offshore or simply deploy nodes in another country), the more I put the higher the chances of being a stone in your anonymity path. To lower the budget I could host several Tor routers in one computer with several network interfaces and fast CPU/crypto hard for OpenSSL.
2 - I see you are using some IPs (I can query the HSDirs to get some) so If I am not lucky to be your HS's IP at start, I go flooding those you select to take them offline in order to force you switch so you pick eventually one of my corrupt nodes as IP. Is not clear to me if choosing them in a deterministic way gets in the way of this or helps. If is deterministic I could precalculate how many nodes I have to force go offline until you pick one of mine, so I could shutdown those nodes that will never be selected and lower my budget at least. I could bribe some ISP to rent servers with specific IP numbers(don't know if you selected this) or bribe the IP operator if he/she publishes the contact email. You can't even suspect if you don't know the flooded IP operator as is totally normal a node going offline. So no dust raised (there is a way to make a Tor router publicly publish it was attacked so other ppl can be warned?).
With the code I published currently, being deterministic helps, as you could create nodes with identities in the right regions (just like the attack against HSDir's).
3 - I become one of your IPs at least. This is a good achievement. From now on I know you are going to connect back to me using some circuit that can contain corrupt nodes or not when I disconnect you. When you connect back to me you have the counter reset so I can disconnect you as much as I want.
4 - I know your last node so if it is not mine I disconnect your circuit until you select a circuit that contains one of my corrupt nodes as the last one.
5 - When you do I can see the previous node in your circuit. If it is not mine I disconnect you and go to step 4. If it is, I learn about one of your guards. I stay for a while going to step 4 to enumerate your guards. The more you have the longer it takes, the less guards the easier for me. But I can continue the attack as long as I know one of the guards to gain time and see if I have success. I could flood your guard to force you select another guard and accelerate the process. Or globally block access to the guard.
6 - Once I learn all your guards I can do some things in front of that.
- Since your instance is going to connect to those nodes for a long
while, I could censor your instance flooding those nodes at least until you notice and select new guard nodes (I can be insidious here repeating the attack over and over again and for each instance). I could wait to enumerate all the guard nodes of all of your instances since all connect to my IP.
- Since I am a big government and I have control of the ISP you are
using, I could monitor incoming connections to your guard nodes. I record all network IP numbers connecting to those nodes for a while. Maybe I could filter here some nodes with some heuristics (nodes that only connect to those guards since the probability of another node connecting to those specific guards should be low) but not necessarily as I am going to filter later. Notice that HSs stay connected for looong time periods so the connection time of a server should be longer than a client and I could discard nodes with that information.
Once I have enough information I disconnect you from the net some small amount of time to avoid you leave my IP leaving some room for random failed circuits. I can tell the ISPs to do this for me. Is totally normal a disconnection so no dust raised. I can take my time too. Disconnect you today, wait some time and disconnect you again.
I could do some things for the case of an HS hosted offshore. Bribe ISP employees, cause DoS to ISPs or individual computers, sell backdoored routers, backdoor router firmware, but that is less realistic and harder. Not discardable IMHO but I am going for the easy case here.
Notice I don't care which circuit you use from now on to reconnect back to me even if you select new guards. I could monitor if your HS answers to RV messages using several preselected RV points too.
If once I disconnect those nodes, I don't see any instance go away I discard those nodes as hosting the HS.
If I see some of the instances go away then your node is in that subset.
I perform a binary search here disconnecting half nodes every time so the disconnection number it takes me is O(log(n)) where n is the total number of nodes I see connecting to those guards.
I repeat each of this steps several times to filter with statistics the casual circuit failure of your HS to my IP.
If two or more instances use the same IP I would still see some instances staying or going so seems it doesn't protects against the attack at all even if they are indistinguishable. If you close circuits and reopen new ones from time to time I would get noise here but maybe I could filter with statistics.
- I could in some cases seize a guard node or bribe the operator.
If so far there are no flaws, I can spy you to know where you are hosting computers and seize them without giving you time to turn them off and use plausible deniability crypto soft (truecrypt) and be able to claim you where routing instead of hosting the HS (by cloning the router without the HS data).
With multiple instances seems to me now becomes desirable to host at least one router per instance, to be able to deny you where hosting and claim you were routing. As looks it won't be possible to correlate the router down with the HS down (other instances would hide that if they are indistinguishable and take over when one instance goes down).
Now if you instead rotate the IPs from time to time, I would be forced to go back to step 2 of the attack but on the other side your chances of selecting a corrupt IP or a corrupt node in your circuits increase. I as hosting one of your IPs would have less control since is not going to last forever and would be time limited.
Changing circuits could introduce noise in step 6 to some extent, on the other side increases the chances of selecting a corrupt node in your circuit.
So probably all of this would have to be studied with statistics and current Tor network size.
Looks to me this could have more implications in other areas.
I didn't go through that very thoroughly, but it sounds reasonable.
In some previous messages about the subject I saw that HSDirs provide all the HS IPs. I don't like this way of doing things since let's say I have 6 IPs to my HS available to everyone. To cause a DoS to your HS seems to me all I have to do is cause a DoS to the IPs. And there is no need for everyone to know all the IPs of one HS all the time. All one user needs to connect is just some maybe for redundancy but not all.
Is there some way to only provide part of the IPs of one HS to one user? Avoid enumeration? Maybe distribute partial information to HSDirs? Don't know, just thinking. Maybe "abuse" some caching effect on HSDirs and publish partial IP information on one end and partial in another end that only reaches all users in entirety over time.
As the set of IP's is so small,
Again small is relative. Earlier you mentioned some nodes failing where not going to affect too much the service so seems contradictory to me. Can you please mention numbers? Can't this number be increased?
Sorry, I should have been more specific. The set of IP's I am referring to here are those used by a service. The number is determined by an algorithm that adjusts the number based on the services load. I think that this is around the 3 to 10 range (but this is a guess).
If this is roughly correct, it becomes very hard to distribute strict subsets of the 10 IP's, in such a way that no one can learn about all 10.
I cannot think of any practical way to do this without it being trivial to break.
This is not directly related to your work but could be worth discussing. I was thinking that one property that maybe could be exploited could be the fact that the whole Tor network has lots of computational power that is hard to match by a single player (unless the player is really big). This is a rough idea that could contain flaws and maybe could be improved.
What if lets say the IP information is encrypted by the HS, doesn't provides the key and makes Tor clients "bruteforce" them to open the encrypted message containing the IP. All IP keys scattered through the keyspace that could be larger or shorter depending on the time I want you to spend looking for it. So any IP would have equal chance of being found. Passing the key through an ASIC resistant function and then encrypt with the result so big players would have to use at most GPU and somehow equalize different CPU powers through memory bandwidth. All Tor clients start looking at a different random position of the keyspace until they find one key to desencrypt one IP.
From there they start to communicate with the IP if available and keep looking for the rest of the IPs (to be able to reconnect if the RV and the initial used IP go offline). Maybe re-connection to the RV could be desirable too to some extent to improve availability of HS in case the circuit fails. Don't know if currently possible.
Maybe adding 1 bit of the key inside the encrypted message for another encrypted IP so finding through a route of decryption makes it more feasible than starting from zero (this could be a good idea or not). Therefore forcing anyone to follow a decryption path depending on where they start decrypting IPs information.
To explain the idea better lets say I have 3 introduction points A B and C, but I don't see why there couldn't be more specially since the bulk of the traffic goes through the rendezvous and not through the IP, IPs could be shared by several HS and since is harder to cause a DoS to more nodes than just a few.
lets say one Tor client starts looking at a random position and finds the key for B (now it can connect to B if available and pass the RV message) once decrypted it gets one bit for the key of C so starts looking for the key of C as is easier to find than the key for A. Then gets C and that gives it bits for A.
Another Tor client starts random finds A and gets 1 bit for B (now it can connect to A and pass the RV message if available) goes to B and gets the bit for C. finds C.
After a while the HS rotates IPs and the process starts again (not necessarily all at once could be a flow in a way some get replaced maintaining part of the old ones). So anyone trying to cause a DoS would have to perform all of that work again and be very quick to find all of the IPs before they are rotated again by the HS in order to flood all of the IPs. So at most all they could do is turn the HS intermittent (if enough CPU power and enough bandwidth). On the other side the Tor swarn would be very efective at finding them all.
The big question that remains to me here is how much this causes a big player waste a load of resources without pushing out of the network small players (mobile devices for instance). The memory hard functions equalizes somehow devices by the memory bandwidth limit but still can make large differences. Seems to me that the more IPs are selected for an HS the more this can be achieved.
One option I was thinking to fix that problem is for instance the HS could function normally if everything is running smooth, and if it detects that can't create circuits to the IPs switch to protective mode in a way that things work as usual if there is no attack but the system changes to defensive mode if there is an attack.
Let me explain better. I as an HS work as normal. Suddenly I see all the circuits to my IPs can't be established (or router operator of my IP publish they are being attacked). I create new ones and see again that eventually I can't connect. That probably means someone is attacking my IPs. I switch to defensive mode and publish encrypted IPs. Clients notice they are encrypted and start each on their side to look for the IP keys.
I could have degrees here and increase computational complexity to do away with the attack. Lets say maybe that only mobile devices get pushed away from the net but CPU and GPU nodes could still connect so the attack is only to a part.
One problem could appear here is the time it takes the information published to HSDirs to reach clients. I ignore a lot of information about that part and I beleive is under active research ATM.
The scheme doesn't pushes away big players with large computational power and bandwidth but can push away some medium players. Also it doesn't protects to other attacks for instance flooding through a RV (could puzzle solving be applied here? Let's say the harder the puzzle the more bandwidth I give to you)
It's an interesting approach, I might try reading it again in a bit.
El 09/05/14 16:03, Christopher Baines escribió:
Maybe a good idea would be to try to reconnect and if it is failing too much select another IP.
It currently does do this, but on probably a shorter time period than you are suggesting. It keeps a count of connection failures while trying to reconnect, but this is reset once a new connection is established.
Yes I meant measuring a larger time period and over different circuits as the cause of disconnection could have being the circuit failing and not the IP.
What happens if the HS just goes offline for a while? It keeps trying to connect, finds that it can't connect to the IPs and picks another set? You are differentiating that case?
I am unsure what you mean here, can you clarify that you do mean the "HS", and what "It" refers to?
Sorry I meant the HS instance. If the instance goes offline, let's say the network interface goes down and it keeps running but can't can't create any circuits to the IP. Looks according to your other explanations that it reads the HSDir and tries to reconnect to older IPs.
waldo waldoalvarez00@yahoo.com writes:
El 02/05/14 02:34, Christopher Baines escribió:
On 02/05/14 00:45, waldo wrote:
El 30/04/14 17:06, Christopher Baines escribió:
On 08/10/13 06:52, Christopher Baines wrote:
I have been looking at doing some work on Tor as part of my degree, and more specifically, looking at Hidden Services. One of the issues where I believe I might be able to make some progress, is the Hidden Service Scaling issue as described here [1].
<snip>
I am worried about an attack coming from evil IP based on forced disconnection of the HS from the IP. I don't know if this is possible but I am worried that if you pick a new circuit randomly could be highly problematic. Lets say I am NSA and I own 10% of the routers and disconnecting your HS from an IP I control, if you select a new circuit randomly, even if the probabilities are low, eventually is a matters of time until I force you to use an specific circuit from those convenient to me in order to have a possible circuit(out of many) that transfers your original IP as metadata through cooperative routers I own and then do away with the anonymity of the hidden service.
Yeah, that does appear plausible. Are there guard nodes used for hidden service circuits (I forget)?
No idea, according to this docs https://www.torproject.org/docs/hidden-services.html.en there aren't guards in the circuits to the IP in step one(not mentioned). They are definitely used on step five to protect against a timing attack with a corrupt entry node.
Hello waldo,
as far as I can tell, that circuit does use entry guards. And that's good, because I think that an HS circuit that doesn't use entry guards wouldn be bad news.
I think choose_good_entry_server() is used for all circuits, and if guards are used, it picks an entry guard for all circuits (except from the ones with purpose CIRCUIT_PURPOSE_TESTING) [0].
<snip>
I would also rotate the IPs after using them some time. I don't think is good to have one IP for too long. Doesn't sounds good to me. If for instance I am big daddy and know your IPs I could go there seize the computers and start gathering funny statistics about your HS. Or simply censor your HS by dropping messages from clients trying to send you the rendezvous point (is this possible? looks like it is if I drop introduce messages and generate fake ones). You wouldn't even know
It's interesting that you say this, because we pretty much took the opposite approach with guard nodes. That is, the plan is to extend their rotation period to 9 months (from the current 2-3 months). See: https://gitweb.torproject.org/torspec.git/blob/HEAD:/proposals/236-single-gu...
I was even planning on writing an extension to rend-spec-ng.txt to specify how IPs should be picked and to extend their rotation period. That's for the same reason we do it for entry guards:
We assume that it's preferrable to never get owned as long as you picked good guards/IPs, than to surely get owned eventually but only some of the time (and ownage _will_ occur if you frequently rotate guards/IPs). The main argument for preferring the former behavior (at least for guards), is that if the adversary can deanonymize you some times it's likely that they can extrapolate to total ownage by using statistics and behavior analysis (is this true?).
I admit that we didn't take much into consideration an adversary who can lawfully seize boxes; mainly because it's hard to quantify such a threat. For example, if a motivated attacker has that power, does it make much difference if he has 2 months or 9 months to act? He can probably "get a warrant" and seize the servers in hours/days if needed.
Also note that even though it makes sense to say "these are different behaviors that facilitate different threat models, and each user should be able to select the behavior that suits her", it's not that easy to support both behaviors because that will divide the anonymity set. Hm. At least that's true for guards, is it also true for HSes and the rotation period of their IPs?
(For example, I think it's wise to assume that a motivated deanonymizing HS attacker has access to the HS descriptor and hence to the list of IPs. So maybe rotating IPs faster than other HSes doesn't harm your anonymity set from this perspective... Don't know really... This is hard stuff.)
PS: I changed the subject of this thread, because the thread was becoming immense and hard to dig through.
[0]: if (state && options->UseEntryGuards && (purpose != CIRCUIT_PURPOSE_TESTING || options->BridgeRelay)) { /* This request is for an entry server to use for a regular circuit, * and we use entry guard nodes. Just return one of the guard nodes. */ return choose_random_entry(state); }
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
On 10/05/14 21:09, George Kadianakis wrote:
It's interesting that you say this, because we pretty much took the opposite approach with guard nodes. That is, the plan is to extend their rotation period to 9 months (from the current 2-3 months). See: https://gitweb.torproject.org/torspec.git/blob/HEAD:/proposals/236-single-gu...
I was even planning on writing an extension to rend-spec-ng.txt to specify how IPs should be picked and to extend their rotation period. That's for the same reason we do it for entry guards:
Hi George,
Is there an analysis somewhere of why it would be better to change IPs less frequently? I think it would be good for the performance of mobile hidden services, but I'm concerned about the attack waldo described eariler in this thread, in which a malicious IP breaks circuits until the service builds a circuit through a malicious middle node, allowing the attacker to discover the service's entry guard.
Perhaps the attack could be mitigated by keeping the same middle node and IP for as long as possible, then choosing a new middle node *and* a new IP when either of them became unavailable? Then a malicious IP that broke a circuit would push the circuit onto a new IP.
However, that might require all three nodes in the circuit to be picked from the high-uptime pool.
Cheers, Michael
Michael Rogers michael@briarproject.org writes:
On 10/05/14 21:09, George Kadianakis wrote:
It's interesting that you say this, because we pretty much took the opposite approach with guard nodes. That is, the plan is to extend their rotation period to 9 months (from the current 2-3 months). See: https://gitweb.torproject.org/torspec.git/blob/HEAD:/proposals/236-single-gu...
I was even planning on writing an extension to rend-spec-ng.txt to specify how IPs should be picked and to extend their rotation period. That's for the same reason we do it for entry guards:
Hi George,
Is there an analysis somewhere of why it would be better to change IPs less frequently?
No, this analysis hasn't been done yet. I'd have to do the analysis before writing the patch to rend-spec-ng.txt and that's why I haven't written it yet.
It's still unclear to me whether keeping IPs for longer periods of time is a good idea; I suggested it as a possible approach because we recently took a similar decision for guard nodes (see my previous mail). More analysis must be done.
Also, note, that if the scaling ideas get implemented, the IPs become more important in the HS threat model. For example, many of the suggested scaling schemes allow the IPs to learn the number of HS nodes, or to decide which HS node should receive a given connection.
I think it would be good for the performance of mobile hidden services, but I'm concerned about the attack waldo described eariler in this thread, in which a malicious IP breaks circuits until the service builds a circuit through a malicious middle node, allowing the attacker to discover the service's entry guard.
I couldn't find the attack you described in this thread. This thread is quite big.
However, I'm not sure how rotating IPs _more frequently_ can help against the guard discovery attack you described. It would seem to me that the contrary is true (the fewer IPs you go through, the less probability you have for one of them to be adversarial).
Perhaps the attack could be mitigated by keeping the same middle node and IP for as long as possible, then choosing a new middle node *and* a new IP when either of them became unavailable? Then a malicious IP that broke a circuit would push the circuit onto a new IP.
Also see https://lists.torproject.org/pipermail/tor-dev/2013-October/005621.html .
Unfortunately, it seems to me that the 'virtual circuit' idea is messier than we imagine, and taking the 'guard layers' approach might be less dangerous and easier to analyze.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
On 11/05/14 17:36, George Kadianakis wrote:
I think it would be good for the performance of mobile hidden services, but I'm concerned about the attack waldo described eariler in this thread, in which a malicious IP breaks circuits until the service builds a circuit through a malicious middle node, allowing the attacker to discover the service's entry guard.
I couldn't find the attack you described in this thread. This thread is quite big.
The attack was described here: https://lists.torproject.org/pipermail/tor-dev/2014-May/006807.html
However, I'm not sure how rotating IPs _more frequently_ can help against the guard discovery attack you described. It would seem to me that the contrary is true (the fewer IPs you go through, the less probability you have for one of them to be adversarial).
I'm not suggesting that fast rotation would be better than slow rotation, but there are some possibilities that don't involve periodic rotation at all.
One possibility (which might be the current behaviour?) is that if the circuit to an IP fails, you build a new circuit to a new IP rather than a new circuit to the same IP. Advantage: not vulnerable to waldo's attack. Disadvantage: rapid turnover of IPs.
Another possibility is to rebuild the circuit through the same nodes if possible, or if not, build an entirely new circuit to a new IP. This would prevent waldo's attack, but it might still cause rapid turnover of IPs unless all nodes in the circuit were chosen from the high-uptime pool.
A third possibility (which might be the virtual circuits idea?) is to reuse the nodes in the circuit up to the point of failure, and pick new nodes beyond that point. But that would seem to be vulnerable to a selective DoS attack where a bad node vetoes any attempt to extend the circuit to a good node, thus causing any circuit that hits a bad node to pass through bad nodes from that point on (similar to MorphMix).
A fourth possibility is to rank the candidate nodes for each position in the circuit, and build the circuit through the highest-ranked candidate for each position that's currently online. Thus whenever the circuit's rebuilt it will pass through the same nodes if possible, or mostly the same nodes if some of the favourite candidates are offline. Over time the circuit will gradually move to new nodes due to churn - but that will happen as slowly for this design as it can happen for any design.
The ranking for each position should be secret so that if an attacker knows what a client's favourite node is, she can't make one of her nodes the second-favourite and wait for the favourite to fail. And the ranking for different circuits should be independent so the client's circuits don't leak information about each other.
One way to achieve those properties would be to generate a secret key for each circuit and rank the candidates for each position according to a pseudo-random function of the key, the candidate's fingerprint and the position. The key wouldn't be shared with anyone, it would just be used to rank the candidates.
A MAC function such as HMAC could serve as the pseudo-random function: sort by HMAC(key,fingerprint|position). HASH(key|fingerprint|position) would be another possibility, but MAC functions are explicitly designed to keep the key secret.
The key would be preserved across sessions for as long as the circuit was wanted - I guess the lifetime would be unlimited for IP circuits, and application-determined for ordinary client circuits. It would be great if an application could signal to the OP "this circuit belongs to long-lived identity X" and get the same circuit (churn permitting) that was previously used for identity X.
As far as I can see, the same entry guards should be used for all circuits, regardless of application-layer identity - otherwise a local observer watching a user could make observations like "Every Wednesday morning, the user connects to guard Y" and correlate those with the activity of a pseudonym (emails, blog updates, etc). So when choosing nodes for a circuit, the candidates for the first position should be the client's entry guards.
If this idea hasn't already been proposed, I suggest we call it persistent circuits.
Perhaps the attack could be mitigated by keeping the same middle node and IP for as long as possible, then choosing a new middle node *and* a new IP when either of them became unavailable? Then a malicious IP that broke a circuit would push the circuit onto a new IP.
Also see https://lists.torproject.org/pipermail/tor-dev/2013-October/005621.html .
Unfortunately, it seems to me that the 'virtual circuit' idea is messier than we imagine, and taking the 'guard layers' approach might be less dangerous and easier to analyze.
Interesting, thanks for the link! Has anything been written about how the guard layers approach would work other than Mike Perry's comment on ticket #9001?
Cheers, Michael
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
On 13/05/14 18:28, Michael Rogers wrote:
A fourth possibility is to rank the candidate nodes for each position in the circuit, and build the circuit through the highest-ranked candidate for each position that's currently online. Thus whenever the circuit's rebuilt it will pass through the same nodes if possible, or mostly the same nodes if some of the favourite candidates are offline. Over time the circuit will gradually move to new nodes due to churn - but that will happen as slowly for this design as it can happen for any design.
Sorry for the self-reply. I've realised this has the same problem as one of the other possibilities - a bad node can veto any attempt to extend the circuit to a good node, so any circuit that hits a bad node will pass through bad nodes from that point on.
It seems the only safe thing is to rebuild the circuit using all the same nodes, or if that isn't possible, build an entirely new circuit.
Cheers, Michael
On Sun, May 11, 2014 at 11:33 AM, Michael Rogers michael@briarproject.org wrote:
However, that might require all three nodes in the circuit to be picked from the high-uptime pool.
Can't be certain but I think Phantom picks the entire circuit and then camps on it till it breaks somewhere. Their paper might give other views/ideas on such things.
Another thought, the benefits of 'longer' camping on an entry seem sound to me (you're either safe or not straight away, vs likely not safe sometime in the future or even straight away.) Does the EG camp time come down to estimating the number of EG's in the environment that are unsafe? Also, if you camp on a whole circuit your usage is more timeable by a PA who might catch your tail. So perhaps making the circuit be an anchored flagellum, farther from the EG, more flipping about. Sometimes with just a shimmy in the middle.
Christopher Baines cbaines8@gmail.com writes:
On 08/10/13 06:52, Christopher Baines wrote:
I have been looking at doing some work on Tor as part of my degree, and more specifically, looking at Hidden Services. One of the issues where I believe I might be able to make some progress, is the Hidden Service Scaling issue as described here [1].
So, before I start trying to implement a prototype, I thought I would set out my ideas here to check they are reasonable (I have also been discussing this a bit on #tor-dev). The goal of this is two fold, to reduce the probability of failure of a hidden service and to increase hidden service scalability.
Previous threads on this subject: https://lists.torproject.org/pipermail/tor-dev/2013-October/005556.html https://lists.torproject.org/pipermail/tor-dev/2013-October/005674.html
I have now implemented a prototype, for one possible design of how to allow distribution in hidden services. While developing this, I also made some modifications to chutney to allow for the tests I wanted to write.
Great! Here are a few small comments from quickly reading your post.
In short, I modified tor such that:
- The services public key is used in the connection to introduction
points (a return to the state as of the v0 descriptor)
Ah, this means that now IPs know which HSes they are serving (even if they don't have the HS descriptor). Why was this change necessary?
- multiple connections for one service to an introduction point is
allowed (previously, existing were closed)
- tor will check for a descriptor when it needs to establish all of its
introduction points, and connect to the ones in the descriptor (if it is available)
- Use a approach similar to the selection of the HSDir's for the
selection of new introduction points (instead of a random selection)
As you note below, this suffers from the same issue that HSDirs suffer from. Why was this necessary? Is it to avoid race conditions?
Based on the previous point, I thought that the second node of an HS would be able to get the list of IPs by reading the descriptor of the first node.
- Attempt to reconnect to an introduction point, if the connection is lost
With chutney, I added support for interacting with the nodes through Stem, I also moved the control over starting the nodes to the test, as this allows for more complex behaviour.
Currently the one major issue is that using an approach similar to the HSDir selection means that introduction points suffer from the same issue as HSDir's currently [1]. I believe any satisfactory solution to the HSDir issue would resolve this problem also.
One other thing of note, tor currently allows building circuits to introduction points, through existing introduction points, and selecting introduction points on circuits used to connect to other introduction points. These two issues mean that a failure in one introduction point, can currently cause tor to change two introduction points. (I am not saying this needs changing, but you could adjust the circuit creation, to prevent some extra work later if a failure occurs).
Any comments regarding the above would be welcome.
On 03/05/14 11:21, George Kadianakis wrote:
Christopher Baines cbaines8@gmail.com writes:
On 08/10/13 06:52, Christopher Baines wrote: In short, I modified tor such that:
- The services public key is used in the connection to introduction
points (a return to the state as of the v0 descriptor)
Ah, this means that now IPs know which HSes they are serving (even if they don't have the HS descriptor). Why was this change necessary?
If the "service key"'s (randomly generated keys per introduction point) are used, then this would complicate/cause problems with the multiple instances connecting to one introduction point. Only one key would be listed in the descriptor, which would only allow one instance to get the traffic.
Using the same key is good. Using the services key, is not great. One possible improvement might be to generate a key for an introduction point based off the identity of the introduction point, plus some other stuff to make it secure.
- multiple connections for one service to an introduction point is
allowed (previously, existing were closed)
- tor will check for a descriptor when it needs to establish all of its
introduction points, and connect to the ones in the descriptor (if it is available)
- Use a approach similar to the selection of the HSDir's for the
selection of new introduction points (instead of a random selection)
As you note below, this suffers from the same issue that HSDirs suffer from. Why was this necessary? Is it to avoid race conditions?]
The existing random selection algorithm was not suitable as each instance would pick differently. If you used a pseudorandom number generator, which produced a consistent output between instances, then this would make the results similar, but the results could then still be thrown off by each instance considering slightly different candidates due to different knowledge about the network state.
The approach used for HSDir selection seemed a promising, as if each node after the start point is considered (regardless of local knowledge), then each instance should converge on the first suitable node.
So, not race conditions, but to account for local network state information.
Based on the previous point, I thought that the second node of an HS would be able to get the list of IPs by reading the descriptor of the first node.
Yes, and no. It depends on its internal state.
There are two scenarios for a instance: - I have had no introduction points, and have not just removed some - I have 0 to n introduction points, and have recently removed some
In the first scenario, a descriptor lookup is used, falling back to selecting introduction points if that fails.
In the second scenario, you pick using the HSDir like approach.
You could attempt to use the descriptor to coordinate the selection of new introduction points, but you then have the problem of "whose job is it to choose".
On Sat, May 3, 2014 at 5:58 AM, Christopher Baines cbaines8@gmail.com wrote:
On 03/05/14 11:21, George Kadianakis wrote:
On 08/10/13 06:52, Christopher Baines wrote: In short, I modified tor such that:
- The services public key is used in the connection to introduction
points (a return to the state as of the v0 descriptor)
Ah, this means that now IPs know which HSes they are serving (even if they don't have the HS descriptor). Why was this change necessary?
If the "service key"'s (randomly generated keys per introduction point) are used, then this would complicate/cause problems with the multiple instances connecting to one introduction point. Only one key would be listed in the descriptor, which would only allow one instance to get the traffic.
Using the same key is good. Using the services key, is not great. One possible improvement might be to generate a key for an introduction point based off the identity of the introduction point, plus some other stuff to make it secure.
Would it make sense to solve this problem using a similar approach to the key blinding described in proposal 224? For example, if the public key is g^x and the introduction point has identity (e.g. fingerprint) y, then the IP blinding factor would be
t_{IP} = Hash(y | g^x)
and the IP-specific public key would be
P_{IP} = g^{x*t_{IP}}
This way the IP doesn't learn what HS it's serving if it doesn't know the descriptor, but any HS server that knows the secret key (x) can compute the IP secret key x*t.
On 06/05/14 20:13, Nicholas Hopper wrote:
On Sat, May 3, 2014 at 5:58 AM, Christopher Baines cbaines8@gmail.com wrote:
On 03/05/14 11:21, George Kadianakis wrote:
On 08/10/13 06:52, Christopher Baines wrote: In short, I modified tor such that:
- The services public key is used in the connection to introduction
points (a return to the state as of the v0 descriptor)
Ah, this means that now IPs know which HSes they are serving (even if they don't have the HS descriptor). Why was this change necessary?
If the "service key"'s (randomly generated keys per introduction point) are used, then this would complicate/cause problems with the multiple instances connecting to one introduction point. Only one key would be listed in the descriptor, which would only allow one instance to get the traffic.
Using the same key is good. Using the services key, is not great. One possible improvement might be to generate a key for an introduction point based off the identity of the introduction point, plus some other stuff to make it secure.
Would it make sense to solve this problem using a similar approach to the key blinding described in proposal 224? For example, if the public key is g^x and the introduction point has identity (e.g. fingerprint) y, then the IP blinding factor would be
t_{IP} = Hash(y | g^x)
and the IP-specific public key would be
P_{IP} = g^{x*t_{IP}}
This way the IP doesn't learn what HS it's serving if it doesn't know the descriptor, but any HS server that knows the secret key (x) can compute the IP secret key x*t.
Yes, from the non-mathematical explanation, that seems to fit the requirements fine.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
Hi Christopher,
I'm interested in your work because the hidden service protocol doesn't seem to perform very well for hidden services running on mobile devices, which frequently lose network connectivity. I wonder if the situation can be improved by choosing introduction points deterministically.
On 30/04/14 22:06, Christopher Baines wrote:
- multiple connections for one service to an introduction point is
allowed (previously, existing were closed)
Does this mean that at present, the service builds a new IP circuit (to a new IP?) every time it receives a connection? If so, is it the IP or the service that closes the old circuit?
Thanks, Michael
On Tue, May 06, 2014 at 03:29:03PM +0100, Michael Rogers wrote:
I'm interested in your work because the hidden service protocol doesn't seem to perform very well for hidden services running on mobile devices, which frequently lose network connectivity. I wonder if the situation can be improved by choosing introduction points deterministically.
I think https://trac.torproject.org/projects/tor/ticket/8239 would resolve a lot of this problem.
Somebody should write the patch. :)
--Roger
On 06/05/14 21:19, Roger Dingledine wrote:
On Tue, May 06, 2014 at 03:29:03PM +0100, Michael Rogers wrote:
I'm interested in your work because the hidden service protocol doesn't seem to perform very well for hidden services running on mobile devices, which frequently lose network connectivity. I wonder if the situation can be improved by choosing introduction points deterministically.
I think https://trac.torproject.org/projects/tor/ticket/8239 would resolve a lot of this problem.
Somebody should write the patch. :)
I have implemented this (or something similar). I will try to extract it as a patch, which can be applied independently of anything else which I have changed. This might take a few weeks, as I have exams looming.
On 06/05/14 15:29, Michael Rogers wrote:
I'm interested in your work because the hidden service protocol doesn't seem to perform very well for hidden services running on mobile devices, which frequently lose network connectivity. I wonder if the situation can be improved by choosing introduction points deterministically.
Unfortunately, I don't really see how anything I have done could have helped with this. Assuming that the mobile device has maintained connectivity during the connection phase, and you now have the 6 hop circuit through the RP, the behaviour from then on is unchanged, and this is where I assume the problems with loosing connectivity occur?
On 30/04/14 22:06, Christopher Baines wrote:
- multiple connections for one service to an introduction point is
allowed (previously, existing were closed)
Does this mean that at present, the service builds a new IP circuit (to a new IP?) every time it receives a connection? If so, is it the IP or the service that closes the old circuit?
Not quite. When the service (instance, or instances) select an introduction point, a circuit to that introduction point is built. This is a long term circuit, through which the RELAY_COMMAND_INTRODUCE2 cells can be sent. This circuit enables the IP to contact the service when a client asks it to do so.
Currently, any IP's will close any existing circuits which are for a common purpose and service.
The modification I attempt to describe above, is the disabling of this functionality. So a hidden service instance (or multiple instances of the same hidden service), can connect to the same introduction point through multiple circuits. There is also some additional modifications needed to make the RELAY_COMMAND_INTRODUCE2 handling work with multiple circuits.
On 06/05/14 22:07, Christopher Baines wrote:
On 06/05/14 15:29, Michael Rogers wrote:
I'm interested in your work because the hidden service protocol doesn't seem to perform very well for hidden services running on mobile devices, which frequently lose network connectivity. I wonder if the situation can be improved by choosing introduction points deterministically.
Unfortunately, I don't really see how anything I have done could have helped with this. Assuming that the mobile device has maintained connectivity during the connection phase, and you now have the 6 hop circuit through the RP, the behaviour from then on is unchanged, and this is where I assume the problems with loosing connectivity occur?
Right, attempt two, I think I may have misinterpreted what you said. The above response relates to client behaviour for hidden services. Am I correct in saying that you actually mean hosting the hidden service from a mobile device?
If so, then yes. When I implemented the deterministic selection of introduction points, I had to implement a reconnection mechanism to ensure that the introduction point would only be changed if it had failed, and not in the case of intermittent network issues (the degree to which I have actually done this might vary).
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
On 06/05/14 22:17, Christopher Baines wrote:
On 06/05/14 22:07, Christopher Baines wrote:
On 06/05/14 15:29, Michael Rogers wrote:
I'm interested in your work because the hidden service protocol doesn't seem to perform very well for hidden services running on mobile devices, which frequently lose network connectivity. I wonder if the situation can be improved by choosing introduction points deterministically.
Unfortunately, I don't really see how anything I have done could have helped with this. Assuming that the mobile device has maintained connectivity during the connection phase, and you now have the 6 hop circuit through the RP, the behaviour from then on is unchanged, and this is where I assume the problems with loosing connectivity occur?
Right, attempt two, I think I may have misinterpreted what you said. The above response relates to client behaviour for hidden services. Am I correct in saying that you actually mean hosting the hidden service from a mobile device?
That's right.
If so, then yes. When I implemented the deterministic selection of introduction points, I had to implement a reconnection mechanism to ensure that the introduction point would only be changed if it had failed, and not in the case of intermittent network issues (the degree to which I have actually done this might vary).
Is it necessary to know why the circuit broke, or is it sufficient to try rebuilding the circuit, and pick a new IP if the old one isn't reachable?
What about the attack suggested by waldo, where a malicious IP repeatedly breaks the circuit until it's rebuilt through a malicious middle node? Are entry guards enough to protect the service's anonymity in that case?
Cheers, Michael
On 07/05/14 13:51, Michael Rogers wrote:
On 06/05/14 22:17, Christopher Baines wrote:
If so, then yes. When I implemented the deterministic selection of introduction points, I had to implement a reconnection mechanism to ensure that the introduction point would only be changed if it had failed, and not in the case of intermittent network issues (the degree to which I have actually done this might vary).
Is it necessary to know why the circuit broke, or is it sufficient to try rebuilding the circuit, and pick a new IP if the old one isn't reachable?
I imagine that the service will still have to try connecting via an alternate route, as even if it was told that the introduction point is no longer available, it should still check anyway (to avoid being tricked).
What about the attack suggested by waldo, where a malicious IP repeatedly breaks the circuit until it's rebuilt through a malicious middle node? Are entry guards enough to protect the service's anonymity in that case?
I think it is a valid concern. Assuming the attacker has identified their node as an IP, and has the corresponding public key. They can then get the service to create new circuits to their node, buy just causing the existing ones to fail.
Using guard nodes for those circuits would seem to be helpful, as this would greatly reduce the chance that the attackers nodes are used in the first hop.
If guard nodes where used (assuming that they are currently not), you would have to be careful to act correctly when the guard node fails, in terms of using a different guard, or selecting a new guard to use instead (in an attempt to still connect to the introduction point).
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
On 07/05/14 17:32, Christopher Baines wrote:
What about the attack suggested by waldo, where a malicious IP repeatedly breaks the circuit until it's rebuilt through a malicious middle node? Are entry guards enough to protect the service's anonymity in that case?
I think it is a valid concern. Assuming the attacker has identified their node as an IP, and has the corresponding public key. They can then get the service to create new circuits to their node, buy just causing the existing ones to fail.
Using guard nodes for those circuits would seem to be helpful, as this would greatly reduce the chance that the attackers nodes are used in the first hop.
If guard nodes where used (assuming that they are currently not), you would have to be careful to act correctly when the guard node fails, in terms of using a different guard, or selecting a new guard to use instead (in an attempt to still connect to the introduction point).
Perhaps it would make sense to pick one or more IPs per guard, and change those IPs when the guard is changed? Then waldo's attack by a malicious IP would only ever discover one guard.
Cheers, Michael
On 07/05/14 18:30, Michael Rogers wrote:
On 07/05/14 17:32, Christopher Baines wrote:
What about the attack suggested by waldo, where a malicious IP repeatedly breaks the circuit until it's rebuilt through a malicious middle node? Are entry guards enough to protect the service's anonymity in that case?
I think it is a valid concern. Assuming the attacker has identified their node as an IP, and has the corresponding public key. They can then get the service to create new circuits to their node, buy just causing the existing ones to fail.
Using guard nodes for those circuits would seem to be helpful, as this would greatly reduce the chance that the attackers nodes are used in the first hop.
If guard nodes where used (assuming that they are currently not), you would have to be careful to act correctly when the guard node fails, in terms of using a different guard, or selecting a new guard to use instead (in an attempt to still connect to the introduction point).
Perhaps it would make sense to pick one or more IPs per guard, and change those IPs when the guard is changed? Then waldo's attack by a malicious IP would only ever discover one guard.
If you change the IP's when the guard is changed, this could break the consistency between different instances of the same service (assuming that the different instances are using different guards).
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
On 08/05/14 14:40, Christopher Baines wrote:
Perhaps it would make sense to pick one or more IPs per guard, and change those IPs when the guard is changed? Then waldo's attack by a malicious IP would only ever discover one guard.
If you change the IP's when the guard is changed, this could break the consistency between different instances of the same service (assuming that the different instances are using different guards).
It should be possible to avoid breaking consistency by having an overlap period: when a guard is scheduled to be replaced, each instance connects to a new guard and IPs, the new descriptor is published, then each instance disconnects from the old guard and IPs.
This should work whether or not the instances use the same guards. If the instances use the same guards, waldo's attack can discover one guard shared by all instances; otherwise it can discover one guard per instance. I'm not sure which is worse for anonymity - any thoughts?
Cheers, Michael
On 09/05/14 10:14, Michael Rogers wrote:
On 08/05/14 14:40, Christopher Baines wrote:
Perhaps it would make sense to pick one or more IPs per guard, and change those IPs when the guard is changed? Then waldo's attack by a malicious IP would only ever discover one guard.
If you change the IP's when the guard is changed, this could break the consistency between different instances of the same service (assuming that the different instances are using different guards).
It should be possible to avoid breaking consistency by having an overlap period: when a guard is scheduled to be replaced, each instance connects to a new guard and IPs, the new descriptor is published, then each instance disconnects from the old guard and IPs.
This should work whether or not the instances use the same guards. If the instances use the same guards, waldo's attack can discover one guard shared by all instances; otherwise it can discover one guard per instance. I'm not sure which is worse for anonymity - any thoughts?
How do you see the guards being "scheduled" for replacement?
Another issue is how do you get each instance to connect through the same guard node?
I think that it would be fine having per instance guard nodes (1 or more). I don't see much significance in it being shared, it also seems quite problematic to accomplish.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
On 09/05/14 14:31, Christopher Baines wrote:
How do you see the guards being "scheduled" for replacement?
Two possibilities (there are probably others):
1. Periodically select a new guard by hashing a secret key and the date, similar to the way HS directories are selected. The HS instances use the same secret key and therefore pick the same guard.
2. The HS instances communicate with each other to pick a guard to use in the next period.
Another issue is how do you get each instance to connect through the same guard node?
If they agree on which guard to use, what's to stop them connecting to it?
I think that it would be fine having per instance guard nodes (1 or more). I don't see much significance in it being shared, it also seems quite problematic to accomplish.
OK cool - but the instances will still have to coordinate in some way to pick IPs, no?
Cheers, Michael
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
On 06/05/14 22:07, Christopher Baines wrote:
On 06/05/14 15:29, Michael Rogers wrote:
Does this mean that at present, the service builds a new IP circuit (to a new IP?) every time it receives a connection? If so, is it the IP or the service that closes the old circuit?
Not quite. When the service (instance, or instances) select an introduction point, a circuit to that introduction point is built. This is a long term circuit, through which the RELAY_COMMAND_INTRODUCE2 cells can be sent. This circuit enables the IP to contact the service when a client asks it to do so.
Currently, any IP's will close any existing circuits which are for a common purpose and service.
Thanks for the explanation!
Cheers, Michael