info@tvdw.eu wrote:
Hi Alec,
Hi Tom! I love your proposal, BTW. :-)
Most of what you said sounds right, and I agree that caching needs TTLs (not just here, all caches need to have them, always).
Thank you!
However, you mention that one DC going down could cause a bad experience for users. In most HA/DR setups I've seen there should be enough capacity if something fails, is that not the case for you? Can a single data center not serve all Tor traffic?
It's not the datacentre which worries me - we already know how to deal with those - it's the failure-based resource contention for the limited introduction-point space that is afforded by a maximum (?) of six descriptors each of which cites 10 introduction points.
A cap of 60 IPs is a clear protocol bottleneck which - even with your excellent idea - could break a service deployment.
Yes, in the meantime the proper solution is to split the service three ways, or even four, but that's administrative burden which less well-resourced organisations might struggle with.
Many (most?) will have a primary site and a single failover site, and it seems perverse that they could bounce just ONE of those sites and automatically lose 50% of their Onion capacity for up to 24 hours UNLESS they also take down the OTHER site for long enough to invalidate the OnionBalance descriptors.
Such is not the description of a high-availability (HA) service, and it might put people off.
If that is a problem, I would suggest adding more data centers to the pool. That way if one fails, you don't lose half of the capacity, but a third (if N=3) or even a tenth (if N=10).
...but you lose it for 1..24 hours, even if you simply reboot the Tor daemon.
Anyway, such a thing is probably off-topic. To get back to the point about TTLs, I just want to note that retrying failed nodes until all fail is scary:
I find that worrying, also. I'm not sure what I think about it yet, though.
what will happen if all ten nodes get a 'rolling restart' throughout the day? Wouldn't you eventually end up with all the traffic on a single node, as it's the only one that hadn't been restarted yet?
Precisely.
As far as I can see the only thing that can avoid holes like that is a TTL, either hard coded to something like an hour, or just specified in the descriptor. Then, if you do a rolling restart, make sure you don't do it all within one TTL length, but at least two or three depending on capacity.
Concur.
desnacked@riseup.net wrote:
Please see rend_client_get_random_intro_impl(). Clients will pick a random intro point from the descriptor which seems to be the proper behavior here.
That looks great!
I can see how a TTL might be useful in high availability scenarios like the one you described. However, it does seem like something with potential security implications (like, set TTL to 1 second for all your descriptors, and now you have your clients keep on making directory circuits to fetch your descs).
Okay, so, how about:
IDEA: if ANY descriptor introduction point connection fails AND the descriptor's ttl has been exceeded THEN refetch the descriptor before trying again?
It strikes me (though I may be wrong?) that the degenerate case for this would be someone with an onion killing their IP in order to force the user to refetch a descriptor - which is what I think would happen anyway?
At very least this proposal would add a work factor.
For this reason I'd be interested to see this specified in a formal Tor proposal (or even as a patch to prop224). It shouldn't be too big! :)
I would hesitate to add it to Prop 224 which strikes me as rather large and distant. I'd love to see this by Christmas :-P
teor2345@gmail.com wrote:
Do we connect to introduction points in the order they are listed in the descriptor? If so, that's not ideal, there are surely benefits to a random choice (such as load balancing).
Apparently not (re: George) :-)
That said, we believe that rendezvous points are the bottleneck in the rendezvous protocol, not introduction points.
Currently, and in most current deployments, yes.
However, if you were to use proposal #255 to split the introduction and rendezvous to separate tor instances, you would then be limited to:
- 6*10*N tor introduction points, where there are 6 HSDirs, each receiving 10 different introduction points from different tor instances, and N failover instances of this infrastructure competing to post descriptors. (Where N = 1, 2, 3.)
- a virtually unlimited number of tor servers doing the rendezvous and exchanging data (say 1 server per M clients, where M is perhaps 100 or so, but ideally dynamically determined based on load/response time).
In this scenario, you could potentially overload the introduction points.
Exactly my concern, especially when combined with overlong lifetimes of mostly-zombie descriptors.
- alec
On 22 Oct (16:30:55), Alec Muffett wrote:
info@tvdw.eu wrote:
Hi Alec,
Hi Tom! I love your proposal, BTW. :-)
Most of what you said sounds right, and I agree that caching needs TTLs (not just here, all caches need to have them, always).
Thank you!
However, you mention that one DC going down could cause a bad experience for users. In most HA/DR setups I've seen there should be enough capacity if something fails, is that not the case for you? Can a single data center not serve all Tor traffic?
It's not the datacentre which worries me - we already know how to deal with those - it's the failure-based resource contention for the limited introduction-point space that is afforded by a maximum (?) of six descriptors each of which cites 10 introduction points.
A cap of 60 IPs is a clear protocol bottleneck which - even with your excellent idea - could break a service deployment.
Yes, in the meantime the proper solution is to split the service three ways, or even four, but that's administrative burden which less well-resourced organisations might struggle with.
Many (most?) will have a primary site and a single failover site, and it seems perverse that they could bounce just ONE of those sites and automatically lose 50% of their Onion capacity for up to 24 hours UNLESS they also take down the OTHER site for long enough to invalidate the OnionBalance descriptors.
Such is not the description of a high-availability (HA) service, and it might put people off.
If that is a problem, I would suggest adding more data centers to the pool. That way if one fails, you don't lose half of the capacity, but a third (if N=3) or even a tenth (if N=10).
...but you lose it for 1..24 hours, even if you simply reboot the Tor daemon.
Anyway, such a thing is probably off-topic. To get back to the point about TTLs, I just want to note that retrying failed nodes until all fail is scary:
I find that worrying, also. I'm not sure what I think about it yet, though.
what will happen if all ten nodes get a 'rolling restart' throughout the day? Wouldn't you eventually end up with all the traffic on a single node, as it's the only one that hadn't been restarted yet?
Precisely.
As far as I can see the only thing that can avoid holes like that is a TTL, either hard coded to something like an hour, or just specified in the descriptor. Then, if you do a rolling restart, make sure you don't do it all within one TTL length, but at least two or three depending on capacity.
Concur.
desnacked@riseup.net wrote:
Please see rend_client_get_random_intro_impl(). Clients will pick a random intro point from the descriptor which seems to be the proper behavior here.
That looks great!
I can see how a TTL might be useful in high availability scenarios like the one you described. However, it does seem like something with potential security implications (like, set TTL to 1 second for all your descriptors, and now you have your clients keep on making directory circuits to fetch your descs).
Okay, so, how about:
IDEA: if ANY descriptor introduction point connection fails AND the descriptor's ttl has been exceeded THEN refetch the descriptor before trying again?
It strikes me (though I may be wrong?) that the degenerate case for this would be someone with an onion killing their IP in order to force the user to refetch a descriptor - which is what I think would happen anyway?
At very least this proposal would add a work factor.
Something also I mentionned on IRC with the TTL is the circuit behavior creation that changes quite a bit.
For instance, if FB's descriptor has a TTL of 2 hours, this means that there will be an HSDir fetch every two hours followed by an IP+RP dance. Seems like it makes me, a malicious client Guard, more able to identify every client going to facebook considering that you guys are the only one using a TTL of 2.
Let's use your idea of "if one IP fails and TTL expired then re-fetch". This could also make it "easier" to identify people connecting to Facebook. As your client guard, I see you do the fetch + IP/RP dance (3 circuits in short period of time where two are killed). I wait 2 hours and then kill all circuits passing through me from you. If I can see again that distinctive HS pattern (3 circuits), I'll get closer to know that you are accessing FB. (I can do that several other times to confirm).
All in all, a TTL in the descriptor changes things enough imo to know *which* descriptor a client is using since as your guard I can induce your client to behave according to the TTL and make you reveal patterns.
Seems like we need a common behavior for all HS client here and that would be "if _any_ IP/RP fails, re-fetch" but that's going to be quite heavy on the network I think.
But let's keep thinking about crazy ideas here like "client keeps circuit to HSDir until rotation and if IP/RPs dies, ask if descriptor has changed by sending a hash of its current desc. and if so, fetch else keep going with current set of IPs." ? (with netflow padding, this will be much more difficult to be recognized by a malicious guard)
(IMO, this is definitely a problem that we need to solve for load balancing and performance so let's keep throwing ideas until we get to something useful we could use to draft a proposal.)
Cheers! David
For this reason I'd be interested to see this specified in a formal Tor proposal (or even as a patch to prop224). It shouldn't be too big! :)
I would hesitate to add it to Prop 224 which strikes me as rather large and distant. I'd love to see this by Christmas :-P
teor2345@gmail.com wrote:
Do we connect to introduction points in the order they are listed in the descriptor? If so, that's not ideal, there are surely benefits to a random choice (such as load balancing).
Apparently not (re: George) :-)
That said, we believe that rendezvous points are the bottleneck in the rendezvous protocol, not introduction points.
Currently, and in most current deployments, yes.
However, if you were to use proposal #255 to split the introduction and rendezvous to separate tor instances, you would then be limited to:
- 6*10*N tor introduction points, where there are 6 HSDirs, each receiving 10 different introduction points from different tor instances, and N failover instances of this infrastructure competing to post descriptors. (Where N = 1, 2, 3.)
- a virtually unlimited number of tor servers doing the rendezvous and exchanging data (say 1 server per M clients, where M is perhaps 100 or so, but ideally dynamically determined based on load/response time).
In this scenario, you could potentially overload the introduction points.
Exactly my concern, especially when combined with overlong lifetimes of mostly-zombie descriptors.
- alec
Let's use your idea of "if one IP fails and TTL expired then re-fetch". This could also make it "easier" to identify people connecting to Facebook. As your client guard, I see you do the fetch + IP/RP dance (3 circuits in short period of time where two are killed). I wait 2 hours and then kill all circuits passing through me from you. If I can see again that distinctive HS pattern (3 circuits), I'll get closer to know that you are accessing FB.
Would that not happen if and only if (in the meantime) the server had had a server outage impacting the first IP that the client tries reconnecting to?
Odds on, the client entry guard will see no measurable change?
-a
On 23 Oct 2015, at 03:30, Alec Muffett alecm@fb.com wrote:
However, you mention that one DC going down could cause a bad experience for users. In most HA/DR setups I've seen there should be enough capacity if something fails, is that not the case for you? Can a single data center not serve all Tor traffic?
It's not the datacentre which worries me - we already know how to deal with those - it's the failure-based resource contention for the limited introduction-point space that is afforded by a maximum (?) of six descriptors each of which cites 10 introduction points.
A cap of 60 IPs is a clear protocol bottleneck which - even with your excellent idea - could break a service deployment.
Let's try a crazier and quite possibly terrible idea: (Consider it a thought experiment rather than a serious technical proposal. Based on my limited understanding, I know I will make mistakes with the details.)
What if a high-volume onion service tries to post descriptors to all of the HSDirs that a client might try, not just the typical 6?
Here's how it might work:
At any point in time, a client may be using any one of the three valid consensuses. (Technically, clients can be using any one of the last 24 to bootstrap, but they update to the latest during bootstrap.)
(Clients which are running constantly will download a new consensus near the end of their current consensus validity period. This might mean that fewer clients are using the latest consensus, for example.)
Therefore, depending on HSDir hashring churn, clients might be trying HSDirs outside the typical 6 (that is, 3 hashring positions, with 2 HSDirs selected side-by-side in each position, specifically to mitigate this very issue).
Also, when the hashring is close to rotating (every 24 hours), Tor will post to both the old and new HSDirs.
What if: * an onion service posts a different descriptor to each HSDir a client might be querying, based on any valid consensus and any nearby hashring rotation; and * different introduction points are included in each descriptor.
I can see this generating up to 3 (consensuses) x 2 (hashring positions) x 3 (hashring positions) x 2 (hashring replicas) x 10 (introduction points per descriptor) = 360 introduction points per service.
Unfortunately, the potential increase in introduction points varies based on the consensus HSDir list churn, and the time of day. These are a poor basis for load-balancing.
Also, if HSDir churn and client clock skew are so bad that clients could be accessing any one of 36 HSDirs, we should have noticed clients which couldn't find any of their HSDirs, and already increased the side-by-side replica count.
So I think it's a terrible idea, but I wonder if we could squeeze another 60 introduction points out of this scheme, or a scheme like it.
Tim
On 23 Oct 2015, at 03:30, Alec Muffett alecm@fb.com wrote:
However, if you were to use proposal #255 to split the introduction and rendezvous to separate tor instances, you would then be limited to:
- 6*10*N tor introduction points, where there are 6 HSDirs, each receiving 10 different introduction points from different tor instances, and N failover instances of this infrastructure competing to post descriptors. (Where N = 1, 2, 3.)
- a virtually unlimited number of tor servers doing the rendezvous and exchanging data (say 1 server per M clients, where M is perhaps 100 or so, but ideally dynamically determined based on load/response time).
In this scenario, you could potentially overload the introduction points.
Exactly my concern, especially when combined with overlong lifetimes of mostly-zombie descriptors.
Hopefully, at this point the onion service operator would inform the directory authority operators. They would then decide on higher values for the HSDir hashring consensus parameters, thus increasing the number of HSDir replicas per onion service.
Of course, this assumes a lot - including that the directory authorities will change, and that no-one has hard-coded the 6 replicas as a constant anywhere in their code. We might want to check this for OnionBalance.
Better to fix the issues at the source, if we can.
Tim