So I’ve just had a conversation with dgoulet on IRC, which I will reformat and subedit here as a conversation regarding OnionBalance and issues in 2.6 and 2.7 when a recently rebooted HS publishes a fresh descriptor:
[…]
alecm: consider OnionBalance which - being a bunch of daemons on a bunch of servers - will be a lot more prone to intermittent failures of 1+ daemons yielding a lot of republishing
alecm: we tend to move services around, and daemons will be killed in one place and resurrected elsewhere, and then we'll have to bundle up a new descriptor and ship it out
dgoulet: hrm so with that new 027 cache behavior, as long as the IP are usable, the descriptor will be kept, if they all become unusable, a new descriptor fetch is triggered and then those IPs will be tried
alecm: There's a mandatory refresh [of the descriptor] after N minutes?
dgoulet: we'll retry 3 times and after that all HSDir are in timeout for 15 minutes (I think, I'll have to validate) before retrying any HSDirs
alecm: I wonder if descriptors should publish a recommended TTL - [number of seconds to live before refresh]
dgoulet: yeah we have an idea for a "revision-counter" in the descriptor being incremented at each new version for the 24 hours period
dgoulet: a TTL could be useful for load balancing though!
alecm: so, here's a scenario: imagine that we run 10 daemons,
alecm: call these daemons: A B C D E F G H I J - they all have random onion addresses
alecm: we steal one IP from each daemon, and bundle the 10 stolen IPs together to make an onionbalance site descriptor and publish it
alecm: people pull that descriptor, it's quite popular
alecm: we then lose power in a datacentre, which takes out half of our onions - say, A through E
alecm: we reboot the datacentre and restart A-E merely 10 minutes later
alecm: everyone who has already loaded our onionbalance site descriptor tests A B C D E and finds them all dead, because the old IPs for A-E are invalid
alecm: so they all move to F G H I J - which get overloaded even though (new) A B C D E are back up
alecm: and this persists for up to 244, even though the outage was only 10 minutes
alecm: net result: large chunks of the world (anyone with an old descriptor + anyone randomly choosing F-J) have a shitty experience, which is not what high-availability is all about :-)
dgoulet: that will be what's going to happen - having a TTL in the desc. would help here indeed, I see the issue
dgoulet: TTL would be one thing to add, here we could also add a mechanism for a client retrying IPs that failed in the situation where some of the IPs are still working, or making client balance themself randomly could be also an idea
dgoulet: definitely there is some content here for tor-dev - I don't have a good answer but it should definitely be addressed
alecm: proper random selection of IP would be beneficial for load-balancing; not perfect, but in the long run, helpful
— Alec Muffett Security Infrastructure Facebook Engineering London
typo:
alecm: and this persists for up to 24h, even though the outage was only 10 minutes
Also, I neglected to observe that linear polling of A-E seeking a descriptor suggests A will be hammered whilst J is nearly idle.
Some entropy in IP selection would be a good thing.
-a
— Alec Muffett Security Infrastructure Facebook Engineering London
On 21 Oct 2015, at 10:22, Alec Muffett alecm@fb.com wrote:
typo:
alecm: and this persists for up to 24h, even though the outage was only 10 minutes
Also, I neglected to observe that linear polling of A-E seeking a descriptor suggests A will be hammered whilst J is nearly idle.
Do you mean "seeking an introduction"?
Do we connect to introduction points in the order they are listed in the descriptor? If so, that's not ideal, there are surely benefits to a random choice (such as load balancing).
That said, we believe that rendezvous points are the bottleneck in the rendezvous protocol, not introduction points.
However, if you were to use proposal #255 to split the introduction and rendezvous to separate tor instances, you would then be limited to: - 6*10*N tor introduction points, where there are 6 HSDirs, each receiving 10 different introduction points from different tor instances, and N failover instances of this infrastructure competing to post descriptors. (Where N = 1, 2, 3.) - a virtually unlimited number of tor servers doing the rendezvous and exchanging data (say 1 server per M clients, where M is perhaps 100 or so, but ideally dynamically determined based on load/response time).
In this scenario, you could potentially overload the introduction points.
Some entropy in IP selection would be a good thing.
I agree!
Tim
Alec Muffett alecm@fb.com writes:
typo:
alecm: and this persists for up to 24h, even though the outage was only 10 minutes
Also, I neglected to observe that linear polling of A-E seeking a descriptor suggests A will be hammered whilst J is nearly idle.
Some entropy in IP selection would be a good thing.
Please see rend_client_get_random_intro_impl(). Clients will pick a random intro point from the descriptor which seems to be the proper behavior here.
I can see how a TTL might be useful in high availability scenarios like the one you described. However, it does seem like something with potential security implications (like, set TTL to 1 second for all your descriptors, and now you have your clients keep on making directory circuits to fetch your descs).
For this reason I'd be interested to see this specified in a formal Tor proposal (or even as a patch to prop224). It shouldn't be too big! :)
Cheers!
On 21 Oct 2015, at 00:18, Alec Muffett alecm@fb.com wrote:
So I’ve just had a conversation with dgoulet on IRC, which I will reformat and subedit here as a conversation regarding OnionBalance and issues in 2.6 and 2.7 when a recently rebooted HS publishes a fresh descriptor:
[…]
alecm: consider OnionBalance which - being a bunch of daemons on a bunch of servers - will be a lot more prone to intermittent failures of 1+ daemons yielding a lot of republishing
alecm: we tend to move services around, and daemons will be killed in one place and resurrected elsewhere, and then we'll have to bundle up a new descriptor and ship it out
dgoulet: hrm so with that new 027 cache behavior, as long as the IP are usable, the descriptor will be kept, if they all become unusable, a new descriptor fetch is triggered and then those IPs will be tried
alecm: There's a mandatory refresh [of the descriptor] after N minutes?
dgoulet: we'll retry 3 times and after that all HSDir are in timeout for 15 minutes (I think, I'll have to validate) before retrying any HSDirs
alecm: I wonder if descriptors should publish a recommended TTL - [number of seconds to live before refresh]
dgoulet: yeah we have an idea for a "revision-counter" in the descriptor being incremented at each new version for the 24 hours period
dgoulet: a TTL could be useful for load balancing though!
alecm: so, here's a scenario: imagine that we run 10 daemons,
alecm: call these daemons: A B C D E F G H I J - they all have random onion addresses
alecm: we steal one IP from each daemon, and bundle the 10 stolen IPs together to make an onionbalance site descriptor and publish it
alecm: people pull that descriptor, it's quite popular
alecm: we then lose power in a datacentre, which takes out half of our onions - say, A through E
alecm: we reboot the datacentre and restart A-E merely 10 minutes later
alecm: everyone who has already loaded our onionbalance site descriptor tests A B C D E and finds them all dead, because the old IPs for A-E are invalid
alecm: so they all move to F G H I J - which get overloaded even though (new) A B C D E are back up
alecm: and this persists for up to 244, even though the outage was only 10 minutes
alecm: net result: large chunks of the world (anyone with an old descriptor + anyone randomly choosing F-J) have a shitty experience, which is not what high-availability is all about :-)
dgoulet: that will be what's going to happen - having a TTL in the desc. would help here indeed, I see the issue
dgoulet: TTL would be one thing to add, here we could also add a mechanism for a client retrying IPs that failed in the situation where some of the IPs are still working, or making client balance themself randomly could be also an idea
dgoulet: definitely there is some content here for tor-dev - I don't have a good answer but it should definitely be addressed
alecm: proper random selection of IP would be beneficial for load-balancing; not perfect, but in the long run, helpful
Hi Alec,
Most of what you said sounds right, and I agree that caching needs TTLs (not just here, all caches need to have them, always).
However, you mention that one DC going down could cause a bad experience for users. In most HA/DR setups I've seen there should be enough capacity if something fails, is that not the case for you? Can a single data center not serve all Tor traffic?
If that is a problem, I would suggest adding more data centers to the pool. That way if one fails, you don't lose half of the capacity, but a third (if N=3) or even a tenth (if N=10).
Anyway, such a thing is probably off-topic. To get back to the point about TTLs, I just want to note that retrying failed nodes until all fail is scary: what will happen if all ten nodes get a 'rolling restart' throughout the day? Wouldn't you eventually end up with all the traffic on a single node, as it's the only one that hadn't been restarted yet?
As far as I can see the only thing that can avoid holes like that is a TTL, either hard coded to something like an hour, or just specified in the descriptor. Then, if you do a rolling restart, make sure you don't do it all within one TTL length, but at least two or three depending on capacity.
Tom