On 23 Oct 2015, at 03:30, Alec Muffett alecm@fb.com wrote:
However, you mention that one DC going down could cause a bad experience for users. In most HA/DR setups I've seen there should be enough capacity if something fails, is that not the case for you? Can a single data center not serve all Tor traffic?
It's not the datacentre which worries me - we already know how to deal with those - it's the failure-based resource contention for the limited introduction-point space that is afforded by a maximum (?) of six descriptors each of which cites 10 introduction points.
A cap of 60 IPs is a clear protocol bottleneck which - even with your excellent idea - could break a service deployment.
Let's try a crazier and quite possibly terrible idea: (Consider it a thought experiment rather than a serious technical proposal. Based on my limited understanding, I know I will make mistakes with the details.)
What if a high-volume onion service tries to post descriptors to all of the HSDirs that a client might try, not just the typical 6?
Here's how it might work:
At any point in time, a client may be using any one of the three valid consensuses. (Technically, clients can be using any one of the last 24 to bootstrap, but they update to the latest during bootstrap.)
(Clients which are running constantly will download a new consensus near the end of their current consensus validity period. This might mean that fewer clients are using the latest consensus, for example.)
Therefore, depending on HSDir hashring churn, clients might be trying HSDirs outside the typical 6 (that is, 3 hashring positions, with 2 HSDirs selected side-by-side in each position, specifically to mitigate this very issue).
Also, when the hashring is close to rotating (every 24 hours), Tor will post to both the old and new HSDirs.
What if: * an onion service posts a different descriptor to each HSDir a client might be querying, based on any valid consensus and any nearby hashring rotation; and * different introduction points are included in each descriptor.
I can see this generating up to 3 (consensuses) x 2 (hashring positions) x 3 (hashring positions) x 2 (hashring replicas) x 10 (introduction points per descriptor) = 360 introduction points per service.
Unfortunately, the potential increase in introduction points varies based on the consensus HSDir list churn, and the time of day. These are a poor basis for load-balancing.
Also, if HSDir churn and client clock skew are so bad that clients could be accessing any one of 36 HSDirs, we should have noticed clients which couldn't find any of their HSDirs, and already increased the side-by-side replica count.
So I think it's a terrible idea, but I wonder if we could squeeze another 60 introduction points out of this scheme, or a scheme like it.
Tim