Re: [tor-dev] Proposal 210: Faster Headless Consensus Bootstrapping

15 Oct 2012


      Thus spake Nick Mathewson (nickm@alum.mit.edu):
...
On Thu, Oct 11, 2012 at 5:32 AM, Mike Perry mikeperry@torproject.org wrote:
...
Title: Faster Headless Consensus Bootstrapping
Author: Mike Perry
Design: Bootstrap Process Changes
The core idea is to attempt to establish bootstrap connections in
 parallel during the bootstrap process, and download the consensus from
 the first connection that completes.
Connection attempts will be done in batches of three. Only one
 connection will be performed to one of the canonical directory
 authorities. Two connections will be performed to randomly chosen hard
 coded directory mirrors.
I misread this paragraph at first.  I thought you were suggesting 3
parallel directory downloads when in fact you were discussing 3
parallel TLS connections, with only the first one that finishes
actually getting a download.
...
Design: Fallback Dir Mirror Selection
Out of scope for this proposal; relevant for proposal 206.
Ok. Consider it a vote for your "third option" in proposal 206 then.
Also consider that I wrote this proposal in such a way that it both
depends on 206, and is meant to make it possible to relax our
requirements mirror selection for 206.
I think the parallel connection idea makes us have to worry much less
about vetting the fallback dir mirrors quite so rigorously for
uptime+longevity, in addition to improving bootstrap delay in the event
of dirauth downtime.
...
...
Performance: Additional Load with Current Parameter Choices
This design and the connection count parameters were chosen such that
 no additional bandwidth load would be placed on the directory
 authorities. In fact, the directory authorities should experience less
 load, because they will not need to serve the consensus document for a
 connection in the event that one of the directory mirrors complete their
 connection before the directory authority does.
To be clear, it's the part of this proposal that's shared with
proposal 206 (directory sources) that would lower load on the
authorities.
Yes, this proposal depends upon 206. It doesn't make as much sense to
implement it by itself, I don't think.
...
...
However, the scheme does place additional TLS connection load on the
 fallback dir mirrors. Because bootstrapping is rare and all but one of
 the TLS connections will be very short-lived and unused, this should not
 be a substantial issue.
How do we know that bootstrapping is rare?
I guess it depends on the definition of rare. I meant compared to normal
directory activity.
The lack of a TBB update mechanism probably does make bootstrap more
prevalent than we'd like, I guess.
Also, if idle clients bootstrap if they've been idle more than 24 hours,
then it's probably quite prevalent. I assumed they at least attempted to
keep their consensus fresh, even if they were not being used. Am I
wrong?
...
...
The dangerous case is in the event of a prolonged consensus failure
 that induces all clients to enter into the bootstrap process. In this
 case, the number of initial TLS connections to the fallback dir mirrors
 would be 2*C/100, or 10,000 for C=500,000 users. If no connections
 complete before the five retries, this could reach as high as 50,000
 connection attempts, but this is extremely unlikely to happen in full
 aggregate.
However, in the no-consensus scenario today, the directory authorities
 would already experience C/9 or 55,555 connection attempts. The
 5-retry scheme increases their total maximum load to about 275,000
 connection attempts, but again this is unlikely to be reached
 in aggregate. Additionally, with this scheme, even if the dirauths
 are taken down by this load, the dir mirrors should be able to survive
 it.
This looks like an argument of the form "The outcome would be
horrible, but the current outcome is also horrible, so we wouldn't
break stuff any worse."  Right?
Well, more like "the outcome would be slightly less horrible, but also
more resilient to unavailability, and more performant."
I analyzed the extreme case specifically because it allows us to more
easily see the load consequences of the scheme than if we were to get
bogged down by say, trying to estimate bootstrap frequency in normal
operations. I think that is a distraction.
...
I wonder if in this case the answer isn't to actually back off from
fetching after N minutes or M servers, like a sane system.  Or to
treat "hey, that's not a good consensus!" as different from "couldn't
connect to directory server" in terms of what it means for how we back
off.
I have limits on the number of retries and total concurrent connection
counts in the proposals. We can tweak them.
I thought about putting in a back-off in terms of retry frequency, but
it didn't seem like a clear win over just limiting things in the first
place, because there's already an implicit backoff by virtue of simply
waiting for the TLS connection timeouts to expire once we hit the total
pending connection limit.
-- 
Mike Perry

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [tor-dev] Proposal 210: Faster Headless Consensus Bootstrapping