Hi All,
I have revised proposal #210 - Faster Headless Consensus Bootstrapping today, after a number of discussions with Peter Palfrader Nick Mathewson, Mike Perry, and others.
This proposal aims to improve tor’s consensus download behaviour when the authorities (or directory mirrors) are down. It has tor initiate multiple concurrent consensus connections, then download the consensus through the first TLS connection that completes.
This proposal is a solution to bug #4483 - If k of n authorities are down, k/n bootstrapping clients are delayed for minutes. It is also needed to implement #15775 - Add IPv4 Fallback Directory List to tor... and #8374 - Ship list of fallback directory mirrors on long-term fixed IPv6 addresses.
The key changes are: * modify the scheme to perform exponential backoff on connections, rather than connections in batches * modify the scheme to enable IPv6 bootstrap on IPv6-only clients (see also #17217 - Change clients to automatically use IPv6 if they can bootstrap over it) * specify a way that clients can still benefit from clock verification via TLS connections with the authorities, without downloading the entire consensus from the authorities if it is available sooner from a mirror * analyse the expected failure rate and additional connection load imposed by this proposal
The full text is included below, and a branch with these changes is available as bootstrap-exponential-backoff in https://github.com/teor2345/torspec.git https://github.com/teor2345/torspec.git
Please feel free to respond here, or on the #4483 ticket at https://trac.torproject.org/projects/tor/ticket/4483 https://trac.torproject.org/projects/tor/ticket/4483
Thanks
Tim
-----
Filename: 210-faster-headless-consensus-bootstrap.txt Title: Faster Headless Consensus Bootstrapping Author: Mike Perry, Tim Wilson-Brown, Peter Palfrader Created: 01-10-2012 Last Modified: 02-10-2015 Status: Open Target: 0.2.8.x+
Overview and Motiviation
This proposal describes a way for clients to fetch the initial consensus more quickly in situations where some or all of the directory authorities are unreachable. This proposal is meant to describe a solution for bug #4483.
Design: Bootstrap Process Changes
The core idea is to attempt to establish bootstrap connections in parallel during the bootstrap process, and download the consensus from the first connection that completes.
Connection attempts will be performed on an exponential backoff basis. Initially, connections will be performed to a randomly chosen hard coded directory mirror and a randomly chosen canonical directory authority. If neither of these connections complete, additional mirror and authority connections are tried. Mirror connections are tried at a faster rate than authority connections.
We specify that mirror connections retry after half a second, and then double the retry time with every connection: 0, 1, 2, 4, 8, 16, 32, ...
We specify that directory authority connections retry after 5 seconds, and then double the retry time with every connection: 0, 10, 20, ...
If the client has both an IPv4 and IPv6 address, we try IPv4 and IPv6 mirrors and authorities on the following schedule: IPv4, IPv6, IPv4, IPv6, ...
We try IPv4 first to avoid overloading IPv6-enabled authorities and mirrors. Mirrors and auths get a separate IPv4/IPv6 schedule. This ensures that we try an IPv6 authority within the first 10 seconds. This helps implement #8374 and related tickets.
The maximum retry time for both timers is 3 days + 1 hour. This places a small load on the mirrors and authorities, while allowing a client that regains a network connection to eventually download a consensus.
The retry timers must reset on HUP and any network reachability events, [ TODO: do we have network reachability events? ] so that clients that have unreliable networks can recover from network failures.
The first connection to complete will be used to download the consensus document and the others will be closed, after which bootstrapping will proceed as normal.
A benefit of connecting to directory authorities is that clients are warned if their clock is wrong. Therefore, when closing a directory authority connection, we check to see if we have successfully connected to an authority during this run of the Tor client. If not, we allow the authority TLS connection to complete, then close the connection.
We expect the vast majority of clients to succeed within 4 seconds, after making up to 4 connection attempts to mirrors and 1 connection attempt to an authority. Clients which can't connect in the first 10 seconds, will try 1 more mirror, then try to contact another directory authority. We expect almost all clients to succeed within 10 seconds. This is a much better success rate than the current Tor implementation, which fails k/n of clients if k of the n directory authorities are down. (Or, if the connection fails in certain ways, (k/n)^2.)
If at any time, the total outstanding bootstrap connection attempts exceeds 10, no new connection attempts are to be launched until an existing connection attempt experiences full timeout. The retry time is not doubled when a connection is skipped.
Design: Fallback Dir Mirror Selection
The set of hard coded directory mirrors from #572 shall be chosen using the 100 Guard nodes with the longest uptime.
The fallback weights will be set using each mirror's fraction of consensus bandwidth out of the total of all 100 mirrors, adjusted to ensure no fallback directory sees more than 10% of clients. We will also exclude fallback directories that are less than 1/1000 of the consensus weight, as they are not large enough to make it worthwhile including them.
This list of fallback dir mirrors should be updated with every major Tor release. In future releases, the number of dir mirrors should be set at 20% of the current Guard nodes (approximately 200 as of October 2015), rather than fixed at 100.
Performance: Additional Load with Current Parameter Choices
This design and the connection count parameters were chosen such that no additional bandwidth load would be placed on the directory authorities. In fact, the directory authorities should experience less load, because they will not need to serve the consensus document for a connection in the event that one of the directory mirrors complete their connection before the directory authority does.
However, the scheme does place additional TLS connection load on the fallback dir mirrors. Because bootstrapping is rare and all but one of the TLS connections will be very short-lived and unused, this should not be a substantial issue.
The dangerous case is in the event of a prolonged consensus failure that induces all clients to enter into the bootstrap process. In this case, the number of TLS connections to the fallback dir mirrors within the first second would be 2*C/100, or 40,000 for C=2,000,000 users. If no connections complete before the 10 retries, 7 of which go to mirrors, this could reach as high as 140,000 connection attempts, but this is extremely unlikely to happen in full aggregate.
However, in the no-consensus scenario today, the directory authorities would already experience 2*C/9 or 444,444 connection attempts. (Tor currently tries 2 authorities, before delaying the next attempt.) The 10-retry scheme, 3 of which go to authorities, increases their total maximum load to about 666,666 connection attempts, but again this is unlikely to be reached in aggregate. Additionally, with this scheme, even if the dirauths are taken down by this load, the dir mirrors should be able to survive it.
Implementation Notes: Code Modifications
The implementation of the bootstrap process is unfortunately mixed in with many types of directory activity.
The process starts in update_consensus_networkstatus_downloads(), which initiates a single directory connection through directory_get_from_dirserver(). Depending on bootstrap state, a single directory server is selected and a connection is eventually made through directory_initiate_command_rend().
There appear to be a few options for altering this code to retry multiple simultaneous connections. Without refactoring, one approach would be to set a connection retry helper function timer in directory_initiate_command_routerstatus() from directory_get_from_dirserver() if the purpose is DIR_PURPOSE_FETCH_CONSENSUS and the only directory servers available are the authorities and the fallback dir mirrors. (That is, there is no valid consensus.) The retry helper function would check the list of pending connections and, if it is 10 or greater, skip the connection attempt, and leave the retry time constant.
The code in directory_initiate_command_rend() would then need to be altered to maintain a list of the dircons created for this purpose as well as avoid immediately queuing the directory_send_command() request for the DIR_PURPOSE_FETCH_CONSENSUS purpose. A flag would need to be set on the dircon to be checked in connection_dir_finished_connecting().
The function connection_dir_finished_connecting() would need to be altered to examine the list of pending dircons, determine if this one is the first to complete, and if so, then call directory_send_command() to download the consensus and close the other pending dircons. connection_dir_finished_connecting() would also cancel the timer.
Reliability Analysis
We make the pessimistic assumptions that 50% of connections to directory mirrors fail, and that 20% of connections to authorities fail. (Actual figures depend on relay churn, age of the fallback list, and authority uptime.)
We expect the first 10 connection retry times to be: Mirror: 0s 1s 2s 4s 8s 16s 32s Auth: 0s 10s 20s Success: 90% 95% 97% 98.7% 99.4% 99.89% 99.94% 99.988% 99.994%
97% of clients succeed in the first 2 seconds. 99.4% of clients succeed without trying a second authority. 99.89% of clients succeed in the first 10 seconds. 0.11% of clients remain, but in this scenario, 2 authorities are down, so the client is most likely blocked from the Tor network.
The current implementation makes 1 or 2 authority connections within the first second, depending on exactly how the first connection fails. Under the 20% authority failure assumption, these clients would have a success rate of either 80% or 96% within a few seconds. The scheme above has a greater success rate in the first few seconds, while spreading the load among a larger number of directory mirrors. In addition, if all the authorities are blocked, current clients will inevitably fail, as they do not have a list of directory mirrors.
Hi All,
I have continued to revise and implement proposal #210: Faster Headless Consensus Bootstrap.
This proposal aims to improve tor’s consensus download behaviour when the authorities (or directory mirrors) are down. It has tor initiate multiple concurrent consensus connections, then download the consensus through the first TLS connection that completes.
The proposal is available inline and attached below. It has been merged to the master branch of torspec as proposals/210-faster-headless-consensus-bootstrap.txt
I have a working implementation for IPv4 that requires: * the feature15775-fallback-v8 branch from #15775 at https://github.com/teor2345/tor.git https://github.com/teor2345/tor.git * a list of fallback directories to be placed at src/or/fallback_dirs.inc (a sample list is available attached to #15775) * the feature4483-v7 branch from #4483 at https://github.com/teor2345/tor.git https://github.com/teor2345/tor.git
The implementation is missing: * some unit tests, * fine-tuning of the connection schedules, * thorough testing.
Please also see my comments on my original revision:
On 3 Oct 2015, at 01:57, Tim Wilson-Brown - teor teor2345@gmail.com wrote: ... This proposal is a solution to bug #4483 - If k of n authorities are down, k/n bootstrapping clients are delayed for minutes. It is also needed to implement #15775 - Add IPv4 Fallback Directory List to tor... and #8374 - Ship list of fallback directory mirrors on long-term fixed IPv6 addresses.
The key changes are:
- modify the scheme to perform exponential backoff on connections, rather than connections in batches
Done using schedules, so the backoff rate can be changed.
- modify the scheme to enable IPv6 bootstrap on IPv6-only clients (see also #17217 - Change clients to automatically use IPv6 if they can bootstrap over it)
IPv6 is out of scope for the initial implementation, see #17281.
- specify a way that clients can still benefit from clock verification via TLS connections with the authorities, without downloading the entire consensus from the authorities if it is available sooner from a mirror
This isn’t possible with the way the code is currently written, as the entire document is downloaded, then decompressed. Instead, try an authority and a fallback, and use whichever connects first. Clients with a reasonable connection to an authority will end up checking their clock whenever the (randomly chosen) authority is faster than the (randomly chosen) fallback.
- analyse the expected failure rate and additional connection load imposed by this proposal
Done!
-----
Filename: 210-faster-headless-consensus-bootstrap.txt Title: Faster Headless Consensus Bootstrapping Author: Mike Perry, Tim Wilson-Brown, Peter Palfrader Created: 01-10-2012 Last Modified: 02-10-2015 Status: Open Target: 0.2.8.x+
Overview and Motiviation
This proposal describes a way for clients to fetch the initial consensus more quickly in situations where some or all of the directory authorities are unreachable. This proposal is meant to describe a solution for bug #4483.
Design: Bootstrap Process Changes
The core idea is to attempt to establish bootstrap connections in parallel during the bootstrap process, and download the consensus from the first connection that completes.
Connection attempts will be performed on an exponential backoff basis. Initially, connections will be performed to a randomly chosen hard coded directory mirror and a randomly chosen canonical directory authority. If neither of these connections complete, additional mirror and authority connections are tried. Mirror connections are tried at a faster rate than authority connections.
Clients represent the majority of the load on the network. They can use directory mirrors to download their documents, as the mirrors download their documents from the authorities early in the consensus validity period.
We specify that client mirror connections retry after one second, and then double the retry time with every connection attempt: 0, 1, 2, 4, 8, 16, 32, ... (The timers currently implemented in Tor increment with every connection failure.)
We specify that client directory authority connections retry after 10 seconds, and then double the retry time with every connection: 0, 10, 20, ...
If a client has both an IPv4 and IPv6 address, it will try IPv4 and IPv6 mirrors and authorities on the following schedule: IPv4, IPv6, IPv4, IPv6, ...
[ TODO: should we add random noise to these scheduled times? - teor Tor doesn’t add random noise to the current failure-based timers, but as failures are a network event, they are somewhat random/arbitrary already. These attempt-based timers will go off every few seconds, exactly erraon the second. ]
(Relays can’t use directory mirrors to download their documents, as they *are* the directory mirrors.)
The maximum retry time for all these timers is 3 days + 1 hour. This places a small load on the mirrors and authorities, while allowing a client that regains a network connection to eventually download a consensus.
We try IPv4 first to avoid overloading IPv6-enabled authorities and mirrors. Each timing schedule uses a separate IPv4/IPv6 schedule. This ensures that clients try an IPv6 authority within the first 10 seconds. This helps implement #8374 and related tickets.
We don't want to keep on trying an IP version that always fails. Therefore, once sufficient IPv4 and IPv6 connections have been attempted, we select an IP version for new connections based on the ratio of their failure rates, up to a maximum of 1:5. This may not make a substantial difference to consensus downloads, as we only need one successful consensus download to bootstrap. However, it is important for future features like #17217, where clients try to automatically determine if they can use IPv4 or IPv6 to contact the Tor network.
The retry timers and IP version schedules must reset on HUP and any network reachability events, so that clients that have unreliable networks can recover from network failures. [ TODO: Do we do this for any other timers? I think this needs another proposal, it’s out of scope here. - teor ]
The first connection to complete will be used to download the consensus document and the others will be closed, after which bootstrapping will proceed as normal.
We expect the vast majority of clients to succeed within 4 seconds, after making up to 4 connection attempts to mirrors and 1 connection attempt to an authority. Clients which can't connect in the first 10 seconds, will try 1 more mirror, then try to contact another directory authority. We expect almost all clients to succeed within 10 seconds. This is a much better success rate than the current Tor implementation, which fails k/n of clients if k of the n directory authorities are down. (Or, if the connection fails in certain ways, it will retry once, failing 1-(1-(k/n)^2).)
If at any time, the total outstanding bootstrap connection attempts exceeds 10, no new connection attempts are to be launched until an existing connection attempt experiences full timeout. The retry time is not doubled when a connection is skipped.
A benefit of connecting to directory authorities is that clients are warned if their clock is wrong. Starting the authority and fallback schedules at the same time should ensure that some clients check their clock with an authority at each bootstrap.
Design: Fallback Dir Mirror Selection
The set of hard coded directory mirrors from #572 shall be chosen using the 100 Guard nodes with the longest uptime.
The fallback weights will be set using each mirror's fraction of consensus bandwidth out of the total of all 100 mirrors, adjusted to ensure no fallback directory sees more than 10% of clients. We will also exclude fallback directories that are less than 1/1000 of the consensus weight, as they are not large enough to make it worthwhile including them.
This list of fallback dir mirrors should be updated with every major Tor release. In future releases, the number of dir mirrors should be set at 20% of the current Guard nodes (approximately 200 as of October 2015), rather than fixed at 100.
[TODO: change the script to dynamically calculate an upper limit.]
Performance: Additional Load with Current Parameter Choices
This design and the connection count parameters were chosen such that no additional bandwidth load would be placed on the directory authorities. In fact, the directory authorities should experience less load, because they will not need to serve the entire consensus document for a connection in the event that one of the directory mirrors complete their connection before the directory authority does.
However, the scheme does place additional TLS connection load on the fallback dir mirrors. Because bootstrapping is rare, and all but one of the TLS connections will be very short-lived and unused, this should not be a substantial issue.
The dangerous case is in the event of a prolonged consensus failure that induces all clients to enter into the bootstrap process. In this case, the number of TLS connections to the fallback dir mirrors within the first second would be 2*C/100, or 40,000 for C=2,000,000 users. If no connections complete before the 10 retries, 7 of which go to mirrors, this could reach as high as 140,000 connection attempts, but this is extremely unlikely to happen in full aggregate.
However, in the no-consensus scenario today, the directory authorities would already experience 2*C/9 or 444,444 connection attempts. (Tor currently tries 2 authorities, before delaying the next attempt.) The 10-retry scheme, 3 of which go to authorities, increases their total maximum load to about 666,666 connection attempts, but again this is unlikely to be reached in aggregate. Additionally, with this scheme, even if the dirauths are taken down by this load, the dir mirrors should be able to survive it.
Implementation Notes: Code Modifications
The implementation of the bootstrap process is unfortunately mixed in with many types of directory activity.
The process starts in update_consensus_networkstatus_downloads(), which initiates a single directory connection through directory_get_from_dirserver(). Depending on bootstrap state, a single directory server is selected and a connection is eventually made through directory_initiate_command_rend().
There appear to be a few options for altering this code to retry multiple simultaneous connections. It looks like we can modify update_consensus_networkstatus_downloads() to make connections more often if the purpose is DIR_PURPOSE_FETCH_CONSENSUS and there is no valid (reasonably live) consensus. We can make multiple connections from update_consensus_networkstatus_downloads(), as the sockets are non-blocking. (This socket appears to be non-blocking on Unixes (SOCK_NONBLOCK & O_NONBLOCK) and Windows (FIONBIO).) As long as we can tolerate a timer resolution of ~1 second (due to the use of second_elapsed_callback and time_t), this requires no additional timers or callbacks. We can make 1 connection for each schedule per second, for a maximum of 2 per second.
The schedules can be specified in: TestingClientBootstrapConsensusAuthorityDownloadSchedule TestingClientBootstrapConsensusFallbackDownloadSchedule (Similar to the existing TestingClientConsensusDownloadSchedule.)
TestingServerIPVersionPreferenceSchedule (Consisting of a CSV like “4,6,4,6”, or perhaps “0,1,0,1”.)
update_consensus_networkstatus_downloads() checks the list of pending connections and, if it is 10 or greater, skip the connection attempt, and leave the retry time constant.
The code in directory_send_command() and connection_finished_connecting() would need to be altered to check that we are not already downloading the consensus. If we’re not, then download the consensus on this connection, and close any other pending consensus dircons.
We might also need to make similar changes in authority_certs_fetch_missing(), as we can’t use a consensus until we have enough authority certificates. However, Tor already makes multiple requests (one per certificate), and only needs a majority of certificates to validate a consensus. Therefore, we will only need to modify authority_certs_fetch_missing() if clients download a consensus, then end up getting stuck downloading certificates. (Current tests show bootstrapping working well without any changes to authority certificate fetches.)
Reliability Analysis
We make the pessimistic assumptions that 50% of connections to directory mirrors fail, and that 20% of connections to authorities fail. (Actual figures depend on relay churn, age of the fallback list, and authority uptime.)
We expect the first 10 connection retry times to be: (Research shows users tend to lose interest after 40 seconds.) Mirror: 0s 1s 2s 4s 8s 16s 32s Auth: 0s 10s 20s Success: 90% 95% 97% 98.7% 99.4% 99.89% 99.94% 99.988% 99.994%
97% of clients succeed in the first 2 seconds. 99.4% of clients succeed without trying a second authority. 99.89% of clients succeed in the first 10 seconds. 0.11% of clients remain, but in this scenario, 2 authorities are unreachable, so the client is most likely blocked from the Tor network. Alternately, they will likely succeed on relaunch.
The current implementation makes 1 or 2 authority connections within the first second, depending on exactly how the first connection fails. Under the 20% authority failure assumption, these clients would have a success rate of either 80% or 96% within a few seconds. The scheme above has a greater success rate in the first few seconds, while spreading the load among a larger number of directory mirrors. In addition, if all the authorities are blocked, current clients will inevitably fail, as they do not have a list of directory mirrors.