Can you suggest a retry amount and time interval? I think 10 times
once every 20 minutes for the Guards we selected but never connected
to and double or even triple that for the Guards we remember we were
once able to connect to is reasonable.
These values need to be low enough that buggy clients / buggy networks don't DoS guards. Consider scenarios where the client never receives any replies, or never succeeds in the handshake, or never remembers any replies, but is still sending connection requests.
I think 10 connections in 20 minutes is in the right range here.
Also, if we're redesigning the guard code, do we want to take the opportunity to implement exponential random backoff for guard connections?
(Exponential random backoff is a common strategy for avoiding network overload, based on the intuition that each retry is less likely to succeed, so we should retry after increasing intervals, and randomise the retry time, so every client doesn't retry at once.)
We've talked about using exponential backoff for client bootstrap connections to the directory authorities, or perhaps for failed tor connections in general. We've had issues in the past with buggy or obsolete clients retrying connections at a rapid pace, placing significant load on the authorities.
I'm not sure if exponential random backoff would be useful for failed guard connections, but I wanted to raise the idea during the redesign.
What this would look like in practice (a straw-man example):
If we want to connect a maximum of 10 times in 20 minutes, using exponential random backoff, we'd retry after approximately:
1, 2, 4, 8, 16, 32, 64, 128, 256, and 512 seconds.
Then we pick a random time in each interval 0-1, 1-2, 2-4, 4-8, … to actually do the reconnections.
This is a total of 10 connections over 511-1023 seconds, or 8.5 - 17 minutes, with a average of 12.75 minutes.
We could tweak the average to 20 minutes by using intervals of:
2, 3, 6, 12, 24, 48, 96, 192, 384, and 768 seconds (average 19.13 minutes)
Tim