On Thu, Jan 28, 2021 at 07:00:45PM +0100, lists@for-privacy.net wrote:
Metrics showed my relay offline. But my Tor daemon is running normally. Then I saw _many_ relays suddenly have flag: staledesc ?
https://metrics.torproject.org/rs.html#search/flag:staledesc
Yep. The reason that happens is that the directory authorities are receiving too many dirport connections from exit relays, but the exit relays use a dirport connection to post their own descriptor.
So if we don't handle all of the dirport attempts, then we end up not receiving some of the descriptor publish attempts.
I'm thinking that this part will still work out though, for two reasons.
One is that if *any* of the dir auths receive the descriptor, then they will mention it in their next vote, and the other dir auths will learn about it from that vote and ask for a copy.
And two is that relays watch to see if they are still listed in the consensus, and if they're not then they try more often to upload a new descriptor.
So yes, we are making an effort to make sure there is at least one dir auth that will be good at receiving descriptor publishes.
Some small fraction of relays are expected to get the StaleDesc flag in normal network operation, because there is an unfortunate interaction between how relays publish a new descriptor "every 18 hours or when something important changes", but dir auths ignore new descriptors if they are too close in time or other characteristics to one that they already have. So for example there is a known bad interaction where you restart your relay, and the relay publishes a new descriptor because it doesn't know that it just published one earlier, but then the dir auths discard that new descriptor because they already have the old one, and then your relay waits 18 hours to create a new one.
For much more backstory, see https://gitlab.torproject.org/tpo/core/tor/-/issues/1810 https://gitlab.torproject.org/tpo/core/tor/-/issues/2479 https://gitlab.torproject.org/tpo/core/tor/-/issues/3327 https://gitweb.torproject.org/torspec.git/tree/proposals/293-know-when-to-pu...
But I guess the other way to look at it is: the StaleDesc flag is a *feature*, to let your relay know that it has fallen into this edge case so it can take steps to recover.
https://metrics.torproject.org/rs.html#details/5D84900DBE6D6365684A9675B81A6...
This relay looks genuinely down.
--Roger