Hey Tim,
just wanted to ask a clarifying question wrt #21969.
First of all there are various forms of #21969 (aka the "missing
descriptors for some of our primary entry guards" issue). Sometimes it
occurs for 10 mins and then goes away, whereas for other people it
disables their service permanently (until restart). I call this the
hardcore case of #21969. It has happened to me and disabled my service
for days, and I've also seen it happen to other people (e.g. dgoulet).
So. We have found various md-related bugs and put them as children of
#21969. Do you think we have found the bugs that can cause the hardcore
case of #21969? That is, is any of these bugs (or a bug combo) capable
of permanently disabling an onion service?
It seems to me that all the bugs identified so far can only cause #21969
to occur for a few hours before it self-heals itself. IIUC, even the
most fundamental bugs like #23862 and #23863 are only temporarily, since
eventually one of the dirguards will fetch the missing mds and give them
to the client. Do you think that's the case?
I'm asking you because I plan to spend some serious time next week on
#21969-related issues, and I'd like to prioritize between bug hunting
and bug fixing. That is, if the root cause of the hardcore case of
#21969 is still out there, I'd like to continue bug hunting until I find
it.
Let me know what you think! Perhaps you have other ideas here of how we
should approach this issue.
Cheers!! :)
PS: Sending this as an email since our timezones are making it kind hard
to synch up on IRC.