Hi Damian and Tom,
Roger discovered that dannenberg did not include any exit flags in certain votes anymore [1].
It would be great if we would detect and notify about such events in the future.
I see two places where this could be added:
DocTor: a new check that alerts on events where a certain dir auth does either no longer include certain flags (guard, hsdir, exit, ..) at all or better: if the amount of relays with a certain flag significantly dropped by xx % from one vote to the next.
consenus-health graphs: we have nice graphs per dirauth and bwauth, if we would have per-dirauthvote-per-flag (mainly guard, exit, hsdir - we have already running) graphs as well we could spot such events (and even trends) better. (btw: what caused there recent flat-line in graphs on 2018-02-03 - 2018-02-05)
What do you think?
thanks for considering it, nusenu
[1] https://lists.torproject.org/pipermail/tor-relays/2018-February/014480.html
Thanks nusenu! Nice idea, added it to DocTor...
https://gitweb.torproject.org/doctor.git/commit/?id=8945013
It gives a notice if flags issued by an authority are 50% different from the conensus. Presently there's only one instance of that...
[consensus-health] NOTICE: moria1 had 756 HSDir flags in its vote but the consensus had 2583
On Sun, Feb 11, 2018 at 1:21 AM, nusenu nusenu-lists@riseup.net wrote:
Hi Damian and Tom,
Roger discovered that dannenberg did not include any exit flags in certain votes anymore [1].
It would be great if we would detect and notify about such events in the future.
I see two places where this could be added:
DocTor: a new check that alerts on events where a certain dir auth does either no longer include certain flags (guard, hsdir, exit, ..) at all or better: if the amount of relays with a certain flag significantly dropped by xx % from one vote to the next.
consenus-health graphs: we have nice graphs per dirauth and bwauth, if we would have per-dirauthvote-per-flag (mainly guard, exit, hsdir - we have already running) graphs as well we could spot such events (and even trends) better. (btw: what caused there recent flat-line in graphs on 2018-02-03 - 2018-02-05)
What do you think?
thanks for considering it, nusenu
[1] https://lists.torproject.org/pipermail/tor-relays/2018-February/014480.html
-- https://mastodon.social/@nusenu twitter: @nusenu_
Thanks nusenu! Nice idea, added it to DocTor...
thanks for implementing the new check so fast.
https://gitweb.torproject.org/doctor.git/commit/?id=8945013
It gives a notice if flags issued by an authority are 50% different from the conensus. Presently there's only one instance of that...
This is also very useful but slightly different from what I had in mind, because it would not trigger if dirauths upgrade from A to B in the same hour and most exits, guards or hsdirs are gone due to a bug in version B.
NOTICE: moria1 had 756 HSDir flags in its vote but the consensus had 2583
I tried to find something related to this in the 0.3.3.x changelogs because moria1 (the affected dirauth) is the only one running tor alpha but I didn't find anything related to a change in what is required to earn the HSDir flag. Has there been any change related to how HSDir is assigned that would explain that significant difference?
thanks for implementing the new check so fast.
No problem! Thanks for suggesting it.
This is also very useful but slightly different from what I had in mind, because it would not trigger if dirauths upgrade from A to B in the same hour and most exits, guards or hsdirs are gone due to a bug in version B.
This should catch a bug with B unless every authority upgrades to B in the same hour. Otherwise we'd get an alert - either because the majority is B and the remaining A votes are out of band, or the consensus is made with A and authorities that upgraded to B are different.
Is there another check in particular that you'd like? One gotcha is that checks that require state (such as comparing with the last hour's consensus) is a bit more work.
I tried to find something related to this in the 0.3.3.x changelogs because moria1 (the affected dirauth) is the only one running tor alpha but I didn't find anything related to a change in what is required to earn the HSDir flag. Has there been any change related to how HSDir is assigned that would explain that significant difference?
For what it's worth I started with alarming when authorities differed more than 20% from the consensus but it was a bit noisier...
[consensus-health] NOTICE: longclaw had 3100 HSDir flags in its vote but the consensus had 2583 [consensus-health] NOTICE: moria1 had 756 HSDir flags in its vote but the consensus had 2583 [consensus-health] NOTICE: moria1 had 1397 Guard flags in its vote but the consensus had 1761
Damian Johnson:
thanks for implementing the new check so fast.
No problem! Thanks for suggesting it.
This is also very useful but slightly different from what I had in mind, because it would not trigger if dirauths upgrade from A to B in the same hour and most exits, guards or hsdirs are gone due to a bug in version B.
This should catch a bug with B unless every authority upgrades to B in the same hour. Otherwise we'd get an alert - either because the majority is B and the remaining A votes are out of band, or the consensus is made with A and authorities that upgraded to B are different.
Is there another check in particular that you'd like?
Yes, but not directly related to this thread. I will file it via trac.tpo.
One gotcha is that checks that require state (such as comparing with the last hour's consensus) is a bit more work.
Yes, that is what I was wondering if DocTor keeps any state at all already.
I tried to find something related to this in the 0.3.3.x changelogs because moria1 (the affected dirauth) is the only one running tor alpha but I didn't find anything related to a change in what is required to earn the HSDir flag. Has there been any change related to how HSDir is assigned that would explain that significant difference?
For what it's worth I started with alarming when authorities differed more than 20% from the consensus but it was a bit noisier...
[consensus-health] NOTICE: longclaw had 3100 HSDir flags in its vote but the consensus had 2583 [consensus-health] NOTICE: moria1 had 756 HSDir flags in its vote but the consensus had 2583 [consensus-health] NOTICE: moria1 had 1397 Guard flags in its vote but the consensus had 1761
I assume this has not been deployed - 50% or maybe 40% are fine I guess. To come up with good threshold values one would need to look at historic data for the past few months.
I assume this has not been deployed - 50% or maybe 40% are fine I guess. To come up with good threshold values one would need to look at historic data for the past few months.
Nope, it is deployed (if by 'deployed' you mean DocTor is presently performing this check). From David's reply about moria1 it sounds like any check of this sort may be a red herring since they experiment with moria1, but I'll leave that up to you guys. Just let me know what kind of check you want.
On Mon, Feb 12, 2018 at 08:31:13AM -0800, Damian Johnson wrote:
Nope, it is deployed (if by 'deployed' you mean DocTor is presently performing this check). From David's reply about moria1 it sounds like any check of this sort may be a red herring since they experiment with moria1, but I'll leave that up to you guys. Just let me know what kind of check you want.
It might be smartest to just put in an exception for moria1's HSDir votes, since we know it's being different.
--Roger
Roger Dingledine:
It might be smartest to just put in an exception for moria1's HSDir votes, since we know it's being different.
yes, please :)
and it would also be nice to have: https://trac.torproject.org/projects/tor/ticket/25222
so we can filter for those emails that we care about most (or filter those that we do not care about)
It might be smartest to just put in an exception for moria1's HSDir votes, since we know it's being different.
Suppressed any notices for HSDir flags. Also fixed the time based suppression for the check (it should have sent one notice a day rather than one an hour).
On 11 Feb (21:21:00), nusenu wrote:
Thanks nusenu! Nice idea, added it to DocTor...
thanks for implementing the new check so fast.
https://gitweb.torproject.org/doctor.git/commit/?id=8945013
It gives a notice if flags issued by an authority are 50% different from the conensus. Presently there's only one instance of that...
This is also very useful but slightly different from what I had in mind, because it would not trigger if dirauths upgrade from A to B in the same hour and most exits, guards or hsdirs are gone due to a bug in version B.
NOTICE: moria1 had 756 HSDir flags in its vote but the consensus had 2583
I tried to find something related to this in the 0.3.3.x changelogs because moria1 (the affected dirauth) is the only one running tor alpha but I didn't find anything related to a change in what is required to earn the HSDir flag. Has there been any change related to how HSDir is assigned that would explain that significant difference?
This is because moria1 is running an experimental patch that drastically cut down the number of voted HSDir. It has been doing that for a while now. I can't recall the ticket but this was an attempt a while back to see how much it would be bad to vote down for the most stable relays on the network to be HSDir.
Roger can probably explain it better but all in all nothing to worried about there as it is expected.
Actually, from time to time, you should expect moria1 to behave strangely sometimes since it is often running alpha code :).
Cheers! David
NOTICE: moria1 had 756 HSDir flags in its vote but the consensus had 2583
I tried to find something related to this in the 0.3.3.x changelogs because moria1 (the affected dirauth) is the only one running tor alpha but I didn't find anything related to a change in what is required to earn the HSDir flag. Has there been any change related to how HSDir is assigned that would explain that significant difference?
This is because moria1 is running an experimental patch that drastically cut down the number of voted HSDir. It has been doing that for a while now. I can't recall the ticket but this was an attempt a while back to see how much it would be bad to vote down for the most stable relays on the network to be HSDir.
Roger can probably explain it better but all in all nothing to worried about there as it is expected.
Thanks for the explanation!
I tried to find it on trac, I guess this is: https://trac.torproject.org/projects/tor/ticket/19162
On Mon, Feb 12, 2018 at 03:09:00PM +0000, nusenu wrote:
NOTICE: moria1 had 756 HSDir flags in its vote but the consensus had 2583
I tried to find it on trac, I guess this is: https://trac.torproject.org/projects/tor/ticket/19162
Yes, correct. moria1 runs all sorts of experimental patches.
One of them is choosing the HSDir flag for relays based on:
+ hsdir_tk = find_nth_long(tks, n_active, n_active*3/4); + hsdir_bandwidth = find_nth_uint32(bandwidths_kb, n_active, n_active/4);
That is, the relay needs to be in the top quarter of the relays by time-known, and in the top three-quarters of the relays by bandwidth weights (as decided by moria1's bwauth).
I think the time-known idea is a potentially really smart one, since if we do it right we force attacking hsdir relays to be in the network for a long time before they are allowed to become hsdirs.
--Roger
I think the doctor notification is the best mechanism.
I'm not opposed to adding more graphs to consensus-health, but I think I'd want to coordinate with the metrics team. There was talk about them absorbing consensus health in some capacity, so I'd prefer to avoid doing a lot of work on graphs if it's going to be redone or throw away.
The host running depictor was down for several days, which explains the gap in data.
Thanks for the thoughts!
-tom