Hi All,
I'm working on #28322 to improve the monitoring of Tor Metrics services, but this also has the side effect of monitoring network health. For example, we'd like to know when Onionoo messes up and starts reporting zero relays, but we also get to learn for free in the same check how many relays we have and alert if that number does something weird.
What would be the most useful checks to add here?
* Range of expected total relays * Range of expected relays with Guard flag * Range of expected relays with Exit flag * Range of expected consensus weight in each position
Each one of these is basically an if statement in the script so I'm happy to add these. I can do this by trial and error but if someone has already thought about it then please reply or comment on the ticket.
Thanks, Iain.
On 08 May (13:27:31), Iain Learmonth wrote:
Hi All,
I'm working on #28322 to improve the monitoring of Tor Metrics services, but this also has the side effect of monitoring network health. For example, we'd like to know when Onionoo messes up and starts reporting zero relays, but we also get to learn for free in the same check how many relays we have and alert if that number does something weird.
What would be the most useful checks to add here?
- Range of expected total relays
- Range of expected relays with Guard flag
- Range of expected relays with Exit flag
- Range of expected consensus weight in each position
For all of them, what could be reported is if a large fraction disappears all the sudden.
Loosing for instance 500 relays at once is something worth our attention imo. Same goes with Exit relays... if we drop from 900 to 500, it is scary.
For the consensus weight, I would report the outliers. Maybe someone is gaming us and so a HUGE values compared to our top usual 10 means something is up.
As what are the good values, I don't know but I think you can probably figure out the average relay we loose/gain every day and scale that like 3 times for a warning?
Cheers! David
Hi,
On 15 May 2019, at 22:40, David Goulet dgoulet@torproject.org wrote:
On 08 May (13:27:31), Iain Learmonth wrote: Hi All,
I'm working on #28322 to improve the monitoring of Tor Metrics services, but this also has the side effect of monitoring network health. For example, we'd like to know when Onionoo messes up and starts reporting zero relays, but we also get to learn for free in the same check how many relays we have and alert if that number does something weird.
What would be the most useful checks to add here?
- Range of expected total relays
- Range of expected relays with Guard flag
- Range of expected relays with Exit flag
- Range of expected consensus weight in each position
For all of them, what could be reported is if a large fraction disappears all the sudden.
Loosing for instance 500 relays at once is something worth our attention imo. Same goes with Exit relays... if we drop from 900 to 500, it is scary.
For the consensus weight, I would report the outliers. Maybe someone is gaming us and so a HUGE values compared to our top usual 10 means something is up.
As what are the good values, I don't know but I think you can probably figure out the average relay we loose/gain every day and scale that like 3 times for a warning?
Maybe it's also worth checking how many times each rule would trigger in the past year?
If the statistics are normally distributed, you could use 4 standard deviations, so that each rule (falsely) triggers about once a year.
T