Re: [tor-relays] Monitoring multiple relays

1 Nov 2017

      Vasilis:
...
Hi,
Reopening thread after IRC discussion.
Bottom-posted, instead of more sensibly posting inline.
...
DaKnOb:
...
It depends on what you consider “professional” monitoring. Do you
mean information collected, or how was it collected?
By professional monitoring I mean a way to find out in a short
time-span what was the reason for a relay that suddenly  is
disconnected from the Tor network, uses an outdated version of tor,
performs badly on the Tor network, runs an outdated OS version,
misses security updates or other crucial software that may compromise
the Tor relays and subsequently the Tor network.
Some important properties of this monitoring system:

Hardware issues: RAID/HD/hardware failures, kernel panic/OOM

states - Software issues: OS updates, tor updates, security updates -
Network issues: RBLs, IP blocking, upstream network issues - Abuse
issues: Monitor of abuse emails per relay/network -sort of ticketing
system for operators that are unwilling/don't know/have the capacity
to track and respond to abuse emails (that most of the time are 
automated and just a 'foo' response back) - Legal issues: Initiating
a canary-like or similar for relay operators that would like to be
reached out when they don't provide any updates. I suspect this to
have many false positives but better safe that sorry (quite often you
are not allowed to speak openly about a legal issue until this is
settled, in this part potential organizations may reach out to help
operators)
...
Is measuring something from the tor process using bash scripts and
cron professional? Is measuring network traffic using Prometheus
and plotting to Grafana professional?
My "professional point of view" will be a system -preferably
agent-less- that could ping operators via email and provide alert
notifications on an IRC channel.
...
For a few nodes I control / controlled I measured lots of network
info such as:

Network Traffic in / out (b/s) - Network Packets in / out (p/s) -

Network Flows in / out (f/s)
And I always run a local resolver, so DNS info too:

Query Responses / Second - Query Latency - SERVFAILs / Second

The DNS info was gathered only in one node, as an experiment, since
I wasn’t sure whether it could leak information, and only for a
limited amount of time.
I share the same concerns with you so I'm not really interested in 
measuring DNS responses or collecting long-term stats that may leak 
sensitive information or potentially used to de-anomymize or
compromise in any way (in ways that we don't know yet) the Tor
network.
From a quick read through on this, it seems there is a case for
different tools instead of "one size fits all".
I would divide the functions into three categories and therefore three
tools:
1. remote checks of server responsiveness
Most any tool could work here.  I'm a long-time fan of sysmon
(https://puck.nether.net/sysmon/). It's light, configurable, very
modular with clean syntax and provides a good array of checks with email
alerts.  Most importantly for the stated purposes, no need for a local
agent running on the target systems.
2. local system checks
The BSDs do daily/weekly/monthly emails by default, with raid health
checks and other tasks available or easily extendable with a little
shell scripting.
Or, I think opting for some shell scripts for checking the raid health,
etc, would be fine.
This doesn't scale well, obviously, when you're talking about a daily
per system.  But in that case, the shell script option or config
management should work. Think about what you want, ie, "is the raid
array dead", and go for it.
3. remote checks of Tor-related statistics
How Tor is operating can be done one of two ways, I'm thinking off the
top of my head.
If you want periodic checks about consensus weight, or anything
available through Onionoo from
https://metrics.torproject.org/onionoo.html with JSON might make sense
worked into some email output.
g
-- 

34A6 0A1F F8EF B465 866F F0C5 5D92 1FD1 ECF6 1682

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [tor-relays] Monitoring multiple relays