Hi Karsten!
(Metrics and health team CCed)
Network team has been working on a new "MetricsPort"[1] in tor which can expose counters of different metrics within "tor". It currently uses the Prometheus model [2] which then allows us to create proper monitoring graphs using tools like Grafana (see some example screenshots in #40063).
The short term goal here also is to provide Grafana templates for monitoring a relay or onion service so people can just download them automatically from their marketplace and are ready to go.
Then that made us think, what if we could have something similar on metrics.torproject.org. A page that we could query like "/prometheus" that would just give us a set of counters of the current state of the network. A bit like a REST API but less "API-issh".
I do recall having seen at one point a REST API item on the metrics roadmap but I'm not entirely sure about my memory hence why I'm probing you about this.
Likely at first, what such a page would expose is not different from what metrics has at the moment _but_ the difference is that it would allow anyone (most importantly us) to be able to aggregate visualization in one dashboard using latest visualization tech (Grafana for instance).
This kind of page can usually handle thousands of requests a second without blinking so the load impact should be minimal since this is exposing an already existing state to the world rather than querying a state (like I assume Onionoo does?).
Maybe the solution here could be to instead write an "exporter" that queries Onionoo and formats it nicely for a Prometheus server but I do fear the load that it could put on Onionoo if let say A LOT of metrics are queried every 5 seconds or so?
The other thing is maybe the exporter idea is better, unclear, if we want to be more agile at integrating other types of metrics like let say monitoring the consensus like consensus-health does or extracting different data from extra-info.
Thoughts?
Cheers! David
[1] https://gitlab.torproject.org/tpo/core/tor/-/issues/40063 [2] https://prometheus.io/docs/concepts/data_model/
On 2020-10-19 15:09, David Goulet wrote:
Hi Karsten!
Hi David!
(Metrics and health team CCed)
Network team has been working on a new "MetricsPort"[1] in tor which can expose counters of different metrics within "tor". It currently uses the Prometheus model [2] which then allows us to create proper monitoring graphs using tools like Grafana (see some example screenshots in #40063).
The short term goal here also is to provide Grafana templates for monitoring a relay or onion service so people can just download them automatically from their marketplace and are ready to go.
Then that made us think, what if we could have something similar on metrics.torproject.org. A page that we could query like "/prometheus" that would just give us a set of counters of the current state of the network. A bit like a REST API but less "API-issh".
Just to be clear, you're interested in stuff you'd typically learn from Onionoo, not from the Metrics website, right? That is, you don't care about Tor Browser update counts, but rather about relay flags assigned in the latest consensus?
I do recall having seen at one point a REST API item on the metrics roadmap but I'm not entirely sure about my memory hence why I'm probing you about this.
Likely at first, what such a page would expose is not different from what metrics has at the moment _but_ the difference is that it would allow anyone (most importantly us) to be able to aggregate visualization in one dashboard using latest visualization tech (Grafana for instance).
This kind of page can usually handle thousands of requests a second without blinking so the load impact should be minimal since this is exposing an already existing state to the world rather than querying a state (like I assume Onionoo does?).
Onionoo does pretty well at caching responses by using a set of Varnish cashes.
Also, "Clients should make use of the "Last-Modified" header of responses and include that timestamp in a "If-Modified-Since" header of subsequent requests." (https://metrics.torproject.org/onionoo.html#protocol)
Maybe the solution here could be to instead write an "exporter" that queries Onionoo and formats it nicely for a Prometheus server but I do fear the load that it could put on Onionoo if let say A LOT of metrics are queried every 5 seconds or so?
The other thing is maybe the exporter idea is better, unclear, if we want to be more agile at integrating other types of metrics like let say monitoring the consensus like consensus-health does or extracting different data from extra-info.
Thoughts?
The exporter idea sounds reasonable to me. That would give you the flexibility to change things as you need them, without blocking on a metrics person.
Note that this exporter wouldn't have to query Onionoo every 5 seconds, because Onionoo only gets an update once per hour. Every 5 minutes, using the right header, would be better. But even the 5 seconds wouldn't kill Onionoo, because it has lots of Varnish friends.
Hope this helps!
Cheers! David
All the best, Karsten
[1] https://gitlab.torproject.org/tpo/core/tor/-/issues/40063 [2] https://prometheus.io/docs/concepts/data_model/
network-health@lists.torproject.org