Roger Dingledine:
Here is an area list for a hypothetical future "network health" team. I say hypothetical because we have no near-term plans to form such a team -- but imo the task areas make sense together. Some of them are being handled by other teams right now, but some of them aren't getting as much attention as they need.
(1) track community standards about what makes a good relay
- publish up-to-date expectations for relay operators
- set best practices for how to set relay families
- detect and resolve bad relays
- exitmap, sybil detection, hsdir traps
(2) anomaly analysis / network health engineer [with network team]
- establish baselines of expected network behavior
- look for and resolve denial of service issues
- track connectivity issues between relays
- look for relays hitting resource limits
This area, in particular, could use a lot of attention.
irl@ and myself have some relevant tickets, which I can dig up if there's interest.
On the whole, exploring how 'exception reports' would apply to the node network (not to mention other parts such as bw authorities, etc) is a very worthy cause.
We have to get beyond the rear-view mirror assessments and start determining baselines within standard deviations, and start correlating significant changes as they happen.
(3) make sure usage/growth stats are collected and accurate
- track network performance, relay diversity by various metrics
- count users [with network team and metrics team]
- monitor bridge growth and usage [with censorship team]
(4) relay advocacy [with community team]
- maintain docs for setting up and running relays and bridges
- grow a cohesive community of relay operators so they have peers
- keep relays on the right tor versions
- relaunch a gamification / badge system for lauding good relay progress
- strengthen relationships with non-profit orgs that run relays
- help companies that want to offset their tor network load
(5) maintain the components of the network
- maintain directory authority relationships
- keep bandwidth authorities working (including setting the right balance between speed and location diversity)
- have enough tor browser default bridges, and keep them running smoothly [with censorship team]
- update the fallbackdirs list
And related to (1) and (5), it might be useful to expand on 'best practices' and documentation for critical services (bw auths, etc).
Certainly OONI's work and wide-angle view could be incorporated in several facets of the above.
The ability to run internet-facing infrastructure isn't innate, and even for those who do have the experience, there's no finish line.
Good stuff Roger.
g