On 25 Nov (16:53:45), Karsten Loesing wrote:
Hello devs,
the Tor Metrics website [0] claims to be "the primary place to learn interesting facts about the Tor network" and invites its visitors who "come across something that is missing" to contact the website authors about it. That's a bold statement I put there! :)
Yet, there's considerable product backlog with possible enhancements [1] that doesn't seem to ever become shorter. Even worse, it can be expected that the backlog will refill quickly once the community notices that feature requests are suddenly considered. The main reason for this unfortunate situation is that Tor Metrics contains many moving parts, including some heavy database lifting that takes place below the surface, that all want to be maintained. Adding more parts just makes the whole thing even more likely to break. At the same time, knowing about the situation that Tor Metrics has become almost closed to contributions is painful.
This posting shall discuss possible solutions. The goal is to let Tor Metrics grow in a healthy fashion that encourages contributions from the community. These solutions are not mutually exclusive, and the best solution may use parts of more than one solution sketched out here.
1 Make Tor Metrics better and bigger, internally
The obvious solution is that the maintainers of Tor Metrics could just work harder to overcome the problems stated above. Let's think this through.
1.1 Add more development resources
If only the current Tor Metrics maintainers had more time to devote to cleaning up existing parts and to add new parts, that would solve our problem. They could refactor parts that are hard to maintain, and they could work off the serious backlog that has piled up. Of course, this means dropping or handing over responsibilities for other products, and it may mean finding (and paying) new developers to help maintain Tor Metrics. It's unclear whether anything like this would fit into Tor's budget, and whether these changed priorities would make users of tools that had to be dropped or handed over unhappy.
1.2 Rewrite internal parts of Tor Metrics to encourage external contributions
Most of Tor Metrics would have run 10 or 15 years ago with only minor modifications. It's not necessarily a bad thing to use established technologies. But maybe, if we rewrite it using modern data-processing, web, and visualization frameworks, it becomes more attractive to other developers to contribute code and help maintain existing (well, then rewritten) code. The result would be a larger Tor Metrics website that is easier to maintain and hopefully maintained by more people. It's unclear how realistic this plan is, though, and it requires attention by Tor Metrics maintainers to bring it enough into shape for external contributors to get involved.
I'm not 100% familiar with the whole process of adding a graph to metrics but I know a bit about the needed Java code and data source setup. In my case, about the graphs I do work with (see http://ygzf7uqcusp4ayjs.onion), I decided to go with Munin for two reasons. First of all, the data source for those graphs are on different machines (3 different for now) and munin offers a _super_ easy way to have remote node where the server just learns what has been deployed, gets the data out of it and auto-graph without any added configuration. Second reason is that I can use whatever language I want to generate those data points. In my case, I use stem extensivelly with Python.
So two things to consider here:
1) _easy_ way to add and deploy new graphs. By that I mean not requiring half a day from a metrics.tpo maintainer.
2) Have a way where the data source collection is decoupled from the graphing mechanism. I think metrics is quite good for that where it pulls CSV from collector.tpo (?) and then some Java/R programs graph it and generates an html page. I think Onionoo is a good tool in that direction (data source).
If we can get that "Java/R" step into an auto discovery way like Munin does or very simple one liner in a config file or a new script in a directory, it would be amazing. Furthermore, if a super epic graph developer wants to contribute, having a way to run metrics.tpo framework locally on a dev machine so it's easy to test would be even more epic.
There are plenty of tools nowadays that can help us do that without reinventing all the things. Food for thoughts :).
2 Add more ways to contribute to Tor Metrics externally
It may be possible to further grow Tor Metrics without adding more code to it, hence not making it any harder to maintain. However, if code to generate visualizations is run elsewhere, there's a certain risk that results are not perceived as trustworthy as if that code were run as part of Metrics. This is primarily a problem of setting user expectations right. We could add different ways for contributing to Tor Metrics, depending on the level of commitment that contributors are willing to make. Possible new ways (in addition to filing a Trac ticket, which is already possible, though not very effective) are:
I would always have graph generated on the metrics.tpo side. The data source for the graph though could be a remote machine but then you end up in the "security/authentication/trust" nightmare :S ...
If the entry bar for new graphs is super low that is technically very easy to add a new one (both data source and graph) then someone could submit (trac ticket) a new visualization and then the metrics team reviews it and merge.
Adding a graph as a "patch" would greatly help avoid more work on the metric team, but it need to be easy, documented and not a complicated framework to run (or at least test that the graph works for metrics.tpo).
2.1 Accept contribution of static data or static graphs
Somebody might contribute data (in a tarball, download link, etc.) or a static graph (static as in "doesn't break, ever", not "static HTML with a tiny amount of JavaScript that will surely never break"). The Tor Metrics team reviews that and puts it on the Tor Metrics website, together with a short description, author information, license, etc. There are plenty of visualizations on Trac and on the mailing lists, so we'll have to define criteria what we add and what not, and we'll need a good process for making that happen.
+1.
2.2 Link to external websites
Somebody might write a website that visualizes Tor network data. The Tor Metrics team reviews the idea behind it, but not necessarily look at its code, and adds an external link to Tor Metrics. It becomes obvious that the authors remain responsible for their visualization, so there's no risk involved for Tor Metrics, but users may not trust it as much, because it doesn't have the Tor Metrics label. Note that we're already doing this approach by linking to the visualizations showing "Tor users as percentage of larger Internet population" [2] and "Data flow in the Tor network" [3]. Also note that we could as well have hosted the former directly on Tor Metrics with appropriate attribution, because it's a static image. This is not the case with the latter.
It comes down to trust here I would say. Like George said in his previous email, we always have the luxury of removing the link if some crazy shit appears after a while but also it could be a sneaky way to deliver malware to users :).
So I would argue to put our effort into making metrics contributions so easy that we should only link to external websites for insane stuff like https://torflow.uncharted.software (from which we helped them).
2.3 Run an externally developed website as if it were part of Tor Metrics
Let's imagine that somebody produces a visualization of Tor network data and would like to make it part of Tor Metrics but without limiting themselves to the technology used by Tor Metrics. We could let them write their visualization as website and integrate it into Tor Metrics after reviewing its code.
Technically, part of this integration would be to "redress" the website by applying the Tor Metrics design (which has lots of room for improvement, but let's just say the result will look as seamlessly integrated into Tor Metrics as the "Network bubble graphs" [4]). Another part would probably be to rewrite web requests, so that users still think they're talking to https://metrics.torproject.org/, but really they're talking to another webserver behind that.
Regarding hosting and maintenance, in theory, the website could be hosted by the original creators, but that effectively means that the Tor Metrics team gives up part of the control about what's on the Tor Metrics website. The creators of the external website could change parts or add new parts that wouldn't be reviewed by Tor Metrics developers, but they would be perceived as part of Metrics, which seems bad. The Tor Metrics team could run the externally developed website on a separate host or on the same host as Tor Metrics. We could imagine variants where the original creator stays around to fix any issues as they come up, or we could imagine that they donate their visualization that the Tor Metrics people will then maintain. We could even imagine that the Tor Metrics maintainers some day decide to integrate the originally external website into Tor Metrics proper, but that would not be required for this model to work.
It goes back a bit to the third part discussion above.
All these ideas require writing down guidelines, criteria, and processes. In particular, they require more thoughts and input from other people who are not currently involved in Tor Metrics maintenance and who can be expected more objective. And once these ideas are implemented, we'll need more Tor Metrics maintainer than just one.
I would be very interested in people actually using/developing visualization tools nowadays and how we could make a transition to something much more fit for external contributions.
What about also a blog post on all of this?
Cheers! David
What are your thoughts?
All the best, Karsten
[0] https://metrics.torproject.org/
[1] https://trac.torproject.org/projects/tor/query?status=!closed&component=...
[2] https://metrics.torproject.org/oxford-anonymous-internet.html
[3] https://metrics.torproject.org/uncharted-data-flow.html
[4] https://metrics.torproject.org/bubbles.html
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev