Re: [tor-dev] Scaling Tor Metrics

26 Nov 2015


      On 25 Nov (16:53:45), Karsten Loesing wrote:
...
Hello devs,
the Tor Metrics website [0] claims to be "the primary place to learn
interesting facts about the Tor network" and invites its visitors who
"come across something that is missing" to contact the website authors
about it.  That's a bold statement I put there! :)
Yet, there's considerable product backlog with possible enhancements
[1] that doesn't seem to ever become shorter.  Even worse, it can be
expected that the backlog will refill quickly once the community
notices that feature requests are suddenly considered.  The main
reason for this unfortunate situation is that Tor Metrics contains
many moving parts, including some heavy database lifting that takes
place below the surface, that all want to be maintained.  Adding more
parts just makes the whole thing even more likely to break.  At the
same time, knowing about the situation that Tor Metrics has become
almost closed to contributions is painful.
This posting shall discuss possible solutions.  The goal is to let Tor
Metrics grow in a healthy fashion that encourages contributions from
the community.  These solutions are not mutually exclusive, and the
best solution may use parts of more than one solution sketched out here.
1 Make Tor Metrics better and bigger, internally
The obvious solution is that the maintainers of Tor Metrics could just
work harder to overcome the problems stated above.  Let's think this
through.
1.1 Add more development resources
If only the current Tor Metrics maintainers had more time to devote to
cleaning up existing parts and to add new parts, that would solve our
problem.  They could refactor parts that are hard to maintain, and
they could work off the serious backlog that has piled up.  Of course,
this means dropping or handing over responsibilities for other
products, and it may mean finding (and paying) new developers to help
maintain Tor Metrics.  It's unclear whether anything like this would
fit into Tor's budget, and whether these changed priorities would make
users of tools that had to be dropped or handed over unhappy.
1.2 Rewrite internal parts of Tor Metrics to encourage external
contributions
Most of Tor Metrics would have run 10 or 15 years ago with only minor
modifications.  It's not necessarily a bad thing to use established
technologies.  But maybe, if we rewrite it using modern
data-processing, web, and visualization frameworks, it becomes more
attractive to other developers to contribute code and help maintain
existing (well, then rewritten) code.  The result would be a larger
Tor Metrics website that is easier to maintain and hopefully
maintained by more people.  It's unclear how realistic this plan is,
though, and it requires attention by Tor Metrics maintainers to bring
it enough into shape for external contributors to get involved.
I'm not 100% familiar with the whole process of adding a graph to metrics but
I know a bit about the needed Java code and data source setup. In my case,
about the graphs I do work with (see http://ygzf7uqcusp4ayjs.onion), I decided
to go with Munin for two reasons. First of all, the data source for those
graphs are on different machines (3 different for now) and munin offers a
_super_ easy way to have remote node where the server just learns what has
been deployed, gets the data out of it and auto-graph without any added
configuration. Second reason is that I can use whatever language I want to
generate those data points. In my case, I use stem extensivelly with Python.
So two things to consider here:
1) _easy_ way to add and deploy new graphs. By that I mean not requiring half
a day from a metrics.tpo maintainer.
2) Have a way where the data source collection is decoupled from the graphing
mechanism. I think metrics is quite good for that where it pulls CSV from
collector.tpo (?) and then some Java/R programs graph it and generates an html
page. I think Onionoo is a good tool in that direction (data source).
If we can get that "Java/R" step into an auto discovery way like Munin does or
very simple one liner in a config file or a new script in a directory, it
would be amazing. Furthermore, if a super epic graph developer wants to
contribute, having a way to run metrics.tpo framework locally on a dev machine
so it's easy to test would be even more epic.
There are plenty of tools nowadays that can help us do that without
reinventing all the things. Food for thoughts :).
...
2 Add more ways to contribute to Tor Metrics externally
It may be possible to further grow Tor Metrics without adding more
code to it, hence not making it any harder to maintain.  However, if
code to generate visualizations is run elsewhere, there's a certain
risk that results are not perceived as trustworthy as if that code
were run as part of Metrics.  This is primarily a problem of setting
user expectations right.  We could add different ways for contributing
to Tor Metrics, depending on the level of commitment that contributors
are willing to make.  Possible new ways (in addition to filing a Trac
ticket, which is already possible, though not very effective) are:
I would always have graph generated on the metrics.tpo side. The data source
for the graph though could be a remote machine but then you end up in the
"security/authentication/trust" nightmare :S ...
If the entry bar for new graphs is super low that is technically very easy to
add a new one (both data source and graph) then someone could submit (trac
ticket) a new visualization and then the metrics team reviews it and merge.
Adding a graph as a "patch" would greatly help avoid more work on the metric
team, but it need to be easy, documented and not a complicated framework to
run (or at least test that the graph works for metrics.tpo).
...
2.1 Accept contribution of static data or static graphs
Somebody might contribute data (in a tarball, download link, etc.) or
a static graph (static as in "doesn't break, ever", not "static HTML
with a tiny amount of JavaScript that will surely never break").  The
Tor Metrics team reviews that and puts it on the Tor Metrics website,
together with a short description, author information, license, etc.
There are plenty of visualizations on Trac and on the mailing lists,
so we'll have to define criteria what we add and what not, and we'll
need a good process for making that happen.
+1.
...
2.2 Link to external websites
Somebody might write a website that visualizes Tor network data.  The
Tor Metrics team reviews the idea behind it, but not necessarily look
at its code, and adds an external link to Tor Metrics.  It becomes
obvious that the authors remain responsible for their visualization,
so there's no risk involved for Tor Metrics, but users may not trust
it as much, because it doesn't have the Tor Metrics label.  Note that
we're already doing this approach by linking to the visualizations
showing "Tor users as percentage of larger Internet population" [2]
and "Data flow in the Tor network" [3].  Also note that we could as
well have hosted the former directly on Tor Metrics with appropriate
attribution, because it's a static image.  This is not the case with
the latter.
It comes down to trust here I would say. Like George said in his previous
email, we always have the luxury of removing the link if some crazy shit
appears after a while but also it could be a sneaky way to deliver malware to
users :).
So I would argue to put our effort into making metrics contributions so easy
that we should only link to external websites for insane stuff like
https://torflow.uncharted.software (from which we helped them).
...
2.3 Run an externally developed website as if it were part of Tor Metrics
Let's imagine that somebody produces a visualization of Tor network
data and would like to make it part of Tor Metrics but without
limiting themselves to the technology used by Tor Metrics.  We could
let them write their visualization as website and integrate it into
Tor Metrics after reviewing its code.
Technically, part of this integration would be to "redress" the
website by applying the Tor Metrics design (which has lots of room for
improvement, but let's just say the result will look as seamlessly
integrated into Tor Metrics as the "Network bubble graphs" [4]).
Another part would probably be to rewrite web requests, so that users
still think they're talking to https://metrics.torproject.org/, but
really they're talking to another webserver behind that.
Regarding hosting and maintenance, in theory, the website could be
hosted by the original creators, but that effectively means that the
Tor Metrics team gives up part of the control about what's on the Tor
Metrics website.  The creators of the external website could change
parts or add new parts that wouldn't be reviewed by Tor Metrics
developers, but they would be perceived as part of Metrics, which
seems bad.  The Tor Metrics team could run the externally developed
website on a separate host or on the same host as Tor Metrics.  We
could imagine variants where the original creator stays around to fix
any issues as they come up, or we could imagine that they donate their
visualization that the Tor Metrics people will then maintain.  We
could even imagine that the Tor Metrics maintainers some day decide to
integrate the originally external website into Tor Metrics proper, but
that would not be required for this model to work.
It goes back a bit to the third part discussion above.
...
All these ideas require writing down guidelines, criteria, and
processes.  In particular, they require more thoughts and input from
other people who are not currently involved in Tor Metrics maintenance
and who can be expected more objective.  And once these ideas are
implemented, we'll need more Tor Metrics maintainer than just one.
I would be very interested in people actually using/developing visualization
tools nowadays and how we could make a transition to something much more fit
for external contributions.
What about also a blog post on all of this?
Cheers!
David
...
What are your thoughts?
All the best,
Karsten
[0] https://metrics.torproject.org/
[1]
https://trac.torproject.org/projects/tor/query?status=!closed&component=...
[2] https://metrics.torproject.org/oxford-anonymous-internet.html
[3] https://metrics.torproject.org/uncharted-data-flow.html
[4] https://metrics.torproject.org/bubbles.html

tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [tor-dev] Scaling Tor Metrics