-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hello Sebastian and list,
Sebastian again suggested a fun task for the latest measurement team 1-1-1 task exchange round [0] that I picked up. He asked the following question:
When a new Tor version is released, how long does it take for 25, 50, 75% of all relays to update? How often do relays update in general? Is there an effect when for example Debian updates its packages? It would help me to get an idea what I need to answer these questions :-) "update" means you're running, then you go down, then within a few minutes you're back up again with a changed version
Here's what I came up with:
I suggest we use two data sets as input for this task. The first data set consists of archived consensuses, available at:
https://collector.torproject.org/archive/relay-descriptors/consensuses/
We'll need "valid-after" and "server-versions" from the header and "v" and "w" lines from all contained entries. Example:
valid-after 2015-09-18 08:00:00 server-versions 0.2.4.23,0.2.4.24,0.2.4.25,0.2.4.26,0.2.4.27,0.2.5.8-rc,0.2.5.9-rc,0.2.5.10,0.2.5.11,0.2.5.12,0.2.6.5-rc,0.2.6.6,0.2.6.7,0.2.6.8,0.2.6.9,0.2.6.10,0.2.7.1-alpha,0.2.7.2-alpha v Tor 0.2.7.2-alpha-dev w Bandwidth=450 v Tor 0.2.6.10 w Bandwidth=23 [...]
We can use those lines to *count* the number of running relays with a given version string, and we can sum up the *fraction* of consensus weights of relays running a given version string. Both can be useful, though I suspect the latter to be more meaningful.
We should aggregate these consensus parts into daily statistics to make them easier to handle. We can do that by keeping counters per (date, version) tuple and per (date) for a) number of relays and b) consensus weight, and then we divide the (date, version) numbers by the (date) numbers to obtain averages. We'd also determine whether a version is recommended, whether it's the latest recommended version in its series, and whether it's the latest recommended stable version. The result would be a .csv file like the following (example data!):
date,version,recommended,series,stable,count,fraction 2015-09-18,0.2.6.10,TRUE,TRUE,TRUE,0.387,0.451 2015-09-18,0.2.6.9,TRUE,FALSE,FALSE,0.063,0.0102 2015-09-18,0.2.6.8,TRUE,FALSE,FALSE,0.021,0.001 2015-09-18,0.2.6.7,TRUE,FALSE,FALSE,0.016,0.009
Read this as: "On 2015-09-18, version 0.2.6.10, which was both recommended and the latest in its series, was run by 38.7% of relays by count or 45.1% by consensus weight." (again, example data!)
You could then draw graphs similar to https://metrics.torproject.org/versions.html, but with a percent scale on y. Though I'm not sure how readable those graphs would be with a few dozen lines in them. Still, they might be useful to explore the data set.
The second data set that would be useful here is events related to versions. The following events come to mind:
- tagged as version in Git (use a command similar to [1]),
- tagged as alpha, beta, rc, or stable (look at version string),
- first/last recommended by directory authorities for relays (parse from consensuses),
- last recommended as newest version in a series or newest stable version (also parse from consensuses), or
- first released in Debian stable (unclear how to obtain those dates).
A possible data format would be:
date,version,event 2015-07-12,0.2.6.10,git_tag 2015-06-10,0.2.6.9,git_tag 2015-05-19,0.2.6.8,git_tag 2015-04-06,0.2.6.7,git_tag
For exploratory purposes, you could add these as colored vertical lines to the graphs above. Of course, adding more elements to a probably already overloaded graph doesn't exactly make it easier to understand. But if you limit the x axis on just a month or two, it might be useful.
So, regarding your original question: "How long does it take for 25/50/75% of relays to update to a new Tor version?" I'm not sure whether that's really the best question to ask/answer here. Some versions will never be deployed on 75%, 50%, or even 25% of relays, because a new version was released shortly after. But that doesn't mean that the previous version was bad.
This question also weights all versions the same, though in reality some releases are more important than others. If you get one data point for each version and then aggregate them by taking the average, what exactly does that tell you?
I think that question could be rephrased to: "How long does it take for 25/50/75% of relays to update to a new Tor version *series*?" And I think that's something you could answer with the first data set above.
But there are other interesting questions you could answer with the two data sets. For example,
- What fraction of relays (by number or consensus weight) is running (un-)recommended versions?
- What fraction of relays (by number of consensus weight) is (not) running the latest version in a series or latest stable version?
Those could again be answered by a graph that uses the first data set, dates on x and fractions on y, and possibly events from the second data set as vertical lines. Explore, explore.
There, the hour is over ("It took an hour to write, I thought it would take an hour to read.", Fry, Futurama). Hope this is useful in any way. And maybe others have good/better ideas for you as well? If you come up with some interesting answers, please post them here.
All the best, Karsten
[0] 1-1-1 task exchange: you get 1 minute to describe a task that would take somebody else roughly 1 hour and that they will do for you within 1 week (review a document, write some analysis code, fix a small bug, etc.; better come prepared to get the most out of this; give 1, take 1)
[1] `git log --tags --simplify-by-decoration --pretty="format:%ci %d" | grep "tag: tor" | less`