I spent some time improving the existing relay uptime visualisation [0]. Inspired by a research paper [1], the new algorithm uses single-linkage clustering with Pearson's correlation coefficient as distance function. The idea is that relays are grouped next to each other if their uptime (basically a binary sequence) is highly correlated. Check out the following gallery. It contains monthly relay uptime images, dating back to 2007: https://nymity.ch/sybilhunting/uptime-visualisation/
nice graphs!
you can also spot - all the daily hibernating relays roughly starting at the same time but finishing their allocated amount of traffic at different times
- the 130 relays of the 11BX.. group. (2015-10-30 - 2015-11-02)
and every column is a relay. White pixels mean that a relay was offline and black pixels means that a relay was online. Red pixels are used to highlight suspiciously similar clusters.
I assume they are highlighted only if they exceed a certain group size? What is the threshold?
Until I looked at the heartbleed example I assumed grouping requires "perfect matches" across the entire month but after seeing the heartbleed example I'm not sure whether that is actually the case or if two distinct groups are just next to each other and do not have a "separator" between them.
Another practical problem is that it's cumbersome to learn the relay fingerprint of a given column.
That would indeed make these graphs a lot more useful, but having a raw time and group size alone would already be enough to look them up indirectly - if that event is unique.
I would also find it useful to have it accept fingerprints as input and graph their uptime to look at a given set of relays in certain cases
example input could be the fingerprints from [1]+[2] after these relays have been around for some time.
Are you planing to generate these graphs on an ongoing basis?
[1] http://article.gmane.org/gmane.network.onion-routing.ornetradar/613 [2] http://article.gmane.org/gmane.network.onion-routing.ornetradar/617