On Mon, Dec 07, 2015 at 01:44:47PM -0800, David Fifield wrote:
On Mon, Dec 07, 2015 at 02:51:23PM -0500, Philipp Winter wrote:
I spent some time improving the existing relay uptime visualisation [0]. Inspired by a research paper [1], the new algorithm uses single-linkage clustering with Pearson's correlation coefficient as distance function. The idea is that relays are grouped next to each other if their uptime (basically a binary sequence) is highly correlated. Check out the following gallery. It contains monthly relay uptime images, dating back to 2007: https://nymity.ch/sybilhunting/uptime-visualisation/
How about just taking the XOR of two sequences as the distance?
Here's Nov 2015, with XOR as distance: https://nymity.ch/sybilhunting/uptime-visualisation/xor-distance.png
It would be interesting to know if there are any near-perfect anticorrelations; i.e., one relay starts when another stops.
It looks like there's many of them. So far, I calculated the correlation as 1 - Pearson(s1,s2) because I'm only interested in positively correlated sequences. Here's an uptime image with Pearson(s1,s2) as distance function, so positive correlation is considered just as much as negative correlation. Have a look at the leftmost part: https://nymity.ch/sybilhunting/uptime-visualisation/anticorrelation.png
Cheers, Philipp