Visualising the similarity between two Tor relay descriptors helps with finding Sybil attacks. I added code to sybilhunter [0] that takes as input relay descriptors, determines all (n^2)/2 pairwise similarities, and outputs DOT code (part of Graphviz) that illustrates relay clusters and what makes them similar. For now, this functionality is just a set of hard-coded rules that determine, e.g.:
- Do the relays have the same, non-default exit policy? - Do the relays have a similar uptime? - Do the relays run on the same platform?
To give you an idea of what this looks like, I took all relay descriptors archived by CollecTor [1] for 2015-05-30 and calculated similarities by running:
$ sybilhunter -data 2015-05-30/ -cumulative -matrix -threshold 6 -visualise > sim.dot $ dot -o sim.svg -Tsvg sim.dot
The resulting graph is online [2]. Vertices are relays (nickname in the first line, followed by the first eight hex digits), and the edge labels show the similarities.
Unsurprisingly, there are several relay clusters that probably should be in a family, but aren't. For example, the "startor*", "torpids*", "manningsnowden*", and "Montharkan*" relays.
There are, however, also several relay clusters, often named "default", that share the first two hex digits of their fingerprint. This is unlikely to be a coincidence, so they might have wanted to position themselves in the DHT.
Please let me know if you have any suggestions on how to improve the tool or its visualisation.
[0] https://gitweb.torproject.org/user/phw/sybilhunter.git/ [1] https://collector.torproject.org/recent/relay-descriptors/server-descriptors/ [2] https://www.nymity.ch/sybilhunting/svg/2015-05-30_similarities.svg
Cheers, Philipp