On Mon, Jan 12, 2015 at 06:57:01PM +0100, Tom van der Woerdt wrote:
23% is a lot though - so high that I really doubt it's true. The ratios between handshakes and deduplicated handshakes is also rather strange. Is there anything we can do to the dataset to find out why the amount is so high?
When looking at the ratio, consider that the majority of relays runs newer versions of Tor [0]. Over these three days, my relay has established hundreds of connections to other relays over and over again. When deduplicating relays' addresses, all these connections get reduced to one which explains why the per-host fraction of version 3 and 4 is much smaller than the per-connection fraction.
Apart from that, I agree that the number of old clients is unexpected. First, I suspected the Sefnit botnet (which might still account for ~50% of Tor "users") but apparently the malware uses Tor v0.2.3.25.
I think the same experiment could be repeated by adding the following to your tor config:
Log [or]info file /path/to/logfile
And then, the negotiated protocol versions can be counted by running, for example:
grep -c 'Negotiated version 2' /path/to/logfile
[0] https://metrics.torproject.org/versions.html
Cheers, Philipp