Hello people,
for the past few months we've been working on getting better statistics for hidden services [0].
The questions we are trying to answer are "Approximately how many hidden services are there?" and "Approximately how much traffic of the Tor network is going to hidden services?".
We can answer these questions by collecting statistics from Tor relays: specifically, from hidden service directories (HSDirs) and rendezvous points. In our design, these relays first obfuscate the statistics before publishing them, so that the numbers themselves are not entirely precise [1]. We specify how exactly these statistics are collected in proposal 238-hs-relay-stats.txt [2].
We have also developed a Tor branch [3] implementing that proposal that people can run on their relays to start collecting hidden service statistics. The corresponding trac ticket is #13192 if you want to follow the developer discussion [4].
Our plan is that in approximately a week we will ask volunteers to run the branch. Then in a month from now we will use those stats to write a blog post about the approximate size of Tor hidden services network and the approximate traffic it's pushing.
Till then please review our design and code and provide us with feedback :)
Thanks!
George Kadianakis Karsten Loesing Aaron Johnson David Goulet
[0]: https://trac.torproject.org/projects/tor/wiki/org/sponsors/SponsorR https://lists.torproject.org/pipermail/tor-dev/2014-October/007642.html
[1]: see threads for discussion: https://lists.torproject.org/pipermail/tor-dev/2014-November/007816.html https://lists.torproject.org/pipermail/tor-dev/2014-December/007928.html
[2]: https://gitweb.torproject.org/user/asn/torspec.git/tree/proposals/238-hs-rel...
[3]: https://gitweb.torproject.org/karsten/tor.git/log/?h=task-13192-5
George Kadianakis desnacked@riseup.net writes:
Hello people,
for the past few months we've been working on getting better statistics for hidden services [0].
The questions we are trying to answer are "Approximately how many hidden services are there?" and "Approximately how much traffic of the Tor network is going to hidden services?".
We can answer these questions by collecting statistics from Tor relays: specifically, from hidden service directories (HSDirs) and rendezvous points. In our design, these relays first obfuscate the statistics before publishing them, so that the numbers themselves are not entirely precise [1]. We specify how exactly these statistics are collected in proposal 238-hs-relay-stats.txt [2].
We have also developed a Tor branch [3] implementing that proposal that people can run on their relays to start collecting hidden service statistics. The corresponding trac ticket is #13192 if you want to follow the developer discussion [4].
Our plan is that in approximately a week we will ask volunteers to run the branch. Then in a month from now we will use those stats to write a blog post about the approximate size of Tor hidden services network and the approximate traffic it's pushing.
Hello,
we have now finished writing a tech report with the results of these statistics. You can find it in PDF form here: https://research.torproject.org/techreports/extrapolating-hidserv-stats-2015...
We are currently working on a more casual-reader-friendly blog post, which will contain additional information that the Tor community might be interested in. You will find it in blog.torproject.org in two weeks from now or so.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 04/02/15 13:12, George Kadianakis wrote:
George Kadianakis desnacked@riseup.net writes:
Hello people,
for the past few months we've been working on getting better statistics for hidden services [0].
The questions we are trying to answer are "Approximately how many hidden services are there?" and "Approximately how much traffic of the Tor network is going to hidden services?".
We can answer these questions by collecting statistics from Tor relays: specifically, from hidden service directories (HSDirs) and rendezvous points. In our design, these relays first obfuscate the statistics before publishing them, so that the numbers themselves are not entirely precise [1]. We specify how exactly these statistics are collected in proposal 238-hs-relay-stats.txt [2].
We have also developed a Tor branch [3] implementing that proposal that people can run on their relays to start collecting hidden service statistics. The corresponding trac ticket is #13192 if you want to follow the developer discussion [4].
Our plan is that in approximately a week we will ask volunteers to run the branch. Then in a month from now we will use those stats to write a blog post about the approximate size of Tor hidden services network and the approximate traffic it's pushing.
Hello,
we have now finished writing a tech report with the results of these statistics. You can find it in PDF form here: https://research.torproject.org/techreports/extrapolating-hidserv-stats-2015...
We are currently working on a more casual-reader-friendly blog post, which will contain additional information that the Tor community might be interested in. You will find it in blog.torproject.org in two weeks from now or so.
Below is the "additional information" that George was referring to. And even though it's me sending this mail, most of these words are not mine but Aaron Johnson's. George and I only reviewed Aaron's math and questioned the assumptions he put in.
So, in the tech report we were estimating how much hidden-service traffic there is in the network. But we were also wondering what *fraction* of traffic that is.
We can take two different approaches to answer this question. First, we can calculate the hidden-service fraction of "external" traffic, that is, traffic relayed into Tor from non-Tor sources, which can be subdivided into exit traffic and hidden-service traffic. The weights in the 2015-01-19 00:00 consensus were all 0 for relays with the Exit flag for any position except the exit position. Therefore it is reasonable to assume that all traffic on that day "relayed" (that is, read and then written) by nodes with the Exit flag was exiting to non-Tor hosts. There are edge cases where this assumption breaks: there are a small number of relays that don't have the Exit flag but that still permit a small number of outgoing ports; and it's conceivable that a non-exit guard accumulated clients and then changed its exit policy. But neither of these cases should affect our calculation substantially. The amount of exit traffic is at least 193322616 + 1686627828 = 1879950444, where the summands are the minimum of read and write traffic figures for Exit and Exit&Guard traffic. On the other hand, we estimate total hidden-service traffic in the network as 526 Mbit/s on January 19, or 65750000 bytes. Thus hidden-service traffic constitutes 65750000 / (65750000 + 1879950444) = 0.034 of external traffic.
Second, we can calculate the hidden-service fraction of "all" traffic, that is, the total number of bytes relayed by all relays. Each rendezvous circuit involves six relays, and so the total amount of HS traffic is 6 * 65750000 = 394500000 bytes per day. The total number of bytes relayed by all relays is 6462858486. The hidden-service fraction of total traffic is thus 394500000 / 6462858486 = 0.061. This is about twice the "external" traffic fraction because rendezvous circuits are twice the length of exit circuits. It is actually a bit lower than twice the external traffic fraction because the total traffic number includes traffic that is non-exit and non-hidden-service (consensus fetches are likely a big component of this).
tl;dr: 3.4% of client traffic is hidden-service traffic, and 6.1% of traffic seen at a relay is hidden-service traffic.
All the best, Karsten