Hello everyone!
I would like to share this graph I've been working on along side with Aaron Johnson (thanks to him for the algorithm I'll be describing).
This shows both the maximum additional and total traffic the network can sustain for both HS and Exit. The "additional" traffic here means that we consider the current load on the network thus the additional value is what can be added more on top of the current traffic.
http://ygzf7uqcusp4ayjs.onion/tor-health/tor-health/bandwidth_capacity.html
(Please ignore the initial downward spike, technical issue)
Here is the algorithm we are using for this. It's in no ways an accurate value but rather an estimation as you will see with the following algorithm to compute the traffic capacity you see.
1. Using the latest consensus, build a list of relays that are Running, Valid and Fast (weight selection only picks relays with Fast flag). Then compute their positionnal consensus weights.
2. For each of them, set their total capacity to min(avg BW, observed BW).
3. For each of them, compute the unused capacity that is use the extrainfo descriptor read/write bytes history for the amount of traffic used and substract that to the total capacity.
max(mean(write_history_values) / write_interval, mean(read_history_values) / read_interval)
4. For HS traffic, that is a 6-hops circuit, we go over all relays and find the relay that has the lowest unused capacity divided by it's probability of being picked in the Guard/Middle position:
picked_prob = (guard_weight * 2) + (middle_weight * 4) unused = relay.unused_capacity / picked_prob
We use time 2 here since there are two guards in the 6 hops circuit and 4 middle nodes. We get the minimum value between each relays.
5. The minimum "unused" value times the picked_prob is the amount of bytes that we'll give to all relays thus reducing their unused capacity:
relay.unused_capacity -= (min_unused * picked_prob)
At each iteration of the algorithm, this will make at least one relay go to 0 unused capacity so when this happens, remove that relay from the set of relays to be picked. Once we can't pick a relay anymore for the 6 hops circuit (that is no more guards or middle), we stop.
6. Add min_unused value to the total number of bytes. Goto step 4.
Ok... this can maybe be not that trivial to understand at first, feel free to ask questions but the basic idea is that at each iteration of the algorithm you spread a specific amount of bytes to each relay depending on the probability of being picked until you have no more relays capacity for such a circuit.
For the "Max Total" traffic, the relay.unused_capacity is equal to the relay capacity. For Exit traffic, we do the same logic but with a 3 hops circuit where each position is multiplied by 1 since there is one guard, middle and exit (in normal circumstances).
I know that we have sometimes 1, 4, 5, 6 or 7 hops circuit but the general case is considered here and we have no stats on how frequent those unusual circuits are.
Anyway, if you think this algorithm could be improved, please respond. If you think this algorithm is wrong, please respond. If you can reproduce the result on your own with this algo, omg please respond! :) The above could be totally wrong but hopefully we did a fairly good job. Please remember, this is an estimate. :)
Thanks! David
On 21 Nov 2015, at 05:38, David Goulet dgoulet@ev0ke.net wrote:
I know that we have sometimes 1, 4, 5, 6 or 7 hops circuit but the general case is considered here and we have no stats on how frequent those unusual circuits are.
When I tested hidden service path lengths:
Most clients cannibalized a 3-hop path for their directory, introduction point, and rendezvous circuits. So 4-hop paths may be quite frequent on the client side of hidden services.
On the server side, it depends on how busy the hidden service is - whether it has any preemptively built paths to cannibalize or not. If so, it's side is typically 4 hops, if not, it is 3.
It would be great to have some stats for typical path lengths, is there an open ticket for this, or should I create one?
Tim
Tim Wilson-Brown (teor)
teor2345 at gmail dot com PGP 968F094B
teor at blah dot im OTR CAD08081 9755866D 89E2A06F E3558B7F B5A9D14F
On 21 Nov (16:26:31), Tim Wilson-Brown - teor wrote:
On 21 Nov 2015, at 05:38, David Goulet dgoulet@ev0ke.net wrote:
I know that we have sometimes 1, 4, 5, 6 or 7 hops circuit but the general case is considered here and we have no stats on how frequent those unusual circuits are.
When I tested hidden service path lengths:
Most clients cannibalized a 3-hop path for their directory, introduction point, and rendezvous circuits. So 4-hop paths may be quite frequent on the client side of hidden services.
On the server side, it depends on how busy the hidden service is - whether it has any preemptively built paths to cannibalize or not. If so, it's side is typically 4 hops, if not, it is 3.
Indeed, I bet cannibalization happens more often then we think thus ending up with a 4 hops circuit to either your IP or RP.
It would be great to have some stats for typical path lengths, is there an open ticket for this, or should I create one?
That would help us have a better estimate of network capacity for sure but I wonder if it worth the efforts versus having a real privacy oriented statistics gathering system that could give us a much more accurate number of the used and unused capacity of relays.
In other words, question comes down to should we put effort in a bigger larger system or continue cherry-picking small stats here and there? (huge work once vs small/medium-ish effort multiple time :)
Thanks! David
Tim
Tim Wilson-Brown (teor)
teor2345 at gmail dot com PGP 968F094B
teor at blah dot im OTR CAD08081 9755866D 89E2A06F E3558B7F B5A9D14F
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
I'd mentioned before idea of posing question how much HS-to-HS bandwidth (unused relay mod hopcount) is freely available within tor. For what this work goes to ansering that, thanks.
On 22 Nov 2015, at 02:55, David Goulet dgoulet@ev0ke.net wrote:
On 21 Nov (16:26:31), Tim Wilson-Brown - teor wrote: ...
It would be great to have some stats for typical path lengths, is there an open ticket for this, or should I create one?
That would help us have a better estimate of network capacity for sure but I wonder if it worth the efforts versus having a real privacy oriented statistics gathering system that could give us a much more accurate number of the used and unused capacity of relays.
In other words, question comes down to should we put effort in a bigger larger system or continue cherry-picking small stats here and there? (huge work once vs small/medium-ish effort multiple time :)
I've created ticket #17768 for this. https://trac.torproject.org/projects/tor/ticket/17768 https://trac.torproject.org/projects/tor/ticket/17768
Tim
Tim Wilson-Brown (teor)
teor2345 at gmail dot com PGP 968F094B
teor at blah dot im OTR CAD08081 9755866D 89E2A06F E3558B7F B5A9D14F
Hi David,
this sound really interesting! Im working on an evaluation of hidden service performance, trying to find some improvements. An algorithm to estimate the network capacities is a great tool, to evaluate what will happen if we would change some circuit parameters. This is why I would like to ask you for your code basis. Is there a git?
Thank u in advanced. Moritz (Institut für Informatik, HU Berlin)
Am 20.11.2015 um 19:38 schrieb David Goulet:
Hello everyone!
I would like to share this graph I've been working on along side with Aaron Johnson (thanks to him for the algorithm I'll be describing).
This shows both the maximum additional and total traffic the network can sustain for both HS and Exit. The "additional" traffic here means that we consider the current load on the network thus the additional value is what can be added more on top of the current traffic.
http://ygzf7uqcusp4ayjs.onion/tor-health/tor-health/bandwidth_capacity.html
(Please ignore the initial downward spike, technical issue)
Here is the algorithm we are using for this. It's in no ways an accurate value but rather an estimation as you will see with the following algorithm to compute the traffic capacity you see.
- Using the latest consensus, build a list of relays that are Running, Valid
and Fast (weight selection only picks relays with Fast flag). Then compute their positionnal consensus weights.
For each of them, set their total capacity to min(avg BW, observed BW).
For each of them, compute the unused capacity that is use the extrainfo
descriptor read/write bytes history for the amount of traffic used and substract that to the total capacity.
max(mean(write_history_values) / write_interval, mean(read_history_values) / read_interval)
- For HS traffic, that is a 6-hops circuit, we go over all relays and find
the relay that has the lowest unused capacity divided by it's probability of being picked in the Guard/Middle position:
picked_prob = (guard_weight * 2) + (middle_weight * 4) unused = relay.unused_capacity / picked_prob
We use time 2 here since there are two guards in the 6 hops circuit and 4 middle nodes. We get the minimum value between each relays.
- The minimum "unused" value times the picked_prob is the amount of bytes
that we'll give to all relays thus reducing their unused capacity:
relay.unused_capacity -= (min_unused * picked_prob)
At each iteration of the algorithm, this will make at least one relay go to 0 unused capacity so when this happens, remove that relay from the set of relays to be picked. Once we can't pick a relay anymore for the 6 hops circuit (that is no more guards or middle), we stop.
- Add min_unused value to the total number of bytes. Goto step 4.
Ok... this can maybe be not that trivial to understand at first, feel free to ask questions but the basic idea is that at each iteration of the algorithm you spread a specific amount of bytes to each relay depending on the probability of being picked until you have no more relays capacity for such a circuit.
For the "Max Total" traffic, the relay.unused_capacity is equal to the relay capacity. For Exit traffic, we do the same logic but with a 3 hops circuit where each position is multiplied by 1 since there is one guard, middle and exit (in normal circumstances).
I know that we have sometimes 1, 4, 5, 6 or 7 hops circuit but the general case is considered here and we have no stats on how frequent those unusual circuits are.
Anyway, if you think this algorithm could be improved, please respond. If you think this algorithm is wrong, please respond. If you can reproduce the result on your own with this algo, omg please respond! :) The above could be totally wrong but hopefully we did a fairly good job. Please remember, this is an estimate. :)
Thanks! David
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
On 24 Nov (10:10:44), Moritz Wedel wrote:
Hi David,
this sound really interesting! Im working on an evaluation of hidden service performance, trying to find some improvements. An algorithm to estimate the network capacities is a great tool, to evaluate what will happen if we would change some circuit parameters. This is why I would like to ask you for your code basis. Is there a git?
I do plan to make a git out of all scripts I use for those graphs... time factor here but I'll reply back to this thread hopefully very soon with more info.
(It's also a great exercice to use the algorithm below and script it yourself to see if you get to the same results :)
Thanks! David
Thank u in advanced. Moritz (Institut für Informatik, HU Berlin)
Am 20.11.2015 um 19:38 schrieb David Goulet:
Hello everyone!
I would like to share this graph I've been working on along side with Aaron Johnson (thanks to him for the algorithm I'll be describing).
This shows both the maximum additional and total traffic the network can sustain for both HS and Exit. The "additional" traffic here means that we consider the current load on the network thus the additional value is what can be added more on top of the current traffic.
http://ygzf7uqcusp4ayjs.onion/tor-health/tor-health/bandwidth_capacity.html
(Please ignore the initial downward spike, technical issue)
Here is the algorithm we are using for this. It's in no ways an accurate value but rather an estimation as you will see with the following algorithm to compute the traffic capacity you see.
- Using the latest consensus, build a list of relays that are Running, Valid
and Fast (weight selection only picks relays with Fast flag). Then compute their positionnal consensus weights.
For each of them, set their total capacity to min(avg BW, observed BW).
For each of them, compute the unused capacity that is use the extrainfo
descriptor read/write bytes history for the amount of traffic used and substract that to the total capacity.
max(mean(write_history_values) / write_interval, mean(read_history_values) / read_interval)
- For HS traffic, that is a 6-hops circuit, we go over all relays and find
the relay that has the lowest unused capacity divided by it's probability of being picked in the Guard/Middle position:
picked_prob = (guard_weight * 2) + (middle_weight * 4) unused = relay.unused_capacity / picked_prob
We use time 2 here since there are two guards in the 6 hops circuit and 4 middle nodes. We get the minimum value between each relays.
- The minimum "unused" value times the picked_prob is the amount of bytes
that we'll give to all relays thus reducing their unused capacity:
relay.unused_capacity -= (min_unused * picked_prob)
At each iteration of the algorithm, this will make at least one relay go to 0 unused capacity so when this happens, remove that relay from the set of relays to be picked. Once we can't pick a relay anymore for the 6 hops circuit (that is no more guards or middle), we stop.
- Add min_unused value to the total number of bytes. Goto step 4.
Ok... this can maybe be not that trivial to understand at first, feel free to ask questions but the basic idea is that at each iteration of the algorithm you spread a specific amount of bytes to each relay depending on the probability of being picked until you have no more relays capacity for such a circuit.
For the "Max Total" traffic, the relay.unused_capacity is equal to the relay capacity. For Exit traffic, we do the same logic but with a 3 hops circuit where each position is multiplied by 1 since there is one guard, middle and exit (in normal circumstances).
I know that we have sometimes 1, 4, 5, 6 or 7 hops circuit but the general case is considered here and we have no stats on how frequent those unusual circuits are.
Anyway, if you think this algorithm could be improved, please respond. If you think this algorithm is wrong, please respond. If you can reproduce the result on your own with this algo, omg please respond! :) The above could be totally wrong but hopefully we did a fairly good job. Please remember, this is an estimate. :)
Thanks! David
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
On Fri, Nov 20, 2015 at 01:38:56PM -0500, David Goulet wrote:
Anyway, if you think this algorithm could be improved, please respond. If you think this algorithm is wrong, please respond. If you can reproduce the result on your own with this algo, omg please respond! :) The above could be totally wrong but hopefully we did a fairly good job. Please remember, this is an estimate. :)
I would be interested to see the capacity by pluggable transport: how much headroom each PT has and which ones need more bridges.
I tried it a few months ago but I didn't have the idea of looking at the read/write bytes history. I tried using measured/advertised bandwidth and using the speed of consensus downloads from the dirreq-v3-tunneled-dl line.
[tor-dev] Per-transport bridge bandwidth and bridge counts https://lists.torproject.org/pipermail/tor-dev/2015-July/009136.html Using measured bandwidth: https://people.torproject.org/~dcf/graphs/pt-bandwidth-2015-07-20/pt-bandwid... Using median dirreq-v3-tunneled-dl rate: https://people.torproject.org/~dcf/graphs/pt-bandwidth-2015-07-20/pt-dl.md.p... Source code: https://people.torproject.org/~dcf/graphs/pt-bandwidth-2015-07-20.tar.gz