-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 06/12/14 00:26, Anna Kornfeld Simpson wrote:
Thanks all for the responses!
On Fri, Nov 21, 2014 at 4:53 PM, Sebastian Hahn sebastian@torproject.org wrote:
Hi there,
On 21 Nov 2014, at 23:44, Damian Johnson atagar@torproject.org wrote:
In other words, if I sorted the descriptors by "measured" value, what
would
that order mean?
I *think* that would be the ordering of 'relays who receive the most tor client traffic due to having a more highly weighted heuristic for relay selection'.
that would be accurate, is my understanding
Is there documentation of why this "heuristic for relay selection" does not correlate that well with "bandwidth" in the descriptor? I've attached a couple of scatter plots pulled from moria1's "measured" and "bandwidth" values for each descriptor a couple hours ago (and the plots look similar from the other bwauths). One shows all values, the other shows the bottom 75% of values (sorted by measurements), and neither shows as much of a correlation as I would expect. Are there factors other than bandwidth that contribute to this "heuristic for relay selection"?
Hi Anna,
I don't have answers, but maybe ideas for further investigations:
- Not sure if this was mentioned before, but did you take a look at the spec? https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/R...
- Maybe try removing bandwidth values close to 10000, or just values exactly at 10000. IIRC, values are capped at that value. (Removing just those values may be more accurate than removing the top 25%.)
- Very small bandwidth values might be the result from newly started or restarted relays. (Advertised) bandwidth values are "the volume of traffic, both incoming and outgoing, that a relay is willing to sustain, as configured by the operator and claimed to be observed from recent data transfers." If a relay didn't observe larger data transfers, the reported bandwidth value will be small, but still the (past) measurements might be large. Maybe compare this for single relays over time.
- There's an interesting pattern at 1024 (?) kB/s. Maybe there are more at 512 kB/s and others. Can you reduce the amount of overplotting in the graph? In R/ggplot2, you'd set the "alpha" value to something smaller than 1, so that dots become somewhat transparent. Could be that these patterns are normal, because operators tend to pick certain bandwidth rates more often than others.
All the best, Karsten