Hi Anna. Glad you're interested in digging in to the directory authorities! This is certainly a space that could use some love.
I'm working off the "Votes by Bandwidth Authorities" example on the Stem webpage (https://stem.torproject.org/tutorials/examples/votes_by_bandwidth_authoritie...).
Oh dear! Just noticed that example sucks for figuring out who is and isn't a bandwidth authority. Made a little tweak so the world sucks a little less...
https://gitweb.torproject.org/stem.git/commitdiff/e130863
You mentioned on IRC that "measured" might be returning the bwauth weight rather than a bandwidth, but what is the meaning of that weight? Is a higher "measured" value mean a higher bandwidth, or a higher bandwidth relative to what it advertises?
My understanding is that a higher 'measured' simply means 'the bandwidth authorities think you should use this relay more', which in turn is based on how much traffic the bandwidth authorities thinks it can/should handle.
In other words, if I sorted the descriptors by "measured" value, what would that order mean?
I *think* that would be the ordering of 'relays who receive the most tor client traffic due to having a more highly weighted heuristic for relay selection'.
That said, this is an area I'm honestly not that familiar. I'm looping in Sebastian, Karsten, and Roger. As mentioned on irc Sebastian has touched the Bandwidth Authorities most recently, so he's likely the most knowledgeable at present about this space.
Karsten is the maintainer of our metrics space (http://metrics.torproject.org) and a descriptor guru, while Roger... well, knows all the things. But that said, he has a special afinity for research so as a PhD student he'll probably be especially interested to hear your plans.
Separately, is there a way (using Stem or some other tool) to see the raw bwauth measurements rather than the weights?
I don't believe this is exposed anywhere, so only the bandwidth authority operators have this. And by 'have' I mean 'maybe in their logs, or possibly not even surfaced at all'.
Is that a calculation I can reverse?
Maybe run a bandwidth authority of your own? This could be a terrible idea. Sebastian would know.
I haven't looked into the historical data on CollecTor yet, but ideally, I would like to use the historical data to figure out how effective the bwauth measurements seem to be in different situations (for example, the misconfigured to very high bandwidth relay this past February seems to have confused the bwauths https://lists.torproject.org/pipermail/tor-talk/2014-February/032094.html).
Agreed, it would be nice for CollecTor to have bandwidth authority information. However, with a few small exceptions (like rdns and geoip lookups) CollecTor is simply a distilled version of what's in the consensus. That is to say, by directly collecting descriptor information like you are you're already have a superset of what CollecTor provides.
Cheers! -Damian