Hi Rob,
On 8 Aug 2019, at 22:15, Rob Jansen rob.g.jansen@nrl.navy.mil wrote:
On Aug 6, 2019, at 5:48 PM, Roger Dingledine arma@torproject.org wrote:
On Tue, Aug 06, 2019 at 05:31:39PM -0400, Rob Jansen wrote:
Today, I started running the speedtest on all relays in the network. So far, I have finished about 100 relays (and counting). I expect that the advertised bandwidths reported by metrics will increase over the next few days. For this to happen, the bandwidth histories observed by a relay during my speedtest are first committed to the bandwidth history table (within 24 hours), and then reported in the server descriptors (within 18-36 hours, depending on when the bandwidth history commit happens).
Great.
There will be another confusing (confounding) factor, which is that the weights in the consensus are chosen by the bandwidth authorities, so even if the relay's self-reported bandwidth goes up (because it now sees that it can handle more traffic), that doesn't mean that the consensus weight will necessarily go up. In theory it ought to, but with a day or so delay, as the bwauths catch on to the larger value in the descriptor; but in practice, I am not willing to make bets on whether it will behave as intended. :) So, call it another thing to keep an eye out for during the experiment.
Another wrinkle to keep in mind is that my script measures one relay at a time. If there are multiple relays running on the same NIC, after my measurement each of them will think they have the full capacity of the NIC. So if we just add up all of the advertised bandwidths after my measurement without considering that some of them share a NIC, that will result in an over-estimate of the available capacity of the network.
To avoid over-estimating network capacity, we could use IP-based heuristics to guess which relays share a machine (e.g., if they share an IP address, or have a nearby IP address). In the long term, it would be nice if Tor would collect and report some sort of machine ID the same way it reports the platform.
More precisely, we're trying to answer the question: "Which small sets of machines are limited by a common network link or shared CPU?"
A machine ID is an incomplete answer to this question: it doesn't deal with VMs, or multiple machines that share a router.
Here are some other potential heuristics: * clock skew / precise time: machine/VM? * nearby IP addresses and common ASN: machine?/VM?/router? * platform: machine * tor version: operator? (a proxy for machine/VM/router)
Is there a cross-platform API for machine IDs? Or similar APIs for our most common relay platforms? (Linux, BSDs, Windows)
T