On Oct 8, 2020, at 2:50 PM, Mike Perry mikeperry@torproject.org wrote:
I do not yet have confidence that these issues are solved simply because they did not appear in Shadow. Shadow does not simulate multi-instance relays, CPU bound relays, or structural load imbalances in the network.
Hi Mike and others!
I'd like to better understand your criticisms here so that we can work to make Shadow more useful (work that fits squarely under sponsor 38).
multi-instance relays
Nothing prevents Shadow from running multiple tor relay processes on the same virtual host. We could add this to the Tor models that are created by our model generation tool[0].
One issue is that we don't have ground truth about: - which relays are co-resident with one another; and - the capacity of the machine hosting the co-resident relays.
A short term fix could be that we look at relays in the same family, and randomly choose some of them to run on the same machine (setting the capacity of the machine as the sum of max observed bandwidth of the co-resident relays). A longer term solution would be to add a new parameter similar to MyFamily and ask operators to identify which relays are co-resident, or add to tor a self-measurement of co-residency - and that would provide the ground truth we would need for accurate modeling.
Thoughts? Any other ideas?
CPU-bound relays
There are two issues here: - we need to improve/rewrite our virtual CPU module in Shadow that accounts for CPU load; and - we need ground truth about the number of CPUs and CPU speeds for each relay.
The first one is relatively straightforward to resolve, the second one again requires some form of self-reporting or automated self-measurement in tor.
structural load imbalances
Could you please explain this one in a couple more sentences?
By 'structural' I think you might mean imbalances across relay positions (i.e., more guard bandwidth and less exit bandwidth). If so, then Shadow does already properly account for this by statically assigning flags using the TestingDirAuthVoteExit and TestingDirAuthVoteGuard torrc options.
Here are some bonus ones for you:
capacity of relays
We currently use the maximum observed bandwidth that we've seen for a relay and set that value as the network link capacity of the (virtual) host machine that runs reach relay. Again, we don't have any ground truth of how much capacity is available to each relay, though maybe someday FlashFlow will collect it for us.
diversity of Tor versions
We should make sure our modeling tool includes relays across different versions of Tor, since not all relays in the public network run the same version. This one is pretty simple to fix (it just requires us to build Tor plugins multiple different Tor source versions) but research that is testing how a new idea performs across the network by modifying Tor source will obviously need to use their custom research version of Tor.
Peace, love, and positivity, Rob