On Mon, Oct 14, 2019 at 07:56:29AM +0000, Florentin Rochet wrote:
In short, Prop 271 causes guards to be selected with probabilities different than their weights due to the way it samples many guards and then chooses primary guards from that sample.
Agreed. As you say, Paul identified this one a few years ago too. As far as I understand the prop#271 design, I agree that it's an issue.
We are suggesting a straightforward fix to the problem, which is, roughly speaking, to choose primary guards in the order in which they were sampled.
This looks like a good solution to the issue -- the ordering of the guards as we select them is proportional to their weight, so let's just use them in the order that we selected them.
One of the tricky features of the prop#271 guard selection design is that it won't just keep on choosing guards if many are unreachable, but rather it will stop after a while, so a bad ISP can't totally control what guard you pick. I think that feature is left untouched by your design change, since we're choosing from among only the same set as before, just in a different order. But please think about whether that is true.
We have created a patch implementing this fix for the case affecting our experiments, which would improve the current situation. We are further suggesting that Tor apply the technique throughout the guard-selection logic.
Can you help us make sure we think of all the places you've already thought of? :)
We believe that this issue has only a limited effect on Tor currently due to the relatively large number of guards.
That makes sense to me too -- most of the guards that are chosen in the 20 will be fast, so choosing uniformly from them will often give you a fast one.
The design also reduces Tor's security by increasing the number of clients that an adversary running small relays can observe. In addition, an adversary has to wait less time than it should after it starts a malicious guard to be chosen by a client. This weakness occurs because the malicious guard only needs to enter the sampled list to have a chance to be chosen as primary, rather than having to wait until all previously-sampled guards have already expired.
This part makes me wonder about another angle to this problem: proper load balancing when we choose our guards on one date but then make decisions about them on a different date.
For example, if we sample all these guards on day 0, and then use the first guard for a week, and then move to the second guard... but the weights have changed in that time... what will that do to our load balancing? One extreme case would be a relay that has a really high weight for a while, and then later turns out to have much lower bandwidth. It gets into a bunch of guard lists at first (but mostly not #1 since that's how the probabilities work), and then slowly clients shift load to it as their #1 guard goes away.
In an ideal world we would want to take into account current guard weights, when we're shifting from one guard to the next, rather than making that decision way earlier before we actually turn out to need the guards. Maybe that argues for delaying more of the decisions?
Note that this question is about yet another improvement that could be made to the guard part of path selection, and I think it's orthogonal to the improvement you are proposing.
Thanks, --Roger