Hey,
This algorithm keeps track of the unreachability status for guards in state private to the algorithm - this is re-initialized every time START is called.
Hmm, didn't we decide to persist the unreachability status over runs, right? Or not?
Yeah, I think we did decide to persist it between runs, but not more permanently. I've changed it now.
SAMPLED_UTOPIC_GUARDS This is a set that contains all guards that should be considered for connection under utopic conditions. This set should be persisted between runs. It will be filled in by the algorithm if it's empty, or if it contains less than SAMPLE_SET_THRESHOLD guards after winnowing out older guards. It should be filled by using NEXT_BY_BANDWIDTH with UTOPIC_GUARDS as an argument.
Should we use UTOPIC_GUARDS or REMAINING_UTOPIC_GUARDS as the argument?
It should be UTOPIC_GUARDS, since REMAINING_UTOPIC_GUARDS will always be a subset of SAMPLED_UTOPIC_GUARDS.
I guess you mean SAMPLED_DYSTOPIC_GUARDS.
Yep, thanks. Fixed.
REMAINING_UTOPIC_GUARDS This is a running set of the utopic guards we have not yet tried to connect to. It should be initialized to be SAMPLED_UTOPIC_GUARDS without USED_GUARDS.
Maybe here we should also mention that we will reinsert guards that we have not tried in a long time (GUARDS_RETRY_TIME) as specified by 2.2.2?
Yep, good clarification. I've added that.
[XXX defining "was not possible to connect" as "entry is not live" according to current definition of "live entry guard" in tor source code, seems to improve success rate on the flaky network scenario. See: https://github.com/twstrike/tor_guardsim/issues/1#issuecomment-187374942]
Hmm, I'm not sure what this XXX means exactly. I believe we should actually try to _connect_ to those primary guards and not just check if we think they are live.
Yeah, I don't know where it comes from either - @rjunior, care to expand on it?
§2.2.2. The STATE_TRY_UTOPIC state
In order to give guards that have been marked as unreachable a chance to come back, add all entries in TRIED_GUARDS that were marked as unreachable more than GUARDS_RETRY_TIME minutes ago back to REMAINING_UTOPIC_GUARDS.
I'm a bit puzzled by this mechanism. Maybe it's benefits can be explained a bit more clearly?
When we add guards back to REMAINING_UTOPIC_GUARDS, do we also remove them from TRIED_GUARDS?
Well, TRIED_GUARDS doesn't really do much at the moment. In fact, it might be easier to just remove it. I've done that and it simplifies things as well.
Now that we have persistent SAMPLED_UTOPIC_GUARDS is this still useful? Won't we have fully populated our SAMPLED_*_GUARDS structures by the point this rule triggers?
Agree, I've removed it. Much nicer and neater now! =D
§2.2.5. ON_NEW_CONSENSUS
First, ensure that all guard profiles are updated with information about whether they were in the newest consensus or not. If not, the guard is considered bad.
Maybe instead of "If not" we could say "If a guard is not included in the newest consensus" to make it a bit clearer.
Good clarification, done.
[XXX Does "add it back in the place it should have been in PRIMARY_GUARDS if it had been non-bad" implies keeping original order?]
If I understand correctly, I think the answer to this XXX is "Ideally, yes.".
Yes, that is definitely the answer.
I'm curious to see how this mechanism will be implemented because it's important and it would be nice if it's done cleanly.
I can see a few different ways to do it easily. One of them would be to just rerun the original primary guard selection algorithm until we find the guard we want to insert.
Also, we should be careful about when we count 'bad' guards. After a few weeks of operation, the USED_GUARDS list can accumulate multiple bad guards, and we should make sure we don't count them when we do our threshold checks.
Absolutely.
Just a reminder that we also discussed adding the "Retry primary guards if we have looped over the whole guardlist" heuristic somewhere here. Because in many cases the network can go down and then back up in less than a minute.
Actually, that retry heuristic is there. Or maybe I misunderstand the point.
IIUC, if the guard is not in USED_GUARDS it should be added *last* (that is, with lowest priority).
Yep, added that.
We should decide if we want to actually use a dynamic percentage here, or just set the threshold to a constant value.
A dynamic percentage might give us better security and reachability as the network evolves, but might also cause unpredictable behaviors if we suddently get too many guards or too many of them disappear.
I don't have a strong opinion here.
Me neither. I think a percentage is a good starting point - it feels easier to tweak in different ways.
It seems to me that the value 20 here could get reduced to something like 5 or even less. Of course 5 is also an arbitrary value and to actually find out the "best" number here we should test the algorithm ourselves in various network types.
Arbitrarily changed to 5. =)
Cheers