On Thu, Apr 21, 2016 at 4:32 AM, George Kadianakis <desnacked@riseup.net> wrote:

Fan Jiang <fanjiang@thoughtworks.com> writes:

> [ text/plain ]

> Hi,
>
>> Hello Fan and team,
>>
>> I think I'm not a big fan of the pending_guard and pending_dir_guard
>> concept. To me it seems like a quick hack that tries to address fundamental
>> issues with our algorithm that appeared when we tried to adapt the
>> proposal to
>> the tor codebase.
>>
>>
> Yeah agree, this pending_guard hack was trying to avoid some implementation
> problem, we need to redesign.
> I haven't got any good idea about this, that will be nice if you already
> got some thoughts.
>
>
>> I think one of the main issues with the current algorithm structure is that
>> _one run of the algorithm_ can be asked to _setup multiple circuits_, and
>> each
>> of those circuits has different requirements for guards. That is, since we
>> do
>> filtering on START based on the requirements of circuit #1, this means
>> that any
>> other circuits that appear before END is called, also have to adapt to the
>> requirements of circuit #1. This is obvious in the code since we use
>> guard_selection->for_directory throughout the whole algorithm run, even
>> though
>> for_directory was just the restriction of circuit #1.
>>
>> Specifically about the pending_guard trick, I feel that it interacts in
>> unpredictable ways with other features of the algorithm. For example,
>> consider
>> how it interacts with the primary guards heuristic. It could be time for
>> the
>> algorithm to reenter the primary guards state and retry the top guards in
>> the
>> list, but because of the pending_guard thing we actually return the 15th
>> guard
>> to the circuit.
>>
>> IMO we should revisit the algorithm so that one run of the algorithm can
>> accomodate multiple circuits by design and without the need for hacks.
>> Here is
>> an idea towards that direction:
>>
>> I think one very important change that we can do to simplify things is to
>> remove the need to filter guards based on whether they are dirguards,
>> fast,
>> or stable. My suggestion here would be to *only* consider guards that are
>> dirguards _and_ fast _and_ stable. This way, any circuit that appears
>> will be
>> happy with all the guards in our list and there is no need to do the
>> pending_dir_guard trick. See [0] on why I think that's safe to do.
>>
>> This is easy to do in the current codebase. You just need to call
>> entry_is_live() with need_uptime, need_capacity and for_directory all
>> enabled (instead of flags being 0).
>>
>> If you do the above, your sampled guard set will be able to accomodate
>> any
>> circuit that comes its way and that should simplify logic considerably.
>>
>>
> Sounds great, that can simplify the logic a lot, I've done the change, no
> more pending_dir_guard.
>

Hm. Can't you also remove pending_guard? What's the point of it now?

Pending guard was actually for sampled_guard, say when we are in SAMPLED_GUARDS_STATE

we don't wan't to make the algo return a new random guard picked_by_banwith, we want to keep the same one

until the callback of first hop comes. We can make it specific for sampled_guards after checking the state.

Does this make sense?

BTW, looking at your e54551adbfd5be4bee795df10f925b53fc9e730d I suggest you
also use entry_is_live() with the ENTRY_NEED_UPTIME and ENTRY_NEED_CAPACITY
flags always enabled. So that it always returns Stable && Fast guards.

Yes, I have done this change.

We should also look at how ENTRY_ASSUME_REACHABLE and ENTRY_NEED_DESCRIPTOR are
used in the rest of the code, to see if we should enable them or not ourselves.

>> Never saw this before, will look into it.
>
> - There is a memleak on 'extended' in filter_set().
>>
>> In general, I feel that logic in that function is a bit weird. The
>> function
>> is called filter_set() but it can actually end up adding guards to the
>> list.
>> Maybe it can be renamed?
>>
>>
> Split it up to filter_set & expand_set probably can make this clear.
>
> - What's up with the each_remaining_by_bandwidth() function name?
>>
>>
> I guess it should be iterate_remaining_guards_by_bandwidth.
>

Better. Or sample_guard_by_bandwidth() ? Or get_guard_by_bandwidth() ?

IIUC that function is probalistically sampling from the 'guards' set, so it's
not really iterating it.

We are actually pick and removing guards from remaining_set in this function,

And I saw the filter_set used this function wrong which has been fixed,

so maybe `pop` is better than `get`.

Another thing:

Do we still need decide_num_guards to define the n_primary_guards? and it's the only remaining one is using for_directory.

Cheers!

____

Fan Jiang 蒋帆

Amateur Code Chef
Thoughtworks, Inc.
mobile +86-150-9189-3714

skype fan@torchz.net