Re: [tor-dev] Latest state of the guard algorithm proposal (prop259) (April 2016)

21 Apr 2016

      On Thu, Apr 21, 2016 at 4:32 AM, George Kadianakis desnacked@riseup.net
wrote:
...
Fan Jiang fanjiang@thoughtworks.com writes:
...
[ text/plain ]
Hi,
...
Hello Fan and team,
I think I'm not a big fan of the pending_guard and pending_dir_guard
concept. To me it seems like a quick hack that tries to address
fundamental
...
...
issues with our algorithm that appeared when we tried to adapt the
proposal to
the tor codebase.
Yeah agree, this pending_guard hack was trying to avoid some
implementation
...
problem, we need to redesign.
I haven't got any good idea about this, that will be nice if you already
got some thoughts.
...
I think one of the main issues with the current algorithm structure is
that
...
...
_one run of the algorithm_ can be asked to _setup multiple circuits_,
and
...
...
each
of those circuits has different requirements for guards. That is, since
we
...
...
do
filtering on START based on the requirements of circuit #1, this means
that any
other circuits that appear before END is called, also have to adapt to
the
...
...
requirements of circuit #1. This is obvious in the code since we use
guard_selection->for_directory throughout the whole algorithm run, even
though
for_directory was just the restriction of circuit #1.
Specifically about the pending_guard trick, I feel that it interacts in
unpredictable ways with other features of the algorithm. For example,
consider
how it interacts with the primary guards heuristic. It could be time for
the
algorithm to reenter the primary guards state and retry the top guards
in
...
...
the
list, but because of the pending_guard thing we actually return the 15th
guard
to the circuit.
IMO we should revisit the algorithm so that one run of the algorithm can
accomodate multiple circuits by design and without the need for hacks.
Here is
an idea towards that direction:
I think one very important change that we can do to simplify things
is to
...
...
remove the need to filter guards based on whether they are dirguards,
fast,
  or stable. My suggestion here would be to *only* consider guards that
are
...
...
dirguards _and_ fast _and_ stable. This way, any circuit that appears
will be
  happy with all the guards in our list and there is no need to do the
  pending_dir_guard trick. See [0] on why I think that's safe to do.
This is easy to do in the current codebase. You just need to call
  entry_is_live() with need_uptime, need_capacity and for_directory all
  enabled (instead of flags being 0).
If you do the above, your sampled guard set will be able to accomodate
any
  circuit that comes its way and that should simplify logic
considerably.
...
...
Sounds great, that can simplify the logic a lot, I've done the change, no
more pending_dir_guard.
Hm. Can't you also remove pending_guard? What's the point of it now?
Pending guard was actually for sampled_guard, say when we are in
SAMPLED_GUARDS_STATE
we don't wan't to make the algo return a new random guard
picked_by_banwith, we want to keep the same one
until the callback of first hop comes. We can make it specific for
sampled_guards after checking the state.
Does this make sense?
BTW, looking at your e54551adbfd5be4bee795df10f925b53fc9e730d I suggest you
...
also use entry_is_live() with the ENTRY_NEED_UPTIME and ENTRY_NEED_CAPACITY
flags always enabled. So that it always returns Stable && Fast guards.
Yes, I have done this change.
We should also look at how ENTRY_ASSUME_REACHABLE and ENTRY_NEED_DESCRIPTOR
...
are
used in the rest of the code, to see if we should enable them or not
ourselves.
...
...
Never saw this before, will look into it.

There is a memleak on 'extended' in filter_set().

...
In general, I feel that logic in that function is a bit weird. The
function
  is called filter_set() but it can actually end up adding guards to the
list.
  Maybe it can be renamed?
Split it up to filter_set & expand_set probably can make this clear.

What's up with the each_remaining_by_bandwidth() function name?

...
I guess it should be iterate_remaining_guards_by_bandwidth.
Better. Or sample_guard_by_bandwidth() ? Or get_guard_by_bandwidth() ?
IIUC that function is probalistically sampling from the 'guards' set, so
it's
not really iterating it.
We are actually pick and removing guards from remaining_set in this
function,
And I saw the filter_set used this function wrong which has been fixed,
so maybe `pop` is better than `get`.
Another thing:
Do we still need decide_num_guards to define the n_primary_guards? and it's
the only remaining one is using for_directory.
...
Cheers!
-- 
____

Fan Jiang 蒋帆
Amateur Code Chef
Thoughtworks, Inc.
mobile +86-150-9189-3714
skype fan@torchz.net

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [tor-dev] Latest state of the guard algorithm proposal (prop259) (April 2016)