Re: [tor-dev] Next version of the algorithm

10 Feb 2016

      Ola Bini obini@thoughtworks.com writes:
...
Hi,
Here is the next version of the algorithm - put it in a gist to make
it easier to look at:
  https://gist.github.com/olabini/343da01de8e01491bf5c
Cheers
Thanks for this.
Here we go with another review!
...
The full algorithm is referred to as ALGO_CHOOSE_ENTRY_GUARD. It is divided
into three components, such that the full algorithm is implemented by first
invocing ALGO_CHOOSE_ENTRY_GUARD_START, then repeatedly calling
ALGO_CHOOSE_ENTRY_GUARD_NEXT and finally calling ALGO_CHOOSE_ENTRY_GUARD_END.
That's helpful. Maybe we can also put the pseudocode that you posted in a
previous mail in the appendix to provide more context?
...
ALGO_CHOOSE_ENTRY_GUARD keeps track of unreachable status for guards in state
private to the algorithm - this is initialized every time
ALGO_CHOOSE_ENTRY_GUARD_START is called.
Interesting. That seems like both a bug and a feature in some ways.
It's a security feature because we will try our guard list from the beginning more frequently.
It's a performance "bug" because we have to cycle through all the unreachable
nodes everytime we restart the algorithm, because we forgot they were
unreachable.  If the first multiple guards in your USED_GUARDS are actually
unreachable, then this will delay bootstrap by some time. Consider the case
where you need to make three circuits to connect to a hidden service as a
client (HSDir/IP/RP), so you have to call the algorithm three times in a row.
Of course, if a guard is really unreachable it _should_ be marked as bad within
an hour because it won't be listed in the next consensus. While this makes
sense, I wonder why my laptop guard list (in the state file) has a total of 24
guards, where 18 of them are marked as unreachable and only 6 of them are
marked as bad. Maybe they were all marked unreachable when the internet was
down. I wonder if this influences the performance of the algorithm.
It would be nice to know if the security/performance tradeoff here is
acceptable. Simulations might help, or we will have to learn it the hard way
when we implement the algorithm and try it out in various types of networks.
...
ALGO_CHOOSE_ENTRY_GUARD_START: (arguments: USED_GUARDS, EXCLUDE_NODES, N_PRIMARY_GUARDS)
    If selecting directory guards, GUARDS is all guards from the consensus with the V2Flag, otherwise GUARDS is all guards from the consensus
    Set UTOPIC_GUARDS to be all guards to use under utopic conditions from GUARDS
    Set DYSTOPIC_GUARDS to be all guards to use under dystopic conditions from GUARDS
    Set REMAINING_UTOPIC_GUARDS to be UTOPIC_GUARDS without EXCLUDE_NODES and USED_GUARDS
    Set REMAINING_DYSTOPIC_GUARDS to be DYSTOPIC_GUARDS without EXCLUDE_NODES and USED_GUARDS
    Create a list of PRIMARY_GUARDS that contain N_PRIMARY_GUARDS that are not bad by:
        Taking the next entry from USED_GUARDS
        If USED_GUARDS is empty:
            randomly select an entry from UTOPIC_GUARDS, weighted by bandwidth
    Set TRIED_GUARDS to be an empty set
    Set TRIED_DYSTOPIC_GUARDS to be an empty set
    Set state = STATE_PRIMARY_GUARDS

ALGO_CHOOSE_ENTRY_GUARD_NEXT:

    If a new consensus has arrived:
        Update all guard profiles with new bad/non-bad information
        If any PRIMARY_GUARDS have become bad:
            re-add to the list of PRIMARY_GUARDS using the same procedure

And also remove the primary guard that has become bad, right?
...
        If any USED_GUARDS have become non-bad:
            add it back to PRIMARY_GUARDS at the place it would have been if it was non-bad when running ALGO_CHOOSE_ENTRY_GUARD_START. If this results in PRIMARY_GUARDS being larger than N_PRIMARY_GUARDS, remove from the end of the list until the list is N_PRIMARY_GUARDS long

How do we know that a guard in USED_GUARDS that just became non-bad deserves to
be a primary guard? Even if it's the first guard in the USED_GUARDS list, how
do we know whether it has priority over the last guard of the PRIMARY_GUARDS
list?  Do we need to keep state of previous PRIMARY_GUARDS sets?
...
        Ensure that UTOPIC_GUARDS and DYSTOPIC_GUARDS are updated with the changes from the consensus

Isn't this implicit in "Update all guard profiles with new bad/non-bad information"?
Maybe these can be merged into a single step that explicitly mentions the guardlists that need to be updated.
...
    If it was at least 3 minutes since we tried the primary guards and we are not in STATE_PRIMARY_GUARDS:
        save previous state
        set state = STATE_PRIMARY_GUARDS

    STATE_PRIMARY_GUARDS:
        return each entry in PRIMARY_GUARDS in turn
            for each entry, if algorithm doesn't terminate
                mark entry as "unreachable"
                add entry to TRIED_GUARDS
                if the number of entries in TRIED_GUARDS that were tried within GUARDS_TRY_THRESHOLD_TIME is larger than GUARDS_TRY_THRESHOLD, return failure from ALGO_CHOOSE_ENTRY_GUARD.
        restore previous state (or STATE_TRY_UTOPIC if no previous)

    STATE_TRY_UTOPIC:
        for each entry in TRIED_GUARDS that was marked as unreachable more than 20 minutes ago
            add it back to REMAINING_UTOPIC_GUARDS
        return each remaining entry from USED_GUARDS in turn

What does "remaining" means here? Is it the next USED_GUARDS entry that has not
been marked as unreachable?
...
            for each entry, if algorithm doesn't terminate
                mark entry as "unreachable"
                add entry to TRIED_GUARDS
                if the number of entries in TRIED_GUARDS that were tried within GUARDS_TRY_THRESHOLD_TIME is larger than GUARDS_TRY_THRESHOLD, return failure from ALGO_CHOOSE_ENTRY_GUARD.
                if the number of entries in TRIED_GUARDS is larger than a GUARDS_FAILOVER_THRESHOLD proportion of UTOPIC_GUARDS, set state = STATE_TRY_DYSTOPIC

I wonder if returning failure and exiting the algorithm is the right thing to do in this case.
Basically, we just connected to too many different guards and we want to stop
exposing ourselves to more of the network. This is something that can happen
during an attack, but also something that happens naturally all the time when
the network is down. So we can't just have Tor exit().
Instead, we probably want to keep our guardlist and continue connecting to the
guards we already have till the network comes up again.
If we return failure here and exit the algorithm, then when we restart the
algorithm we will have forgetten the TRIED_GUARDS list and hence all the guards
that we tried in our previous run. Then we will attempt to connect to
GUARDS_TRY_THRESHOLD new guards again. Which is probably exactly what the
adversary wants.
So maybe failure is not an option here, and instead we should massage our
current guardlist and start from the beginning without adding more guards?
What do you think?
...
        return each entry from REMAINING_UTOPIC_GUARDS randomly selected, weighted by bandwidth
            remove the returned entry from REMAINING_UTOPIC_GUARDS
            for each entry, if algorithm doesn't terminate
                mark entry as "unreachable"
                add entry to TRIED_GUARDS
                if the number of entries in TRIED_GUARDS that were tried within GUARDS_TRY_THRESHOLD_TIME is larger than GUARDS_TRY_THRESHOLD, return failure from ALGO_CHOOSE_ENTRY_GUARD.
                if the number of entries in TRIED_GUARDS is larger than a GUARDS_FAILOVER_THRESHOLD proportion of UTOPIC_GUARDS, set state = STATE_TRY_DYSTOPIC

Also, I'm not sure I understand the precise security properties that the above two
checks provide us with. Are both checks needed? Could you explain them to me please?
...
    STATE_TRY_DYSTOPIC:
        for each entry in TRIED_DYSTOPIC_GUARDS that was marked as unreachable more than 20 minutes ago
            add it back to REMAINING_DYSTOPIC_GUARDS
        return each remaining DYSTOPIC entry from USED_GUARDS in turn
            for each entry, if algorithm doesn't terminate
                mark entry as "unreachable"
                add entry to TRIED_DYSTOPIC_GUARDS
                if the number of entries in TRIED_GUARDS+TRIED_DYSTOPIC_GUARDS that were tried within GUARDS_TRY_THRESHOLD_TIME is larger than GUARDS_TRY_THRESHOLD, return failure from ALGO_CHOOSE_ENTRY_GUARD.
                if the number of entries in TRIED_DYSTOPIC_GUARDS is larger than a GUARDS_FAILOVER_THRESHOLD proportion of DYSTOPIC_GUARDS:
                    mark all guards in PRIMARY_GUARDS, TRIED_GUARDS and TRIED_DYSTOPIC_GUARDS as not "unreachable"
                    return failure from ALGO_CHOOSE_ENTRY_GUARD.
        return each entry from REMAINING_DYSTOPIC_GUARDS randomly selected, weighted by bandwidth
            remove the returned entry from REMAINING_DYSTOPIC_GUARDS
            for each entry, if algorithm doesn't terminate
                mark entry as "unreachable"
                add entry to TRIED_DYSTOPIC_GUARDS
                if the number of entries in TRIED_GUARDS+TRIED_DYSTOPIC_GUARDS that were tried within GUARDS_TRY_THRESHOLD_TIME is larger than GUARDS_TRY_THRESHOLD, return failure from ALGO_CHOOSE_ENTRY_GUARD.
                if the number of entries in TRIED_DYSTOPIC_GUARDS is larger than a GUARDS_FAILOVER_THRESHOLD proportion of DYSTOPIC_GUARDS:
                    return failure from ALGO_CHOOSE_ENTRY_GUARD.

ALGO_CHOOSE_ENTRY_GUARD_END:
    If circuit is set up correctly, let algorithm know
        Algorithm marks the guard chosen as used and makes sure it is in USED_GUARDS

Maybe instead of "makes sure it is in" we should say "adds it in"?
It has not been added in a previous step from what I see.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [tor-dev] Next version of the algorithm