[proposal] New Entry Guard Selection Algorithm

List overview All Threads
Download

newer

older

Re: [tor-dev] stale entries in...

isis

29 Oct 2015 29 Oct '15

9:01 p.m.

Hello,

For your reviewing pleasure, may I present you with a new (draft) proposal for a new entry guard selection algorithm, and associated data structures (with reference implementations!). [0] The bulk of it is based upon George Kadianakis' recent email and pseudocode. [1] [2] There are still a few XXXs which need elucidation and/or comment.

Relevant tickets include: https://bugs.torproject.org/17261 https://bugs.torproject.org/17262 https://bugs.torproject.org/16120 https://bugs.torproject.org/12466 https://bugs.torproject.org/12450 https://bugs.torproject.org/12595

Assistance with the completion of #17262 (due 31 Oct) would be greatly appreciated, since my plan for tomorrow is to work with some of the cryptographers here to review my changes to the crypto in rBridge for #7520.

[0]: https://gitweb.torproject.org/user/isis/torspec.git/commit/?h=bug17261-best-... [1]: https://lists.torproject.org/pipermail/tor-dev/2015-August/009328.html [2]: https://gitweb.torproject.org/user/asn/tor.git/tree/src/or/guardlist.c?h=bug...

Best regards,

-- ♥Ⓐ isis agora lovecruft _________________________________________________________ OpenPGP: 4096R/0A6A58A14B5946ABDE18E207A3ADB67A2CDB8B35 Current Keys: https://blog.patternsinthevoid.net/isis.txt

Attachments:

signature.asc (application/pgp-signature — 1.2 KB)

Show replies by date

Nick Mathewson

29 Oct 29 Oct

9:23 p.m.

On Thu, Oct 29, 2015 at 5:01 PM, isis isis@torproject.org wrote:

...

Hello,

For your reviewing pleasure, may I present you with a new (draft) proposal for a new entry guard selection algorithm, and associated data structures (with reference implementations!). [0] The bulk of it is based upon George Kadianakis' recent email and pseudocode. [1] [2] There are still a few XXXs which need elucidation and/or comment.

Relevant tickets include: https://bugs.torproject.org/17261 https://bugs.torproject.org/17262 https://bugs.torproject.org/16120 https://bugs.torproject.org/12466 https://bugs.torproject.org/12450 https://bugs.torproject.org/12595

Assistance with the completion of #17262 (due 31 Oct) would be greatly appreciated, since my plan for tomorrow is to work with some of the cryptographers here to review my changes to the crypto in rBridge for #7520.

Woo, this is now proposal 259.

George Kadianakis

30 Oct 30 Oct

4:12 p.m.

Hello Isis,

thanks for the proposal. Looks good!

I think adding the logic for greater reachability of people behind FascistFirewalls makes sense and will be very helpful.

Some comments on the proposal inlined below:

----

...

Filename: xxx-guard-selection.txt Title: New Guard Selection Behaviour Author: Isis Lovecruft, George Kadianakis

Thanks for adding me as a co-author! Would also be great if you could link to my original post for reference:

https://lists.torproject.org/pipermail/tor-dev/2015-August/009328.html

...

Created: 2015-10-28 Status: Draft Extends: 241-suspicious-guard-turnover.txt

<snip>

§2. Design

Alice, an OP attempting to connect to the Tor network, should undertake the following steps to determine information about the local network and to select (some) appropriate entry guards. In the following scenario, it is assumed that Alice has already obtained a recent, valid, and verifiable consensus document.

Before attempting the guard selection procedure, Alice initialises the guard data structures and prepopulates the guardlist structures, including the UTOPIC_GUARDLIST and DYSTOPIC_GUARDLIST (cf. §XXX). Additionally, the structures have been designed to make updates efficient both in terms of memory and time, in order that these and other portions of the code which require an up-to-date guard structure are capable of obtaining such.
0. Determine if the local network is potentially accessible.

   Alice should attempt to discover if the local network is up or down,
   based upon information such as the availability of network interfaces
   and configured routing tables.  See #16120. [0]

   [XXX: This section needs to be fleshed out more.  I'm ignoring it for
   now, but since others have expressed interest in doing this, I've added
   this preliminary step. —isis]

1. Check that we have not already attempted to add too many guards
   (cf. proposal #241).

I'd suggest you extract all the functionality and variables you need from prop#241 and incorporate them to your proposal. Otherwise it's not immediately obvious how the two proposals interact, and which parts of #prop241 are still active.

Personally, I think I only used CANDIDATE_THRESHOLD from #prop241 which I renamed to GUARDS_ATTEMPTED_THRESHOLD.

...

2. Then, if the PRIMARY_GUARDS on our list are marked offline, the
   algorithm attempts to retry them, to ensure that they were not flagged
   offline erroneously when the network was down. This retry attempt
   happens only once every 20 mins to avoid infinite loops.

   [Should we do an exponential decay on the retry as s7r suggested?
   —isis]

3. Take the list of all available and fitting entry guards and return the
   top one in the list.

4. If there were no available entry guards, the algorithm adds a new entry
   guard and returns it.  [XXX detail what "adding" means]

5. Go through the steps 1-4 above algorithm, using the UTOPIC_GUARDLIST.

   5.a. When the GUARDLIST_FAILOVER_THRESHOLD of the UTOPIC_GUARDLIST has
        been tried (without success), Alice should begin trying steps 1-4
        with entry guards from the DYSTOPIC_GUARDLIST as well.  Further,
        if no nodes from UTOPIC_GUARDLIST work, and it appears that the
        DYSTOPIC_GUARDLIST nodes are accessible, Alice should make a note
        to herself that she is possibly behind a fascist firewall.

   5.b. If no nodes from either the UTOPIC_GUARDLIST or the
        DYSTOPIC_GUARDLIST are working, Alice should make a note to
        herself that the network has potentially gone down.  Alice should
        then schedule, at exponentially decaying times, to rerun steps
        0-5.

        [XXX Should we do step 0? Or just 1-4?  Should we retain any
        previous assumptions about FascistFirewall?  —isis]

6. [XXX Insert potential other fallback mechanisms, e.g. switching to
   using bridges? —isis]

§3. New Data Structures, Consensus Parameters, & Configurable Variables

§3.1. Consensus Parameters & Configurable Variables

Variables marked with an asterisk (*) SHOULD be consensus parameters.

DYSTOPIC_GUARDS ¹
    All nodes listed in the most recent consensus which are marked with
    the Guard flag and which advertise their ORPort(s) on 80, 443, or any
    other addresses and/or ports controllable via the FirewallPorts and
    ReachableAddresses configuration options.

UTOPIC_GUARDS
    All nodes listed in the most recent consensus which are marked with
    the Guard flag and which do NOT advertise their ORPort(s) on 80, 443,
    or any other addresses and/or ports controllable via the FirewallPorts
    and ReachableAddresses configuration options.

It's interesting that these two sets DYSTOPIC_GUARDS and UTOPIC_GUARDS are disjoint. This means that there will be no 80/443 relays on the UTOPIC guardlist. This means that 80/443 guards will only be used by people under FascistFirewall; which makes them a bit like bridges in a way, and also has significant effects on our load balancing.

Are we sure we want these two sets to be disjoint?

I could imagine an alternative design where we still have the two guard lists, but we also allow 80/443 relays to be in UTOPIC_GUARDS. If you were lucky and you had 80/443 guards in your UTOPIC guard list you don't need to go down to the DYSTOPIC guard list. Or something like this.

...

PRIMARY_GUARDS *
   The number of first, active, PRIMARY_GUARDS on either the
   UTOPIC_GUARDLIST or DYSTOPIC_GUARDLIST as "primary". We will go to
   extra lengths to ensure that we connect to one of our primary guards,
   before we fall back to a lower priority guard. By "active" we mean that
   we only consider guards that are present in the latest consensus as
   primary.

UTOPIC_GUARDS_ATTEMPTED_THRESHOLD *
DYSTOPIC_GUARDS_ATTEMPTED_THRESHOLD *
   These thresholds limit the amount of guards from the UTOPIC_GUARDS and
   DYSTOPIC_GUARDS which should be partitioned into a single
   UTOPIC_GUARDLIST or DYSTOPIC_GUARDLIST respectively.  Thus, this
   represents the maximum percentage of each of UTOPIC_GUARDS and
   DYSTOPIC_GUARDS respectively which we will attempt to connect to.  If
   this threshold is hit we assume that we are offline, filtered, or under
   a path bias attack by a LAN adversary.

   There are currently 1600 guards in the network.  We allow the user to
   attempt 80 of them before failing (5% of the guards).  With regards to
   filternet reachability, there are 450 guards on ports 80 or 443, so the
   probability of picking such a guard guard here should be high.

oops "guard guard"

...

   This logic is not based on bandwidth, but rather on the number of
   relays which possess the Guard flag.  This is for three reasons: First,
   because each possible *_GUARDLIST is roughly equivalent to others of
   the same category in terms of bandwidth, it should be unlikely [XXX How
   unlikely? —isis] for an OP to select a guardset which contains less
   nodes of high bandwidth (or vice versa).  Second, the path-bias attacks
   detailed in proposal #241 are best mitigated through limiting the
   number of possible entry guards which an OP might attempt to use, and
   varying the level of security an OP can expect based solely upon the
   fact that the OP picked a higher number of low-bandwidth entry guards
   rather than a lower number of high-bandwidth entry guards seems like a
   rather cruel and unusual punishment in addition to the misfortune of
   already having slower entry guards.  Third, we favour simplicity in the
   redesign of the guard selection algorithm, and introducing bandwidth
   weight fraction computations seems like an excellent way to
   overcomplicate the design and implementation.

§3.2. Data Structures

UTOPIC_GUARDLIST
DYSTOPIC_GUARDLIST
    These lists consist of a subset of UTOPIC_GUARDS and DYSTOPIC_GUARDS
    respectively.  The guards in these guardlists are the only guards to
    which we will attempt connecting.

    When an OP is attempting to connect to the network, she will construct
    hashring structure containing all potential guard nodes from both
    UTOPIC_GUARDS and DYSTOPIC_GUARDS.  The nodes SHOULD BE inserted into
    the structure some number of times proportional to their consensus
    bandwidth weight. From this, the client will hash some information
    about themselves [XXX what info should we use? —isis] and, from that,
    choose #P number of points on the ring, where #P is
    {UTOPIC,DYSTOPIC}_GUARDLIST_ATTEMPTED_THRESHOLD proportion of the
    total number of unique relays inserted (if a duplicate is selected, it
    is discarded).  These selected nodes comprise the
    {UTOPIC,DYSTOPIC}_GUARDLIST for (first) entry guards.  (We say "first"
    in order to distinguish between entry guards and the vanguards
    proposed for hidden services in proposal #247.)

I don't entirely understand why we prefer a hash ring over a simple list here for sampling guards. I was imagining that when we need a new guard, we would just put all guards in a list, and sample a random guard weighted by bandwidth. I think this is what the current code is doing in smartlist_choose_node_by_bandwidth_weights() and it seems easy!

...

    [Perhaps we want some better terminology for this.  Suggestions
    welcome. —isis]

    Each GUARDLIST SHOULD have the property that the total sum of
    bandwidth weights for the nodes contained within it is roughly equal
    to each other guardlist of the same type (i.e. one UTOPIC_GUARDLIST is
    roughly equivalent in terms of bandwidth to another UTOPIC_GUARDLIST,
    but necessarily equivalent to a DYSTOPIC_GUARDLIST).

Why is it important for guard lists to have about the same total bandwidth?

Also, will this be enforced somehow, or is it a result of how the hash ring is designed?

...

    For space and time efficiency reasons, implementations of the
    GUARDLISTs SHOULD support prepopulation(), update(), insert(), and
    remove() functions.  A second data structure design consideration is

These operations sound good!

...

    that the amount of "shifting" — that is, the differential between
    constructed hashrings as nodes are inserted or removed (read: ORs
    falling in and out of the network consensus) — SHOULD be minimised in
    order to reduce the resources required for hashring update upon
    receiving a newer consensus.

Why is shifting-resistance important here? I'm not sure what you mean "reduce the resources required for hashring update". This is something that happens about once every hour right?

...

    The implementation we propose is to use a Consistent Hashring,
    modified to dynamically allocate replications in proportion to
    fraction of total bandwidth weight.  As with a normal Consistent
    Hashring, replications determine the number times the relay is
    inserted into the hashring.  The algorithm goes like this:

      router          ← ⊥
      key             ← 0
      replications    ← 0
      bw_weight_total ← 0
      while router ∈ GUARDLIST:
       | bw_weight_total ← bw_weight_total + BW(router)
      while router ∈ GUARDLIST:
       | replications ← FLOOR(CONSENSUS_WEIGHT_FRACTION(BW(router),
      bw_total) * T)
       | factor ← (S / replications)
       | while replications != 0:
       |  | key ← (TOINT(HMAC(ID)[:X] * replications * factor) mod S
       |  | INSERT(key, router)
       |  | replications <- replications - 1

    where:

      - BW is a function for extracting the value of an OR's `w bandwith=`
        weight line from the consensus,
      - GUARDLIST is either UTOPIC_GUARDLIST or DYSTOPIC_GUARDLIST,
      - CONSENSUS_WEIGHT_FRACTION is a function for computing a router's
        consensus weight in relation to the summation of consensus weights
        (bw_total),
      - T is some arbitrary number for translating a router's consensus
        weight fraction into the number of replications,
      - H is some collision-resistant hash digest,
      - S is the total possible hash space of H (e.g. for SHA-1, with
        digest sizes of 160 bits, this would be 2^160),
      - HMAC is a keyed message authentication code which utilises H,
      - ID is an hexadecimal string containing the hash of the router's
        public identity key,
      - X is some (arbitrary) number of bytes to (optionally) truncate the
        output of the HMAC to,
      - S[:X] signifies truncation of S, some array of bytes, to a
        sub-array containing X bytes, starting from the first byte and
        continuing up to and including the Xth byte, such that the
        returned sub-array is X bytes in length.
      - INSERT is an algorithm for inserting items into the hashring,
      - TOINT convert hexadecimal to decimal integers,

    For routers A and B, where B has a little bit more bandwidth than A,
    this gets you a hashring which looks like this:

                       B-´¯¯`-BA
                    A,`        `.
                    /            \
                   B|            |B
                    \            /
                     `.        ,´A
                      AB--__--´B

    When B disappears, A remains in the same positions:

                       _-´¯¯`-_A
                    A,`        `.
                    /            \
                    |            |
                    \            /
                     `.        ,´A
                      A`--__--´

    And similarly if B disappears:

Hm, maybe you mean "if A disappears".

...

                       B-´¯¯`-B
                     ,`        `.
                    /            \
                   B|            |B
                    \            /
                     `.        ,´
                       B--__--´B

    Thus, no "shifting" problems, and recalculation of the hashring when a
    new consensus arrives via the update() function is much more time
    efficient.

    Alternatively, for a faster and simpler algorithm, but non-uniform
    distribution of the keys, one could remove the "factor" and replace
    the derivation of "key" in the algorithm above with:

            key ← HMAC(ID || replications)[:X]

    A reference implementation in Python is available². [1]
§4. Footnotes

¹ "Dystopic" was chosen because those are the guards you should choose from if you're behind a FascistFirewall.

² One tiny caveat being that the ConsistentHashring class doesn't dynamically assign replication count by bandwidth weight; it gets initialised with the number of replications. However, nothing in the current implementation prevents you from doing: >>> h = ConsistentHashring('SuperSecureKey', replications=6) >>> h.insert(A) >>> h.replications = 23 >>> h.insert(B) >>> h.replications = 42 >>> h.insert(C)

§5. References

s7r

6 Nov 6 Nov

1:19 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256

Hi isis,

I am also not sure if we should have DYSTOPIC_GUARDS and UTOPIC_GUARDS sets disjoint. It hurts the already fragile load balancing for Guards and will cause lighter load on FascistFirewall Guards (ports 80/443). I think usually users behind such firewalls know their condition and act accordingly (torrc option, bridges, etc). I agree with you that we should automate this somehow for the users who don't know, and make sure they try to connect to FascistFirewall Guards (80/443) before Tor gives up.

I suggest having a single guard list, created like this:

1. GUARDS_ATTEMPTED_THRESHOLD - consensus parameter, containing the maximum number of guards we will attempt to connect to. Currently ~5% from the total number of Guards in the consensus: 80.

2. GUARD_LIST - the list of guards we will attempt to connect to. It will contain exactly GUARDS_ATTEMPTED_THRESHOLD guards.

When we build this list, we do it like this: We will choose based on weighted bandwidth instead of number of routers for better load balancing. All numbers are dynamic and calculated based on consensus. Adjust to whole numbers if the result contains decimals.

a) DYSTOPIC_GUARDLIST_FRACTION - calculate what percent of the Guard bandwidth (consensus weight) belongs to FascistFirewall Guards (ports 80/443). For a simple example, let's assume the total Guard bandwidth in the last consensus is 10 GB/s and FascistFirewall Guard bandwidth is 2,8 GB/s = 28%.

b) UTOPIC_GUARDLIST_FRACTION - trivially determine the percent of the non-FascistFirewall Guard bandwidth: 100 - DYSTOPIC_GUARDLIST_FRACTION (28) = 72%.

c) Build final GUARD_LIST of a max length of GUARDS_ATTEMPTED_THRESHOLD:

- - 25% totally random (20 routers). Tor will choose these Guards candidates randomly, without considering FascistFirewall or non-FascistFirewall Guards.

- - the rest of 75% (60 routers): -> 28% DYSTOPIC_GUARDLIST (16 routers) -> 72% UTOPIC_GUARDLIST (44 routers)

The list cannot contain duplicates.

So, we have a single guard list, and we try the guards in any order (hash ring, weighted by bandwidth).

For step 2, we also need a maximum retry amount. Something like: - - try once every 20 minutes, maximum 15 retries. - - after that, try once every 1 hour, maximum 7 retries. - - after that, try once every 6 hours, maximum 3 retries. - - try one last time after 24 hours. Remove the guard permanently from PRIMARY_GUARDS if still unavailable.

* Counters should reset after each successful connection and start from 0. If Tor was shut down and the timestamp of last retry is > 48 hours, reset counters to 0.

This will give us about 2 days worth of retries. Increase the maximum retries if you think we should insist more.

If a Guard has been offline for > 24 hours, it probably won't have the Guard flag when it comes back, so we need to make an exception here and still use it if it was our guard before. Should we get rid of it if the guard flag is not regained after reasonable uptime?

On 10/30/2015 6:12 PM, George Kadianakis wrote:

...

It's interesting that these two sets DYSTOPIC_GUARDS and UTOPIC_GUARDS are disjoint. This means that there will be no 80/443 relays on the UTOPIC guardlist. This means that 80/443 guards will only be used by people under FascistFirewall; which makes them a bit like bridges in a way, and also has significant effects on our load balancing.

Are we sure we want these two sets to be disjoint?

I could imagine an alternative design where we still have the two guard lists, but we also allow 80/443 relays to be in UTOPIC_GUARDS. If you were lucky and you had 80/443 guards in your UTOPIC guard list you don't need to go down to the DYSTOPIC guard list. Or something like this.

I don't entirely understand why we prefer a hash ring over a simple list here for sampling guards. I was imagining that when we need a new guard, we would just put all guards in a list, and sample a random guard weighted by bandwidth. I think this is what the current code is doing in smartlist_choose_node_by_bandwidth_weights() and it seems easy!

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (MingW32) iQEcBAEBCAAGBQJWPAA3AAoJEIN/pSyBJlsRHhEIAKbAYSdhNlyNzKVUEYhiulPz 4dvIfKXlsVtU6Lv1tnpGgLpc1XWti52aw8alA4Va4rXlTptnW+W8q3xp28Ux9LWp UbajhIHRbhtTKYfLdm7paPexQwSgA5PNTec8Cvs3Oja0oXGEswVCmKr5odQyjGTl Txirvw2K/WXxZkhmed8Xg8Lf+72QND5u0ozX4rbBRbeNxr3Hnmcyz1QBBrNKbAzZ z+68CDqVQ8On2bxvX3G4a6gL8GJmzIgxA+QKbG5EBf0l2gcTxNoIhXG62GaBleEl BvmhB8KRYfbldjuXzdrDu8yL14WTgT+mdOCm6+H6H55KY5uA4gGMuqZenGhp9MM= =v5n8 -----END PGP SIGNATURE-----

3289

Age (days ago)

3297

Last active (days ago)

tor-dev@lists.torproject.org

3 comments

4 participants

tags (0)

participants (4)

George Kadianakis
isis
Nick Mathewson
s7r