During our meeting in Iceland, we talked a lot about guard nodes. Some of that discussion eventually turned into proposal 236 [0].
During our discussions, we looked into the state file of Roger, and we noticed that there are 50 or so guard nodes in there. And that made us wonder: "Why does Roger have so many guards?".
Roger is not the problem in this case; my state file also has many guards. Most people who don't use bridges or hardcoded EntryNodes have shitloads of guards. This post tries to explain why.
So, Tor, in its memory, has an ordered list of entry guards (the global `entry_guards` smartlist in `src/or/entrynodes.c`). This list can be lengthy: it usually contains more than $NumEntryGuards entry guards. You can see this beautiful list just on your right below that beautiful stalagmite: https://gitweb.torproject.org/tor.git/blob/d064773595f1d0bf1b76dd6f7439bff65...
This happens because in its first startup, Tor adds $NumEntryGuards nodes to that list. However if one of them is not Stable and Tor needs to build a Stable circuit, Tor will need to append a Stable guard to the list. Similarly, if one of the guards is down, Tor will need to compensate for that and append [1] one more guard to the list. The same happens if Tor needs to fetch directory documents, but its guards are not directory mirrors.
So, if Tor walks to the end of the guard node list and it still hasn't found enough guard nodes with the needed property to make a pick, it picks a random entry guard from the consensus and adds it to the list. It's amazing and yet real, look straight ahead (and don't look directly into the light): https://gitweb.torproject.org/tor.git/blob/d064773595f1d0bf1b76dd6f7439bff65...
But this still does not explain why Roger has so many guards. Usually a list of 5 or 6 nice guards is sufficient to satisfy the needs of any circuit (alive, stable, fast, directory mirror).
The reason for Roger's surplus of guards, is the following very interesting functionality of Tor: Consider the following scenario, you start Tor while your network is down, Tor starts picking nodes from your list and attempts to connect to them. All connections fail, since your network is down. So now, Tor needs to add a new guard node to the list. There are two cases now:
If Tor fails to connect to this new guard node (your network is still down), Tor removes the new guard node from the entry guard list (that's good; otherwise the list would be full of nodes added while the network is down). Look on your left, you can see this beautiful phenomenon happening here: https://gitweb.torproject.org/tor.git/blob/d064773595f1d0bf1b76dd6f7439bff65...
However, let's say that your network is back up, and Tor manages to connect to this new guard node! That's great! But should Tor keep the connection to this guard? The answer is probably that it shouldn't: Tor should recognize this problem and attempt to reconnect to the primary guards on the top of the list.
And that's exactly what Tor does. Nature is truly amazing! Just relax and witness this behavior happening right in front of your eyes: https://gitweb.torproject.org/tor.git/blob/d064773595f1d0bf1b76dd6f7439bff65...
So, when Tor manages to connect to this newly added entry guard, it assumes that the network is back, and walks through the list of entry guards and marks them all as "needs to be retried". It also marks the connection to the new entry guard as rotten and kills it. This to me is very interesting, because it ensures that the primary guards (the ones at the top of the list) are going to be tried again after the network is back up; otherwise we would leak connections to new guards all the time!
And all that fluff is related to this post, because this new guard (that made us realise that the network is back up) actually stays in our guard list. So, basically every time the network goes down and Tor does this little dance, a new entry guard is appended to our list and our statefile. And that's why Roger has so many guards! Or at least, that's why *I* have so many guards [2].
Apart from this being wonderful on its own, there are two interesting points here:
a) There is always a bug:
As this thing happens more times, our guard list gets bigger and the time to walk it increases.
Dig this race condition:
Tor starts up with the network being down, so the connections to our primary guards fail, but the network comes back while we are walking our entry guard list and trying to connect to the rest of our guards. If we manage to connect to one of the guards in our list (the lucky guard), the code at https://gitweb.torproject.org/tor.git/blob/d064773595f1d0bf1b76dd6f7439bff65... doesn't get triggered because `first_contact` is not true (that node was already in the guard node list). So, we stick with that lucky guard even though it's not our primary guard, and since the network is back up, a connection to our primary guards would work too.
What stinks here is that all the guards above that lucky guard are marked as unreachable, so next time Tor starts up, it will ignore them and jump directly to the lucky guard.
This probably needs to be fixed somehow. I opened trac ticket #12450 for this issue [3].
b) While writing proposal 236 we were thinking about how new guard nodes should be picked. Should we pick new guard nodes at the point they are needed? Or should we pick a surplus of guard nodes in the beginning, and then when the primary ones expire, we use the extra ones? You can read more about this behavior here: https://gitweb.torproject.org/torspec.git/blob/2ecd06fcfd883e8c760f0694f3591...
The insight here is that apparently we are already doing the latter approach, because all these guard nodes that get added when our network goes back up will remain in our guard list. And when our primary guards expire, the ones on the bottom will rise on the top (till they expire themselves).
So if you are wondering "when does Tor add new entry guards?", the answer is "when you move your laptop to a new location; just before you connect to the wifi" ;)
Greetings from the core, have a good day!
[0]: https://gitweb.torproject.org/torspec.git/blob/HEAD:/proposals/236-single-gu...
[1]: Note that the word "append" is vital here. The extra guards are appended to the end of the list, and when Tor wants to pick a guard node it walks the list from the top. So, these newly added guards have lower priority so to say (most of them will not even be considered if the ones above are sufficient for building a circuit).
[2]: Here is a grep of my logs. Look at how the guard counter increments by one everytime we hit https://gitweb.torproject.org/tor.git/blob/d064773595f1d0bf1b76dd6f7439bff65...
$ zgrep "Marking earlier" /var/log/tor/notices.log.3.gz [warn] Connected to new entry guard XXX. Marking earlier entry guards up. 0/2 entry guards usable/new. [warn] Connected to new entry guard XXX. Marking earlier entry guards up. 0/3 entry guards usable/new. [warn] Connected to new entry guard XXX. Marking earlier entry guards up. 3/4 entry guards usable/new. [warn] Connected to new entry guard XXX. Marking earlier entry guards up. 4/5 entry guards usable/new. [warn] Connected to new entry guard XXX. Marking earlier entry guards up. 5/6 entry guards usable/new. [warn] Connected to new entry guard XXX. Marking earlier entry guards up. 8/9 entry guards usable/new. [warn] Connected to new entry guard XXX. Marking earlier entry guards up. 6/8 entry guards usable/new. [warn] Connected to new entry guard XXX. Marking earlier entry guards up. 7/9 entry guards usable/new. [warn] Connected to new entry guard XXX. Marking earlier entry guards up. 8/10 entry guards usable/new. [warn] Connected to new entry guard XXX. Marking earlier entry guards up. 9/11 entry guards usable/new. [warn] Connected to new entry guard XXX. Marking earlier entry guards up. 10/12 entry guards usable/new. [warn] Connected to new entry guard XXX. Marking earlier entry guards up. 11/13 entry guards usable/new. [warn] Connected to new entry guard XXX. Marking earlier entry guards up. 12/14 entry guards usable/new. [warn] Connected to new entry guard XXX. Marking earlier entry guards up. 13/15 entry guards usable/new.
[3]: https://trac.torproject.org/projects/tor/ticket/12450#ticket