==Guardiness: Yet another external dirauth script==
====Introduction====
One well-known problem with Tor relays, is that Guards will suffer a big loss of traffic as soon as they get the Guard flag. This happens because clients pick guards every 2-3 months, so young guards will not get picked by old clients and mainly attract new clients. This is documented in 'phase three' of Roger's blog post: https://blog.torproject.org/blog/lifecycle-of-a-new-relay
The problem gets even worse if we extend the guard lifetime to 8-9 months.
The plan to solve this problem is to make client load balancing a bit smarter by priotizing guards that suffer this traffic loss as middle relays.
The reason I'm sending this email is because this feature is by far the trickiest part of prop236 (guard node security) and I wanted to inform all dirauths of our plan and ask for feedback on the deployment procedure.
====How guardiness works====
Authorities calculate for each relay how many consensuses it has been a guard for the past 2-3 months, and then they note that fraction down in the consensus.
Then clients parse the consensus and if they see a guard that has been a guard for 55% of the past consensuses, they will consider that relay as 55% guard and 45% non-guard (that's 100% - 55%).
You can find more information at: https://gitweb.torproject.org/torspec.git/blob/HEAD:/proposals/236-single-gu...
The idea was that the guardiness script will be an external script that is run by Tor in a similar fashion to the bandwidth auth scripts. We chose that because we could write the script in a high-level language and because it could be modular and we could change the algorithm in the future if we wanted. Unfortunately, it seems that external scripts in dirauths are a PITA to maintain as can be seen by the lack of bwauth operators.
====How the guardiness script works====
The guardiness script, is supposed to parse 2-3 months worth of consensuses (but should also be to do the same for 9 months worth of consensuses) and calculate the guard fraction of each guard, save it to a file, and have the dirauth read it to update its routerstatuses.
One problem I encountered from early on, is that stem takes about 30mins to parse 3 months of consesuses (~2000 consensuses). Since this script should ideally be run every hour before each authority votes, such long parsing time is unacceptable.
I mentioned this problem at https://trac.torproject.org/projects/tor/ticket/9321#comment:19 and stated a few possible solutions.
I received some feedback from Nick, and the solution I decided to take in the end is to have another script that is called first and summarizes consensuses to summary files. Summary files are then saved to disk, and parsed by the guardiness script to produce an output file that is read by dirauths.
Summary files are designed to be quick to parse (even with Python) and contain all the necessary information for guardiness. For example, parsing 2000 summary files in my laptop takes about 10 seconds.
FWIW, the guardiness scripts are ready for review and can be found here: https://gitweb.torproject.org/user/asn/hax.git/shortlog/refs/heads/guardines...
====How the guardiness script will be deployed====
The idea is that dirauths will add another script to their crontab that is called every hour (before or after the bwauth scripts).
The script first calls the summarizer script, which goes to the consensus/ directory and summarizes all consensuses it finds and puts them in the summary/ directory. The summarizer script then deletes all the consensuses that got summarized.
Then the script calls the the guardiness script, which goes to the summary/ directory, parses all summary files it finds, and outputs a guardiness output file that gets parsed by the dirauth prior to voting.
That should be all. Easy, eh? :)
Now I will start a FAQ section where I state my doubts and fears.
====FAQ====
- Q: Where do dirauths find all those old consensuses?
There are various ways for dirauths to populate their consensus/ directory. They could fetch consensuses from metrics, or they could add a cron job that copies cached-consensus to a directory every hour.
However, I think the cleanest solution is to use Daniel MartÃ's upcoming consensus diff changes. Daniel will add a torrc option that allows Tor to save consensuses to a directory. My idea was to get dirauths to use Daniel's code to populate their consensus/ directory for two or three months. And then, after two or three months enable the guardiness scripts.
To make sure that this is indeed the best approach, I need to learn from Nick when he plans to merge Daniel's code to Tor.
- Q: How does guardiness look like in the consensus?
Here is how a guard with guardiness (GuardFraction) 10% looks like in the consensus:
r test006r HyS1DRHzEojbQVPZ1B3zAHc/HY0 9St4yWfV4huz5V86mt24HL3Yi2I 2014-09-06 13:44:28 127.0.0.1 5006 7006 s Exit Fast Guard HSDir Running Stable V2Dir Valid v Tor 0.2.6.0-alpha-dev w Bandwidth=111 Unmeasured=1 GuardFraction=10
- Q: What are you afraid of?
I'm mainly afraid of misconfiguration problems. This guardiness system is a bit complex and I'm not expecting dirauths to learn how to use it and debug it, so it should work easily and well...
Here are some specific issues:
-- File management
For example, I'm afraid of the file management mess that summary files cause. We need to make sure that we don't leave old consensuses/summary files rot in the filesystem. Or that we don't summarize the same consensuses over and over again. To do that, I added some optional cleanup switches to both scripts:
Specifically, the summarizer script can delete consensus files that already got summarized and can also delete consensus files older than 3 months (or N months). Similarly, the guardiness.py script can delete summary files older than 3 months (or N months).
The idea is that every time the cron job triggers, both scripts will auto-delete the oldest summary/consensus file, keeping in disk only the useful files.
-- Incomplete consensus data set
I'm afraid that a directory authority might not have a properly populated consensus directory and hence advertise wrong guard fractions. For example, maybe it only has 10 consensuses in its consensus directory instead of 1900 consensuses. Since the authorities only state the guardiness percentage in the consensus, it's not possible to learn how many consensuses were in their dataset. Maybe we need to add a "guardiness-consensus-parsed" in their votes, to easier debug such issues?
Also, 3 months worth of consensuses is 2160 consensuses. Because dirauths sometimes misbehave, it's certain that not all 2160 consensuses will have been issued and that's normal. But how do we understand if dirauths have a sufficiently good consensus data set? Is 2000 out of 2160 consensuses an OK data set? What about 1000 out of 2160 consensuses?
Furthermore, we need to make sure that dirauths don't consider old consensuses in their GuardFraction calculations. To achieve this, both scripts have a mandatory switch that allows operators to specify the maximum consensus age that is acceptable. So for example, if you call the summarizer script with 3 months of consensus age, it will not parse consensuses older than 3 months. Furthermore, there is a CLI switch that allows the scripts to delete expired consensuses.
- Q: Why do you slow stem instead of parsing consensuses with Python on your own?
This is another part where I might have taken the wrong design decision, but I decided to not get into the consensus parsing business and just rely on stem.
This is also because I was hoping to use stem to verify consensus signatures. However, now that we might use Daniel's patch to populate our consensus database, maybe we don't need to treat consensuses as untrusted anymore.
If you think that I should try to parse the consensuses on my own, please tell me and I will give it a try. Maybe it will be fast. Definitely not as fast as summary files, but maybe we can parse 3 months worth of consesuses in 15 to 40 seconds.
- Q: Why do you mess with multiple summary files and you don't just have *one* summary file?
Because of the rolling nature of guardiness (we always want to consider the past 3 months), every hour we need to _discard_ the oldest observations (the consensus from 3 months ago) and start considering the newest consensus.
Because we need to discard that oldest consensus, it's hard to keep information about each consensus in a single summary file. And that's why I chose to have a summary file for each consensus. Maybe it's the wrong decision though...
- Q: What's up with the name "guardiness"?
It's quite terrible I know but it was the name that we used from quite early on about this project.
I think before finalizing this task I'm going to rename everything to 'GuardFraction' since it's more self-explanatory. I'm also considering names like "GuardTrafficLoadBalancer" etc.