On Sat, Jan 16, 2021 at 01:56:02AM +0300, James wrote:
In any case, it seems to me that if there was some high-level description of logic for official tor client, it would be very useful.
Hi James! Thanks for starting this discussion.
While I was looking at moria1's directory activity during the overload, I did say to myself "wow that's a lot of microdescriptor downloads".
So hearing that torpy isn't caching mirodescriptors yet makes me think that it's a good bet for explaining our overload last weekend.
I agree that we should have clearer docs for "how to be nice to the Tor network." We actually have an open ticket for that goal but nobody has worked on it in a while: https://gitlab.torproject.org/tpo/core/tor/-/issues/7106
Quoting from that ticket:
"""Second, it's easy to make client-side decisions that harm the Tor network. For examples, you can hold your TLS connections open too long, or do too many TLS connections, or make circuits too often, or ask the directory authorities for everything. We need to write up a spec to clarify how well-behaving Tor clients should do things. Maybe that means we write up some principles along the way, or maybe we just identify every design point that matters and say what to do for each of them."""
And in fact, since Nick has been working a lot on Arti lately: https://gitlab.torproject.org/tpo/core/arti/ it might be a perfect time for him to help document the current Tor behavior and the current Arti behavior, and we can think about where there is room for improvement.
If you have some sort of statistic about increasing traffic we can compare that
Here's the most interesting graph so far: https://metrics.torproject.org/dirbytes.html
So from that graph, the number of bytes handled by the directory authorities doesn't go up a lot, because they were already rate limited (instead, they just failed more often).
But the number of bytes handled by directory mirrors (including fallbackdirs) shot up a huge amount. For context, if we imagine that the normal Tor network handles between 2M and 8M daily users, then that added dir mirror load would imply an extra 4M to 16M daily users if they follow Tor's directory update habits. I'm guessing that the torpy users weren't following Tor's directory update habits, and so a much smaller set of users accounted for a much larger fraction of the load.
The logic that if a network_status document was already downloaded that is used rather than trying to download a new one does not work.
It works. But probably not in optimal way. It caches network_status only.
Here's my first start at three principles we should all follow when writing Tor clients:
(1) Reduce redundant interactions. For examples:
- Cache as much as possible of the directory information you fetch (consensus documents, microdescriptors, certs)
- If a directory fetch failed, don't just relaunch a duplicate request right after (because it will probably fail too).
- If your setup involves running multiple Tors locally, consider using a shared directory cache, so only one of them needs to fetch new directory info and then all of them can use it.
(2) Reduce impact of interactions. For examples:
- Always use the "If-Modified-Since" header on consensus updates, so they don't send you a consensus that you already have.
- Try to use the consensus diff system, so if you have an existing consensus you aren't fetching an entire new consensus.
- Ask for compression, to save overall bandwidth in the network.
- Move load off of directory authorities, and then off of fallback directories, as soon as possible. That is, if you have a list of fallbackdirs, ask them instead of directory authorities. And once you have a consensus and you've chosen your directory guards, ask them instead of the fallbackdirs.
(3) Plan ahead for what your current code will do in a few years when the world is different.
- To start here, check out the "slow zombies and fast zombies" discussion in Proposal 266: https://gitweb.torproject.org/torspec.git/tree/proposals/266-removing-curren...
- Specifically, think about how your code handles failures, and design your interactions with the Tor network so that if many people are running your code in the future, and it's failing for example because it is asking directory questions in an old format or because the directory servers have started rate limiting differently, it will back off rather than become more aggressive.
- When possible, look for ways to recognize when your code is asking old questions, so it can warn the user and stop interacting with the network.
...What else should be on the list?
Thanks! --Roger