Hi James,
thanks for already working on patches for these issues! I will reply inline some more.
On 15. Jan 2021, at 23:56, James jbrown299@yandex.com wrote:
First of all, sorry if torpy hurt in some way Tor Network. It was unintentionally.
I believe you :)
In any case, it seems to me that if there was some high-level description of logic for official tor client, it would be very useful.
Indeed. The more people work on alternative clients etc, the more we can learn here. Perhaps you can help point out places where documentation could help or something was not easy to understand.
First, I found this string in the code: "Hardcoded into each Tor client is the information about 10 beefy Tor nodes run by trusted volunteers". The word beefy is definitely wrong here. The nodes are not particularly powerful, which is why we have the fallback dir design for bootstrapping.
At first glance, it seemed that the AuthDirs were the most trusted and reliable place for obtaining consensus. Now I'm understand more.
The consensus is signed, so all the places to get it from are equally trusted. That's the beauty of the consensus system :) The dirauths are just trusted to create it, it doesn't matter who spreads it.
Once this happens, torpy goes into a deathly loop of "consensus invalid, trying again". There are no timeouts, backoffs, or failures noted.
Not really, because torpy has only 3 retries for getting consensus. But probably you are right because user code probably can do retry calling torpy in a loop. So that will always try download network_status... If you have some sort of statistic about increasing traffic we can compare that with time when was consensus signed by 4 signers which enough for tor but not enough for torpy.
Interesting, I ran torpy and on the console it seemed to try more often. Perhaps it made some progress and then failed on a different thing, which it then tried again.
To your second point, something like this can probably be done using https://metrics.torproject.org. But I am not doing the analysis here at the moment for personal reasons, sorry. Maybe someone else wants to look at it.
The code frequently throws exceptions, but when an exception occurs it just continues doing what it was doing before. It has absolutely no regards to constrain its resources when using the Tor network.
What kind of constraints can you advise?
I think instead of throwing an exception and continuing, you should give clear error messages and consider whether you need to stop execution. For example, if you downloaded a consensus and it is invalid, you're likely not going to get a valid one by trying again immediately. Instead, it would be better to declare who gave you the invalid one and log a sensible error.
In addition, properly using already downloaded directory information would be a much more considerate use of resources.
The logic that if a network_status document was already downloaded that is used rather than trying to download a new one does not work.
It works. But probably not in optimal way. It caches network_status only.
I may have confused it with asking for the diff. But that should not be necessary at all if you already have the latest one, so don't ask for a diff in this case.
I have a network_status document, but the dirauths are contacted anyway. Perhaps descriptors are not cached to disk and downloaded on every new start of the application?
Exactly. Descriptors and network_status diff every hour was asking always from AuthDirs.
Please cache descriptors.
New consensuses never seem to be downloaded from guards, only from dirauths.
Thanks for pointing out. I looked more deeply into tor client sources. So basically if we have network_status we can use guard nodes to ask network_status and descriptors from them. Otherwise using fallback dirs to download network_status. I've implemented such logic in last commit.
Cool!
- Stop automatically retrying on failure, without backoff
I've added delays and backoff between retries.
- Cache failures to disk to ensure a newly started torpy_cli does not
request the same resources again that the previous instance failed to get.
That will be on the list. But probably even if there is a loop level above and without this feature but with backoff it will be delays like: 3 sec, 5, 7, 9; 3, 5, 7, 9. Seems ok?
Well, the problem is if I run torpy_cli in parallel 100 times, we will still send many requests per second. From dirauth access patterns, we can see that some people indeed have such access patterns. So I think the backoff is a great start (tor client uses exponential backoff I think) but it definitely is not enough. If you couldn't get something this hour and you tried a few times, you need to stop trying again for this hour.
Defenses are probably necessary to implement even if torpy can be fixed very quickly, because the older versions of torpy >are out there and I assume will continue to be used. Hopefully that point is wrong?
I believe that old versions doesn't work any more because them could not connect to auth dirs. Users getting 503 many times, so they will update client. I hope.
Would be nice. We'll see!
Thanks Sebastian