On 11. Jan 2021, at 23:20, James jbrown299@yandex.com wrote:
Good day.
Is there any chance that torpy (https://github.com/torpyorg/torpy) was triggered this issue https://gitlab.torproject.org/tpo/core/tor/-/issues/33018 ?
Some wary facts:
- Torpy using old fashion consensus (not mircodesc)
- When consensus not present in cache (first time usage) it downloads consensus from random directory authorities only.
- Before August 2020 it was using plain HTTP requests to DirAuths. Now it creates "CREATE_FAST" circuits to DirAuths (is that right way by the way?)
From other side:
- Torpy store consensus on disk (so whenever client restart it must not download full consensus again)
- It will try download consensus after time which sets by valid_time field from consensus which more than 1 hour (so it's not so often)
- Torpy try get consensus by "diff" feature (so it's minimize traffic)
Still may be some of this features not working well in some conditions. Which could cause a lot of consensus downloads in Jan 2020... Or may be you know more info about this situation?
Hi there,
thanks for the message. I think it is very likely that torpy is responsible for a at least a part of the increased load we're seeing on dirauths. I have taken a (very!) quick look at the source, and it appears that there are some problems. Please excuse any inaccuracies, I am not that strong in Python nor have I done too much Tor development recently:
First, I found this string in the code: "Hardcoded into each Tor client is the information about 10 beefy Tor nodes run by trusted volunteers". The word beefy is definitely wrong here. The nodes are not particularly powerful, which is why we have the fallback dir design for bootstrapping.
The code counts Serge as a directory authority which signs the consensus, and checks that over half of the dirauths signed it. But Serge is only the bridge authority and never signs the consensus, so torpy will reject some consensuses that are indeed valid. Once this happens, torpy goes into a deathly loop of "consensus invalid, trying again". There are no timeouts, backoffs, or failures noted.
The code frequently throws exceptions, but when an exception occurs it just continues doing what it was doing before. It has absolutely no regards to constrain its resources when using the Tor network.
The logic that if a network_status document was already downloaded that is used rather than trying to download a new one does not work. I have a network_status document, but the dirauths are contacted anyway. Perhaps descriptors are not cached to disk and downloaded on every new start of the application?
New consensuses never seem to be downloaded from guards, only from dirauths.
If my analsis above is at least mostly correct, if only some few people are running a scraper using torpy and call the binary in a loop, they will quickly overload the dirauths, causing exactly the trouble we're seeing. The effects compound, because torpy is relentless in trying again. Especially a scraper that might call torpy in a loop would just think that a single file failed to download and go to the next, once again creating load on all the dirauths.
There are probably more things suboptimal that I missed here. Generally, I think torpy needs to implement the following quickly if it wants to stop hurting the network. This is in order of priority, but I think _ALL_ (maybe more) are needed before torpy stops being an abuser of the network:
- Stop automatically retrying on failure, without backoff - Cache failures to disk to ensure a newly started torpy_cli does not request the same resources again that the previous instance failed to get. - Fix consensus validation logic to work the same way as tor cli (maybe as easy as removing Serge) - use microdescs/consensus, cache descriptors
I wonder if we can actively defend against network abuse like this in a sensible way. Perhaps you have some ideas, too? I think torpy has the ability to also quickly overwhelm fallback dirs in its current implementation, so simply switching to them from dirauths is not a solution here. Defenses are probably necessary to implement even if torpy can be fixed very quickly, because the older versions of torpy are out there and I assume will continue to be used. Hopefully that point is wrong?
Thanks Sebastian