On Thu, Jul 27, 2017 at 08:48:35PM +0300, Vort wrote:
This sort of thing has been going on for many years. I used to refer
to it as "mobbing". As nearly as I was ever able to determine, the behavior is an unintended consequence of hidden services.
Same thing started to happen today and I have noticed that 100% CPU usage spikes happens every hour and lasts for several minutes. During this spikes, all cores of CPU are used and stack trace points somewhere at worker_thread_main() function. Also today relay have more connections than usually (5500 vs 2000-3000). Is this pattern matches the characteristics of hidden services work?
Yes. Your relay is getting way more circuit create attempts than it usually does, and all your signs point to the "HSDir hotspot" phenomenon.
Each onion service has six relays each day that serve as the place for fetching its onion descriptor, and some onion services are super popular (for example, back in the day, the August 2013 botnet used an onion service for coordinating its millions of bots).
Tor isn't great at parallelizing all of its crypto, but it is good at parallelizing the circuit handshakes, which would be why all of your cores are being used.
And clients building paths to your relay will use a variety of middle hops, meaning more relays than usual will have made a direct connection to your relay lately.
If there's one super popular onion service, then 6/3719 = ~0.16% of the relays will feel the pain on any given day. More than one super popular onion service would make that number go up.
One fix would be to raise the bar for getting the HSDir flag, so relays with the flag are better able to handle the phenomenon. But shrinking the total number of HSDir relays can make other attacks easier. Another fix would be to spread the load for each onion service, or maybe for popular onion services, over more than six relays. But that would open up more surface area for attacks to e.g. measure popularity of an onion service. I think there is not an easy fix.
--Roger