Replying to myself:
s7r wrote: [SNIP]
Metrics port says:
tor_relay_load_tcp_exhaustion_total 0
tor_relay_load_onionskins_total{type="tap",action="processed"} 52073 tor_relay_load_onionskins_total{type="tap",action="dropped"} 0 tor_relay_load_onionskins_total{type="fast",action="processed"} 0 tor_relay_load_onionskins_total{type="fast",action="dropped"} 0 tor_relay_load_onionskins_total{type="ntor",action="processed"} 8069522 tor_relay_load_onionskins_total{type="ntor",action="dropped"} 273275
So if we account the dropped ntor circuits with the processed ntor circuits we end up with a reasonable % (it's >8 million vs <300k).
So the question here is: does the computed consensus weight of a relay change if that relay keeps sending reports to directory authorities that it is being overloaded? If yes, could this be triggered by an attacker, in order to arbitrary decrease a relay's consensus weight even when it's not really overloaded (to maybe increase the consensus weights of other malicious relays that we don't know about)?
Also, as a side note, I think that if the dropped/processed ratio is not over 15% or 20% a relay should not consider itself overloaded. Would this be a good idea?
Sending to tor-relays@ for now, if some of you think of this in any way we can open a thread about it on tor-dev@ - please let me know if I should do this.
I am now positive that this particular relay is actively being probed, overloaded for just few minutes every 2-3-4 days, rest of the time performing just fine with under 70% usage for CPU and under 50% for RAM, SSD and bandwidth.
I also confirm that after this time's overload report, my consensus weight and advertised bandwidth decreased. So my concerns about this being triggered arbitrary has a network-wide effect in terms of path selection probability and might suite someone a purpose of any sort.
I don't know what is the gain here and who is triggering this, as well as if other Guard relays are experiencing the same (maybe we can analyze onionoo datasets and find out) but until then I am switching to OverloadStatistics 0.
Here are today's Metrics Port results:
tor_relay_load_tcp_exhaustion_total 0
tor_relay_load_onionskins_total{type="tap",action="processed"} 62857 tor_relay_load_onionskins_total{type="tap",action="dropped"} 0 tor_relay_load_onionskins_total{type="fast",action="processed"} 0 tor_relay_load_onionskins_total{type="fast",action="dropped"} 0 tor_relay_load_onionskins_total{type="ntor",action="processed"} 10923543 tor_relay_load_onionskins_total{type="ntor",action="dropped"} 819524
As you can see, like in the first message of this thread, the calculated percent of dropped/processed ntor cells is not a concern (over 10 million processed, under 900 000 dropped).
Other relevant log messages that sustain my doubts: This appeared when it was being hammered intentionally. As you can see the overload only took 7 minutes. At previous overload it took 5 minutes and previous previous overload 6 minutes.
I think the attacker saves resources as it gains same result overloading it 5 minutes versus overloading it 24x7.
Jan 03 07:14:42.000 [warn] Your computer is too slow to handle this many circuit creation requests! Please consider using the MaxAdvertisedBandwidth config option or choosing a more restricted exit policy. [2004 similar message(s) suppressed in last 213900 seconds] Jan 03 07:15:42.000 [warn] Your computer is too slow to handle this many circuit creation requests! Please consider using the MaxAdvertisedBandwidth config option or choosing a more restricted exit policy. [52050 similar message(s) suppressed in last 60 seconds] Jan 03 07:16:42.000 [warn] Your computer is too slow to handle this many circuit creation requests! Please consider using the MaxAdvertisedBandwidth config option or choosing a more restricted exit policy. [92831 similar message(s) suppressed in last 60 seconds] Jan 03 07:17:42.000 [warn] Your computer is too slow to handle this many circuit creation requests! Please consider using the MaxAdvertisedBandwidth config option or choosing a more restricted exit policy. [89226 similar message(s) suppressed in last 60 seconds] Jan 03 07:18:42.000 [warn] Your computer is too slow to handle this many circuit creation requests! Please consider using the MaxAdvertisedBandwidth config option or choosing a more restricted exit policy. [74832 similar message(s) suppressed in last 60 seconds] Jan 03 07:19:42.000 [warn] Your computer is too slow to handle this many circuit creation requests! Please consider using the MaxAdvertisedBandwidth config option or choosing a more restricted exit policy. [79933 similar message(s) suppressed in last 60 seconds] Jan 03 07:20:42.000 [warn] Your computer is too slow to handle this many circuit creation requests! Please consider using the MaxAdvertisedBandwidth config option or choosing a more restricted exit policy. [68678 similar message(s) suppressed in last 60 seconds] Jan 03 07:21:42.000 [warn] Your computer is too slow to handle this many circuit creation requests! Please consider using the MaxAdvertisedBandwidth config option or choosing a more restricted exit policy. [76461 similar message(s) suppressed in last 60 seconds]
Other stats from log file:
14154 circuits open ; I've received 358682 connections on IPv4 and 14198 on IPv6. I've made 185294 connections with IPv4 and 51900 with IPv6.
[notice] Heartbeat: DoS mitigation since startup: 1 circuits killed with too many cells, 27881 circuits rejected, 2 marked addresses, 0 same address concurrent connections rejected, 0 connections rejected, 0 single hop clients refused, 0 INTRODUCE2 rejected.
[notice] Since our last heartbeat, 2878 circuits were closed because of unrecognized cells while we were the last hop. On average, each one was alive for 653.767547 seconds, and had 1.000000 unrecognized cells.
This particular last message I am only seeing it recently, but I see it quite heavily (at every heartbeat); any others of you see it?
My sense which never disappointed me is telling me there's something here worth looking into. I want to analyze onionoo datasets to see if the Guard % overload reports increased in the last month and to open an issue on gitlab to patch Tor to only report Overload in case the dropped/processed ratio is over 20%.