-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
I've just seen the most amazing headshot of my Tor relay by a sudden massive SYN flood yet. I was online and started noticing problems with DNS on my local router. I checked my so-called monitoring setup, a window with a permanent ping to my router, and noticed a lot of timeouts. Obviously, that means trouble.
Checked my Raspberry Pi tor relay setup, and there was an incredible SYN flood just starting. I have attached an image where the vertical scale reaches up to 5 megabits per second and where is column is two seconds. This is absolutely not established tor connection behavior. I don't know what *all* of it is, since once the Tor daemon dies, the SYN traffic seems to be steady at about 50KB/sec (of *just* SYNs inbound, and 100+KB/sec of outbound ICMP port unreachable packets). But that huge tsunami marks when the flood / circuit creation storm really got going.
My relay crashed faster than I've ever seen it crash before, even with my newer protections in place. In under 5 minutes the out of memory killer reaped Tor. In previous situations, I've observed during floods that Tor's share of physical memory doesn't seem to increase. I could be wrong about that, but I think the thing eating all the RAM is TCP open/half-open sockets and/or associated tables in the Linux kernel - once RAM pressure becomes too intense, Tor is just the biggest thing around, so the oom-killer picks it and bam.
The truly amazing and disturbing thing is that it's an hour and a half later now, and my router is still under extreme load from the incoming SYN packets. It hasn't yet crashed.
In the meantime I added an iptables rule right under the "ESTABLISHED" rule suggested by David Serrano:
Chain INPUT (policy ACCEPT) target prot opt source destination
ACCEPT all -- anywhere anywhere state ESTABLISHED
DROP tcp -- anywhere anywhere tcpflags: FIN,SYN,RST,ACK/SYN #conn src/0 > 75
SYN_THROTTLE tcp -- anywhere anywhere multiport dports 31923,31924 state NEW
(those weird params are from a connlimit suggestion I found for limiting the total number of TCP connections which may be handled over a chain.) I started off at 50, and am now up to 100. This is obviously a stopgap solution for an ongoing event, but it suggests some further ways that slower single-board computers can be made to weather such storms, possibly without (see earlier discussion on this thread) using fail2ban at all, which is very inefficient.
What's quite alarming is that when I raise the limit a bit, to get the restarted Tor relay better connected, the SYN flood logs go crazy for a minute or two before instantaneously stopping when, I presume, the connection limit has been reached. Since the dropped packets above the global inbound connection limit are not logged, the sudden start/stop of the SYN flood logging (in the SYN_THROTTLE chain, they're logged) tells me I am still under intense SYN flood.
After adding connection limits on the Tor box, my router recovered and is seeing ping times, ballpark 2x normal (0.8-1.2ms is normal, now it's more like 1.0-2.0,s), but things are working handily. I have also been able to connect to other services through the Tor relay again, with considerable difficulty.
I notice that Tor is consuming all available CPU, even though it is, for the moment, relaying a pretty consistent 50-80KB/sec. I suspect that this is mostly circuit creation requests coming in over established Tor connections, which Tor is ... processing, I don't know how, but unless there's been some turnover and they get lucky, once another peer attempts a TCP/TLS handshake, its packets are likely to be dropped. This is probably not ideal.
As long as the Raspberry Pi manages to stay up and keep logging I'll have all the data to go through later. (I already captured a lot.) I also have the logs from the other incident that I caught and watched in real time. I'm planning to do an analysis on the number of IPs involved, whether they are Tor relays are not, and other interesting things that can be gleaned from the logs. I promise some graphs and charts, punch and pie not so much. Unfortunately my life is quite busy right now so the report may take a while although it's kind of a high priority thing for me at the moment. This is pretty crazy. I can't verify it, but my suspicion is this is happening when I get my Stable flag (I have no idea if I'd gotten it back this morning or not) or shortly thereafter.
Node is VastCatbox, flood started around 8am Pacific.
Best, - -Gordon M.