Re: [tor-relays] max TCP interruption before Tor circuit teardown?

31 Oct 2013


      -----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
I've just seen the most amazing headshot of my Tor relay by a sudden
massive SYN flood yet. I was online and started noticing problems with
DNS on my local router. I checked my so-called monitoring setup, a
window with a permanent ping to my router, and noticed a lot of
timeouts. Obviously, that means trouble.
Checked my Raspberry Pi tor relay setup, and there was an incredible SYN
flood just starting. I have attached an image where the vertical scale
reaches up to 5 megabits per second and where is column is two seconds.
This is absolutely not established tor connection behavior. I don't
know what *all* of it is, since once the Tor daemon dies, the SYN
traffic seems to be steady at about 50KB/sec (of *just* SYNs inbound,
and 100+KB/sec of outbound ICMP port unreachable packets).  But that
huge tsunami marks when the flood / circuit creation storm really got
going.
My relay crashed faster than I've ever seen it crash before, even with
my newer protections in place. In under 5 minutes the out of memory
killer reaped Tor.  In previous situations, I've observed during
floods that Tor's share of physical memory doesn't seem to increase.
I could be wrong about that, but I think the thing eating all the RAM
is TCP open/half-open sockets and/or associated tables in the Linux
kernel - once RAM pressure becomes too intense, Tor is just the
biggest thing around, so the oom-killer picks it and bam.
The truly amazing and disturbing thing is that it's an hour and a half
later now, and my router is still under extreme load from the incoming
SYN packets. It hasn't yet crashed.
In the meantime I added an iptables rule right under the "ESTABLISHED"
rule suggested by David Serrano:
Chain INPUT (policy ACCEPT)
target     prot opt source               destination
ACCEPT     all  --  anywhere             anywhere             state
ESTABLISHED
DROP       tcp  --  anywhere             anywhere
tcpflags: FIN,SYN,RST,ACK/SYN #conn src/0 > 75
SYN_THROTTLE  tcp  --  anywhere             anywhere
multiport dports 31923,31924 state NEW
(those weird params are from a connlimit suggestion I found for
limiting the total number of TCP connections which may be handled over
a chain.)  I started off at 50, and am now up to 100.  This is
obviously a stopgap solution for an ongoing event, but it suggests
some further ways that slower single-board computers can be made to
weather such storms, possibly without (see earlier discussion on this
thread) using fail2ban at all, which is very inefficient.
What's quite alarming is that when I raise the limit a bit, to get the
restarted Tor relay better connected, the SYN flood logs go crazy for
a minute or two before instantaneously stopping when, I presume, the
connection limit has been reached.  Since the dropped packets above
the global inbound connection limit are not logged, the sudden
start/stop of the SYN flood logging (in the SYN_THROTTLE chain,
they're logged) tells me I am still under intense SYN flood.
After adding connection limits on the Tor box, my router recovered and
is seeing ping times, ballpark 2x normal (0.8-1.2ms is normal, now
it's more like 1.0-2.0,s), but things are working handily.  I have
also been able to connect to other services through the Tor relay
again, with
considerable difficulty.
I notice that Tor is consuming all available CPU, even though it is,
for the moment, relaying a pretty consistent 50-80KB/sec.  I suspect
that this is mostly circuit creation requests coming in over
established Tor connections, which Tor is ... processing, I don't know
how, but unless there's been some turnover and they get lucky, once
another peer attempts a TCP/TLS handshake, its packets are likely to
be dropped.  This is probably not ideal.
As long as the Raspberry Pi manages to stay up and keep logging I'll
have all the data to go through later. (I already captured a lot.) I
also have the logs from the other incident that I caught and watched
in real time. I'm planning to do an analysis on the number of IPs
involved, whether they are Tor relays are not, and other interesting
things that can be gleaned from the logs. I promise some graphs and
charts, punch and pie not so much. Unfortunately my life is quite busy
right now so the report may take a while although it's kind of a high
priority thing for me at the moment.  This is pretty crazy.  I can't
verify it, but my suspicion is this is happening when I get my Stable
flag (I have no idea if I'd gotten it back this morning or not) or
shortly thereafter.
Node is VastCatbox, flood started around 8am Pacific.
Best,
- -Gordon M.
...PGP SIGNATURE...
-----BEGIN PGP SIGNATURE-----

iQEcBAEBCgAGBQJSco2BAAoJED/jpRoe7/ujPS0H/0Hc8IGiQfpVL7gfB1PPAaSc
v2vocpj74czmQJSev/mYkKJbvRT/YdNm9bCE/CH6suFFvNgBqHmx4WWEFiBbzudH
DmBXO8OTKj5oEvb3IJLxuoiPyljzTQzrk7FoUkqnieDl1et4uB8RhSGe5GpDkoYZ
0+jVB8WqEJJg2EJUghHAXVPzvgOMa9yaW4jfBWHZWM5CZ9FsDMxFC5wFMMJ8A0mL
vtY5YMxNdaQPmz6btfsAT+HM/hnTZ1kOT+WnryKlShKtEOhyHXTrWw3QC3Xd60n3
D1hYNSpZ2TWo5tRhzkZobqZnBeDarOvFOeXlQjjP1YTBwYUFrV5RZiBsZD7zNos=
=U4MT
-----END PGP SIGNATURE-----

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [tor-relays] max TCP interruption before Tor circuit teardown?