-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
Hello,
I'm working on building support scaffolding[1] for Tor on Raspberry Pi and other small ARM single-board computers (SBCs).
With the slower computers, sometimes too many attempts to connect to the ORPort (I am almost positive as part of TAP circuit building, but not *really* sure) can eventually cause Tor to consume more physmem than available and cause the oom-killer to kill Tor. Also, depending on the crappiness of the user's router, it's effectively a SYN flood, and can crash or impair consumer routers.
My solution, so far, is to define (through trial and error on a per-machine basis, since [1] is only officially supporting 3 SBCs right now) limits on how many SYNs may be sent to the ORPort and the DirPort per second. This is done with iptables. I experimented, tuned the parameters and watched traffic for weeks and came up with a pretty good set of limits for a 950MHz Raspberry Pi: 4 SYNs/sec burst 10. (For those about to say the Pi is thus too slow to be used as a relay, it's quite capable of relaying *at least* 2.5Mbps, but *not* when it's getting SYN flooded.)
So, sometimes hosts exceed this limit. Once the limit is exceeded, my current strategy is to use iptables REJECT to send an ICMP Service Unavailable (or whatever it's called, sorry no coffee yet) back to the hosts that triggered the filter. This is on a per-SYN basis.
After watching the data, I noticed that some hosts just try to connect once or twice, or try to connect (during overload conditions) at reasonable intervals of tens of seconds to a few minutes. Other hosts will quadruple-tap the ORPort with SYNs, four in a row, and otherwise be much more aggressive with sending SYNs.
I'm currently testing fail2ban[2] as a way to ban aggressive peers by changing that iptables REJECT to a DROP for a short period, in order to accomplish two things:
1. Encourage them to knock off their bad behavior (i.e., go away for a little while).
2. Free up CPU time, RAM and bandwidth because we don't have to construct and send ICMP packets to banned peers.
Currently, if a peer violates the 4/sec burst 10 SYN limit more than 5 times in 60 seconds, that peer will be banned for 90 seconds. I'm trying to trim this down to the minimum that will protect the relay, and 90 seconds is a guess given some of my fears, read on...
During an overload condition, my primary priority is to protect the relay, but of course I wish to do so with as little disruption to the Tor network as possible. So, here is a potential problem with my approach that I can think of, which could degrade service (mildly, for a few end users) on the Tor network.
First, during a SYN flood type overload, some peers which have *existing* circuits built through the relay and are sending SYNs as normal traffic, will stochastically get "caught" in the filter and banned for a short time. If these hosts already have circuits open through the relay which is overloaded, I would prefer to preserve those circuits rather than break them. My defensive strategy versus overload here is to throttle new circuit creation requests, *not* to break existing circuits.
So here's the $64,000 question:
If a tor relay has a circuit built through a peer, and the peer starts dropping 100% of packets, how long will it take before the relay with the circuit "gives up" on the circuit and tears it down? I want to set my temp ban time *below* this timeout. Thus, unlucky peers that were caught in the filter and have circuits already built through the relay they will experience a brief performance degradation, but they won't lose their active circuits through the overloaded relay, and in the meantime hopefully the overload condition is becoming resolved.
Is there such a timeout? There must be. Can someone tell me what it is?
Or, is there a better way to protect low-resource machines (slow CPU, 512MB RAM) against the SYN flood "circuit creation storm" conditions which occasionally arise on the Tor network? Again, I must reiterate, machines with specs like these can be very good relays for home broadband users. The true goal of my project[1] is to build a set of software which enables a "plug and forget" relay for home broadband users that costs well under $100.
[1] https://github.com/gordon-morehouse/cipollini [2] http://www.fail2ban.org/wiki/index.php/Main_Page
Best, - -Gordon M.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
On 10/20/2013 12:42 PM, Gordon Morehouse wrote:
If a tor relay has a circuit built through a peer, and the peer starts dropping 100% of packets, how long will it take before the relay with the circuit "gives up" on the circuit and tears it down? I want to set my temp ban time *below* this timeout. Thus, unlucky peers that were caught in the filter and have circuits already built through the relay they will experience a brief performance degradation, but they won't lose their active circuits through the overloaded relay, and in the meantime hopefully the overload condition is becoming resolved.
Might it be better to actually cause the connecting client to tear down the circuit instead of degrading performance? If your relay is already being swamped by circuit-creation requests, it might be better to cause clients to build new circuits, hopefully not using your relay, no?
Dan
- -- http://disman.tl OpenPGP key: http://disman.tl/pgp.asc Fingerprint: 2480 095D 4B16 436F 35AB 7305 F670 74ED BD86 43A9
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
Dan Staples:
On 10/20/2013 12:42 PM, Gordon Morehouse wrote:
If a tor relay has a circuit built through a peer, and the peer starts dropping 100% of packets, how long will it take before the relay with the circuit "gives up" on the circuit and tears it down? I want to set my temp ban time *below* this timeout. Thus, unlucky peers that were caught in the filter and have circuits already built through the relay they will experience a brief performance degradation, but they won't lose their active circuits through the overloaded relay, and in the meantime hopefully the overload condition is becoming resolved.
Might it be better to actually cause the connecting client to tear down the circuit instead of degrading performance? If your relay is already being swamped by circuit-creation requests, it might be better to cause clients to build new circuits, hopefully not using your relay, no?
My reasoning here is that the Pi can push at least 2.5 Mbps of traffic comfortably. If a Pi-based relay gets the Stable flag, and peers start building long-lived circuits through it (correct me if my understanding is weak please, BTW), the traffic flowing through those existing circuits isn't doing the most loading of the relay; it's the SYNs/circuit creation requests, and thus, those are what I want to shed.
The issue is that a peer with circuits which already exist may send some SYNs at the wrong time and get banned - I'd prefer to temporarily degrade service than to force that peer to tear down the circuit, because the circuit itself isn't causing much load. The ban of a peer with pre-existing circuits is collateral damage, essentially, and I'd like to limit that.
Let's pretend (I have no idea) that Tor will give up after 90 sec if a circuit's peer starts dropping all packets. If I choose only to drop packets for anyone caught in the short-term ban filter for 75 seconds, that's probably a pretty strong signal to peers looking to build *new* circuits to try elsewhere, but the peers with existing circuits will be degraded for 75 seconds and then get to keep their active circuits. If the storm abates or even slows, they may not see this degradation much at all.
I'm still waiting for another "storm" to test the 60 sec findtime / 90 sec bantime guesses that I made (and just pushed to my repo, BTW). Every time my relay crashes due to a storm, it takes me that much longer to get Stable back, and the storms are almost nonexistent until you have the Stable flag in my observation.
Best, - -Gordon M.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
Gordon Morehouse:
I'm still waiting for another "storm" to test the 60 sec findtime / 90 sec bantime guesses that I made (and just pushed to my repo, BTW). Every time my relay crashes due to a storm, it takes me that much longer to get Stable back, and the storms are almost nonexistent until you have the Stable flag in my observation.
Another circuit-creation storm (detectable as SYN flood on ORPort) happened last night soon after reattaining my Stable flag (argh!!!) and the following limits on SYNs to the ORPort were not enough to save Tor from the oom-killer:
1. Absolute limit avg 4 SYN per second with burst of 10 to ORPort, with an iptables REJECT (as opposed to DROP) for hosts that send SYNs when this limit has been reached.
2. 90-second iptables DROP ban for hosts which exceed the above (and are thus logged) in any 60-second period.
Sigh. More trial and error and another (figurative) century before I get my Stable flag back.
Best, - -Gordon M.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
Gordon Morehouse:
Gordon Morehouse:
I'm still waiting for another "storm" to test the 60 sec findtime / 90 sec bantime guesses that I made (and just pushed to my repo, BTW). Every time my relay crashes due to a storm, it takes me that much longer to get Stable back, and the storms are almost nonexistent until you have the Stable flag in my observation.
Another circuit-creation storm (detectable as SYN flood on ORPort) happened last night soon after reattaining my Stable flag (argh!!!) and the following limits on SYNs to the ORPort were not enough to save Tor from the oom-killer:
- Absolute limit avg 4 SYN per second with burst of 10 to ORPort,
with an iptables REJECT (as opposed to DROP) for hosts that send SYNs when this limit has been reached.
- 90-second iptables DROP ban for hosts which exceed the above
(and are thus logged) in any 60-second period.
I should have said "exceed the above 5 times" here.
Sigh. More trial and error and another (figurative) century before I get my Stable flag back.
I'm going to try dropping the total SYN limit to 3/sec burst 8, extend the watch time from 60 to 75 seconds, and decrease the max # of exceeds from 5 to 4 and see how that does.
This is fairly Pi-specific.
Best, - -Gordon M.
On 13-10-20 12:42 PM, Gordon Morehouse wrote:
First, during a SYN flood type overload, some peers which have *existing* circuits built through the relay and are sending SYNs as normal traffic, will stochastically get "caught" in the filter and banned for a short time. If these hosts already have circuits open through the relay which is overloaded, I would prefer to preserve those circuits rather than break them. My defensive strategy versus overload here is to throttle new circuit creation requests, *not* to break existing circuits. ... If a tor relay has a circuit built through a peer, and the peer starts dropping 100% of packets, how long will it take before the relay with the circuit "gives up" on the circuit and tears it down? I want to set my temp ban time *below* this timeout. Thus, unlucky peers that were caught in the filter and have circuits already built through the relay they will experience a brief performance degradation, but they won't lose their active circuits through the overloaded relay, and in the meantime hopefully the overload condition is becoming resolved.
Is there such a timeout? There must be. Can someone tell me what it is?
Would something like an conntrack-tools help? Maybe it provides more direct connection control than trying to game the timings. http://conntrack-tools.netfilter.org/
Also, to what extent would/could the Tor network (or a small group of nodes) count as a "high availability cluster" for entry firewalling purposes? Would clustering help protect against timing attacks on relays or hidden services?
(I lack expertise or resources to answer any of the above, but reading Gordon Morehouse's project got me searching and curious.)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
krishna e bera:
On 13-10-20 12:42 PM, Gordon Morehouse wrote:
First, during a SYN flood type overload, some peers which have *existing* circuits built through the relay and are sending SYNs as normal traffic, will stochastically get "caught" in the filter and banned for a short time. If these hosts already have circuits open through the relay which is overloaded, I would prefer to preserve those circuits rather than break them. My defensive strategy versus overload here is to throttle new circuit creation requests, *not* to break existing circuits. ... If a tor relay has a circuit built through a peer, and the peer starts dropping 100% of packets, how long will it take before the relay with the circuit "gives up" on the circuit and tears it down? I want to set my temp ban time *below* this timeout. Thus, unlucky peers that were caught in the filter and have circuits already built through the relay they will experience a brief performance degradation, but they won't lose their active circuits through the overloaded relay, and in the meantime hopefully the overload condition is becoming resolved.
Is there such a timeout? There must be. Can someone tell me what it is?
Would something like an conntrack-tools help? Maybe it provides more direct connection control than trying to game the timings. http://conntrack-tools.netfilter.org/
Probably would, though it might be faster to slink over to tor-dev and ask, get a dev to notice in here (which is what I'm trying to do ;)), or dig through the source code myself - I'm not a C programmer but I can read it okay.
Also, to what extent would/could the Tor network (or a small group of nodes) count as a "high availability cluster" for entry firewalling purposes? Would clustering help protect against timing attacks on relays or hidden services?
You mean, if you have a circuit, sending some bytes of I/O over entry node A, some over entry node B, etc? Not quite sure what you're asking.
(I lack expertise or resources to answer any of the above, but reading Gordon Morehouse's project got me searching and curious.)
I'm glad it's doing somebody some good, or taking up time that could've been otherwise wasted on Buzzfeed or something ;) Not that you'd do that. ;)
Best, - -Gordon M.
On 13-10-27 05:32 PM, Gordon Morehouse wrote:
Also, to what extent would/could the Tor network (or a small group of nodes) count as a "high availability cluster" for entry firewalling purposes? Would clustering help protect against timing attacks on relays or hidden services?
You mean, if you have a circuit, sending some bytes of I/O over entry node A, some over entry node B, etc? Not quite sure what you're asking.
Yes, essentially load balancing. I noticed someone was working on bonding tcp connections at the back end, so why not the opposite as well.
(I lack expertise or resources to answer any of the above, but reading Gordon Morehouse's project got me searching and curious.)
I'm glad it's doing somebody some good, or taking up time that could've been otherwise wasted on Buzzfeed or something ;) Not that you'd do that. ;)
Never heard of Buzzfeed but i will check it out, thanks!
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
krishna e bera:
On 13-10-27 05:32 PM, Gordon Morehouse wrote:
Also, to what extent would/could the Tor network (or a small group of nodes) count as a "high availability cluster" for entry firewalling purposes? Would clustering help protect against timing attacks on relays or hidden services?
You mean, if you have a circuit, sending some bytes of I/O over entry node A, some over entry node B, etc? Not quite sure what you're asking.
Yes, essentially load balancing. I noticed someone was working on bonding tcp connections at the back end, so why not the opposite as well.
That is one for the folks who can develop proofs or theories about anonymity to answer - and I hope they might at some point if they haven't already in a paper they'll point out to us. :)
(I lack expertise or resources to answer any of the above, but reading Gordon Morehouse's project got me searching and curious.)
I'm glad it's doing somebody some good, or taking up time that could've been otherwise wasted on Buzzfeed or something ;) Not that you'd do that. ;)
Never heard of Buzzfeed but i will check it out, thanks!
Oh, no, please don't. :(
;)
- -Gordon M.
First post to this mailing list. I joined the network 3 days ago with a Via Nehemia system, 1 GHz, 256 Mb RAM, RelayBandwidthRate 500 KB.
On 2013-10-20 09:42:01 (-0700), Gordon Morehouse wrote:
First, during a SYN flood type overload, some peers which have *existing* circuits built through the relay and are sending SYNs as normal traffic, will stochastically get "caught" in the filter and banned for a short time. If these hosts already have circuits open through the relay which is overloaded, I would prefer to preserve those circuits rather than break them. My defensive strategy versus overload here is to throttle new circuit creation requests, *not* to break existing circuits.
So here's the $64,000 question:
If a tor relay has a circuit built through a peer, and the peer starts dropping 100% of packets, how long will it take before the relay with the circuit "gives up" on the circuit and tears it down? I want to set my temp ban time *below* this timeout. Thus, unlucky peers that were caught in the filter and have circuits already built through the relay they will experience a brief performance degradation, but they won't lose their active circuits through the overloaded relay, and in the meantime hopefully the overload condition is becoming resolved.
I can think of two approaches to your problem:
- You can 'iptables -m state --state ESTABLISHED -J ACCEPT' early in your ruleset, so all existing circuits will be allowed. I understand this is pretty standard practice and I'm somewhat surprised that you're not already doing it. Your SYN throttling would appear later in the ruleset. You could be aggresive at this point since you know that you won't break any circuit.
- Besides this, you can 'iptables -p tcp --syn -J SYN_THROTTLE' and populate a new SYN_THROTTLE chain with your desired rules to tell peers to calm down. Only SYN packets will enter this chain, the established circuits won't match this rule and will traverse the rest of the ruleset unaffected.
Since I run a new node and discovering this new world I'm somewhat concerned that once I gain the Stable flag I'll be SYN flooded too so I'll pay attention to this too.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
David Serrano:
First post to this mailing list. I joined the network 3 days ago with a Via Nehemia system, 1 GHz, 256 Mb RAM, RelayBandwidthRate 500 KB.
I suspect that'll have the CPU to handle things, but RAM... guess you'll find out! Unsure.
On 2013-10-20 09:42:01 (-0700), Gordon Morehouse wrote:
First, during a SYN flood type overload, some peers which have *existing* circuits built through the relay and are sending SYNs as normal traffic, will stochastically get "caught" in the filter and banned for a short time. If these hosts already have circuits open through the relay which is overloaded, I would prefer to preserve those circuits rather than break them. My defensive strategy versus overload here is to throttle new circuit creation requests, *not* to break existing circuits.
So here's the $64,000 question:
If a tor relay has a circuit built through a peer, and the peer starts dropping 100% of packets, how long will it take before the relay with the circuit "gives up" on the circuit and tears it down? I want to set my temp ban time *below* this timeout. Thus, unlucky peers that were caught in the filter and have circuits already built through the relay they will experience a brief performance degradation, but they won't lose their active circuits through the overloaded relay, and in the meantime hopefully the overload condition is becoming resolved.
I can think of two approaches to your problem:
- You can 'iptables -m state --state ESTABLISHED -J ACCEPT' early
in your ruleset, so all existing circuits will be allowed. I understand this is pretty standard practice and I'm somewhat surprised that you're not already doing it. Your SYN throttling would appear later in the ruleset. You could be aggresive at this point since you know that you won't break any circuit.
- Besides this, you can 'iptables -p tcp --syn -J SYN_THROTTLE' and
populate a new SYN_THROTTLE chain with your desired rules to tell peers to calm down. Only SYN packets will enter this chain, the established circuits won't match this rule and will traverse the rest of the ruleset unaffected.
Since I run a new node and discovering this new world I'm somewhat concerned that once I gain the Stable flag I'll be SYN flooded too so I'll pay attention to this too.
This is greatly helpful, thanks. The reason I've overlooked some obvious things is because I'm an iptables noob. :)
If you like, have a look at The Cipollini Project[1], which is essentially a collection of tidbits aiming to eventually be a set of packages that can be distributed or otherwise used to turn very inexpensive and/or low-end boxes into "plug and forget" relays. It'll soon have its own mailing list.
1. https://github.com/gordon-morehouse/cipollini
Best, - -Gordon M.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
David Serrano:
- You can 'iptables -m state --state ESTABLISHED -J ACCEPT' early
in your [snip] - Besides this, you can 'iptables -p tcp --syn -J SYN_THROTTLE' and populate a
I think we're using different versions of iptables! A lot of these options don't work with Raspbian's version - 1.4.14. I am puzzling out the equivalents this morning. :)
Best, - -Gordon M.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
David Serrano: [snip]
On 2013-10-20 09:42:01 (-0700), Gordon Morehouse wrote:
First, during a SYN flood type overload, some peers which have *existing* circuits built through the relay and are sending SYNs as normal traffic, will stochastically get "caught" in the filter and banned for a short time. If these hosts already have circuits open through the relay which is overloaded, I would prefer to preserve those circuits rather than break them. My defensive strategy versus overload here is to throttle new circuit creation requests, *not* to break existing circuits.
So here's the $64,000 question:
If a tor relay has a circuit built through a peer, and the peer starts dropping 100% of packets, how long will it take before the relay with the circuit "gives up" on the circuit and tears it down? I want to set my temp ban time *below* this timeout. Thus, unlucky peers that were caught in the filter and have circuits already built through the relay they will experience a brief performance degradation, but they won't lose their active circuits through the overloaded relay, and in the meantime hopefully the overload condition is becoming resolved.
I can think of two approaches to your problem:
I've implemented these and I'd really love for anyone who's great at iptables to sanity-check my rules[1] because I am an iptables relative noob.
I'm also quite happy to report that my Raspberry Pi node weathered a pretty intense SYN flood (20-30 SYNs per sec, I'm going to post a log deconstruction of the event with graphs if possible) with the old rules. It didn't weather it *well*, specifically fail2ban got bogged down and stopped working after a while while chewing up half the available CPU cycles, but the node survived without crashing.
There are stories on the Pivotal project tracker[2] for The Cipollini Project[3] regarding these problems - I luckily happened to catch the SYN flood ("circuit creation storm") event just as it really got started and was able to observe it in real time.
1. http://v.gd/1TV9mz (link to file on github)
2. https://www.pivotaltracker.com/s/projects/917796
3. https://github.com/gordon-morehouse/cipollini
Best, - -Gordon M.
On 2013-10-27 12:29:33 (-0700), Gordon Morehouse wrote:
I've implemented these and I'd really love for anyone who's great at iptables to sanity-check my rules[1] because I am an iptables relative noob.
5: # TODO: don't know if fail2ban will override this if a host with established 6: # connections gets temp banned. We don't want it to. Need to find out.
It depends on the spot fail2ban inserts the new firewall rules. If it's before the '--state ESTABLISHED' rule, then the ban will be enforced. Otherwise, the kernel will let the packets through when they reach that rule.
12: iptables -A INPUT -p tcp -m multiport --dports 31923,31924 -m state --state NEW -j SYN_THROTTLE [...] 17: /sbin/iptables -A SYN_THROTTLE -m state --state NEW -j LOG 18: /sbin/iptables -A SYN_THROTTLE -m state --state NEW -j REJECT
You don't need '-m state --state NEW' in lines 17 and 18 because all packets in that chain are already known to be new.
I recommend to use always --log-prefix for easy future grepping.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
David Serrano:
On 2013-10-27 12:29:33 (-0700), Gordon Morehouse wrote:
I've implemented these and I'd really love for anyone who's great at iptables to sanity-check my rules[1] because I am an iptables relative noob.
5: # TODO: don't know if fail2ban will override this if a host with established 6: # connections gets temp banned. We don't want it to. Need to find out.
It depends on the spot fail2ban inserts the new firewall rules. If it's before the '--state ESTABLISHED' rule, then the ban will be enforced. Otherwise, the kernel will let the packets through when they reach that rule.
Here's my 'iptables -L' output, on pastebin because it's a mess when formatted for email: http://pastebin.com/f1VZNeTF
That's not a fresh boot, though, I did:
'iptables -F' 'service fail2ban reload'
and then ran the iptables commands by hand, in order.
12: iptables -A INPUT -p tcp -m multiport --dports 31923,31924 -m state --state NEW -j SYN_THROTTLE [...] 17: /sbin/iptables -A SYN_THROTTLE -m state --state NEW -j LOG 18: /sbin/iptables -A SYN_THROTTLE -m state --state NEW -j REJECT
You don't need '-m state --state NEW' in lines 17 and 18 because all packets in that chain are already known to be new.
Ah, right - thanks! That might save a few cycles, assuming iptables wouldn't optimize it out. Important for the Raspberry Pi!
I recommend to use always --log-prefix for easy future grepping.
Another good idea, thanks again. I've committed these changes to the repo.
Best, - -Gordon M.
On 2013-10-27 15:00:10 (-0700), Gordon Morehouse wrote:
Here's my 'iptables -L' output, on pastebin because it's a mess when formatted for email: http://pastebin.com/f1VZNeTF
That's not a fresh boot, though, I did:
'iptables -F' 'service fail2ban reload'
and then ran the iptables commands by hand, in order.
Things may potentially be different after a reboot, so I'd recommend rebooting now and see how the firewall ends up. Right now it seems that fail2ban would ban and break existing circuits. It all depends on what rules it inserts into its chain.
However, do you need fail2ban now that you are throttling SYNs without affecting circuits?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
David Serrano:
On 2013-10-27 15:00:10 (-0700), Gordon Morehouse wrote:
Here's my 'iptables -L' output, on pastebin because it's a mess when formatted for email: http://pastebin.com/f1VZNeTF
That's not a fresh boot, though, I did:
'iptables -F' 'service fail2ban reload'
and then ran the iptables commands by hand, in order.
Things may potentially be different after a reboot, so I'd recommend rebooting now and see how the firewall ends up. Right now it seems that fail2ban would ban and break existing circuits. It all depends on what rules it inserts into its chain.
Here's the output of 'iptables -L' after a fresh boot: http://pastebin.com/b0PUbJJX
And, after the boot, I've simulated an aggressive host from another machine using hping, and here's the output of 'iptables -L' after fail2ban banned the host (LAN IP partly redacted to settle my paranoia): http://pastebin.com/1L62z23b
Incidentally, this experiment confirmed that once fail2ban has banned a host, further packets are not logged such that fail2ban must parse them, which was an open question and is now answered, and answered the way I wanted.
However, do you need fail2ban now that you are throttling SYNs without affecting circuits?
Uncertain. I'd added it as an adjunct to the throttling, hoping a temporary placement into the DROP chain would save cycles and memory as REJECT ICMP packets would no longer be sent; in the only major Tor SYN flood I've experienced since adding fail2ban to the mix (and reducing the SYN limits from 4/sec burst 10 to 3/sec burst 6), fail2ban eventually fell far enough behind in parsing logs of those SYNs exceeding the limits that it could not catch up and stopped banning hosts. The node survived the flood for the first time without crashing, but fail2ban was working for the first 20-30 min or so IIRC, so that may have helped, or it may have just been the reduction in the SYN throttle limits.
I have an open bug in the project tracker[1] regarding figuring out what to do with fail2ban, and one of the options is to get rid of it, but I don't know enough yet.
1. https://www.pivotaltracker.com/s/projects/917796
Thanks a ton for your help!
Best, - -Gordon M.
On 2013-10-27 16:35:43 (-0700), Gordon Morehouse wrote:
And, after the boot, I've simulated an aggressive host from another machine using hping, and here's the output of 'iptables -L' after fail2ban banned the host (LAN IP partly redacted to settle my paranoia): http://pastebin.com/1L62z23b
That resulting ruleset will break circuits. Packets from flooding hosts won't have a chance to reach the '--state ESTABLISHED' rule since they are dropped before that, from within the fail2ban-tor-syn-flood chain.
However, do you need fail2ban now that you are throttling SYNs without affecting circuits?
Uncertain. I'd added it as an adjunct to the throttling, hoping a temporary placement into the DROP chain would save cycles and memory as REJECT ICMP packets would no longer be sent
But you can drop packets in the SYN_THROTTLE chain instead of rejecting them, without fail2ban. Or you can accept them until a threshold is reached, then log/reject them up to a second threshold, then silently drop them.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
David Serrano:
On 2013-10-27 16:35:43 (-0700), Gordon Morehouse wrote:
And, after the boot, I've simulated an aggressive host from another machine using hping, and here's the output of 'iptables -L' after fail2ban banned the host (LAN IP partly redacted to settle my paranoia): http://pastebin.com/1L62z23b
That resulting ruleset will break circuits. Packets from flooding hosts won't have a chance to reach the '--state ESTABLISHED' rule since they are dropped before that, from within the fail2ban-tor-syn-flood chain.
Thanks - I really don't understand yet with iptables how to tell in what order the chains are processed.
However, do you need fail2ban now that you are throttling SYNs without affecting circuits?
Uncertain. I'd added it as an adjunct to the throttling, hoping a temporary placement into the DROP chain would save cycles and memory as REJECT ICMP packets would no longer be sent
But you can drop packets in the SYN_THROTTLE chain instead of rejecting them, without fail2ban. Or you can accept them until a threshold is reached, then log/reject them up to a second threshold, then silently drop them.
Currently this is how it works:
1. accept to the 3/sec burst 6, then reject (iptables) 2. 4 logs of iptables reject in 75 sec = 90 sec ban (fail2ban)
I'd love to do all of the above purely in iptables and eliminate fail2ban, but is it capable of maintaining state like that (e.g. the 75 second 'watch time' and 90 sec 'ban time')?
This is very new to me, I've always used off-the-shelf iptables-based packages. If there are docs I should read which cover this use case without me having to read for 2 hours before I get there, I'd really appreciate a link. And I say that not to be a jerk, but because my time is stretched really really thin.
Thanks for all your iptables help. You'll definitely be credited.
Best, - -Gordon M.
On 2013-10-29 08:00:53 (-0700), Gordon Morehouse wrote:
Currently this is how it works:
- accept to the 3/sec burst 6, then reject (iptables)
- 4 logs of iptables reject in 75 sec = 90 sec ban (fail2ban)
I'd love to do all of the above purely in iptables and eliminate fail2ban, but is it capable of maintaining state like that (e.g. the 75 second 'watch time' and 90 sec 'ban time')?
I don't think so.
*glances over the iptables man page*
Ah, check the 'recent' module ;). Seems to be what you need to get rid of fail2ban. After playing with it for a while I got this:
iptables -N RECENT iptables -I INPUT 1 -p icmp -s somehost -j RECENT iptables -A RECENT -m recent --name pa --rcheck --seconds 10 -j DROP #iptables -A RECENT -j LOG --log-prefix "not recent dropped: " iptables -A RECENT -m limit --limit 2/s --limit-burst 2 -j ACCEPT #iptables -A RECENT -j LOG --log-prefix "rate limited, set recent: " iptables -A RECENT -m recent --name pa --set -j DROP
Then:
user@somehost $ ping -i 0.3 172.26.8.85 PING 172.26.8.85 (172.26.8.85) 56(84) bytes of data. 64 bytes from 172.26.8.85: icmp_req=1 ttl=63 time=0.286 ms 64 bytes from 172.26.8.85: icmp_req=2 ttl=63 time=0.264 ms 64 bytes from 172.26.8.85: icmp_req=3 ttl=63 time=0.255 ms 64 bytes from 172.26.8.85: icmp_req=37 ttl=63 time=0.263 ms 64 bytes from 172.26.8.85: icmp_req=38 ttl=63 time=0.282 ms 64 bytes from 172.26.8.85: icmp_req=39 ttl=63 time=0.292 ms 64 bytes from 172.26.8.85: icmp_req=73 ttl=63 time=0.278 ms 64 bytes from 172.26.8.85: icmp_req=74 ttl=63 time=0.287 ms 64 bytes from 172.26.8.85: icmp_req=75 ttl=63 time=0.283 ms ^C --- 172.26.8.85 ping statistics --- 89 packets transmitted, 9 received, 89% packet loss, time 27031ms rtt min/avg/max/mdev = 0.255/0.276/0.292/0.022 ms
You want the '-j RECENT' rule in chain INPUT not to be in position 1, but after the '--state ESTABLISHED' one, otherwise it will affect your established circuits. I suggest you play with this in another box, then edit the relay firewall accordingly when you're comfortable (consider that I limited my test to ICMP packets from a specific host while you're going to check everything). I'm afraid further questions on this (or even this very message) could be considered off topic, though.
Lastly, this may give additional ideas:
http://thiemonagel.de/2006/02/preventing-brute-force-attacks-using-iptables-...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
I've just seen the most amazing headshot of my Tor relay by a sudden massive SYN flood yet. I was online and started noticing problems with DNS on my local router. I checked my so-called monitoring setup, a window with a permanent ping to my router, and noticed a lot of timeouts. Obviously, that means trouble.
Checked my Raspberry Pi tor relay setup, and there was an incredible SYN flood just starting. I have attached an image where the vertical scale reaches up to 5 megabits per second and where is column is two seconds. This is absolutely not established tor connection behavior. I don't know what *all* of it is, since once the Tor daemon dies, the SYN traffic seems to be steady at about 50KB/sec (of *just* SYNs inbound, and 100+KB/sec of outbound ICMP port unreachable packets). But that huge tsunami marks when the flood / circuit creation storm really got going.
My relay crashed faster than I've ever seen it crash before, even with my newer protections in place. In under 5 minutes the out of memory killer reaped Tor. In previous situations, I've observed during floods that Tor's share of physical memory doesn't seem to increase. I could be wrong about that, but I think the thing eating all the RAM is TCP open/half-open sockets and/or associated tables in the Linux kernel - once RAM pressure becomes too intense, Tor is just the biggest thing around, so the oom-killer picks it and bam.
The truly amazing and disturbing thing is that it's an hour and a half later now, and my router is still under extreme load from the incoming SYN packets. It hasn't yet crashed.
In the meantime I added an iptables rule right under the "ESTABLISHED" rule suggested by David Serrano:
Chain INPUT (policy ACCEPT) target prot opt source destination
ACCEPT all -- anywhere anywhere state ESTABLISHED
DROP tcp -- anywhere anywhere tcpflags: FIN,SYN,RST,ACK/SYN #conn src/0 > 75
SYN_THROTTLE tcp -- anywhere anywhere multiport dports 31923,31924 state NEW
(those weird params are from a connlimit suggestion I found for limiting the total number of TCP connections which may be handled over a chain.) I started off at 50, and am now up to 100. This is obviously a stopgap solution for an ongoing event, but it suggests some further ways that slower single-board computers can be made to weather such storms, possibly without (see earlier discussion on this thread) using fail2ban at all, which is very inefficient.
What's quite alarming is that when I raise the limit a bit, to get the restarted Tor relay better connected, the SYN flood logs go crazy for a minute or two before instantaneously stopping when, I presume, the connection limit has been reached. Since the dropped packets above the global inbound connection limit are not logged, the sudden start/stop of the SYN flood logging (in the SYN_THROTTLE chain, they're logged) tells me I am still under intense SYN flood.
After adding connection limits on the Tor box, my router recovered and is seeing ping times, ballpark 2x normal (0.8-1.2ms is normal, now it's more like 1.0-2.0,s), but things are working handily. I have also been able to connect to other services through the Tor relay again, with considerable difficulty.
I notice that Tor is consuming all available CPU, even though it is, for the moment, relaying a pretty consistent 50-80KB/sec. I suspect that this is mostly circuit creation requests coming in over established Tor connections, which Tor is ... processing, I don't know how, but unless there's been some turnover and they get lucky, once another peer attempts a TCP/TLS handshake, its packets are likely to be dropped. This is probably not ideal.
As long as the Raspberry Pi manages to stay up and keep logging I'll have all the data to go through later. (I already captured a lot.) I also have the logs from the other incident that I caught and watched in real time. I'm planning to do an analysis on the number of IPs involved, whether they are Tor relays are not, and other interesting things that can be gleaned from the logs. I promise some graphs and charts, punch and pie not so much. Unfortunately my life is quite busy right now so the report may take a while although it's kind of a high priority thing for me at the moment. This is pretty crazy. I can't verify it, but my suspicion is this is happening when I get my Stable flag (I have no idea if I'd gotten it back this morning or not) or shortly thereafter.
Node is VastCatbox, flood started around 8am Pacific.
Best, - -Gordon M.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
Tor CPU usage has dropped way back to 15-40% in the last few minutes, even as I was increasing my total inbound connection limit to 150 connections.
This is a hell of a log line on a Raspberry Pi, which also just popped out:
Oct 31 10:13:49.000 [notice] Circuit handshake stats since last time: 61533/218956 TAP, 30/30 NTor.
Wow.
Best, - -Gordon M.
Gordon Morehouse:
I've just seen the most amazing headshot of my Tor relay by a sudden massive SYN flood yet. I was online and started noticing problems with DNS on my local router. I checked my so-called monitoring setup, a window with a permanent ping to my router, and noticed a lot of timeouts. Obviously, that means trouble.
Checked my Raspberry Pi tor relay setup, and there was an incredible SYN flood just starting. I have attached an image where the vertical scale reaches up to 5 megabits per second and where is column is two seconds. This is absolutely not established tor connection behavior. I don't know what *all* of it is, since once the Tor daemon dies, the SYN traffic seems to be steady at about 50KB/sec (of *just* SYNs inbound, and 100+KB/sec of outbound ICMP port unreachable packets). But that huge tsunami marks when the flood / circuit creation storm really got going.
My relay crashed faster than I've ever seen it crash before, even with my newer protections in place. In under 5 minutes the out of memory killer reaped Tor. In previous situations, I've observed during floods that Tor's share of physical memory doesn't seem to increase. I could be wrong about that, but I think the thing eating all the RAM is TCP open/half-open sockets and/or associated tables in the Linux kernel - once RAM pressure becomes too intense, Tor is just the biggest thing around, so the oom-killer picks it and bam.
The truly amazing and disturbing thing is that it's an hour and a half later now, and my router is still under extreme load from the incoming SYN packets. It hasn't yet crashed.
In the meantime I added an iptables rule right under the "ESTABLISHED" rule suggested by David Serrano:
Chain INPUT (policy ACCEPT) target prot opt source destination
ACCEPT all -- anywhere anywhere state ESTABLISHED
DROP tcp -- anywhere anywhere tcpflags: FIN,SYN,RST,ACK/SYN #conn src/0 > 75
SYN_THROTTLE tcp -- anywhere anywhere multiport dports 31923,31924 state NEW
(those weird params are from a connlimit suggestion I found for limiting the total number of TCP connections which may be handled over a chain.) I started off at 50, and am now up to 100. This is obviously a stopgap solution for an ongoing event, but it suggests some further ways that slower single-board computers can be made to weather such storms, possibly without (see earlier discussion on this thread) using fail2ban at all, which is very inefficient.
What's quite alarming is that when I raise the limit a bit, to get the restarted Tor relay better connected, the SYN flood logs go crazy for a minute or two before instantaneously stopping when, I presume, the connection limit has been reached. Since the dropped packets above the global inbound connection limit are not logged, the sudden start/stop of the SYN flood logging (in the SYN_THROTTLE chain, they're logged) tells me I am still under intense SYN flood.
After adding connection limits on the Tor box, my router recovered and is seeing ping times, ballpark 2x normal (0.8-1.2ms is normal, now it's more like 1.0-2.0,s), but things are working handily. I have also been able to connect to other services through the Tor relay again, with considerable difficulty.
I notice that Tor is consuming all available CPU, even though it is, for the moment, relaying a pretty consistent 50-80KB/sec. I suspect that this is mostly circuit creation requests coming in over established Tor connections, which Tor is ... processing, I don't know how, but unless there's been some turnover and they get lucky, once another peer attempts a TCP/TLS handshake, its packets are likely to be dropped. This is probably not ideal.
As long as the Raspberry Pi manages to stay up and keep logging I'll have all the data to go through later. (I already captured a lot.) I also have the logs from the other incident that I caught and watched in real time. I'm planning to do an analysis on the number of IPs involved, whether they are Tor relays are not, and other interesting things that can be gleaned from the logs. I promise some graphs and charts, punch and pie not so much. Unfortunately my life is quite busy right now so the report may take a while although it's kind of a high priority thing for me at the moment. This is pretty crazy. I can't verify it, but my suspicion is this is happening when I get my Stable flag (I have no idea if I'd gotten it back this morning or not) or shortly thereafter.
Node is VastCatbox, flood started around 8am Pacific.
Best, -Gordon M.
_______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
- -- Sent from my thing that sends email.
On 2013-10-31 10:04:02 (-0700), Gordon Morehouse wrote:
I can't verify it, but my suspicion is this is happening when I get my Stable flag (I have no idea if I'd gotten it back this morning or not) or shortly thereafter.
You can use https://metrics.torproject.org/relay-search.html and enter your IP address to figure that out.
huh, well, near as I can tell, I didn't get Stable for any time represented yesterday (2013-10-31) for the node VastCatbox.
So maybe that theory is incorrect. In that case I don't know what would trigger the SYN flood behavior other than Roger's idea about becoming an introducer for a popular HS, but... eh... seems like a stretch, a node offering 2.5Mbps that isn't flagged Stable?
-Gordon
On Fri, 1 Nov 2013 13:10:17 +0100, David Serrano tor@dserrano5.es wrote:
On 2013-10-31 10:04:02 (-0700), Gordon Morehouse wrote:
I can't verify it, but my suspicion is this is happening when I get my Stable flag (I have no idea if I'd gotten it back this morning or not) or shortly thereafter.
You can use https://metrics.torproject.org/relay-search.html and enter your IP address to figure that out.
-- David Serrano GnuPG id: 280A01F9 _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
This morning I got my first Tor traffic flood since upgrading to 2.4.x. Logs didn't say anything about not being able to handle the amount of circuit creation requests, but it showed a 200x increase in active TAP circuits (~400k/hour) and the traffic pattern is the same: Advertising 100kb bandwidth, but slammed with ~2Mb traffic.
When I saw it, I checked my relay's flags, and it has the stable flag, and has been tagged stable for at least 3 days. It's been up for 7 days.
I would love to contribute data to help correlate w/ your findings Gordon. Any metrics or logs that would be particularly helpful? I currently use NTop to measure traffic, but it's not very granular.
I also currently don't use any iptables rules to throttle, but am happy to experiment with that if you want me to try out any particular configurations.
Dan
On 11/01/2013 05:30 PM, Gordon Morehouse wrote:
huh, well, near as I can tell, I didn't get Stable for any time represented yesterday (2013-10-31) for the node VastCatbox.
So maybe that theory is incorrect. In that case I don't know what would trigger the SYN flood behavior other than Roger's idea about becoming an introducer for a popular HS, but... eh... seems like a stretch, a node offering 2.5Mbps that isn't flagged Stable?
-Gordon
On Fri, 1 Nov 2013 13:10:17 +0100, David Serrano tor@dserrano5.es wrote:
On 2013-10-31 10:04:02 (-0700), Gordon Morehouse wrote:
I can't verify it, but my suspicion is this is happening when I get my Stable flag (I have no idea if I'd gotten it back this morning or not) or shortly thereafter.
You can use https://metrics.torproject.org/relay-search.html and enter your IP address to figure that out.
-- David Serrano GnuPG id: 280A01F9 _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
Dan Staples:
This morning I got my first Tor traffic flood since upgrading to 2.4.x. Logs didn't say anything about not being able to handle the amount of circuit creation requests, but it showed a 200x increase in active TAP circuits (~400k/hour) and the traffic pattern is the same: Advertising 100kb bandwidth, but slammed with ~2Mb traffic.
When I saw it, I checked my relay's flags, and it has the stable flag, and has been tagged stable for at least 3 days. It's been up for 7 days.
I would love to contribute data to help correlate w/ your findings Gordon. Any metrics or logs that would be particularly helpful? I currently use NTop to measure traffic, but it's not very granular.
I'm still trying to scratch together enough time to analyze the logs from the two floods I caught as they began in the past 10 days or so. One thing I am logging, which you're definitely not, is hosts that send SYNs above the limit on my Raspberry Pi. Are you running on a slow machine or a VPS or what? That might not apply to you if you're not running on a slow machine - you may have no need to limit SYNs or anything else, and that's probably the case if your relay did not crash as a result of the flood.
During my last two floods, the relay survived the first (poorly, with fail2ban becoming useless and chewing up half the CPU), and was headshotted by the second - crash in less than 5 minutes.
I'm looking forward to getting the data together and providing a report for the community, but time ... my kingdom for the time to do anything beyond work, sleep, eat, sh*t.
I also currently don't use any iptables rules to throttle, but am happy to experiment with that if you want me to try out any particular configurations.
Depends on the capacity of your hardware. All my experimentation has to do with low-end ARM boards, so the logs most useful to the report *I* am planning to prepare on these events are logs of SYN exceeds, and fail2ban logs.
Thanks very much for staying up to date and offering to contribute - there is a real problem someplace, but it seems to be mostly a Problem with a capital P for low-end hardware with 512MB physical RAM, since those are the relays likely to actually crash as a result of the floods.
Best, - -Gordon M.
Dan
On 11/01/2013 05:30 PM, Gordon Morehouse wrote:
huh, well, near as I can tell, I didn't get Stable for any time represented yesterday (2013-10-31) for the node VastCatbox.
So maybe that theory is incorrect. In that case I don't know what would trigger the SYN flood behavior other than Roger's idea about becoming an introducer for a popular HS, but... eh... seems like a stretch, a node offering 2.5Mbps that isn't flagged Stable?
-Gordon
On Fri, 1 Nov 2013 13:10:17 +0100, David Serrano tor@dserrano5.es wrote:
On 2013-10-31 10:04:02 (-0700), Gordon Morehouse wrote:
I can't verify it, but my suspicion is this is happening when I get my Stable flag (I have no idea if I'd gotten it back this morning or not) or shortly thereafter.
You can use https://metrics.torproject.org/relay-search.html and enter your IP address to figure that out.
-- David Serrano GnuPG id: 280A01F9 _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
_______________________________________________
tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
I am also running on a Pi Model B, 512MB RAM. How are you logging SYNs?
On Sun 03 Nov 2013 11:25:26 AM EST, Gordon Morehouse wrote:
********* *BEGIN ENCRYPTED or SIGNED PART* *********
Dan Staples:
This morning I got my first Tor traffic flood since upgrading to 2.4.x. Logs didn't say anything about not being able to handle the amount of circuit creation requests, but it showed a 200x increase in active TAP circuits (~400k/hour) and the traffic pattern is the same: Advertising 100kb bandwidth, but slammed with ~2Mb traffic.
When I saw it, I checked my relay's flags, and it has the stable flag, and has been tagged stable for at least 3 days. It's been up for 7 days.
I would love to contribute data to help correlate w/ your findings Gordon. Any metrics or logs that would be particularly helpful? I currently use NTop to measure traffic, but it's not very granular.
I'm still trying to scratch together enough time to analyze the logs from the two floods I caught as they began in the past 10 days or so. One thing I am logging, which you're definitely not, is hosts that send SYNs above the limit on my Raspberry Pi. Are you running on a slow machine or a VPS or what? That might not apply to you if you're not running on a slow machine - you may have no need to limit SYNs or anything else, and that's probably the case if your relay did not crash as a result of the flood.
During my last two floods, the relay survived the first (poorly, with fail2ban becoming useless and chewing up half the CPU), and was headshotted by the second - crash in less than 5 minutes.
I'm looking forward to getting the data together and providing a report for the community, but time ... my kingdom for the time to do anything beyond work, sleep, eat, sh*t.
I also currently don't use any iptables rules to throttle, but am happy to experiment with that if you want me to try out any particular configurations.
Depends on the capacity of your hardware. All my experimentation has to do with low-end ARM boards, so the logs most useful to the report *I* am planning to prepare on these events are logs of SYN exceeds, and fail2ban logs.
Thanks very much for staying up to date and offering to contribute - there is a real problem someplace, but it seems to be mostly a Problem with a capital P for low-end hardware with 512MB physical RAM, since those are the relays likely to actually crash as a result of the floods.
Best, -Gordon M.
Dan
On 11/01/2013 05:30 PM, Gordon Morehouse wrote:
huh, well, near as I can tell, I didn't get Stable for any time represented yesterday (2013-10-31) for the node VastCatbox.
So maybe that theory is incorrect. In that case I don't know what would trigger the SYN flood behavior other than Roger's idea about becoming an introducer for a popular HS, but... eh... seems like a stretch, a node offering 2.5Mbps that isn't flagged Stable?
-Gordon
On Fri, 1 Nov 2013 13:10:17 +0100, David Serrano tor@dserrano5.es wrote:
On 2013-10-31 10:04:02 (-0700), Gordon Morehouse wrote:
I can't verify it, but my suspicion is this is happening when I get my Stable flag (I have no idea if I'd gotten it back this morning or not) or shortly thereafter.
You can use https://metrics.torproject.org/relay-search.html and enter your IP address to figure that out.
-- David Serrano GnuPG id: 280A01F9 _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
********** *END ENCRYPTED or SIGNED PART* **********
tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
-- http://disman.tl OpenPGP key: http://disman.tl/pgp.asc Fingerprint: 2480 095D 4B16 436F 35AB 7305 F670 74ED BD86 43A9
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
Dan Staples:
I am also running on a Pi Model B, 512MB RAM. How are you logging SYNs?
Ah yes, that's right.
You will find all the magic (very pre-alpha at the moment - it's iptables commands in /etc/rc.local) in contrib/90_slowboards as part of Cipollini:
https://github.com/gordon-morehouse/cipollini/tree/master/contrib/90_slowboa...
I wouldn't bother with fail2ban right now, I've turned it off pending some other experiments with total connection limits on the Pi. I have an open story to investigate making it work, right now it's just too slow on the Pi:
https://www.pivotaltracker.com/story/show/59590860
So, try the iptables rules, change the ports to your ORPort (and DirPort if any). You'll note that there's a LOG target in there - for me it appears in kern.log.
Best, - -Gordon M.
On Sun 03 Nov 2013 11:25:26 AM EST, Gordon Morehouse wrote:
********* *BEGIN ENCRYPTED or SIGNED PART* *********
Dan Staples:
This morning I got my first Tor traffic flood since upgrading to 2.4.x. Logs didn't say anything about not being able to handle the amount of circuit creation requests, but it showed a 200x increase in active TAP circuits (~400k/hour) and the traffic pattern is the same: Advertising 100kb bandwidth, but slammed with ~2Mb traffic.
When I saw it, I checked my relay's flags, and it has the stable flag, and has been tagged stable for at least 3 days. It's been up for 7 days.
I would love to contribute data to help correlate w/ your findings Gordon. Any metrics or logs that would be particularly helpful? I currently use NTop to measure traffic, but it's not very granular.
I'm still trying to scratch together enough time to analyze the logs from the two floods I caught as they began in the past 10 days or so. One thing I am logging, which you're definitely not, is hosts that send SYNs above the limit on my Raspberry Pi. Are you running on a slow machine or a VPS or what? That might not apply to you if you're not running on a slow machine - you may have no need to limit SYNs or anything else, and that's probably the case if your relay did not crash as a result of the flood.
During my last two floods, the relay survived the first (poorly, with fail2ban becoming useless and chewing up half the CPU), and was headshotted by the second - crash in less than 5 minutes.
I'm looking forward to getting the data together and providing a report for the community, but time ... my kingdom for the time to do anything beyond work, sleep, eat, sh*t.
I also currently don't use any iptables rules to throttle, but am happy to experiment with that if you want me to try out any particular configurations.
Depends on the capacity of your hardware. All my experimentation has to do with low-end ARM boards, so the logs most useful to the report *I* am planning to prepare on these events are logs of SYN exceeds, and fail2ban logs.
Thanks very much for staying up to date and offering to contribute - there is a real problem someplace, but it seems to be mostly a Problem with a capital P for low-end hardware with 512MB physical RAM, since those are the relays likely to actually crash as a result of the floods.
Best, -Gordon M.
Dan
On 11/01/2013 05:30 PM, Gordon Morehouse wrote:
huh, well, near as I can tell, I didn't get Stable for any time represented yesterday (2013-10-31) for the node VastCatbox.
So maybe that theory is incorrect. In that case I don't know what would trigger the SYN flood behavior other than Roger's idea about becoming an introducer for a popular HS, but... eh... seems like a stretch, a node offering 2.5Mbps that isn't flagged Stable?
-Gordon
On Fri, 1 Nov 2013 13:10:17 +0100, David Serrano tor@dserrano5.es wrote:
On 2013-10-31 10:04:02 (-0700), Gordon Morehouse wrote:
I can't verify it, but my suspicion is this is happening when I get my Stable flag (I have no idea if I'd gotten it back this morning or not) or shortly thereafter.
You can use https://metrics.torproject.org/relay-search.html and enter your IP address to figure that out.
-- David Serrano GnuPG id: 280A01F9 _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
_______________________________________________
tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
********** *END ENCRYPTED or SIGNED PART* **********
_______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
-- http://disman.tl OpenPGP key: http://disman.tl/pgp.asc Fingerprint: 2480 095D 4B16 436F 35AB 7305 F670 74ED BD86 43A9 _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
On Sun, Oct 20, 2013 at 09:42:01AM -0700, Gordon Morehouse wrote:
With the slower computers, sometimes too many attempts to connect to the ORPort (I am almost positive as part of TAP circuit building, but not *really* sure) can eventually cause Tor to consume more physmem than available and cause the oom-killer to kill Tor. Also, depending on the crappiness of the user's router, it's effectively a SYN flood, and can crash or impair consumer routers.
This doesn't sound like circuit building. It sounds like TLS handshakes.
You see, a new circuit handshake (TAP or NTor) is simply a 512-byte cell sent along an already established TCP connection. So if you're getting flooded by circuit handshakes, it will be traffic (which causes cpu load) but it won't be any new TCP connections.
If you're seeing a bunch of new TCP connections, that sounds like clients trying to establish a new OR connection with you. (And those TLS handshakes are done in the core Tor thread, so having a weak CPU while handling a lot of TLS handshakes will cause your other Tor operations to hiccup.)
My solution, so far, is to define (through trial and error on a per-machine basis, since [1] is only officially supporting 3 SBCs right now) limits on how many SYNs may be sent to the ORPort and the DirPort per second. This is done with iptables. I experimented, tuned the parameters and watched traffic for weeks and came up with a pretty good set of limits for a 950MHz Raspberry Pi: 4 SYNs/sec burst 10. (For those about to say the Pi is thus too slow to be used as a relay, it's quite capable of relaying *at least* 2.5Mbps, but *not* when it's getting SYN flooded.)
My first question is to wonder if this flood of clients connections is coming from a few IP addresses or many IP addresses. And to wonder if it's coming from Tor relays or not.
After watching the data, I noticed that some hosts just try to connect once or twice, or try to connect (during overload conditions) at reasonable intervals of tens of seconds to a few minutes. Other hosts will quadruple-tap the ORPort with SYNs, four in a row, and otherwise be much more aggressive with sending SYNs.
Sounds like you are seeing variations in TCP implementations.
Currently, if a peer violates the 4/sec burst 10 SYN limit more than 5 times in 60 seconds, that peer will be banned for 90 seconds. I'm trying to trim this down to the minimum that will protect the relay, and 90 seconds is a guess given some of my fears, read on...
That brings up a second question: if you *do* let them establish a TLS connection with you, do they stop hammering you? Or do they always want more? How long until they hang up on a connection that you allow to establish.
First, during a SYN flood type overload, some peers which have *existing* circuits built through the relay and are sending SYNs as normal traffic, will stochastically get "caught" in the filter and banned for a short time.
Wait, what? SYN packets are not part of normal traffic for an established connection.
So here's the $64,000 question:
If a tor relay has a circuit built through a peer, and the peer starts dropping 100% of packets, how long will it take before the relay with the circuit "gives up" on the circuit and tears it down?
That depends on the TCP implementation on both sides. I imagine the answer varies widely. Which probably isn't what you wanted to hear.
--Roger
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
Hi Roger, I was hoping you'd get to this eventually. :)
Roger Dingledine:
On Sun, Oct 20, 2013 at 09:42:01AM -0700, Gordon Morehouse wrote:
With the slower computers, sometimes too many attempts to connect to the ORPort (I am almost positive as part of TAP circuit building, but not *really* sure) can eventually cause Tor to consume more physmem than available and cause the oom-killer to kill Tor. Also, depending on the crappiness of the user's router, it's effectively a SYN flood, and can crash or impair consumer routers.
This doesn't sound like circuit building. It sounds like TLS handshakes.
Very good to know.
You see, a new circuit handshake (TAP or NTor) is simply a 512-byte cell sent along an already established TCP connection. So if you're getting flooded by circuit handshakes, it will be traffic (which causes cpu load) but it won't be any new TCP connections.
If you're seeing a bunch of new TCP connections, that sounds like clients trying to establish a new OR connection with you. (And those TLS handshakes are done in the core Tor thread, so having a weak CPU while handling a lot of TLS handshakes will cause your other Tor operations to hiccup.)
This is what's going on, and it's often relatively soon after I get my Stable flag.
My solution, so far, is to define (through trial and error on a per-machine basis, since [1] is only officially supporting 3 SBCs right now) limits on how many SYNs may be sent to the ORPort and the DirPort per second. This is done with iptables. I experimented, tuned the parameters and watched traffic for weeks and came up with a pretty good set of limits for a 950MHz Raspberry Pi: 4 SYNs/sec burst 10. (For those about to say the Pi is thus too slow to be used as a relay, it's quite capable of relaying *at least* 2.5Mbps, but *not* when it's getting SYN flooded.)
My first question is to wonder if this flood of clients connections is coming from a few IP addresses or many IP addresses. And to wonder if it's coming from Tor relays or not.
I was lucky enough to catch a "storm" just starting a couple mornings ago, and am going to try to dissect the logs and my realtime observations and provide a report - I expect it'd be useful to more than just me and my single-board computer project.
After watching the data, I noticed that some hosts just try to connect once or twice, or try to connect (during overload conditions) at reasonable intervals of tens of seconds to a few minutes. Other hosts will quadruple-tap the ORPort with SYNs, four in a row, and otherwise be much more aggressive with sending SYNs.
Sounds like you are seeing variations in TCP implementations.
Yep, that's what I figured.
Currently, if a peer violates the 4/sec burst 10 SYN limit more than 5 times in 60 seconds, that peer will be banned for 90 seconds. I'm trying to trim this down to the minimum that will protect the relay, and 90 seconds is a guess given some of my fears, read on...
That brings up a second question: if you *do* let them establish a TLS connection with you, do they stop hammering you? Or do they always want more? How long until they hang up on a connection that you allow to establish.
I'm not entirely sure yet, and I need to do some log-data crunching. Do you know offhand how long it will take Tor to give up on connecting to a peer if it seems down for a while?
First, during a SYN flood type overload, some peers which have *existing* circuits built through the relay and are sending SYNs as normal traffic, will stochastically get "caught" in the filter and banned for a short time.
Wait, what? SYN packets are not part of normal traffic for an established connection.
I incorrectly assumed that new circuit requests began with a TCP handshake. However, *if* the peer were being flooded, and a peer that was already connected to the relay happened to send 4 SYN packets which arrived after other hosts had exceeded the limit for that given second, the unlucky peer would still get banned. David Serrano suggested an amendment to my iptables rules, which I've implemented, which *may* immunize ESTABLISHED connections from the fail2ban ban; he's helping me piece out whether that actually works or not.
What would be good to know from you is how often already-connected peers would be TCP handshaking to a relay's ORPort or DirPort.
So here's the $64,000 question:
If a tor relay has a circuit built through a peer, and the peer starts dropping 100% of packets, how long will it take before the relay with the circuit "gives up" on the circuit and tears it down?
That depends on the TCP implementation on both sides. I imagine the answer varies widely. Which probably isn't what you wanted to hear.
Is there not a piece in Tor's connect-to-peer code which says "try for N seconds, or P retries, then give up?"
Thanks much for your input.
- -Gordon M.
tor-relays@lists.torproject.org