big spike in cpu usage

List overview All Threads
Download

newer

older

On the way to more diversity

Relay overload

Owen Gunden

5 Apr 2013 5 Apr '13

5:50 p.m.

I have been running a non-exit tor relay for a few months now. It's on a metered VPS, so after some experimenting I found that I can afford about this much bandwidth:

RelayBandwidthRate 250 KB RelayBandwidthBurst 500 KB

This worked fine until recently. However, just today I noticed the machine was sluggish, and sure enough tor was using 99% cpu and dumping these messages to syslog:

Your computer is too slow to handle this many circuit creation requests! Please consider using the MaxAdvertisedBandwidth config option or choosing a more restricted exit policy. [2022 similar message(s) suppressed in last 60 seconds]

Digging deeper, I've seen these messages pop up from time to time over the last 30 days or so, but not consistently, and I've never noticed the cpu pegged (but it could have happened, I don't use the machine that often).

I tried including:

MaxAdvertisedBandwidth 50 KB

and

# https://lists.torproject.org/pipermail/tor-relays/2012-May/001352.html MaxOnionsPending 250

to no avail. I have had to dial all the way down to:

RelayBandwidthRate 40 KB RelayBandwidthBurst 80 KB

and that seems to make the messages go away, but the cpu is still pegged and the machine is sluggish.

It's running on an up-to-date debian squeeze (6.0.7), with tor 0.2.3.25-1~~squeeze+1 and openssl 0.9.8o-4squeeze14.

I run updates fairly often. I upgraded to this version of tor a couple of weeks ago, which could be related, but doesn't correlate perfectly to when I first find the "too slow" messages in my logs. The only other recent updates that seem plausibly related are libbind/libisc/libisccc/etc.

Attachments:

attachment.html (text/html — 2.2 KB)

Show replies by date

Moritz Bartl

6 Apr 6 Apr

1:17 p.m.

On 05.04.2013 19:50, Owen Gunden wrote:

...

Digging deeper, I've seen these messages pop up from time to time over the last 30 days or so, but not consistently, and I've never noticed the cpu pegged (but it could have happened, I don't use the machine that often).

In most cases, Tor spits out this message when it hits the CPU (core) limit. You might want to use munin or something similar to track CPU/memory usage over time.

Depending on the actual VPS limits, 250kb/s and/or 500kb/s burst might be too much for the slice. There's not much you can do about it then (as far as I know). If the VPS is OpenVZ based, check /proc/user_beancounters.

-- Moritz Bartl https://www.torservers.net/

mick

3:24 p.m.

On Fri, 5 Apr 2013 13:50:29 -0400 Owen Gunden ogunden@phauna.org allegedly wrote:

...

I have been running a non-exit tor relay for a few months now. It's on a metered VPS, so after some experimenting I found that I can afford about this much bandwidth:

RelayBandwidthRate 250 KB RelayBandwidthBurst 500 KB

Owen

You don't give details of your VPS, so comparisons may be difficult. But I have the following config options on my main (non-exit) relay:

-------------- NumCPU 1 MaxOnionsPending 300

# rate limit - anything above about 2500 KB seems to cause tor # to invoke oom-killer

BandwidthRate 2100 KB BandwidthBurst 2200 KB ---------------

That relay is on a VM with 512Mb RAM, one CPU slice and 1Gig network connectivity (with unlimited traffic allowance). Stats can be seen at:

https://atlas.torproject.org/#details/C332113DF99E367E4190424CE825057D91337A...

I had the same problems you are seeing until I set the rate limits above and increased MaxOnionsPending to 300. My CPU usage now hovers around 65-85% for about 2000 established tor connections.

Mick

---------------------------------------------------------------------

blog: baldric.net gpg fingerprint: FC23 3338 F664 5E66 876B 72C0 0A1F E60B 5BAD D312

---------------------------------------------------------------------

Roman Mamedov

3:33 p.m.

On Fri, 5 Apr 2013 13:50:29 -0400 Owen Gunden ogunden@phauna.org wrote:

...

This worked fine until recently. However, just today I noticed the machine was sluggish, and sure enough tor was using 99% cpu and dumping these messages to syslog:

You can check if your host has throttled/limited your total CPU usage.

I like this completely unscientific quick test that doesn't require installing any extra software:

dd if=/dev/zero bs=1M count=1024 | md5sum

Should return 250-400 MB/sec on a modern CPU.

Check if you get significantly less. E.g. on one host I had about 80-100 MB only, despite /proc/cpuinfo and the like all showing normal CPU frequencies, so couldn't tell there was any throttling other than from testing.

-- With respect, Roman

N Owen Gunden

7 Apr 7 Apr

6:12 a.m.

On Sat, Apr 06, 2013 at 09:33:11PM +0600, Roman Mamedov wrote:

...

You can check if your host has throttled/limited your total CPU usage.

I like this completely unscientific quick test that doesn't require installing any extra software:

dd if=/dev/zero bs=1M count=1024 | md5sum

Should return 250-400 MB/sec on a modern CPU.

Check if you get significantly less. E.g. on one host I had about 80-100 MB only, despite /proc/cpuinfo and the like all showing normal CPU frequencies, so couldn't tell there was any throttling other than from testing.

40 MB/sec :(.

Granted, the VPS was marketed as a storage server. I simply thought to run tor on there because it mostly sits idle. I believe it is a xen-based VM.

Some more specs: 1G of RAM, cpu is described as 6/6 (SPECint_rate2006/SPECfp_rate2006).

# cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 44 model name : Intel(R) Xeon(R) CPU E5606 @ 2.13GHz stepping : 2 cpu MHz : 2133.476 cache size : 8192 KB fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu de tsc msr pae cx8 sep cmov pat clflush mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc up rep_good aperfmperf pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 popcnt aes hypervisor lahf_lm arat bogomips : 4266.95 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management:

My issue seems to come and go. For example, right now I'm running with:

RelayBandwidthRate 120 KB RelayBandwidthBurst 240 KB

and using all of 1% CPU.

(as a reminder, the other day I was running at 40KB/80KB and pegging the cpu.)

Does tor traffic generally fluctuate a lot with time of day?

N Owen Gunden

6:20 a.m.

On Sun, Apr 07, 2013 at 02:12:22AM -0400, N Owen Gunden wrote:

...

[...]

Also:

# egrep -v "^#|^$" /etc/tor/torrc SocksPort 0 # what port to open for local application connections SocksListenAddress 127.0.0.1 # accept connections only from localhost Log notice syslog ORPort 9001 Address XXXXXXXXX Nickname XXXXXXXXX RelayBandwidthRate 250 KB RelayBandwidthBurst 500 KB MaxAdvertisedBandwidth 50 KB MaxOnionsPending 250 ContactInfo Owen Gunden <ogunden AT phauna dot org> DirPort 9030 # what port to advertise for directory connections DirPortFrontPage /etc/tor/tor-exit-notice.html ExitPolicy reject *:* # no exits allowed

Miłosz Gaczkowski

10:47 a.m.

On 07/04/2013 08:12, N Owen Gunden wrote:

...

Does tor traffic generally fluctuate a lot with time of day?

I'm pretty new to tor, but in my experience it does fluctuate quite a lot. With my settings of:

RelayBandwidthRate 1536 KB RelayBandwidthBurst 2048 KB

it tends to switch between two "modes": hogging all of 1.5MiB/s of bandwidth very consistently (and using a lot of CPU power), and dancing somewhere between 0.5MiB/s-1MiB/s (and using hardly any CPU). It seems pretty regular to me, usually hogging between 14:00 and 24:00 UTC and getting more relaxed outside of these times.

Andreas Krey

5:08 p.m.

On Sun, 07 Apr 2013 12:47:05 +0000, Mi??osz Gaczkowski wrote:

...

On 07/04/2013 08:12, N Owen Gunden wrote:

...
Does tor traffic generally fluctuate a lot with time of day?

I'm pretty new to tor, but in my experience it does fluctuate quite a lot. With my settings of:

RelayBandwidthRate 1536 KB RelayBandwidthBurst 2048 KB

I use 'RelayBandwidthBurst 1000 MB', as the point of my RelayBandwidthRate is to keep below the traffic limit of my VPS. I have no problem of accumulating lots of unused bandwith for later bursts (which then can go over the BandwithRate for some time).

...

it tends to switch between two "modes": hogging all of 1.5MiB/s of bandwidth very consistently (and using a lot of CPU power), and dancing somewhere between 0.5MiB/s-1MiB/s (and using hardly any CPU).

Hmm. The part that strikes me as strange is the 'a lot of CPU power'. The CPU usage when in limit mode ('flatline') shouldn't be much higher than when being shortly below the limit; there may be an actual problem with the code there. (Otherwise, the pegging to the BandwithRate for hours at a time is normal - don't know if it happens similarly for nodes that have no explicit BandwithRate and are only limited by their physical capability.)

...

It seems pretty regular to me, usually hogging between 14:00 and 24:00 UTC and getting more relaxed outside of these times.

That is more or less a daily pattern, but the correlation between the two relays (non-exit) I operate at least aren't very obvious; see https://twitter.com/akrey/status/320943399752564736/photo/1 for a four-day period dump.

Also, on the 'he' relay I use a lower advertized bandwidth than the actual RelayBandwidthRate so the node needs to be somewhat overbooked before it actually 'flatlines'. Currently this seems to be successful, but it isn't always. (When in 'flatline' a relay IMHO imparts a substantially higher roundtrip time on the circuits through it, so, for usability, I'd like to avoid that.)

The 'hz' plot also shows the actual bursts that go over the RelayBandwidthRate.

Andreas

-- "Totally trivial. Famous last words." From: Linus Torvalds <torvalds@*.org> Date: Fri, 22 Jan 2010 07:29:21 -0800

Moritz Bartl

5:42 p.m.

On 07.04.2013 19:08, Andreas Krey wrote:

...

I use 'RelayBandwidthBurst 1000 MB', as the point of my RelayBandwidthRate is to keep below the traffic limit of my VPS. I have no problem of accumulating lots of unused bandwith for later bursts (which then can go over the BandwithRate for some time).

1000 MB (per second!) is not a useful setting. (Relay)BandwidthBurst should ideally reflect the maximum actual line speed.

-- Moritz Bartl https://www.torservers.net/

Andreas Krey

6:25 p.m.

On Sun, 07 Apr 2013 19:42:25 +0000, Moritz Bartl wrote: ...

...

1000 MB (per second!) is not a useful setting.

No, its not 'per second'. It is the amount of allowed traffic that can be saved up while not hitting the BandwidthRate to be used up when the BandwidthRate is exceeded. Using up that savings may happen must faster or much slower than a second depending on settings and use; and it's doesn't make sense to label the Burst in 'per seconds' just like it doesn't make sense to label your credit limit in 'dollars per month'.

In my case, I only care that my average bandwith usage doesn't exceed, say, 1 TB/month; the resulting BandwithRate is 385 KB/s. But I don't mind it transferring much more as long as this is compensated by earlier unused BandwithRate. So I don't see a reason why I shouldn't set the Burst to 1 GB or even 100 GB. (As long as the authorities don't take the higher traffic as a hint to advertise my relay with more than the set BandwithRate.)

...

(Relay)BandwidthBurst should ideally reflect the maximum actual line speed.

That is only useful when you want to save up some bandwith on a DSL link for your own use; then a big burst would clog you line. (And I guess Burst=0 would be the proper thing in that case, unless the implementation is weird about that.)

Andreas

-- "Totally trivial. Famous last words." From: Linus Torvalds <torvalds@*.org> Date: Fri, 22 Jan 2010 07:29:21 -0800

Miłosz Gaczkowski

7:35 p.m.

On 07/04/2013 20:25, Andreas Krey wrote:

...

No, its not 'per second'. [...]

Oh, wow, looks like I completely misunderstood what RelayBandwidthBurst does. I assumed it's a burst rate that would be occasionally allowed in peak times, not a "credit limit". If you're sure your description is correct, I may need to reconfigure my node.

mick

8:34 p.m.

On Sun, 07 Apr 2013 21:35:36 +0200 Miłosz Gaczkowski milosz@omgomg.eu allegedly wrote:

...

On 07/04/2013 20:25, Andreas Krey wrote:

...
No, its not 'per second'. [...]

Oh, wow, looks like I completely misunderstood what RelayBandwidthBurst does. I assumed it's a burst rate that would be occasionally allowed in peak times, not a "credit limit". If you're sure your description is correct, I may need to reconfigure my node.

Errr. Me too.

My RelayBandwidthBurst limit is set on the assumtion that that is the max I will ever see (and allow).

Confused.

Mick ---------------------------------------------------------------------

blog: baldric.net gpg fingerprint: FC23 3338 F664 5E66 876B 72C0 0A1F E60B 5BAD D312

---------------------------------------------------------------------

Moritz Bartl

8 Apr 8 Apr

5:52 a.m.

On 07.04.2013 20:25, Andreas Krey wrote:

...

No, its not 'per second'. It is the amount of allowed traffic that can be saved up while not hitting the BandwidthRate to be used up when the BandwidthRate is exceeded.

Wow. Thanks. All these years I completely misunderstood Tor's Burst settings. Sad too that nobody bothered to check our torrc and tell us. :(

-- Moritz Bartl https://www.torservers.net/

Sebastian Hahn

6:47 a.m.

On Apr 8, 2013, at 7:52 AM, Moritz Bartl moritz@torservers.net wrote:

...

On 07.04.2013 20:25, Andreas Krey wrote:

...
No, its not 'per second'. It is the amount of allowed traffic that can be saved up while not hitting the BandwidthRate to be used up when the BandwidthRate is exceeded.

Wow. Thanks. All these years I completely misunderstood Tor's Burst settings. Sad too that nobody bothered to check our torrc and tell us. :(

Please don't make such assertions. People did check the torrc, and found nothing to be wrong with it. For all I know and can tell from the source, Tor's Burst settings indeed limit the amount of traffic you can send in a single second. Setting it to higher than your line speed doesn't help anything, and the bucket gets refilled to the burst anyway.

This is a comment from src/or/or.h:

...

uint64_t BandwidthRate; /**< How much bandwidth, on average, are we willing * to use in a second? */ uint64_t BandwidthBurst; /**< How much bandwidth, at maximum, are we willing * to use in a second? */

Now, it's entirely possible I'm missing something big here; or that the code changed and now does something different; or that it used to do something different, etc. Andreas, can you please explain more?

Thanks Sebastian

Andreas Krey

5:41 p.m.

On Mon, 08 Apr 2013 08:47:56 +0000, Sebastian Hahn wrote:

...

Now, it's entirely possible I'm missing something big here; or that the code changed and now does something different; or that it used to do something different, etc. Andreas, can you please explain more?

At least the original change explains different:

+--- ReleaseNotes ----- | Changes in version 0.0.2pre20 - 2004-01-30 | ... | | - I've split the TotalBandwidth option into BandwidthRate (how many | bytes per second you want to allow, long-term) and | BandwidthBurst (how many bytes you will allow at once before the cap | kicks in). This better token bucket approach lets you, say, set | BandwidthRate to 10KB/s and BandwidthBurst to 10MB, allowing good | performance while not exceeding your monthly bandwidth quota. +---------------------

..which is pretty much my usage scenario, just with smaller numbers.

And the code looks likewise. We have the global_*_buckets that are initialized from *BandwidthBurst, and get incremented regularly by *BandwidthRate (divide by increment frequency; TokenBucketRefillInterval) and then capped to the *BandwidthBurst.

Thus *BandwidthBurst ist the total amount of unused traffic we can save up to later fire with more than *BandwidthRate. No 'per second'.

(The interesting part is that the global_*_bucket are ints; much more than the 1 GB default could behave strangely.)

Andreas

-- "Totally trivial. Famous last words." From: Linus Torvalds <torvalds@*.org> Date: Fri, 22 Jan 2010 07:29:21 -0800

Matt Joyce

12 Apr 12 Apr

12:58 p.m.

On 08/04/13 18:41, Andreas Krey wrote:

...

On Mon, 08 Apr 2013 08:47:56 +0000, Sebastian Hahn wrote: ...

...
Now, it's entirely possible I'm missing something big here; or that the code changed and now does something different; or that it used to do something different, etc. Andreas, can you please explain more?

At least the original change explains different:

+--- ReleaseNotes ----- | Changes in version 0.0.2pre20 - 2004-01-30 | ... | | - I've split the TotalBandwidth option into BandwidthRate (how many | bytes per second you want to allow, long-term) and | BandwidthBurst (how many bytes you will allow at once before the cap | kicks in). This better token bucket approach lets you, say, set | BandwidthRate to 10KB/s and BandwidthBurst to 10MB, allowing good | performance while not exceeding your monthly bandwidth quota. +---------------------

..which is pretty much my usage scenario, just with smaller numbers.

And the code looks likewise. We have the global_*_buckets that are initialized from *BandwidthBurst, and get incremented regularly by *BandwidthRate (divide by increment frequency; TokenBucketRefillInterval) and then capped to the *BandwidthBurst.

Thus *BandwidthBurst ist the total amount of unused traffic we can save up to later fire with more than *BandwidthRate. No 'per second'.

(The interesting part is that the global_*_bucket are ints; much more than the 1 GB default could behave strangely.)

Andreas

Wouldn't using the AccountingMax and AccountingStart configuration options be more suitable if the objective is to enforce a bandwidth limitation over timeframes of a day/week/month instead of trying to do it this way which limits burst durations, using the accounting options it would seem to me that it's more flexable to the needs of the network as the accounting options were designed for this purpose.

In particular I note the following (src tor manpage) emphasis is mine:

AccountingMax N bytes|KBytes|MBytes|GBytes|TBytes Never send more than the specified number of bytes in a given accounting period, or receive more than that number in the period. For example, with AccountingMax set to 1 GByte, a server could send 900 MBytes and receive 800 MBytes and continue running. It will only hibernate once one of the two reaches 1 GByte. When the number of bytes gets low, Tor will stop accepting new connections and circuits. When the number of bytes is exhausted, Tor will hibernate until some time in the next accounting period. To prevent all servers from waking at the same time, Tor will also wait until a random point in each period before waking up. If you have bandwidth cost issues, ***enabling hibernation is preferable to setting a low bandwidth, since it provides users with a collection of fast servers that are up some of the time, which is more useful than a set of slow servers that are always "available".***

AccountingStart day|week|month [day] HH:MM Specify how long accounting periods last. If month is given, each accounting period runs from the time HH:MM on the dayth day of one month to the same day and time of the next. (The day must be between 1 and 28.) If week is given, each accounting period runs from the time HH:MM of the dayth day of one week to the same day and time of the next week, with Monday as day 1 and Sunday as day 7. If day is given, each accounting period runs from the time HH:MM each day to the same time on the next day. All times are local, and given in 24-hour time. (Default: "month 1 0:00")

Essentially this way you can give tor a "budget" for the month for example to not to exceed but there are no constraints on when or how long it can burst thus while capacity remains in your package subject to the limits you have set the network can use the capacity whenever it is required by the needs of the network. Before using this however do check whether your provider counts upload download or both, if they count both you would need to set AccountingMax to half the stated value in order to ensure you remain bellow the limit.

4227

Age (days ago)

4234

Last active (days ago)

tor-relays@lists.torproject.org

15 comments

9 participants

tags (0)

participants (9)

Andreas Krey
Matt Joyce
mick
Miłosz Gaczkowski
Moritz Bartl
N Owen Gunden
Owen Gunden
Roman Mamedov
Sebastian Hahn