Greetings,
I am a bit of a noob here so please bear with me. I ran a relay using Ubuntu with very few issues however I decide to add an additional relay and decided to use FreeBSD. They will only run for around 18 hours and then they shut down. I have adjust the torrc file every way I know how and increased the Max vnodes thinking this may have been my issue. I can post the sysrc and torrc if needed. Anyone that might help me figure this out I would be grateful otherwise I am going to reluctantly move them both back to Ubuntu.
Thanks, Dan
Sent from ProtonMail for iOS
xplato xplato@protonmail.com wrote on 2021-03-30:
I am a bit of a noob here so please bear with me. I ran a relay using Ubuntu with very few issues however I decide to add an additional relay and decided to use FreeBSD. They will only run for around 18 hours and then they shut down. I have adjust the torrc file every way I know how and increased the Max vnodes thinking this may have been my issue. I can post the sysrc and torrc if needed. Anyone that might help me figure this out I would be grateful otherwise I am going to reluctantly move them both back to Ubuntu.
What do you mean by "they" and "shutting down"?
Seeing the torrc and /etc/sysctl.conf indeed could be useful.
Please also check the system and tor logs to see if there are any relevant log messages.
If you don't already, you may want to start monitoring tor's and the system's resource usage.
Fabian
Fabian Keil freebsd-listen@fabiankeil.de wrote:
xplato xplato@protonmail.com wrote on 2021-03-30:
I am a bit of a noob here so please bear with me. I ran a relay using Ubuntu with very few issues however I decide to add an additional relay and decided to use FreeBSD. They will only run for around 18 hours and then they shut down. I have adjust the torrc file every way I know how and increased the Max vnodes thinking this may have been my issue. I can post the sysrc and torrc if needed. Anyone that might help me figure this out I would be grateful otherwise I am going to reluctantly move them both back to Ubuntu.
What do you mean by "they" and "shutting down"?
It exits on signal 6. See the tail end of the log excerpt I included in my posting about 0.4.6.1-alpha being very noisy. N.B. 0.4.5.7 also has the same bug. I have browsed a bit in the consensus documents, looking for relays running those versions of tor. What I found was that the ones running those versions on LINUX systems often have much longer uptimes, but I didn't find any FreeBSD systems running them with uptimes much over one day, which is consistent with them failing in short order on FreeBSD systems. I did not look for those versions on the other BSDs, but such systems are generally much rarer than those on FreeBSD, except perhaps for MacOS.
Seeing the torrc and /etc/sysctl.conf indeed could be useful.
Possibly, but probably not. Take a look at the crash messages that tor usually writes to its notice-level log, which look like they have something to do with authority code, but we, of course, do not run authority relays. I think I did have one crash that did not write that batch of messages to syslog, but usually it does.
Please also check the system and tor logs to see if there are any relevant log messages.
I posted such in my previous two postings.
If you don't already, you may want to start monitoring tor's and the system's resource usage.
xplato's relay may differ from mine, but mine isn't using much these days because the authorities have a) stopped giving mine an HSDir flag for the last two months or so after giving it seemingly randomly for months before that, and b) been giving or withholding a Fast flag seemingly also at random. This is on a relay that often runs at 300 KB/s to 400+ KB/s, both in and out. The whole mess makes me wonder whether it's worth my bothering to maintain and run a relay anymore. Can you tell us why the consensus documents on Sunday showed *all* authorities with Bandwidth=20 Unmeasured=1 for some hours? What caused such an alarming situation?
Scott
Scott Bennett bennett@sdf.org wrote on 2021-03-30:
Fabian Keil freebsd-listen@fabiankeil.de wrote:
xplato xplato@protonmail.com wrote on 2021-03-30:
I am a bit of a noob here so please bear with me. I ran a relay using Ubuntu with very few issues however I decide to add an additional relay and decided to use FreeBSD. They will only run for around 18 hours and then they shut down. I have adjust the torrc file every way I know how and increased the Max vnodes thinking this may have been my issue. I can post the sysrc and torrc if needed. Anyone that might help me figure this out I would be grateful otherwise I am going to reluctantly move them both back to Ubuntu.
What do you mean by "they" and "shutting down"?
It exits on signal 6. See the tail end of the log excerpt I
included in my posting about 0.4.6.1-alpha being very noisy. N.B. 0.4.5.7 also has the same bug.
It's not obvious to me that you are experiencing the same issue.
I have several non-exit relays running 0.4.5.7 on ElectroBSD systems based on FreeBSD 11.4-STABLE and they appear to be stable.
One example:
Mar 30 08:50:48.691 [notice] {HEARTBEAT} Heartbeat: Tor's uptime is 13 days 0:00 hours, with 14270 circuits open. I've sent 11283.88 GB and received 11200.63 GB. I've received 3494368 connections on IPv4 and 0 on IPv6. I've made 653389 connections with IPv4 and 0 with IPv6.
While I'm also seeing BUG messages they are different from yours and are probably related to the MaxMemInQueues settings:
Mar 30 07:10:01.071 [notice] {CONTROL} New control connection opened from 127.0.1.5. Mar 30 07:10:09.457 [notice] {GENERAL} We're low on memory (cell queues total alloc: 952512 buffer total alloc: 2557952, tor compress total alloc: 292862049 (zlib: 0, zstd: 0, lzma: 292862001), rendezvous cache total alloc: 23736810). Killing circuits withover-long queues. (This behavior is controlled by MaxMemInQueues.) Mar 30 07:10:09.531 [notice] {GENERAL} Removed 1147072 bytes by killing 17401 circuits; 0 circuits remain alive. Also killed 2 non-linked directory connections. Mar 30 07:10:09.532 [warn] {BUG} channel_flush_from_first_active_circuit: Bug: Found a supposedly active circuit with no cells to [...] Mar 30 07:10:09.532 [warn] {BUG} channel_flush_from_first_active_circuit: Bug: Found a supposedly active circuit with no cells to send. Trying to recover. (on Tor 0.4.5.7 )
I intentionally reduced the MaxMemInQueues value before the upgrade as the server became unresponsive due to a lack of mbuf clusters.
I wrote about it here: https://www.fabiankeil.de/blog-surrogat/2021/03/14/website-ausfall-durch-mbu... (The text is in German but the Munin graphs and the log messages aren't).
I'm using Tor 0.4.6.1 alpha as a client and on a couple of servers to provide onion services and haven't seen any crashes there either.
Please also check the system and tor logs to see if there are any relevant log messages.
I posted such in my previous two postings.
Again, I think it's premature to conclude that you are experiencing the same issue.
If you don't already, you may want to start monitoring tor's and the system's resource usage.
xplato's relay may differ from mine, but mine isn't using much these
days because the authorities have a) stopped giving mine an HSDir flag for the last two months or so after giving it seemingly randomly for months before that, and b) been giving or withholding a Fast flag seemingly also at random. This is on a relay that often runs at 300 KB/s to 400+ KB/s, both in and out. The whole mess makes me wonder whether it's worth my bothering to maintain and run a relay anymore. Can you tell us why the consensus documents on Sunday showed *all* authorities with Bandwidth=20 Unmeasured=1 for some hours? What caused such an alarming situation?
I haven't looked at the consensus documents from Sunday and don't run bandwidth scanners so I have no information on this.
Fabian
xplato xplato@protonmail.com wrote:
I am a bit of a noob here so please bear with me. I ran a relay using Ubuntu with very few issues however I decide to add an additional relay and decided to use FreeBSD. They will only run for around 18 hours and then they shut down. I have adjust the torrc file every way I know how and increased the Max vnodes thinking this may have been my issue. I can post the sysrc and torrc if needed. Anyone that might help me figure this out I would be grateful otherwise I am going to reluctantly move them both back to Ubuntu.
Thanks for attempting to run a relay in a diversifying environment. Unfortunately, your timing has left you high and dry. What is happening is not your fault. Both 0.4.5.7 and 0.4.6.1-alpha have a fatal bug. I don't know which version of tor introduced it because I have been running 0.4.5.2-alpha until a few days ago, when I "upgraded" to 0.4.6.1-alpha. That crashed in under 24 hours twice, so I switched to the allegedly "stable" version, 0.4.5.7 only to have it crash the same way in about 12 or 13 hours. Also, 0.4.6.1-alpha was filling its notice-level log file with many bug messages the whole time it was running. I posted a message on this list about it, including the last few hours of the log up through the crash messages. At the crash, the console log (also in /var/log/messages) had a message like this.
Mar 27 18:29:26 hellas kernel: pid 17047 (tor), jid 0, uid 256: exited on signal 6
After the second crash I made the switch mentioned above. It has been crashing ever since, and I am getting very tired of discovering that it has been down for several hours. There have been no responses to my earlier posting, but your query is on the same topic, if a different Subject: line, so I am replying. This sort of thing has not hit me in tor in many years, and I hope the tor project developers will fix it ASAP. My suggestion is that you hold off a bit and try again once the bug has been squashed. Alternatively, if you a different BSD system (i.e., DragonflyBSD, NetBSD, or OpenBSD), running your relay on that system would help diversify the systems running relays as much as, or more than, running it on FreeBSD. IOW, it depends on how urgently you want to get a relay up and running vs. your level of patience to wait for a bug fix. Like I wrote above, this situation hardly ever happens, so if you can be patient, it may be fixed within a few days.
Scott Bennett, Comm. ASMELG, CFIAG ********************************************************************** * Internet: bennett at sdf.org *xor* bennett at freeshell.org * *--------------------------------------------------------------------* * "A well regulated and disciplined militia, is at all times a good * * objection to the introduction of that bane of all free governments * * -- a standing army." * * -- Gov. John Hancock, New York Journal, 28 January 1790 * **********************************************************************
I am running two relays and the error message is the same for both:
Mar 30 08:13:01 freebsd kernel: pod 1745 (tor) , jid 0 , uid 256, was killed: out of swap space
If I run # dd if=/dev/zero of=/usr/swap0 bs=1m count=512
#chmod 0600 /usr/swap0
#swapon -aL
Will that fix the error above?
Thanks!
Sent from ProtonMail for iOS
On Tue, Mar 30, 2021 at 1:31 AM, Scott Bennett bennett@sdf.org wrote:
xplato xplato@protonmail.com wrote:
I am a bit of a noob here so please bear with me. I ran a relay using Ubuntu with very few issues however I decide to add an additional relay and decided to use FreeBSD. They will only run for around 18 hours and then they shut down. I have adjust the torrc file every way I know how and increased the Max vnodes thinking this may have been my issue. I can post the sysrc and torrc if needed. Anyone that might help me figure this out I would be grateful otherwise I am going to reluctantly move them both back to Ubuntu.
Thanks for attempting to run a relay in a diversifying environment. Unfortunately, your timing has left you high and dry. What is happening is not your fault. Both 0.4.5.7 and 0.4.6.1-alpha have a fatal bug. I don't know which version of tor introduced it because I have been running 0.4.5.2-alpha until a few days ago, when I "upgraded" to 0.4.6.1-alpha. That crashed in under 24 hours twice, so I switched to the allegedly "stable" version, 0.4.5.7 only to have it crash the same way in about 12 or 13 hours. Also, 0.4.6.1-alpha was filling its notice-level log file with many bug messages the whole time it was running. I posted a message on this list about it, including the last few hours of the log up through the crash messages. At the crash, the console log (also in /var/log/messages) had a message like this.
Mar 27 18:29:26 hellas kernel: pid 17047 (tor), jid 0, uid 256: exited on signal 6
After the second crash I made the switch mentioned above. It has been crashing ever since, and I am getting very tired of discovering that it has been down for several hours. There have been no responses to my earlier posting, but your query is on the same topic, if a different Subject: line, so I am replying. This sort of thing has not hit me in tor in many years, and I hope the tor project developers will fix it ASAP. My suggestion is that you hold off a bit and try again once the bug has been squashed. Alternatively, if you a different BSD system (i.e., DragonflyBSD, NetBSD, or OpenBSD), running your relay on that system would help diversify the systems running relays as much as, or more than, running it on FreeBSD. IOW, it depends on how urgently you want to get a relay up and running vs. your level of patience to wait for a bug fix. Like I wrote above, this situation hardly ever happens, so if you can be patient, it may be fixed within a few days.
Scott Bennett, Comm. ASMELG, CFIAG
- Internet: bennett at sdf.org *xor* bennett at freeshell.org *
*--------------------------------------------------------------------*
- "A well regulated and disciplined militia, is at all times a good *
- objection to the introduction of that bane of all free governments *
- -- a standing army." *
- -- Gov. John Hancock, New York Journal, 28 January 1790 *
xplato xplato@protonmail.com wrote:
I am running two relays and the error message is the same for both:
Mar 30 08:13:01 freebsd kernel: pod 1745 (tor) , jid 0 , uid 256, was killed: out of swap space
Oh. Then Fabian may be right. Assuming that you already have what should be an adequate amount of swap space available, then this could be due to one of the cluster of memory management bugs introduced into the FreeBSD kernel in 11.2-RELEASE and remains in 12.x and very likely will be in the upcoming 13.0.
If I run # dd if=/dev/zero of=/usr/swap0 bs=1m count=512
Note that, while once upon a time 512 MB was a large amount of swap space, in modern times it is almost trivial and inconsequential.
#chmod 0600 /usr/swap0
#swapon -aL
Will that fix the error above?
It might alleviate it for a short time, but if the problem is due to those bugs, it likely will make little or no difference. The reason for that is that the message is partly erroneous; i.e., it is correct that the OOM killer has killed the process, but it is incorrect that it was out of swap space. Having watched those bugs in action for several years now, what I can tell you is that a lot of pagefixing is going on, but very little pagefreeing happens later. Processes being killed with that error message is just one symptom, and it can be a real problem, for example, if xorg is running and gets killed, leaving the console inaccessible. Another symptom is that, one by one, processes stop doing anything because they get swapped out due to the shortage of page frames on the free list. The kernel will not begin to page processes back in unless there is at least ~410 MB on the free list, so the system ends up with nothing running, not even shells, because everything is marked as swapped out. If what is happening to tor on your system, then increasing swap space likely will have no effect because swap space is not really where the shortage exists. The shortage is on the free list. There are some things that you can do in 11.4 that will minimize the situations where the memory management problems take over the system. You can set a sysctl tunable that may help a bit. Unfortunately, vm.max_wired no longer does anything and is a red herring. You can try to limit kernel memory by setting vm.kmem_size_max to some value considerably less than the size of real memory on your system. Although the system does not honor this limit either, it may still have a minor influence on how much the kernel uses. I think I set mine to 4 GB on an 8 GB machine. This should be set in /boot/loader.conf. In /etc/sysctl.conf there are several variables that should each help a little more. If you use ZFS, you can try limiting the size of the ARC by setting vfs.zfs.arc_max. After setting that, you may see the ARC grow to as much as ~200 MB more than the size you set the limit to, but it doesn't really go beyond that, so it does work after a fashion. Just allow for that extra couple of hundred megabytes or so. Next is vm.v_free_min, which on my system defaults to 65536, and I have increased that to 98304. Then there is this very important one: vm.pageout_wakeup_thresh=112640. Its default value is only 14124, a far cry from the ~410 MB needed on the free list for the kernel to begin paging a swapped process back into memory. (112640 pages are 440 MB, so it gives a tiny bit of leeway to the pagedaemon to get to work before the free list gets too low.) Lastly, set vm.pageout_oom_seq=102400000 to prevent the OOM killer from killing your processes. This value is the number of complete passes through memory the pagedaemon must make in its attempt to free enough memory to satisfy the current demand for free page frames before it calls the OOM killer. Setting the value that high means that the pagedaemon never will get through that may passes, so the OOM killer never gets called. After setting this one you may occasionally see the pagedaemon using all of one core's CPU time for a while, possibly a *long* while, but it should protect your processes from being killed due to the collection of memory management bugs. With all of the variables mentioned above set to better values you may still see the system slowly grind down to an idle state. This can happen due to the kernel prioritizing the keeping of unused file system buffers in memory over swapping processes in to get actual work done. In such a case, manual intervention is required to free up page frames. For example, if you confine your ccache directory trees to a UFS file system, the system will quickly accumulate a lot of buffers it doesn't want to let go of. The same holds true for portmaster's $WRKDIRPREFIX, where a "portmaster -a" to update your ports will tie up a large number of buffers. buildworld and buildkernel are also culprits. The file system buffers can be forcibly freed up, thereby freeing page frames occupied by the file system buffers, by unmounting the UFS file system. (It can be remounted after waiting a few seconds to make sure that the free list has been updated, which you can watch for with top(1).) Another trick can be used if you have a longrunning process that uses at least several hundred megabytes of memory. If you can temporarily shut such a process down or otherwise get it to free up its memory, the system will begin paging in swapped processes, after which you can restart the one you halted. Often this action shakes up memory management enough that things in your system will go on recovering or at least begin doing some work again. For example, I run mprime with four worker threads, two of which typically are using 300 MB to 400 MB each. I can temporarily stop those two workers to free up plenty of page frames to get swapped stuff brought back in. Then I can restart those worker threads, and usually things will gradually return to normal. However, this is a special case that is not available if you don't have such a process to manipulate in this manner.
Scott
xplato xplato@protonmail.com wrote on 2021-03-30:
I am running two relays and the error message is the same for both:
Mar 30 08:13:01 freebsd kernel: pod 1745 (tor) , jid 0 , uid 256, was killed: out of swap space
If I run # dd if=/dev/zero of=/usr/swap0 bs=1m count=512
#chmod 0600 /usr/swap0
#swapon -aL
Will that fix the error above?
Is /usr/swap0 already referenced in /etc/fstab?
How much RAM and swap space is currently available? Are both relays running on the same system?
Before adjusting the swap space I'd try experimenting with MaxMemInQueues. The auto-tuning is based on the system's memory and may result in a value that is too high if the system's memory is also needed elsewhere.
If you then still need to increase the swap space you'll probably have to increase it by more than 512 MB.
On one of my relays where I've set "MaxMemInQueues 300MB" the tor process currently consumes 870 MB of "real memory" and the memory usage is still slowly increasing.
Fabian
On Tue, Mar 30, 2021 at 02:36:36AM +0000, xplato wrote:
Greetings,
I am a bit of a noob here so please bear with me. I ran a relay using Ubuntu with very few issues however I decide to add an additional relay and decided to use FreeBSD. They will only run for around 18 hours and then they shut down. I have adjust the torrc file every way I know how and increased the Max vnodes thinking this may have been my issue. I can post the sysrc and torrc if needed. Anyone that might help me figure this out I would be grateful otherwise I am going to reluctantly move them both back to Ubuntu.
Emerald Onion runs over twenty Tor exit nodes on HardenedBSD 12 and 13. Given Tor's need for security, you might want to consider using HardenedBSD, a derivative of FreeBSD that implements exploit mitigations and security hardening technologies. FreeBSD's state of security leaves much to be desired. Tor's relay operators and users really should at least have exploit mitigations like ASLR and W^X applied.
Thanks,
On 30-03-2021 15:47, Shawn Webb wrote:
On Tue, Mar 30, 2021 at 02:36:36AM +0000, xplato wrote:
Greetings,
I am a bit of a noob here so please bear with me. I ran a relay using Ubuntu with very few issues however I decide to add an additional relay and decided to use FreeBSD. They will only run for around 18 hours and then they shut down. I have adjust the torrc file every way I know how and increased the Max vnodes thinking this may have been my issue. I can post the sysrc and torrc if needed. Anyone that might help me figure this out I would be grateful otherwise I am going to reluctantly move them both back to Ubuntu.
Emerald Onion runs over twenty Tor exit nodes on HardenedBSD 12 and 13. Given Tor's need for security, you might want to consider using HardenedBSD, a derivative of FreeBSD that implements exploit mitigations and security hardening technologies. FreeBSD's state of security leaves much to be desired. Tor's relay operators and users really should at least have exploit mitigations like ASLR and W^X applied.
But it won't fix the problem at hand, unless memory management in HardenedBSD is different than in FreeBSD.
René
On Wed, Mar 31, 2021 at 01:09:45PM +0200, René Ladan wrote:
On 30-03-2021 15:47, Shawn Webb wrote:
On Tue, Mar 30, 2021 at 02:36:36AM +0000, xplato wrote:
Greetings,
I am a bit of a noob here so please bear with me. I ran a relay using Ubuntu with very few issues however I decide to add an additional relay and decided to use FreeBSD. They will only run for around 18 hours and then they shut down. I have adjust the torrc file every way I know how and increased the Max vnodes thinking this may have been my issue. I can post the sysrc and torrc if needed. Anyone that might help me figure this out I would be grateful otherwise I am going to reluctantly move them both back to Ubuntu.
Emerald Onion runs over twenty Tor exit nodes on HardenedBSD 12 and 13. Given Tor's need for security, you might want to consider using HardenedBSD, a derivative of FreeBSD that implements exploit mitigations and security hardening technologies. FreeBSD's state of security leaves much to be desired. Tor's relay operators and users really should at least have exploit mitigations like ASLR and W^X applied.
But it won't fix the problem at hand, unless memory management in HardenedBSD is different than in FreeBSD.
Memory management is indeed different in HardenedBSD than in FreeBSD. HardenedBSD implemented a clean-room version of grsecurity's PaX ASLR. FreeBSD's version of ASLR, more appropriately called ASR, has known issues. HardenedBSD's does not.
Thanks,
Hi Shawn,
I looked at HardenedBSD and have actually moved to a different VPS so that can I use HBSD. FreeBSD was the only option I had at the time but both instances crashed repeatedly and it got so frustrating that I gave up on FreeBSD. I will give HardenedBSD a go.
Cheers, Dan
Sent from ProtonMail for iOS
On Wed, Mar 31, 2021 at 10:12 AM, Shawn Webb shawn.webb@hardenedbsd.org wrote:
On Wed, Mar 31, 2021 at 01:09:45PM +0200, René Ladan wrote:
On 30-03-2021 15:47, Shawn Webb wrote:
On Tue, Mar 30, 2021 at 02:36:36AM +0000, xplato wrote:
Greetings,
I am a bit of a noob here so please bear with me. I ran a relay using Ubuntu with very few issues however I decide to add an additional relay and decided to use FreeBSD. They will only run for around 18 hours and then they shut down. I have adjust the torrc file every way I know how and increased the Max vnodes thinking this may have been my issue. I can post the sysrc and torrc if needed. Anyone that might help me figure this out I would be grateful otherwise I am going to reluctantly move them both back to Ubuntu.
Emerald Onion runs over twenty Tor exit nodes on HardenedBSD 12 and 13. Given Tor's need for security, you might want to consider using HardenedBSD, a derivative of FreeBSD that implements exploit mitigations and security hardening technologies. FreeBSD's state of security leaves much to be desired. Tor's relay operators and users really should at least have exploit mitigations like ASLR and W^X applied.
But it won't fix the problem at hand, unless memory management in HardenedBSD is different than in FreeBSD.
Memory management is indeed different in HardenedBSD than in FreeBSD. HardenedBSD implemented a clean-room version of grsecurity's PaX ASLR. FreeBSD's version of ASLR, more appropriately called ASR, has known issues. HardenedBSD's does not.
Thanks,
-- Shawn Webb Cofounder / Security Engineer HardenedBSD
https://git.hardenedbsd.org/hardenedbsd/pubkeys/-/raw/master/Shawn_Webb/03A4... _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
Sounds good. If you still can't get your relay to have an uptime longer than eighteen hours, feel free to reach out directly to me and I can help address that. The non-exit relay I'm running from home has an uptime greater than eighteen, so we at least have a "reference implementation" to work with.
Thanks,
xplato xplato@protonmail.com schreef op 30 maart 2021 04:36:36 CEST:
Greetings,
I am a bit of a noob here so please bear with me. I ran a relay using Ubuntu with very few issues however I decide to add an additional relay and decided to use FreeBSD. They will only run for around 18 hours and then they shut down. I have adjust the torrc file every way I know how and increased the Max vnodes thinking this may have been my issue. I can post the sysrc and torrc if needed. Anyone that might help me figure this out I would be grateful otherwise I am going to reluctantly move them both back to Ubuntu.
Thanks, Dan
Sent from ProtonMail for iOS
How many circuits does your relay typically have?
Perhaps memory consumption depends on the amount of circuits, my tiny relay typically has 500 to 3000 circuits and uses 800 to 1000 MB of RAM.
René
tor-relays@lists.torproject.org