I've gotten a couple of these sets of messages in the last few hours.
Sep 22 05:13:17.366 [warn] tor_bug_occurred_: Bug: src/common/compress.c:576: tor_compress_process: Non-fatal assertion !((rv == TOR_COMPRESS_OK) && *in_len == in_len_orig && *out_len == out_len_orig) failed. (on Tor 0.3.1.6-rc efc306c59aa9ee1a) Sep 22 05:13:17.366 [warn] Bug: Non-fatal assertion !((rv == TOR_COMPRESS_OK) && *in_len == in_len_orig && *out_len == out_len_orig) failed in tor_compress_process at src/common/compress.c:576. Stack trace: (on Tor 0.3.1.6-rc efc306c59aa9ee1a) Sep 22 05:13:17.366 [warn] Bug: 0x11a63fc <tor_bug_occurred_+268> at /usr/local/bin/tor (on Tor 0.3.1.6-rc efc306c59aa9ee1a) Sep 22 05:13:17.366 [warn] Bug: 0x11abbd4 <tor_compress_process+228> at /usr/local/bin/tor (on Tor 0.3.1.6-rc efc306c59aa9ee1a) Sep 22 05:13:17.366 [warn] Bug: 0x11ab0fa <tor_compress+554> at /usr/local/bin/tor (on Tor 0.3.1.6-rc efc306c59aa9ee1a) Sep 22 05:13:17.366 [warn] Bug: 0x11ab540 <tor_uncompress+64> at /usr/local/bin/tor (on Tor 0.3.1.6-rc efc306c59aa9ee1a) Sep 22 05:13:17.366 [warn] Bug: 0x114ecfd <connection_dir_reached_eof+1773> at /usr/local/bin/tor (on Tor 0.3.1.6-rc efc306c59aa9ee1a) Sep 22 05:13:17.366 [warn] Bug: 0x1126c43 <connection_handle_read+3027> at /usr/local/bin/tor (on Tor 0.3.1.6-rc efc306c59aa9ee1a) Sep 22 05:13:17.366 [warn] Bug: 0x106db94 <connection_add_impl+532> at /usr/local/bin/tor (on Tor 0.3.1.6-rc efc306c59aa9ee1a) Sep 22 05:13:17.366 [warn] Bug: 0x1c1e8564e <event_base_assert_ok_nolock_+3102> at /usr/local/lib/libevent-2.1.so.6 (on Tor 0.3.1.6-rc efc306c59aa9ee1a) Sep 22 05:13:17.366 [warn] Bug: 0x1c1e80a1e <event_base_loop+1310> at /usr/local/lib/libevent-2.1.so.6 (on Tor 0.3.1.6-rc efc306c59aa9ee1a) Sep 22 05:13:17.366 [warn] Bug: 0x106fa55 <do_main_loop+1413> at /usr/local/bin/tor (on Tor 0.3.1.6-rc efc306c59aa9ee1a) Sep 22 05:13:17.366 [warn] Bug: 0x1071e89 <tor_main+233> at /usr/local/bin/tor (on Tor 0.3.1.6-rc efc306c59aa9ee1a) Sep 22 05:13:17.366 [warn] Bug: 0x106d899 <main+25> at /usr/local/bin/tor (on Tor 0.3.1.6-rc efc306c59aa9ee1a) Sep 22 05:13:17.366 [warn] Bug: 0x106d791 <_start+417> at /usr/local/bin/tor (on Tor 0.3.1.6-rc efc306c59aa9ee1a)
My outdated system is running FreeBSD 10.3-STABLE. I've never seen these messages before that I can remember. What causes them? What, if anything, should I do about them? And what fallout do they cause for users? Thanks in advance for any relevant information!
Scott Bennett, Comm. ASMELG, CFIAG ********************************************************************** * Internet: bennett at sdf.org *xor* bennett at freeshell.org * *--------------------------------------------------------------------* * "A well regulated and disciplined militia, is at all times a good * * objection to the introduction of that bane of all free governments * * -- a standing army." * * -- Gov. John Hancock, New York Journal, 28 January 1790 * **********************************************************************
nusenu nusenu-lists@riseup.net wrote:
Thanks for the pointer. I'm glad it has been reported, but I still have no sense of what in tor is malfunctioning because the compression has failed. Are user cells lost? Do user connections get broken when this happens? Has the error been reported to the project(s) that develop the compression libraries?
Scott Bennett, Comm. ASMELG, CFIAG ********************************************************************** * Internet: bennett at sdf.org *xor* bennett at freeshell.org * *--------------------------------------------------------------------* * "A well regulated and disciplined militia, is at all times a good * * objection to the introduction of that bane of all free governments * * -- a standing army." * * -- Gov. John Hancock, New York Journal, 28 January 1790 * **********************************************************************
On 22 Sep 2017, at 21:24, Scott Bennett bennett@sdf.org wrote:
nusenu nusenu-lists@riseup.net wrote:
Thanks for the pointer. I'm glad it has been reported, but I still
have no sense of what in tor is malfunctioning because the compression has failed.
It's ok, we're also somewhat clueless. But we're good at tracking down weird bugs. Just give us some time.
Are user cells lost? Do user connections get broken when this happens?
This error occurs during directory document decompression using zstd.
There's a loop that calls out to a function that decompresses some data. The loop is meant to terminate when decompression finishes. But if decompression doesn't make any progress, it terminates with an error. (Otherwise, we'd just keep spinning endlessly, wondering when the decompressor was going to cough up something useful.)
The relay discards the received document, and tries another mirror. At the moment, most mirrors don't support zstd, so the relay will likely get a zlib-compressed document, decompress it successfully, and continue merrily on its way. And the users don't even notice.
Although, I wonder if we should do more to guard against network-wide breakage like this. It would be awkward if all relays ran into an error like this at the same time.
Fortunately, zstd isn't available for all platforms and architectures. So as long as we steer clear of a platform monoculture, we'll be fine. (That said, this particular bug is on BSD. But people should still run relays on BSD, so we get a good OS mix.)
Has the error been reported to the project(s) that develop the compression libraries?
Well, it's kinda rude to go straight to blaming zstd. First we have to work out what the issue is, and whose code it's in.
Although, I have to say that I wouldn't surprise me if we have a zstd version incompatibility here. Or maybe truncated input data?
Truncated input data could definitely explain why zstd can't make any progress, but isn't returning an error code. In that case, it's our fault for not giving zstd enough data to work with. (And not failing the loop quietly?)
Because if it were corrupt data, it really should return an error: http://facebook.github.io/zstd/zstd_manual.html#Chapter9
T
-- Tim Wilson-Brown (teor)
teor2345 at gmail dot com PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B ricochet:ekmygaiu4rzgsk6n xmpp: teor at torproject dot org ------------------------------------------------------------------------
teor teor2345@gmail.com wrote:
On 22 Sep 2017, at 21:24, Scott Bennett bennett@sdf.org wrote:
nusenu nusenu-lists@riseup.net wrote:
Thanks for the pointer. I'm glad it has been reported, but I still
have no sense of what in tor is malfunctioning because the compression has failed.
It's ok, we're also somewhat clueless. But we're good at tracking down weird bugs. Just give us some time.
Are user cells lost? Do user connections get broken when this happens?
This error occurs during directory document decompression using zstd.
There's a loop that calls out to a function that decompresses some data. The loop is meant to terminate when decompression finishes. But if decompression doesn't make any progress, it terminates with an error. (Otherwise, we'd just keep spinning endlessly, wondering when the decompressor was going to cough up something useful.)
The relay discards the received document, and tries another mirror. At the moment, most mirrors don't support zstd, so the relay will likely get a zlib-compressed document, decompress it successfully, and continue merrily on its way. And the users don't even notice.
Oh. Okay.
Although, I wonder if we should do more to guard against network-wide breakage like this. It would be awkward if all relays ran into an error like this at the same time.
Fortunately, zstd isn't available for all platforms and architectures. So as long as we steer clear of a platform monoculture, we'll be fine. (That said, this particular bug is on BSD. But people should still run relays on BSD, so we get a good OS mix.)
Are the BSDs the only systems that have zstd available?
Has the error been reported to the project(s) that develop the compression libraries?
Well, it's kinda rude to go straight to blaming zstd. First we have to work out what the issue is, and whose code it's in.
I guess I misunderstood the traceback. Apologies.
Although, I have to say that I wouldn't surprise me if we have a zstd version incompatibility here. Or maybe truncated input data?
Here's what I have.
Sep 23 00:36:33.957 [notice] Tor 0.3.1.6-rc (git-efc306c59aa9ee1a) running on FreeBSD with Libevent 2.1.8-stable, OpenSSL 1.0.2l, Zlib 1.2.8, Liblzma 5.2.2, and Libzstd 1.3.1.
Truncated input data could definitely explain why zstd can't make any progress, but isn't returning an error code. In that case, it's our fault for not giving zstd enough data to work with. (And not failing the loop quietly?)
Now, that's interesting because last night my Comcast connection was down and back up a few times beginning around 11:30 p.m. until it went down just before 1:30 a.m. and stayed down for about an hour and a half. During that time, of course, I called Comcast to get whatever information (as usual, not much) I could about the outage. They said their crew was out working on it and that it affected 327(?) customers in the area, so I surmised something was gradually failing (such nighttime down and back up sequences have been becoming more frequent recently) and after my complaints a couple of days earlier, they figured out what it was and were replacing it. The error messages started last night after they were done and the line was working again. If there's some leftover file that's dirty from having been cut off during transmission that continues to trigger the error messages from time to time, I'll get rid of it if someone can tell me what to look for.
Because if it were corrupt data, it really should return an error: http://facebook.github.io/zstd/zstd_manual.html#Chapter9
Scott Bennett, Comm. ASMELG, CFIAG ********************************************************************** * Internet: bennett at sdf.org *xor* bennett at freeshell.org * *--------------------------------------------------------------------* * "A well regulated and disciplined militia, is at all times a good * * objection to the introduction of that bane of all free governments * * -- a standing army." * * -- Gov. John Hancock, New York Journal, 28 January 1790 * **********************************************************************
Scott Bennett:
Fortunately, zstd isn't available for all platforms and architectures. So as long as we steer clear of a platform monoculture, we'll be fine. (That said, this particular bug is on BSD. But people should still run relays on BSD, so we get a good OS mix.)
Are the BSDs the only systems that have zstd available?
No, Debian has libzstd as well and produces similar stack traces.
https://packages.debian.org/stretch/libzstd1 https://lists.torproject.org/pipermail/tor-relays/2017-September/013057.html
tor-relays@lists.torproject.org