Dear Tor developers,
[Please CC me in replies, I am not currently subscribed to tor-dev.]
Context: At the Institute of Networks and Security at Johannes Kepler University Linz, we have been hosting Austria's fastest exit node for the last ca. 9 months. It used to be listed as https://atlas.torproject.org/#details/01A9258A46E97FF8B2CAC7910577862C14F2C5... until very recently, and we tried to find out what went wrong when we saw traffic drop sharply a bit over a week ago. Unfortunately, two out of three people responsible for running this node were on holidays, so we could only start investigating today.
Setup: Please note that our setup is a bit particular for reasons that we will explain in more detail in a later message (including a proposed patch to the current source which has been pending also because of the holiday situation...). Briefly summarizing, we use a different network interface for "incoming" (Tor encrypted traffic) than for "outgoing" (mostly clearnet traffic from the exit node, but currently still includes outgoing Tor relay traffic to other nodes). The outgoing interface has the default route associated, while the incoming interface will only originate traffic in response to those incoming connections. Consequently, we let our Tor node only bind to the IP address assigned to the incoming interface 193.171.202.146, while it will initiate new outgoing connections with IP 193.171.202.150.
Problem: This worked nicely with Tor 0.2.5.12-1 on Debian Jessie. We upgraded about two weeks ago to 0.2.8.7-1 from the Tor apt repositories (mostly in response to https://blog.torproject.org/blog/tor-0287-released-important-fixes as a wakeup call that we were using old versions from Debian main). At first, it seemed to work well enough, but then the holidays came and we didn't actively watch it for the next week.... Now with 0.2.8.7-1, the traffic sent to our node started declining until it vanished completely. After a bit of debugging and rolling back to 0.2.5.12-1 (which is now active on our node as of a few hours ago, slowly approaching the 200MBit/s again), it seems that we discovered a regression concerning the handling of sockets. I can best summarize it with the relevant torrc config options and startup log lines from both versions:
root@tor2 ~ # grep 193.171.202 /etc/tor/torrc ORPort 193.171.202.146:9001 ORPort 193.171.202.146:443 OutboundBindAddress 193.171.202.150 DirPort 193.171.202.146:9030
Sep 19 11:37:41.000 [notice] Tor 0.2.8.7 (git-cc2f02ef17899f86) opening log file. Sep 19 11:37:41.194 [notice] Tor v0.2.8.7 (git-cc2f02ef17899f86) running on Linux with Libevent 2.0.21-stable, OpenSSL 1.0.1t and Zlib 1.2.8. Sep 19 11:37:41.194 [notice] Tor can't help you if you use it wrong! Learn how to be safe at https://www.torproject.org/download/download#warning Sep 19 11:37:41.194 [notice] Read configuration file "/usr/share/tor/tor-service-defaults-torrc". Sep 19 11:37:41.194 [notice] Read configuration file "/etc/tor/torrc". Sep 19 11:37:41.197 [warn] You specified a public address '0.0.0.0:9050' for SocksPort. Other people on the Internet might find your computer and use it as an open proxy. Please don't allow this unless you have a good reason. Sep 19 11:37:41.198 [notice] Based on detected system memory, MaxMemInQueues is set to 2961 MB. You can override this by setting MaxMemInQueues by hand. Sep 19 11:37:41.198 [warn] Tor is running as an exit relay. If you did not want this behavior, please set the ExitRelay option to 0. If you do want to run an exit Relay, please set the ExitRelay option to 1 to disable this warning, and for forward compatibility. Sep 19 11:37:41.198 [warn] You specified a public address '0.0.0.0:9050' for SocksPort. Other people on the Internet might find your computer and use it as an open proxy. Please don't allow this unless you have a good reason. Sep 19 11:37:41.199 [notice] Opening Socks listener on 0.0.0.0:9050 Sep 19 11:37:41.199 [notice] Opening Control listener on 127.0.0.1:9051 Sep 19 11:37:41.199 [notice] Opening OR listener on 193.171.202.146:9001 Sep 19 11:37:41.199 [notice] Opening OR listener on 193.171.202.146:443 Sep 19 11:37:41.199 [notice] Opening Directory listener on 193.171.202.146:9030 Sep 19 11:37:41.000 [notice] Parsing GEOIP IPv4 file /usr/share/tor/geoip. Sep 19 11:37:41.000 [notice] Parsing GEOIP IPv6 file /usr/share/tor/geoip6. Sep 19 11:37:41.000 [notice] Configured to measure statistics. Look for the *-stats files that will first be written to the data directory in 24 hours from now. Sep 19 11:37:41.000 [warn] I have no descriptor for the router named "ins1" in my declared family; I'll use the nickname as is, but this may confuse clients. Sep 19 11:37:41.000 [warn] I have no descriptor for the router named "ins2" in my declared family; I'll use the nickname as is, but this may confuse clients. Sep 19 11:37:41.000 [notice] Your Tor server's identity key fingerprint is 'ins0 01A9258A46E97FF8B2CAC7910577862C14F2C524' Sep 19 11:37:41.000 [notice] Bootstrapped 0%: Starting Sep 19 11:37:49.000 [notice] Bootstrapped 80%: Connecting to the Tor network Sep 19 11:37:49.000 [notice] Signaled readiness to systemd Sep 19 11:37:50.000 [notice] Opening Control listener on /var/run/tor/control Sep 19 11:37:51.000 [notice] Bootstrapped 85%: Finishing handshake with first hop Sep 19 11:37:51.000 [notice] Bootstrapped 90%: Establishing a Tor circuit Sep 19 11:37:51.000 [notice] Tor has successfully opened a circuit. Looks like client functionality is working. Sep 19 11:37:51.000 [notice] Bootstrapped 100%: Done Sep 19 11:37:51.000 [notice] Now checking whether ORPort 193.171.202.150:9001 and DirPort 193.171.202.150:9030 are reachable... (this may take up to 20 minutes -- look for log messages indicating success) Sep 19 11:38:30.000 [notice] Self-testing indicates your ORPort is reachable from the outside. Excellent. Sep 19 11:38:34.000 [warn] You specified a server "ins1" by name, but the directory authorities do not have any key registered for this nickname -- so it could be used by any server, not just the one you meant. To make sure you get the same server in the future, refer to it by key, as "$CD9FD887A4572D46938640BA65F258851F1E418B". Sep 19 11:38:34.000 [warn] You specified a server "ins2" by name, but the directory authorities do not have any key registered for this nickname -- so it could be used by any server, not just the one you meant. To make sure you get the same server in the future, refer to it by key, as "$7C3AF46F77445A0B1E903A5AF5B730A05F127BFC". Sep 19 11:40:18.000 [notice] Performing bandwidth self-test...done. Sep 19 11:57:50.000 [warn] Your server (193.171.202.150:9030) has not managed to confirm that its DirPort is reachable. Relays do not publish descriptors until their ORPort and DirPort are reachable. Please check your firewalls, ports, address, /etc/hosts file, etc. Sep 19 12:00:27.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 19 12:00:57.000 [notice] Clean shutdown finished. Exiting. Sep 19 12:01:48.000 [notice] Tor 0.2.5.12 (git-3731dd5c3071dcba) opening log file. Sep 19 12:01:48.000 [notice] Configured to measure directory request statistics, but no GeoIP database found. Please specify a GeoIP database using the GeoIPFile option. Sep 19 12:01:48.000 [notice] Caching new entry debian-tor for debian-tor Sep 19 12:01:48.000 [notice] Caching new entry debian-tor for debian-tor Sep 19 12:01:48.000 [warn] I have no descriptor for the router named "ins1" in my declared family; I'll use the nickname as is, but this may confuse clients. Sep 19 12:01:48.000 [warn] I have no descriptor for the router named "ins2" in my declared family; I'll use the nickname as is, but this may confuse clients. Sep 19 12:01:48.000 [notice] Your Tor server's identity key fingerprint is 'ins0 01A9258A46E97FF8B2CAC7910577862C14F2C524' Sep 19 12:01:48.000 [notice] Bootstrapped 0%: Starting Sep 19 12:01:48.000 [notice] Bootstrapped 5%: Connecting to directory server Sep 19 12:01:51.000 [notice] We now have enough directory information to build circuits. Sep 19 12:01:51.000 [notice] Bootstrapped 80%: Connecting to the Tor network Sep 19 12:01:51.000 [notice] Bootstrapped 85%: Finishing handshake with first hop Sep 19 12:01:52.000 [notice] Bootstrapped 90%: Establishing a Tor circuit Sep 19 12:01:52.000 [notice] Tor has successfully opened a circuit. Looks like client functionality is working. Sep 19 12:01:52.000 [notice] Bootstrapped 100%: Done Sep 19 12:01:52.000 [notice] Now checking whether ORPort 193.171.202.146:9001 and DirPort 193.171.202.146:9030 are reachable... (this may take up to 20 minutes -- look for log messages indicating success) Sep 19 12:01:53.000 [notice] Self-testing indicates your DirPort is reachable from the outside. Excellent. Sep 19 12:01:53.000 [notice] Self-testing indicates your ORPort is reachable from the outside. Excellent. Publishing server descriptor. Sep 19 12:01:55.000 [notice] Performing bandwidth self-test...done.
Please note the difference (0.2.8.7): Sep 19 11:37:51.000 [notice] Now checking whether ORPort 193.171.202.150:9001 and DirPort 193.171.202.150:9030 are reachable... (this may take up to 20 minutes -- look for log messages indicating success) vs. (0.2.5.12): Sep 19 12:01:52.000 [notice] Now checking whether ORPort 193.171.202.146:9001 and DirPort 193.171.202.146:9030 are reachable... (this may take up to 20 minutes -- look for log messages indicating success)
I.e. 0.2.8.7 does not seem to honor the address the socket is bound to when starting the reachability checks from the outside (it seems to use the address that either the default route is associated with or the OutboundBindAddress) - although the socket binding itself is done correctly (i.e. the netstat output is exactly the same for both versions, with tor binding to the specific IP address only for the Dir and both OR ports). Consequently, the node is declared as non-reachable and drops off the globe/atlas...
Has this change been intentional? I have to admit we have not yet checked the source code for further debugging, as we wanted to get the node back up as quickly as possible (after our unfortunately timed holidays, sorry for that).
with best regards, Rene