A couple of months ago I hard-configured two entry nodes that I trust for my own browsing. Both are very fast, very reliable nodes on the opposite side of the Atlantic.
For about 20 minutes this evening the 'tor' daemon appears to have lost connectivity to the both nodes and I could not browse via Tor. Quite a few instances of this message were logged:
Jan 30 22:59:06 Tor[]: Failed to find node for hop 0 of our path. Discarding this circuit.
Neither of the configured guards were restarted during the outage, and they are located in diverse networks. The BGP looking-glass routes are different except for the first hop, which is a massive AS aggregation presumably for a trans-Atlantic cable operated by AS1299 (Telianet). I presume that this "cable" is actually a fully redundant load-balanced set of cables and any outage would be measured in milliseconds, or at most two or three seconds.
Looking now, I see roughly 200 occurrences of the above message on Jan 14, 26 and 30.
The reason I hard-coded the guard nodes was that, after extending the guard lifetime to 18 months, the guards were changing fairly often anyway. This bothered me.
So the slightly paranoid side of my personality wonders if this is the result of a determined effort by one of the various spooky organizations out there purposefully disrupting Tor connections with the intent of forcing guard reassignment, naturally with the idea that Tor users might end up selecting guard nodes controlled by same said spooky organizations.
Thoughts, comments and sharing of similar experiences are all welcome.
I should add that three ping-plotters pointed at various diverse networks ran clean during the outage, and that trusty Google was accessible at all times via normal HTTP. So the local carrier appears to have had no disruptions.
I didn't think to ping the guards or other proximate systems at the time, but certainly will do so if this happens again. Will run
traceroute -A
as well.
More important detail I should include:
After spending a few minutes trying to figure out what was wrong, I restarted the 'tor' relay daemon. It came up like this:
Bootstrapped 5%: Connecting to directory server. We now have enough directory information to build circuits. Bootstrapped 80%: Connecting to the Tor network. Bootstrapped 85%: Finishing handshake with first hop. 2 entries in guards Failed to find node for hop 0 of our path. Discarding this circuit. 2 entries in guards Failed to find node for hop 0 of our path. Discarding this circuit. Self-testing indicates your ORPort is reachable from the outside. Excellent. Publishing server descriptor. Failed to find node for hop 0 of our path. Discarding this circuit. Failed to find node for hop 0 of our path. Discarding this circuit. . . .
This says to me that much or all of the Tor network was contactable at the time of the restart, _but_ the two entry nodes were not reachable.
My next action was to stop the relay, comment out the 'EntryNodes' line, and start it again. Came all the way up immediately. Interestingly the two favorite entry guards (still present in the 'state' file) were now accessible again and were retained as the chosen guards. I put back the hard-EntryNodes config and restarted and again it came up fine.
The restart where the entry guards were not contactable is what really made me wonder what has happened here.
I did notice in the 'state' file a line indicating each guard was "down since xxxx". I take this as simply informational and that it would not have prevented the relay from attempting to reach them when it was restarted.
You obviously can't disclose your nodes for others to test or be telling us what you're going to do over clearnet when.
Yet during the time of an outage, you might try to leave the old tor running and - copy over old tor .tor and torrc to a new tor instance, start it and see if it works over the guards. That may point some state jam inside the old tor and a good clearnet path. - run a new tor with no .tor and default torrc and try to telnet to the old guards OR ports over that, then they seem to be up. - put a TransPort in that new default torrc config, put packet filter in front of old tor, hup the new one, see if old tor works over this tor... to point to your clearnet path to the guards being specifically bad to tor. - turn on old tor debug log and see where it is repeatedly stalling.
A script could determine this in a couple minutes.
Last bit of info. . .
I realized that the two other events mentioned earlier where the hard-configured guards were unreachable were the result of local Internet connectivity outages --had slipped my mind when writing that message.
So the 30 minutes (almost exactly) of guard unreachabllity was a unique event following the hard-coding of the guard nodes. As mentioned, I've seen the guards change-out entirely before adding the EntryNodes configuration despite having "GuardLifetime 18 months" set.
Finally I checked the consensus votes for the three hours surrounding the event and both guards were in the consensus at all times and without interruption.
tor-relays@lists.torproject.org