On Tue, 11 Sep 2012 16:24:36 -0400 Nick Mathewson nickm@freehaven.net wrote:
On Tue, Sep 11, 2012 at 1:12 PM, Jacob Appelbaum jacob@appelbaum.net wrote:
Hi Scott,
It is nice to see you posting again, I had wondered where you had gone.
Scott Bennett:
I know this really belongs on tor-talk, but I haven't been subscribed
to it for a long time now. Sorry if posting this here bothers anyone.
Seems like a fine place to discuss relay problems, which is what it sounds like, no?
Maaybe! The very best place would be the bugtracker, of course. (I do seem to recall that you have some issues with trac -- I'm just mentioning the bugtracker so that other people don't get the idea that the mailing lists are the best place for bug reports. But a bug report on the mailing list is much much better than no bug report at all.)
You switched trackers a year or two ago. I don't recall whether I've tried the new one. Either way, I hesitate to submit a bug report until I'm pretty sure I'm looking at a bug, which is why I asked on this list whether anyone else could suggest anything.
Back in early July, I upgraded from 0.2.3.13-alpha to 0.2.3.18-rc.
I immediately ran into problems with a python script that honors the http_proxy environment variable, which I normally have set to the localhost port for privoxy, which, in turn, connects to tor's SOCKS port. I couldn't really see what was going wrong, but using arm to ask for a new identity seemed to help sometimes to get a circuit that worked. Sending tor a SIGHUP instead also seemed to work about as often.
If you use 0.2.2.x - what happens?
I'm not sure what the bug described here is, fwiw. What is the behavior for the circuits that don't work, and to what extent is 0.2.2.x better?
The problem is that the python script in question often has no trouble, but also often does. When it does, it usually gets a good connection twice and fails on the third connection. (Or at least I *think* that's what it does.) Then I either ask tor nicely with arm for a new set of circuits or I pound on tor with a SIGHUP for same, and then try the python script again. The process is slow, so it's very irritating to have to babysit it repeatedly until it finally gets a good connection. If I've left the script a small list of work to do, the failure may come at any point in the list, which then terminates the script. :-( I think it may have an option to ignore errors, but I don't really see an option like that as being very helpful in this situation because it would just fail its way through the rest of the list from that point, but it would take a long time to do that. Quitting outright is much faster.
A bit over a week ago, I switched to 0.2.3.20-rc, and the problem
still occurs. However, 0.2.3.20-rc now also emits a new message from time to time, the most recent occurrence of which is
Sep 06 06:02:45.934 [notice] Low circuit success rate 7/21 for guard TORy0=753E0B5922E34BF98F0D21CC08EA7D1ADEEE2F6B.
That is an interesting message - I wonder if the author of that message might chime in?
Looks like bug #6475.
Wondering whether such circuit-building failures might be related to the other problem, I began a little experiment: each time I saw a "Low circuit success rate" message, I added the key fingerprint of the node in question to my ExcludeNodes list in torrc and sent tor a SIGHUP. The problem is still occurring, though, and when I look at the circuits involved, they all seem to have at least one of the excluded nodes in them, usually in the entry position. So my question is, what changed between 0.2.3.13-alpha and 0.2.3.18-rc (or possibly 0.2.3.20-rc) in the handling of nodes listed in the ExcludeNodes line in torrc? And is there anything I can do to get the ExcludeNodes list to work again the way it used to work? Thanks in advance for any relevant information.
It seems that there are two issues - one is that a guard is failing to build circuits, the other is that you can't seem to exclude them. I have to admit, I'm more interested in the former... Is there a pattern to the failures? That is for the 7 successes for that node, did you see anything interesting? Were say, the nodes that worked somehow in the same country as that guard? Or perhaps were the other failed circuits all seemingly unrelated to the guard?
As far as the ExcludeNodes - did you set StrictNodes at the same time? Are you also a relay?
Any other configuration info would be helpful here too.
Okay. Skipping over local port and IP address usage, here's some client-side stuff.
TunnelDirConns 1 PreferTunneledDirConns 1 UseEntryGuards 1 NumEntryGuards [if you really need the number here, let me know--SB] AllowDotExit 0 LongLivedPorts 20-23,47,115,119,143,144,152,178,194,563,706,989,990,992-994,1863,5050,5190-5193,5222,5223,6000-6063,6523,6667,6697,8021,8300
There are also several NodeFamily statements and extensive ExcludeNodes and ExludeExitNodes statements. If there is other stuff you want, let me know, but depending upon what you ask for, I may choose to send it to you directly rather than post it here.
(To answer your question: looking through the changelogs, and the commit logs for src/or/circuitbuild.c and src/or/routerlist.c, I can't find anything that stands out to me as something that might cause an ExcludeNodes regression. So more investigation will be needed!)
The only thing remotely related that I saw was the following item in the Changelog for 0.2.3.15-rc.
o Minor bugfixes (on 0.2.2.x and earlier): . . . - After we pick a directory mirror, we would refuse to use it if it's in our ExcludeExitNodes list, resulting in mysterious failures to bootstrap for people who just wanted to avoid exiting from certain locations. Fixes bug 5623; bugfix on 0.2.2.25-alpha.
It probably has nothing to do with what has been causing me trouble, but it was the only item I found about either of the Exclude{,Exit}Nodes statement that mentioned a change between 0.2.3.13-alpha and 0.2.3.18-rc.
Scott