I'm working on an exitmap module that wants to feed order of 5000 short-lived streams through each exit relay. I think this is running foul of some sort of upper limit (in STEM, or in Tor itself, not sure) on the number of streams a circuit can be used for, or how long, or something. What I see in the logs (note: I've modified eventhandler.py for more detailed debug logs) looks like
2016-04-22 16:07:53,306 [DEBUG]: Circuit status change: CIRC 6 LAUNCHED PURPOSE=GENERAL TIME_CREATED=2016-04-22T20:07:53.305851 2016-04-22 16:07:53,325 [DEBUG]: Circuit status change: CIRC 6 EXTENDED [fp] PURPOSE=GENERAL TIME_CREATED=2016-04-22T20:07:53.305851 2016-04-22 16:07:54,114 [DEBUG]: Circuit status change: CIRC 6 EXTENDED [fp],[fp] PURPOSE=GENERAL TIME_CREATED=2016-04-22T20:07:53.305851 2016-04-22 16:07:54,115 [DEBUG]: Circuit status change: CIRC 6 BUILT [fp],[fp] PURPOSE=GENERAL TIME_CREATED=2016-04-22T20:07:53.305851 2016-04-22 16:07:54,136 [DEBUG]: Port 47488 preparing to attach to circuit 6 2016-04-22 16:07:54,136 [DEBUG]: Port 47488 circuit 6 waiting for stream. 2016-04-22 16:07:54,155 [DEBUG]: Attempting to attach stream 65 to circuit 6. 2016-04-22 16:07:54,387 [DEBUG]: Port 47492 preparing to attach to circuit 6 2016-04-22 16:07:54,387 [DEBUG]: Port 47492 circuit 6 waiting for stream. 2016-04-22 16:07:54,388 [DEBUG]: Attempting to attach stream 67 to circuit 6. 2016-04-22 16:07:54,809 [DEBUG]: Port 47496 preparing to attach to circuit 6 2016-04-22 16:07:54,810 [DEBUG]: Port 47496 circuit 6 waiting for stream. 2016-04-22 16:07:54,810 [DEBUG]: Attempting to attach stream 69 to circuit 6. 2016-04-22 16:07:55,060 [DEBUG]: Port 47502 preparing to attach to circuit 6 2016-04-22 16:07:55,060 [DEBUG]: Port 47502 circuit 6 waiting for stream. 2016-04-22 16:07:55,061 [DEBUG]: Attempting to attach stream 72 to circuit 6. 2016-04-22 16:07:55,468 [DEBUG]: Port 47506 preparing to attach to circuit 6 2016-04-22 16:07:55,468 [DEBUG]: Port 47506 circuit 6 waiting for stream. 2016-04-22 16:07:55,469 [DEBUG]: Attempting to attach stream 74 to circuit 6. 2016-04-22 16:07:55,720 [DEBUG]: Port 47508 preparing to attach to circuit 6 2016-04-22 16:07:55,720 [DEBUG]: Port 47508 circuit 6 waiting for stream. 2016-04-22 16:07:55,990 [DEBUG]: Port 47512 preparing to attach to circuit 6 2016-04-22 16:07:55,990 [DEBUG]: Port 47512 circuit 6 waiting for stream. 2016-04-22 16:07:55,990 [DEBUG]: Attempting to attach stream 77 to circuit 6. 2016-04-22 16:07:56,241 [DEBUG]: Port 47518 preparing to attach to circuit 6 2016-04-22 16:07:56,241 [DEBUG]: Port 47518 circuit 6 waiting for stream. 2016-04-22 16:07:56,242 [DEBUG]: Attempting to attach stream 80 to circuit 6. 2016-04-22 16:07:56,492 [DEBUG]: Port 47528 preparing to attach to circuit 6 2016-04-22 16:07:56,492 [DEBUG]: Port 47528 circuit 6 waiting for stream. 2016-04-22 16:07:56,495 [DEBUG]: Attempting to attach stream 85 to circuit 6. 2016-04-22 16:07:56,836 [DEBUG]: Port 47536 preparing to attach to circuit 6 2016-04-22 16:07:56,836 [DEBUG]: Port 47536 circuit 6 waiting for stream. 2016-04-22 16:07:56,836 [DEBUG]: Attempting to attach stream 89 to circuit 6. 2016-04-22 16:07:57,100 [DEBUG]: Port 47540 preparing to attach to circuit 6 2016-04-22 16:07:57,100 [DEBUG]: Attempting to attach stream 91 to circuit 6. 2016-04-22 16:07:57,351 [DEBUG]: Port 47544 preparing to attach to circuit 6 2016-04-22 16:07:57,351 [DEBUG]: Port 47544 circuit 6 waiting for stream. 2016-04-22 16:07:57,352 [DEBUG]: Attempting to attach stream 93 to circuit 6. 2016-04-22 16:07:57,769 [DEBUG]: Port 47550 preparing to attach to circuit 6 2016-04-22 16:07:57,769 [DEBUG]: Port 47550 circuit 6 waiting for stream. 2016-04-22 16:07:57,769 [DEBUG]: Attempting to attach stream 96 to circuit 6. 2016-04-22 16:07:58,118 [DEBUG]: Port 47554 preparing to attach to circuit 6 2016-04-22 16:07:58,118 [DEBUG]: Port 47554 circuit 6 waiting for stream. 2016-04-22 16:07:58,118 [DEBUG]: Attempting to attach stream 98 to circuit 6. 2016-04-22 16:08:04,697 [DEBUG]: Circuit status change: CIRC 6 CLOSED [fp],[fp] PURPOSE=PATH_BIAS_TESTING TIME_CREATED=2016-04-22T20:07:53.305851 REASON=FINISHED 2016-04-22 16:08:05,878 [DEBUG]: Port 47690 preparing to attach to circuit 6 2016-04-22 16:08:05,878 [DEBUG]: Port 47690 circuit 6 waiting for stream. 2016-04-22 16:08:05,879 [DEBUG]: Attempting to attach stream 166 to circuit 6. 2016-04-22 16:08:05,879 [WARNING]: Failed to attach stream because: Unknown circuit "6"
You can see that circuit 6 is no longer available, but the module is still trying to use it. It looks like it lasted for almost exactly ten seconds, which smells like a time limit, but I can't find any relevant configuration parameters in the documentation.
Rather than trying to make a circuit survive as long as necessary, which might not even be possible, it'd probably be better if exitmap could notice that it's lost a circuit that's still in use and create a new one. However, I'm not sure how to do that in the current architecture. The existing code has "one circuit per module invocation = one circuit per exit node" as a pretty deeply embedded design constraint.
Anyone have any ideas? Full logfile available on request, but it's huge.
zw
On Fri, Apr 22, 2016 at 04:22:48PM -0400, Zack Weinberg wrote:
I'm working on an exitmap module that wants to feed order of 5000 short-lived streams through each exit relay. I think this is running foul of some sort of upper limit (in STEM, or in Tor itself, not sure) on the number of streams a circuit can be used for, or how long, or something.
Tor has a built-in "if a stream has been waiting for a connected cell for 10 seconds, something has gone wrong" feature, where it disconnects from that circuit, and marks the circuit as not worth using for future streams. (Assuming the stream hasn't timed out, Tor would then say "oh hey, I need to find a circuit for this stream", and start the process over again.)
It looks from your logs like you're attaching a bunch of streams to a circuit, and one or more of the begin cells (heck, maybe even all of them) aren't getting any answer in the first 10 seconds.
2016-04-22 16:07:54,115 [DEBUG]: Circuit status change: CIRC 6 BUILT [fp],[fp] PURPOSE=GENERAL TIME_CREATED=2016-04-22T20:07:53.305851
[...]
2016-04-22 16:07:58,118 [DEBUG]: Attempting to attach stream 98 to circuit 6. 2016-04-22 16:08:04,697 [DEBUG]: Circuit status change: CIRC 6 CLOSED [fp],[fp] PURPOSE=PATH_BIAS_TESTING TIME_CREATED=2016-04-22T20:07:53.305851 REASON=FINISHED
This is interesting. If the later streams were still attached to the circuit, the circuit shouldn't be closing here. Are you sure the streams actually get attached (and stay attached)?
It looks like you're only giving me exitmap logs, not Tor logs, so it's hard to tell for sure what's going on underneath.
For why it says PURPOSE=PATH_BIAS_TESTING there, check out the pathbias_send_usable_probe() function which calls circuit_change_purpose(). If this function is happening, your Tor log should see
log_info(LD_CIRC, "Sending pathbias testing cell to %s:25 on stream %d for circ %d.", probe_nonce, ocirc->pathbias_probe_id, ocirc->global_identifier);
(The pathbias stuff has never been tested well I believe, so it's totally possible there are easy bugs in it. For example, maybe we're queueing that pathbias testing cell for sending, and at the same time we're marking the circuit as worth closing now because it's empty.)
You can see that circuit 6 is no longer available, but the module is still trying to use it.
That does indeed sound like an exitmap bug.
--Roger
On 05/08/2016 03:49 AM, Roger Dingledine wrote:
On Fri, Apr 22, 2016 at 04:22:48PM -0400, Zack Weinberg wrote:
I'm working on an exitmap module that wants to feed order of 5000 short-lived streams through each exit relay. I think this is running foul of some sort of upper limit (in STEM, or in Tor itself, not sure) on the number of streams a circuit can be used for, or how long, or something.
Tor has a built-in "if a stream has been waiting for a connected cell for 10 seconds, something has gone wrong" feature, where it disconnects from that circuit, and marks the circuit as not worth using for future streams. (Assuming the stream hasn't timed out, Tor would then say "oh hey, I need to find a circuit for this stream", and start the process over again.)
It looks from your logs like you're attaching a bunch of streams to a circuit, and one or more of the begin cells (heck, maybe even all of them) aren't getting any answer in the first 10 seconds.
Yeah, that's expected; the policy you describe is very, very wrong for this experiment. Is there any way to turn it off?
What's going on is that I have ~500 hosts in known locations worldwide, and I'm measuring the round-trip latency to each. Unfortunately, I can't count on all of them to be up and responding to SYNs at any given time. The module asks each exit to make 10 connections to each host, and it deliberately targets a *closed* port whenever possible, so I'm expecting to get back mostly RELAY_END/REASON_CONNECTREFUSED replies, and timeouts.
The module internally times out each *stream* after ten seconds, but it wants to use the same circuit forever - this is both because of design limitations in exitmap (see below), and because changing circuits in the middle of this test will spoil at least one measurement with circuit-creation overhead. Changing the route to the exit is also bad, as it will change the intra-Tor latency, which ideally would be the same for all measurements. (This last is mitigated by exitmap using two-hop circuits and being able to pin the entry node, but one would like to be certain that it won't happen.)
Frankly, assuming that something is wrong with a *circuit* when one of its *streams* suffers a connection timeout seems inappropriate in general. That can happen for any number of reasons beyond Tor's control, not least that, simply, the exit node's SYN to the requested destination got dropped by a firewall. What was the original rationale for this policy?
2016-04-22 16:07:54,115 [DEBUG]: Circuit status change: CIRC 6 BUILT [fp],[fp] PURPOSE=GENERAL TIME_CREATED=2016-04-22T20:07:53.305851
[...]
2016-04-22 16:07:58,118 [DEBUG]: Attempting to attach stream 98 to circuit 6. 2016-04-22 16:08:04,697 [DEBUG]: Circuit status change: CIRC 6 CLOSED [fp],[fp] PURPOSE=PATH_BIAS_TESTING TIME_CREATED=2016-04-22T20:07:53.305851 REASON=FINISHED
This is interesting. If the later streams were still attached to the circuit, the circuit shouldn't be closing here. Are you sure the streams actually get attached (and stay attached)?
It looks like you're only giving me exitmap logs, not Tor logs, so it's hard to tell for sure what's going on underneath.
Yeah, I'll try to get some Tor logs (that'll mean more changes to exitmap core, yay). I may not get to it till June, though. The later streams might be a red herring; the module closes each stream immediately after the connection resolves and that may not be visible in these logs.
I am not sure why the circuit is labeled PATH_BIAS_TESTING in the first place; it's possible that exitmap has gotten mixed up about which circuit it is supposed to be using.
You can see that circuit 6 is no longer available, but the module is still trying to use it.
That does indeed sound like an exitmap bug.
exitmap core assumes that one circuit per exit is all you will ever need, and has no way of notifying an experiment module that its circuit has gotten destroyed. It seems pretty clear that this is a design error, but it's going to take a lot of work to change ...
zw