On Fri, Apr 22, 2016 at 04:22:48PM -0400, Zack Weinberg wrote:
I'm working on an exitmap module that wants to feed order of 5000 short-lived streams through each exit relay. I think this is running foul of some sort of upper limit (in STEM, or in Tor itself, not sure) on the number of streams a circuit can be used for, or how long, or something.
Tor has a built-in "if a stream has been waiting for a connected cell for 10 seconds, something has gone wrong" feature, where it disconnects from that circuit, and marks the circuit as not worth using for future streams. (Assuming the stream hasn't timed out, Tor would then say "oh hey, I need to find a circuit for this stream", and start the process over again.)
It looks from your logs like you're attaching a bunch of streams to a circuit, and one or more of the begin cells (heck, maybe even all of them) aren't getting any answer in the first 10 seconds.
2016-04-22 16:07:54,115 [DEBUG]: Circuit status change: CIRC 6 BUILT [fp],[fp] PURPOSE=GENERAL TIME_CREATED=2016-04-22T20:07:53.305851
[...]
2016-04-22 16:07:58,118 [DEBUG]: Attempting to attach stream 98 to circuit 6. 2016-04-22 16:08:04,697 [DEBUG]: Circuit status change: CIRC 6 CLOSED [fp],[fp] PURPOSE=PATH_BIAS_TESTING TIME_CREATED=2016-04-22T20:07:53.305851 REASON=FINISHED
This is interesting. If the later streams were still attached to the circuit, the circuit shouldn't be closing here. Are you sure the streams actually get attached (and stay attached)?
It looks like you're only giving me exitmap logs, not Tor logs, so it's hard to tell for sure what's going on underneath.
For why it says PURPOSE=PATH_BIAS_TESTING there, check out the pathbias_send_usable_probe() function which calls circuit_change_purpose(). If this function is happening, your Tor log should see
log_info(LD_CIRC, "Sending pathbias testing cell to %s:25 on stream %d for circ %d.", probe_nonce, ocirc->pathbias_probe_id, ocirc->global_identifier);
(The pathbias stuff has never been tested well I believe, so it's totally possible there are easy bugs in it. For example, maybe we're queueing that pathbias testing cell for sending, and at the same time we're marking the circuit as worth closing now because it's empty.)
You can see that circuit 6 is no longer available, but the module is still trying to use it.
That does indeed sound like an exitmap bug.
--Roger