I asked on ticket https://bugs.torproject.org/33598 how to reliably reproduce SOCKS errors so that it would be easier to determine when the underlying issue is properly fixed, rather than running into the possibility of race conditions (as the current workaround depends on "waiting long enough" for connections to succeed most of the time).
In #tor-dev I believe it was arma who pointed me toward the right direction, but that still leaves me with unresolved questions. It was suggested that I can attempt SOCKS connections to an invalid host/port versus a valid one, in order to have a reliable failure case for testing. But in the context of Chutney I believe we only want to attempt local connections, correct? So either attempting connection to 127.0.0.0/8 on a known-closed port, or perhaps more simply 0.0.0.0 on any port, would be a reliable case to use, correct?
Also from my comment on #33598:
Assuming workaround is at Traffic.py:441? I see the timeout was adjusted in 95ce144c which has more changes than just that line.
and
will decreasing the timeout back to 0.2 be enough to encourage failure?
Last question: I looked for a bit, but where is Chutney actually initiating SOCKS connections to Tor during tests? I still find it hard to follow especially when I am going in blind for much of this.
Caitlin
On Mon, Jun 15, 2020 at 8:53 AM c c@chroniko.jp wrote:
I asked on ticket https://bugs.torproject.org/33598 how to reliably reproduce SOCKS errors so that it would be easier to determine when the underlying issue is properly fixed, rather than running into the possibility of race conditions (as the current workaround depends on "waiting long enough" for connections to succeed most of the time).
In #tor-dev I believe it was arma who pointed me toward the right direction, but that still leaves me with unresolved questions. It was suggested that I can attempt SOCKS connections to an invalid host/port versus a valid one, in order to have a reliable failure case for testing. But in the context of Chutney I believe we only want to attempt local connections, correct? So either attempting connection to 127.0.0.0/8 on a known-closed port, or perhaps more simply 0.0.0.0 on any port, would be a reliable case to use, correct?
This seems plausible, though 0.0.0.0:x will just be the local computer as well.
Maybe you could try routing to a known-unassigned address, or one that Tor simply won't support, like the one from RFC 6666? For IPv4 I think maybe a multicast or known-unassigned prefix might have similar results.
Also from my comment on #33598:
Assuming workaround is at Traffic.py:441? I see the timeout was adjusted in 95ce144c which has more changes than just that line.
Hm, that might be where the "5.0" is coming from, yeah.
and
will decreasing the timeout back to 0.2 be enough to encourage failure?
So the way that timeout works, I think, is that it controls how long asyncore will go without any events before it exits. The while loop around asyncore.loop() is what makes it retry over and over.
I think the problem here might be that the while loop keeps going until the time reaches "end", or until "self.tests.all_done()". This might mean that we need to adjust the socks handling code instead, so it detects socks refusals and treats them as the test being "done", but failed.
In theory, the code handles this in Source.collect_incoming_data(), where it looks for a socks response, and compares it to an expected value. But I guess chutney is restarting the connection again, or not treating this as a test failure?
Last question: I looked for a bit, but where is Chutney actually initiating SOCKS connections to Tor during tests? I still find it hard to follow especially when I am going in blind for much of this.
It's in Traffic.py, Source.handle_connect(), in this line:
self.push(socks_cmd(self.server))
best wishes,