Hi all!
During the last weeks I have been very busy working on my GSoC project which is about reducing the RTT of preemptively built circuits.
There is now a single script called "rttprober"[0] that depends on a patched[1] Tor client running a certain configuration[2]. The goal is to measure RTTs of Tor circuits. It takes a few parameters as input: an authenticated Stem Tor controller for communication with the Tor client, the number of circuits to probe, the number of probes to be taken for each circuit and the number of circuits that should be probed concurrently. It outputs a tar file containing lzo-compressed serialized data with detailed node information, all circuit- and stream-events involved and the circuit build time for further analysis. Since the RTT-measurements are run in parallel with very short locks it is important not to overload Tor nodes. Therefore a single node is not probed more than once at a time.
A first analysis of some measurements taken supports the original assumption that a Frechét distribution fits both the circuit build times[3] and round trip times[4].
I will continue gathering and analyzing measurement data and will hopefully be able to draw some conclusions from that.
Best, Robert
[0] https://bitbucket.org/ra_/tor- rtt/src/1127f6936086664981fc55b4dbc82b1570714140/rttprober.py?at=master [1] https://bitbucket.org/ra_/tor- rtt/src/1127f6936086664981fc55b4dbc82b1570714140/patches?at=master [2] https://bitbucket.org/ra_/tor- rtt/src/1127f6936086664981fc55b4dbc82b1570714140/torrc?at=master [3] http://postimg.org/image/je8k5yydt/ [4] http://postimg.org/image/ktk90vxm7/
There is now a single script called "rttprober"[0] that depends on a patched[1] Tor client running a certain configuration[2]. The goal is to measure RTTs of Tor circuits. It takes a few parameters as input: an authenticated Stem Tor controller for communication with the Tor client..
Hi ra, glad to see that you're using stem! If you have any questions, suggestions, feature requests, or would like a code review then let me know. Few things I spotted...
# Stem does not do that yet. # See https://trac.torproject.org/projects/tor/ticket/7953 if is_valid_fingerprint(fingerprint): query = "ns/id/%s" % fingerprint else: raise ValueError("Invalid fingerprint: %s." % fingerprint) desc = self._controller.get_info(query) return RouterStatusEntryV3(desc)
As of just four weeks ago the Controller started providing v3 responses, so you can replace this with "self._controller.get_network_status(fingerprint)"...
https://gitweb.torproject.org/stem.git/commitdiff/003fa8e
circ.build_flags.count('IS_INTERNAL') == 0
This would more commonly be done as...
'IS_INTERNAL' not in circ.build_flags
try: controller.reset_conf("__DisablePredictedCircuits") controller.reset_conf("__LeaveStreamsUnattached") controller.close() except NameError: pass
What raises a NameError?
# close circuit, but ignore if it does not exist anymore try: self._controller.get_circuit(self._cid) self._controller.close_circuit(self._cid) except (ValueError, InvalidArguments): pass
What is the purpose of the get_circuit() call? If it's not superfluous then you can provide a default argument to prevent it from raising an exception (just about every getter allows for one). Also, you can omit exception types to catch everything (if you'd like to ignore all errors). For instance, in this case...
self._controller.get_circuit(self._cid, None)
try: self._controller.close_circuit(self._cid) except: pass
try: controller = Controller.from_port() except SocketError: sys.stderr.write("ERROR: Couldn't connect to Tor.\n") sys.exit(1) controller.authenticate()
This is certainly a fine way of doing it, but you might want to also look at connection.connect_port()...
https://stem.torproject.org/api/connection.html#stem.connection.connect_port
It is intended to be a quick and easy method of getting a Controller for command-line applications. For instance, it will present a password prompt if tor is configured to use password authentication. Just realized I should have included it in a tutorial somewhere...
Your code looks great! If you wouldn't mind I'd love to reference it on stem's examples page...
https://stem.torproject.org/tutorials/double_double_toil_and_trouble.html
Shall I reference 'https://bitbucket.org/ra_/tor-rtt/' or do you anticipate your project having a more permanent home? (this might be a question for Mike as much as you)
Cheers! -Damian
On Saturday 27 July 2013 04:41:45 Damian Johnson wrote:
Hi ra, glad to see that you're using stem!
Sure, stem works really great!
If you have any questions, suggestions, feature requests,
These parts were a bit tricky to figure out for me: -) When I wanted to check if a certain node is an exit node it took me some time to figure out that looking for an exit flag is not sufficient because some nodes are in fact exit nodes but don't have an exit flag. One has to look at the nodes exit policy which is unaccessible by default because of microdescriptors. Maybe returning some meaningful message when one uses get_server_descriptor() and microdescriptors are enabled would help..? -) It is not safe to use extend_circuit in parallel for creating new circuits. I think this is not mentioned anywhere. -) Router status V2/V3 also took me some time but this has already been fixed.
or would like a code review then let me know.
That would be awesome!
As of just four weeks ago the Controller started providing v3 responses
I missed that obviously. Fixed in [0].
circ.build_flags.count('IS_INTERNAL') == 0
This would more commonly be done as... 'IS_INTERNAL' not in circ.build_flags
Fixed in [0].
try: controller.reset_conf("__DisablePredictedCircuits") controller.reset_conf("__LeaveStreamsUnattached") controller.close()
except NameError: pass
What raises a NameError?
This was a leftover where it has been possible that "controller" doesn't exist at that time. Fixed in [0].
# close circuit, but ignore if it does not exist anymore
try: self._controller.get_circuit(self._cid) self._controller.close_circuit(self._cid)
except (ValueError, InvalidArguments): pass
What is the purpose of the get_circuit() call? If it's not superfluous
It doesn't do any harm but is definitely superfluous. Fixed in [0].
try: controller = Controller.from_port()
except SocketError: sys.stderr.write("ERROR: Couldn't connect to Tor.\n") sys.exit(1)
controller.authenticate()
This is certainly a fine way of doing it, but you might want to also look at connection.connect_port()...
https://stem.torproject.org/api/connection.html#stem.connection.connect_por t
It is intended to be a quick and easy method of getting a Controller for command-line applications. For instance, it will present a password prompt if tor is configured to use password authentication. Just realized I should have included it in a tutorial somewhere...
I didn't know that. Since the script now depends on stem version > 1.0.1 anyway, I integrated it.
Thank you for your feedback so far!
Your code looks great! If you wouldn't mind I'd love to reference it on stem's examples page...
Sure, go ahead.
Shall I reference 'https://bitbucket.org/ra_/tor-rtt/' or do you anticipate your project having a more permanent home? (this might be a question for Mike as much as you)
I would not mind but I don't have any plans for that. Mike only asked me to make the code accessible online.
Best, Robert
[0] https://bitbucket.org/ra_/tor- rtt/commits/666e0b173871ba3f699c8bc07bfb156f653adf7a
Hi Robert, sorry about the delay. I couldn't sink the time a reply to this thread deserved until now.
-) When I wanted to check if a certain node is an exit node it took me some time to figure out that looking for an exit flag is not sufficient because some nodes are in fact exit nodes but don't have an exit flag.
Yup. It's unfortunate that tor decided to include an 'Exit' flag with such an unintuitive meaning. You're not the first person to be confused by it.
One has to look at the nodes exit policy which is unaccessible by default because of microdescriptors. Maybe returning some meaningful message when one uses get_server_descriptor() and microdescriptors are enabled would help..?
Good idea! Done...
https://gitweb.torproject.org/stem.git/commitdiff/e78f1b7
-) It is not safe to use extend_circuit in parallel for creating new circuits. I think this is not mentioned anywhere.
What kind of issue does that encounter? Is it a problem with stem's thread safety or an issue on tor's side?
or would like a code review then let me know.
That would be awesome!
Few things I'm spotting offhand...
self._lock.acquire()
Manual lock handling is risky. If anything within this block raises an exception (and there's several points throughout your script where you use Controller methods that can potentially raise errors) then the lock won't be released.
The safer way of doing this is to use the 'with' keyword...
with self._lock: # do stuff
This is the same as...
try: self._lock.acquire() # do stuff finally: self._lock.release()
def read(self): ... return None
Not necessary. Methods return None by default.
# pylint: disable-msg=R0902
You might want to look into pyflakes and pep8. I've found them to be better static analysis tools.
try: controller = connect_port() except SocketError: sys.stderr.write("ERROR: Couldn't connect to Tor.\n") sys.exit(1) controller.authenticate()
Not quite. The connect_port() function never returns an exception. Rather, if it fails to establish a control connection then it prints the issue to stdout and returns None. Also, the connection it provides is already authenticated.
This should instead be...
controller = connect_port()
if not controller: sys.exit(1) # failed to get a control connenction
Cheers! -Damian
On Monday 05 August 2013 07:25:20 Damian Johnson wrote:
Yup. It's unfortunate that tor decided to include an 'Exit' flag with such an unintuitive meaning. You're not the first person to be confused by it.
Is this meaning at least documented somewhere and I have just read over it?
-) It is not safe to use extend_circuit in parallel for creating new circuits. I think this is not mentioned anywhere.
What kind of issue does that encounter? Is it a problem with stem's thread safety or an issue on tor's side?
If requests are sent to Tor to create more then a single circuit at once, the mapping between circuit events and create-request is unknown because the circuit ID is not known until the LAUNCHED-event has been received. This is clearly an issue on Tor's side but one could argue that Stem should stop me from using it that way.
Manual lock handling is risky. If anything within this block raises an exception (and there's several points throughout your script where you use Controller methods that can potentially raise errors) then the lock won't be released.
The safer way of doing this is to use the 'with' keyword...
I could get rid of all manual locking besides in one case.
Not necessary. Methods return None by default.
Removed.
You might want to look into pyflakes and pep8. I've found them to be better static analysis tools.
pyflakes didn't say anything but I commited lots of cosmetic pep8 changes .
try: controller = connect_port()
except SocketError: sys.stderr.write("ERROR: Couldn't connect to Tor.\n") sys.exit(1)
controller.authenticate()
Not quite. The connect_port() function never returns an exception. Rather, if it fails to establish a control connection then it prints the issue to stdout and returns None. Also, the connection it provides is already authenticated.
If Tor has ControlPort enabled without having HashedControlPassword set, authenticate() has to be called to authenticate the connection. Though this is not recommended I don't know which other default setting would be more appropriate.
This should instead be...
controller = connect_port()
if not controller: sys.exit(1) # failed to get a control connenction
Fixed.
Best, Robert
Yup. It's unfortunate that tor decided to include an 'Exit' flag with such an unintuitive meaning. You're not the first person to be confused by it.
Is this meaning at least documented somewhere and I have just read over it?
Hi Robert. Here's the relevant part of the spec...
https://gitweb.torproject.org/torspec.git/blob/HEAD:/dir-spec.txt#l1738
What kind of issue does that encounter? Is it a problem with stem's thread safety or an issue on tor's side?
If requests are sent to Tor to create more then a single circuit at once, the mapping between circuit events and create-request is unknown because the circuit ID is not known until the LAUNCHED-event has been received. This is clearly an issue on Tor's side but one could argue that Stem should stop me from using it that way.
Not sure that I follow. The extend_circuit() returns the circuit id (it's provided by the EXTENDCIRCUIT call). Are you saying that tor's EXTENDCIRCUIT response is wrong when done in parallel?
Not quite. The connect_port() function never returns an exception. Rather, if it fails to establish a control connection then it prints the issue to stdout and returns None. Also, the connection it provides is already authenticated.
If Tor has ControlPort enabled without having HashedControlPassword set, authenticate() has to be called to authenticate the connection. Though this is not recommended I don't know which other default setting would be more appropriate.
I think there's some misunderstanding. Yes, when you establish a new controller connection you need to call authenticate(), even if Tor doesn't require any credentials.
connect_port() is a convenience function that does everything (including authentication) for you. If tor requires a password then it gives the user a password prompt. If it runs into an error then it prints an explanation of the failure and returns None. Sounds like I need some more documentation here...
Cheers! -Damian
On Saturday 10 August 2013 02:37:48 Damian Johnson wrote:
Hi Robert. Here's the relevant part of the spec...
https://gitweb.torproject.org/torspec.git/blob/HEAD:/dir-spec.txt#l1738
Thanks. I will try to make that part more clear and open a ticket.
If requests are sent to Tor to create more then a single circuit at once, the mapping between circuit events and create-request is unknown because the circuit ID is not known until the LAUNCHED-event has been received. This is clearly an issue on Tor's side but one could argue that Stem should stop me from using it that way.
Not sure that I follow. The extend_circuit() returns the circuit id (it's provided by the EXTENDCIRCUIT call). Are you saying that tor's EXTENDCIRCUIT response is wrong when done in parallel?
As far as I understand it it's not necessarily wrong but it might be the case that a response that does not belong to the call is received first: Assume a single program making two extend_circuit() calls within a short time. If the first EXTENDED response is delayed for some reason, both calls receive the EXTENDED response belonging to the second call -> both calls use the same circuit ID. Another case, again a single program making two extend_circuit() calls within a short time: if the second call has been made before the first EXTENDED response is received, the second call will use the EXTENDED response from the the first call when it arrives -> both calls use the same circuit ID. Therefore the await_build parameter should be True by default IMHO. Anyway it should be made clear that the await_build parameter doesn't work when extend_circuit() is used by two separate programs/threads that run concurrently. The user has to do the locking of (at least) the LAUNCHED event herself then.
Besides I could not find any filtering of Tor-internal circuit events. If a Tor- internal circuit EXTENDED event occurs during an extend_circuit() call, the wrong circuit ID will be used.
I hope, this is not too confusing.
Best, Robert
As far as I understand it it's not necessarily wrong but it might be the case that a response that does not belong to the call is received first: Assume a single program making two extend_circuit() calls within a short time. If the first EXTENDED response is delayed for some reason, both calls receive the EXTENDED response belonging to the second call -> both calls use the same circuit ID.
If I understand this correctly you're thinking that multiple calls to extend_circuit() cause parallel EXTENDCIRCUIT requests, and the first response would be used for both callers. Is that right?
If so then I would be very interested if you actually see that behaviour. Stem provides thread safe controller communication. See the msg() method of the BaseController - though the Controller's methods are called in parallel the actual socket requests are done in serial to prevent that exact issue that you describe.
Apologies if I'm misunderstanding what you're describing. -Damian
On Saturday 10 August 2013 23:52:44 Damian Johnson wrote:
If I understand this correctly you're thinking that multiple calls to extend_circuit() cause parallel EXTENDCIRCUIT requests, and the first response would be used for both callers. Is that right?
Yes.
If so then I would be very interested if you actually see that behaviour. Stem provides thread safe controller communication. See the msg() method of the BaseController - though the Controller's methods are called in parallel the actual socket requests are done in serial to prevent that exact issue that you describe.
That looks fine to me. I obviously drew the wrong conclusion from the issues I have encountered. My fault, sorry.
Best, Robert
On Saturday 10 August 2013 02:37:48 Damian Johnson wrote:
Yup. It's unfortunate that tor decided to include an 'Exit' flag with such an unintuitive meaning. You're not the first person to be confused by it.
Is this meaning at least documented somewhere and I have just read over it?
Here's the relevant part of the spec...
https://gitweb.torproject.org/torspec.git/blob/HEAD:/dir-spec.txt#l1738
Patch submitted in ticket 9932[0].
Best, Robert