Hello!
I have been reading through the various tor specifications trying to understand how this all works, so please forgive any ignorance of the protocol on my part. There seems to be a fair amount of gaps about specifically how various communications take place; for instance if we consider the very beginning of the communication chain, Directory Authorities, we have the dir-spec.txt file which outlines rather well what type of information can be retrieved from the directories, but not what communication protocol is actually being used.
It appears that the usage of HTTP is fairly inherent, but this seems like it is only a partial answer as only some of the trusted authorities seem to speak HTTP on the address & port combination compiled into tor, moreover at least some of the authorities do not appear to implement the specification entirely. For instance, the trusted authority at MIT, 'morial' is listening on port 9131, however attempting to retrieve the network status document from it via a GET request to /tor/status/all.z results in no output at all. I would assume if this was a matter of needing to connect via SSL that I would receive an error due to a bad handshake, but I get nothing back. This holds true for at least one of the other trusted authorities listening on a non-HTTP related port (turtles). So for those servers, exactly what protocol is being used and is it documented anywhere other than the source code?
Then, for instance, if we connect to 'tor26', which does respond to HTTP requests and attempt to retrieve a v2 network status document via the /tor/status/all.z URI, we receive a 404 although it appears the document that should exist there exists on other URIs, it's not entirely clear if this is just outdated code, specific to particular versions of the protocol (tor26 does have the no-v2 flag set which might be the issue?) or what exactly.
So the question is, are there accurate specifications anywhere that focus not only on the semantics of cryptography and rationale behind certain choices but also the specifics of how exactly the protocol works or am I 'stuck' with reading the source code? I suspect that it is the later, so my question would be is there anyplace where the control flow is somewhat documented? (As the flow is somewhat disjointed at least in part to the way libevent works and other such aspects that make it difficult to parse if you're not familiar already with how everything is interconnected).
I have other questions about aspects of the protocol, but I will mostly save those until I understand the basic blocks of it better. But to exemplify somewhat, it does seem that the introduction of guard nodes would cause an inverse of desired effect; there appears to be about 1000-1100 guard nodes versus a several thousand relays, and about 800-900 exit nodes so it would seem that mitigating the attack where an attacker controlled C number of nodes is essentially pointless as one would only need to control a set number of guard and exit nodes and can more or less ignore the relays in between, so whereas you needed say C/N nodes previously, one would only need Cg/Ng (Cg controlled guards / Ng Number of guards). If we then factor in that it seems possible that a guard or relay can essentially indirectly control the route a circuit takes through the network by continually causing cell extensions to fail for all relays and exit nodes that they do not control, then the value for Cg would seem to not need to be overly large or at least not approach the values of C or Cg (C/N or Cg/Ng).
This seems incredibly reasonable for attackers that have state level resources (What is 1,000 computers to China? Iran? ...the United States?) and because the algorithm for selecting guards appears to be based entirely on stability and bandwidth; metrics we can expect a government to have plenty of on hand. I understand that rotation is supposed to ease this somewhat, but at least according to the academic paper out of Waterloo that Roger co-authored, it would appear that this actually facilitates the compromise of more clients than eases the problem, with the most secure (in the lab) situation being a single guard. (I understand that in practice this will not be realistic as it creates a bottle-neck in a variety of ways, e.g. firewalls and DDoS).
I suspect many of the questions I have will be answered as I better familiarize myself with the protocol, so of the few I've enumerated, they can be thought of exemplifications that I expect will hash themselves out as I progress through the source and specifications.
Thanks a lot!
Jon
but not what communication protocol is actually being used.
http, https://gitweb.torproject.org/torspec.git/blob/HEAD:/dir-spec.txt
section 1 line 184 "All directory information is uploaded and downloaded with HTTP."
If you search through the document for http, you will find most of the uris and related info youre looking for.
tl;dr/no idea for the rest of your post.
On Sun, May 19, 2013 at 02:40:13PM -0400, Jon Smithe wrote:
I have been reading through the various tor specifications trying to understand how this all works, so please forgive any ignorance of the protocol on my part. There seems to be a fair amount of gaps about specifically how various communications take place; for instance if we consider the very beginning of the communication chain, Directory Authorities, we have the dir-spec.txt file which outlines rather well what type of information can be retrieved from the directories, but not what communication protocol is actually being used.
Yeah -- I'm afraid there's a lot to learn at this point. Perhaps when you figure out the bootstrapping process to your satisfaction, you'll write up a summary and then we can correct it as needed and have a better answer for the next person?
It appears that the usage of HTTP is fairly inherent, but this seems like it is only a partial answer as only some of the trusted authorities seem to speak HTTP on the address & port combination compiled into tor
Each directory authority actually has two ports baked into Tor. E.g., "moria1 orport=9101 no-v2 " "v3ident=D586D18309DED4CD6D57C18FDB97EFA96D330566 " "128.31.0.39:9131 9695 DFC3 5FFE B861 329B 9F1A B04C 4639 7020 CE31",
has 9131 as its DirPort (answering naked http requests), and 9101 as its ORPort (doing the Tor TLS handshake). Note that clients by default connect to the ORPort and then tunnel their http request through it (see the 'begindir' relay cell command), both for authentication and to prevent simple DPI-based blocking.
, moreover at least some of the authorities do not appear to implement the specification entirely. For instance, the trusted authority at MIT, 'morial' is listening on port 9131, however attempting to retrieve the network status document from it via a GET request to /tor/status/all.z results in no output at all.
That's for asking v2 directory questions, not v3.
We disabled it because of https://trac.torproject.org/projects/tor/ticket/6783
Building what basically amounts to a botnet of old Tors, all clamoring for obsolete directory information, has certainly proved a learning experience. :)
I would assume if this was a matter of needing to connect via SSL that I would receive an error due to a bad handshake, but I get nothing back. This holds true for at least one of the other trusted authorities listening on a non-HTTP related port (turtles). So for those servers, exactly what protocol is being used and is it documented anywhere other than the source code?
Try the v3 protocol, not the v2 protocol.
I just made https://trac.torproject.org/projects/tor/ticket/8913
(tor26 does have the no-v2 flag set which might be the issue?)
(Hm? No it doesn't.)
So the question is, are there accurate specifications anywhere that focus not only on the semantics of cryptography and rationale behind certain choices but also the specifics of how exactly the protocol works
I think the specs do a pretty good job of explaining what is supported, but as you say they don't specify how you should use the protocol.
Some of that is in path-spec.txt.
But see also https://trac.torproject.org/projects/tor/ticket/7106
I have other questions about aspects of the protocol, but I will mostly save those until I understand the basic blocks of it better. But to exemplify somewhat, it does seem that the introduction of guard nodes would cause an inverse of desired effect; there appears to be about 1000-1100 guard nodes versus a several thousand relays, and about 800-900 exit nodes so it would seem that mitigating the attack where an attacker controlled C number of nodes is essentially pointless as one would only need to control a set number of guard and exit nodes and can more or less ignore the relays in between, so whereas you needed say C/N nodes previously, one would only need Cg/Ng (Cg controlled guards / Ng Number of guards).
It seems you're leaving out Ce/Ne, and/or assuming that the adversary controls/observes the destination also.
I agree that the guard notion is counterintuitive, and also it's not perfect, but I think it's way better than not doing it.
If we then factor in that it seems possible that a guard or relay can essentially indirectly control the route a circuit takes through the network by continually causing cell extensions to fail for all relays and exit nodes that they do not control
You might like http://freehaven.net/anonbib/#ccs07-doa (though it doesn't come with analysis of the guard design, and there's still some debate about how the guard design changes things).
You might also like Mike Perry's work on Path Bias detection. See https://trac.torproject.org/projects/tor/ticket/5458 and also look in the changelog for the string "Bias".
This seems incredibly reasonable for attackers that have state level resources (What is 1,000 computers to China? Iran? ...the United States?) and because the algorithm for selecting guards appears to be based entirely on stability and bandwidth; metrics we can expect a government to have plenty of on hand. I understand that rotation is supposed to ease this somewhat, but at least according to the academic paper out of Waterloo that Roger co-authored, it would appear that this actually facilitates the compromise of more clients than eases the problem, with the most secure (in the lab) situation being a single guard. (I understand that in practice this will not be realistic as it creates a bottle-neck in a variety of ways, e.g. firewalls and DDoS).
1 guard is better than 3 guards in this respect. But both 1 and 3 are way better than 0.
--Roger