On Thu, Oct 11, 2012 at 5:38 AM, Mike Perry mikeperry@torproject.org wrote:
Also at: https://gitweb.torproject.org/user/mikeperry/torspec.git/blob/mapaddress-che...
Title: Internal Mapaddress for Tor Configuration Testing Author: Mike Perry Created: 08-10-2012 Status: Open Target: 0.2.4.x+
Overview
This proposal describes a method by which we can replace the https://check.torproject.org/ testing service with an internal XML document provided by the Tor client.
Motivation
The Tor Check service is a central point of failure in terms of Tor usability. If it is ever out of sync with the set of exit nodes on the Tor network or down, user experience is degraded considerably. Moreover, the check itself is very time-consuming. Users must wait seconds or more for the result to come back. Worse still, if the user's software *was* in fact misconfigured, the check.torproject.org DNS resolution and request leaks out on to the network.
Design Overview
The system will have three parts: an internal hard-coded IP address mapping (127.84.111.114:80), a hard-coded mapaddress to a DNS name (selftest.torproject.org:80), and a DirPortFrontPage-style simple HTTP server that serves an XML document for both addresses.
The use of XML and HTTP here are both reasons for some unhappiness. Both of them pull in a fair amount of complexity that I'd prefer not to need. (Yes, Tor already has a sort of an HTTP implementation, but at least clients aren't currently required to run what amounts to a local HTTP server.)
I seriously wonder whether the benefits of HTTP (easier to access from within a locked-down web browser environment) aren't actually the _defects_ of HTTP here: it's easier to poke it from a web page.
I understand that your design takes some steps to prevent browser-based attacks on this, but I'm not currently sure how to become sure that that it solves them all. Right now, I'm nervous.
Upon receipt of a request to the IP address mapping, the system will create a new 128 bit randomly generated nonce and provide it in the XML document.
Requests to http://selftest.torproject.org/ must include a valid, recent nonce as the GET url path. Upon receipt of a valid nonce, it is removed from the list of valid nonces. Nonces are only valid for 60 seconds or until SIGNAL NEWNYM, which ever comes first.
So, I'm not totally sure what the nonce field is for. The idea as I understand it is that when you connect to the IPv4 address, you get a nonce, and later when you connect to the hostname, you provide that nonce, and Tor tells you "yes" if you gave it the same nonce.
What does that protect against? My first thought is that you're trying to prevent the case where a malicious local DNS server maps "selftest.torproject.org" to some IP address in their control, and then just runs a server at that IP address to say "yes I'm Tor". But that doesn't make sense, since you could just make one of those that said "yes I'm Tor" no matter what you say for the nonce.
Also, how useful is the followup DNS check? If it's checking that DNS leaks aren't happening... You're going to need torbrowser or something of equivalent complexity for this to work at all; isn't it easier then for torbrowser to make sure that it set up SOCKS ?
The list of pending nonces should not be allowed to grow beyond 10 entries.
This means that any webpage could flush out the list of pending nonces. Does that matter?
The timeout period and nonce limit should be configurable in torrc.
Design: XML document format for http://127.84.111.114
[...]
Security Considerations
XML was chosen over JSON due to the risks of the identifier leaking in a way that could enable websites to track the user[1].
Well, that's a nuclear-powered-flyswatter!
If I read that page right, the problem with using JSON is that it can be parsed and executed as Javascript, and the advantage of XML is that it's unlikely to be syntactically correct javascript, then maybe instead we should
If that's the issue, I'd strongly suggest that instead of going with a more complex data format, we could add a layer of encoding over the json, or use an even simpler format.
Because there are many exceptions and circumvention techniques to the same-origin policy, we have also opted for strict controls on dns-nonce lifetimes and usage, as well as validation of the Host header and SOCKS4A request hostnames.
Of course, this all comes down to the fact that we're using http. Can we spell out why we need HTTP for this?
Thus spake Nick Mathewson (nickm@alum.mit.edu):
On Thu, Oct 11, 2012 at 5:38 AM, Mike Perry mikeperry@torproject.org wrote:
Design Overview
The system will have three parts: an internal hard-coded IP address mapping (127.84.111.114:80), a hard-coded mapaddress to a DNS name (selftest.torproject.org:80), and a DirPortFrontPage-style simple HTTP server that serves an XML document for both addresses.
The use of XML and HTTP here are both reasons for some unhappiness. Both of them pull in a fair amount of complexity that I'd prefer not to need. (Yes, Tor already has a sort of an HTTP implementation, but at least clients aren't currently required to run what amounts to a local HTTP server.)
I seriously wonder whether the benefits of HTTP (easier to access from within a locked-down web browser environment) aren't actually the _defects_ of HTTP here: it's easier to poke it from a web page.
I understand that your design takes some steps to prevent browser-based attacks on this, but I'm not currently sure how to become sure that that it solves them all. Right now, I'm nervous.
This is a reasonable fear. I think the major risk with the proposal revolve around the need to prevent the nonces from being used as tracking beacon...
I did my best to protect against this, but we probably could use a few web-heads reviewing it, too.
Upon receipt of a request to the IP address mapping, the system will create a new 128 bit randomly generated nonce and provide it in the XML document.
Requests to http://selftest.torproject.org/ must include a valid, recent nonce as the GET url path. Upon receipt of a valid nonce, it is removed from the list of valid nonces. Nonces are only valid for 60 seconds or until SIGNAL NEWNYM, which ever comes first.
So, I'm not totally sure what the nonce field is for. The idea as I understand it is that when you connect to the IPv4 address, you get a nonce, and later when you connect to the hostname, you provide that nonce, and Tor tells you "yes" if you gave it the same nonce.
What does that protect against? My first thought is that you're trying to prevent the case where a malicious local DNS server maps "selftest.torproject.org" to some IP address in their control, and then just runs a server at that IP address to say "yes I'm Tor". But that doesn't make sense, since you could just make one of those that said "yes I'm Tor" no matter what you say for the nonce.
*Headdesk*. Doh. Yes, the DNS test needs to be given a transform of the nonce (SHA1? SHA1+salt?), and needs to spit the original back out again in the response for validation by the client.
But yes, that is exactly what we're trying to protect against.
Also, how useful is the followup DNS check? If it's checking that DNS leaks aren't happening... You're going to need torbrowser or something of equivalent complexity for this to work at all; isn't it easier then for torbrowser to make sure that it set up SOCKS ?
Hrmm. I was under the impression most apps have url fetch capabilities. Pidgin appears to. Thunderbird definitely does. Both have XML deps already (as does any XMPP chat app).
But yes, the plan was for this to be used by custom software we wrote.
The list of pending nonces should not be allowed to grow beyond 10 entries.
This means that any webpage could flush out the list of pending nonces. Does that matter?
Hrmm. Maybe. I was balancing this with other issues:
1. Without any limit, web pages could oom the tor client.
2. A website that managed to access this service could track a user for a long period of time by getting a pile of nonces to use, all known to be bound to that user.
We could rely only on a shorter default timeout instead, though.
The timeout period and nonce limit should be configurable in torrc.
Design: XML document format for http://127.84.111.114
[...]
Security Considerations
XML was chosen over JSON due to the risks of the identifier leaking in a way that could enable websites to track the user[1].
Well, that's a nuclear-powered-flyswatter!
If I read that page right, the problem with using JSON is that it can be parsed and executed as Javascript, and the advantage of XML is that it's unlikely to be syntactically correct javascript, then maybe instead we should
Assuming "write our own format." finishes this paragraph.
If that's the issue, I'd strongly suggest that instead of going with a more complex data format, we could add a layer of encoding over the json, or use an even simpler format.
I wanted to avoid requiring our clients write parsers, and everything I could think of already parses XML.
But if you think hand-parsing is less dangerous than relying on an XML lib, we can do line-based key=value instead.
Because there are many exceptions and circumvention techniques to the same-origin policy, we have also opted for strict controls on dns-nonce lifetimes and usage, as well as validation of the Host header and SOCKS4A request hostnames.
Of course, this all comes down to the fact that we're using http. Can we spell out why we need HTTP for this?
See https://trac.torproject.org/projects/tor/ticket/6546#comment:18 and the following comment.
Do you want that in the proposal, you mean?
On Mon, Oct 15, 2012 at 4:38 PM, Mike Perry mikeperry@torproject.org wrote: [...]
What does that protect against? My first thought is that you're trying to prevent the case where a malicious local DNS server maps "selftest.torproject.org" to some IP address in their control, and then just runs a server at that IP address to say "yes I'm Tor". But that doesn't make sense, since you could just make one of those that said "yes I'm Tor" no matter what you say for the nonce.
*Headdesk*. Doh. Yes, the DNS test needs to be given a transform of the nonce (SHA1? SHA1+salt?), and needs to spit the original back out again in the response for validation by the client.
But yes, that is exactly what we're trying to protect against.
Okay. So to write up crypto/protocols that work, you actually need to start by writing up the security features you actually get from your protocol: what the client needs to do, what the attacker might do, and so forth.
Let's say we want the property where a client who has connected via an IP address learns something that the client can use to conclude that it is talking to the same Tor when it connects by hostname. Let's say that the attacker *can* cause the client to make connections to Tor-by-IP or Tor-by-hostname, but can't learn or interfere with the content. Let's say that the attacker can't make his own connections both to Tor-by-IP or Tor-by-hostname. Let's say that the attacker _can_ impersonate Tor-by-hostname.
Is that about right?
Incidentally: No new SHA1 in Tor!
Also, how useful is the followup DNS check? If it's checking that DNS leaks aren't happening... You're going to need torbrowser or something of equivalent complexity for this to work at all; isn't it easier then for torbrowser to make sure that it set up SOCKS ?
Hrmm. I was under the impression most apps have url fetch capabilities. Pidgin appears to. Thunderbird definitely does. Both have XML deps already (as does any XMPP chat app).
But yes, the plan was for this to be used by custom software we wrote.
So, why can't this custom software just check the SOCKS settings?
(Sure, there might be a SOCKS bypass. But there might also be a SOCKS bypass anywhere in the application that doesn't use the same path to the network as the URL fetch code.)
The list of pending nonces should not be allowed to grow beyond 10 entries.
This means that any webpage could flush out the list of pending nonces. Does that matter?
Hrmm. Maybe. I was balancing this with other issues:
Without any limit, web pages could oom the tor client.
A website that managed to access this service could track a user
for a long period of time by getting a pile of nonces to use, all known to be bound to that user.
We could rely only on a shorter default timeout instead, though.
Or design something that uses less server-side memory.
Like, instead of remembering every N, you could construct each N as "r, HMAC(X,r)" where n is a one-off random value and X is an HMAC key Tor creates at startup. Then you could recognize all of the N that you generated without having to remember more than a single HMAC key.
(This doesn't solve the actual protocol problem, but it does show how you can avoid storage issues.)
[...]
If that's the issue, I'd strongly suggest that instead of going with a more complex data format, we could add a layer of encoding over the json, or use an even simpler format.
I wanted to avoid requiring our clients write parsers, and everything I could think of already parses XML.
But if you think hand-parsing is less dangerous than relying on an XML lib, we can do line-based key=value instead.
Not all hand-parsing is less dangerous/bloaty than XML, but some sure is.
Because there are many exceptions and circumvention techniques to the same-origin policy, we have also opted for strict controls on dns-nonce lifetimes and usage, as well as validation of the Host header and SOCKS4A request hostnames.
Of course, this all comes down to the fact that we're using http. Can we spell out why we need HTTP for this?
See https://trac.torproject.org/projects/tor/ticket/6546#comment:18 and the following comment.
Do you want that in the proposal, you mean?
Yeah, and also we should discuss it.
The argument as I understand it is that your browser's TCP sockets API is not guaranteed to use the same proxies as the browser uses for http URL access.
(Of course, that goes the other way: if we were trying this for something like a chat client, there would be no guarantee that the URL access would use the same proxies as are used for regular chat.)
I don't think, though, that "Are my socks proxies configured right?" is the primary use for this tool. Any application we write had *better* get the socks proxies right, and verify that they're right, and audit to make sure they're not bypassable, etc etc. The "is my Tor running" and "can my Tor build circuits" questions seem much more useful.
Thus spake Nick Mathewson (nickm@alum.mit.edu):
On Mon, Oct 15, 2012 at 4:38 PM, Mike Perry mikeperry@torproject.org wrote: [...]
What does that protect against? My first thought is that you're trying to prevent the case where a malicious local DNS server maps "selftest.torproject.org" to some IP address in their control, and then just runs a server at that IP address to say "yes I'm Tor". But that doesn't make sense, since you could just make one of those that said "yes I'm Tor" no matter what you say for the nonce.
*Headdesk*. Doh. Yes, the DNS test needs to be given a transform of the nonce (SHA1? SHA1+salt?), and needs to spit the original back out again in the response for validation by the client.
But yes, that is exactly what we're trying to protect against.
Okay. So to write up crypto/protocols that work, you actually need to start by writing up the security features you actually get from your protocol: what the client needs to do, what the attacker might do, and so forth.
Let's say we want the property where a client who has connected via an IP address learns something that the client can use to conclude that it is talking to the same Tor when it connects by hostname. Let's say that the attacker *can* cause the client to make connections to Tor-by-IP or Tor-by-hostname, but can't learn or interfere with the content. Let's say that the attacker can't make his own connections both to Tor-by-IP or Tor-by-hostname. Let's say that the attacker _can_ impersonate Tor-by-hostname.
Is that about right?
Yeah.
However, I am beginning to wonder if the nonce complexity is worth it at all. It sure is hard to get right, and check.tp.o already does *not* tell you if your DNS is configured properly today.
This is making me think we should table this multi-request nonce idea and instead just focus on the simple case: Replacing check with a local IP-only test.
We can then consider bringing the nonce+DNS test back later on, if we decide we do actually want the DNS test.
The list of pending nonces should not be allowed to grow beyond 10 entries.
This means that any webpage could flush out the list of pending nonces. Does that matter?
Hrmm. Maybe. I was balancing this with other issues:
Without any limit, web pages could oom the tor client.
A website that managed to access this service could track a user
for a long period of time by getting a pile of nonces to use, all known to be bound to that user.
We could rely only on a shorter default timeout instead, though.
Or design something that uses less server-side memory.
Like, instead of remembering every N, you could construct each N as "r, HMAC(X,r)" where r is a one-off random value and X is an HMAC key Tor creates at startup. Then you could recognize all of the N that you generated without having to remember more than a single HMAC key.
(This doesn't solve the actual protocol problem, but it does show how you can avoid storage issues.)
But then, wouldn't we need to have a way to handle expiry for r, or you could be tracked by simple replay? That's the main reason I opted against making the nonce generated by a function. I suppose an HMAC construction would allow us encode a timestamp as part of that r, though...
Because there are many exceptions and circumvention techniques to the same-origin policy, we have also opted for strict controls on dns-nonce lifetimes and usage, as well as validation of the Host header and SOCKS4A request hostnames.
Of course, this all comes down to the fact that we're using http. Can we spell out why we need HTTP for this?
See https://trac.torproject.org/projects/tor/ticket/6546#comment:18 and the following comment.
Do you want that in the proposal, you mean?
Yeah, and also we should discuss it.
The argument as I understand it is that your browser's TCP sockets API is not guaranteed to use the same proxies as the browser uses for http URL access.
(Of course, that goes the other way: if we were trying this for something like a chat client, there would be no guarantee that the URL access would use the same proxies as are used for regular chat.)
I don't think, though, that "Are my socks proxies configured right?" is the primary use for this tool. Any application we write had *better* get the socks proxies right, and verify that they're right, and audit to make sure they're not bypassable, etc etc. The "is my Tor running" and "can my Tor build circuits" questions seem much more useful.
Should we forget the nonce+DNS stuff then, and just scale this back to a simpler local-IP HTTP status port?