Hi Chang,
We've been discussing how to build better pluggable transports for Tor as part of your application to Google Summer of Code. Now that you've been accepted, I thought it would be good to bring this discussion to tor-dev so that others can contribute.
The basic idea behind the project is to build a pluggable transport which is as unlikely has possible to be blocked. In particular, we are most interested in improving the situation in countries where obfs2/obfs3 is blocked or likely to be blocked in the near future.
There are two basic options, which would ideally be combined: - Better camouflaging Tor traffic - Scanning resistance
On better camouflaging Tor traffic, the benchmark we have is obfs3, which converts Tor traffic to data which is indistinguishable from random bytes (but timing and packet-size patterns are not disguised). As far as I'm aware, this is not being blocked anywhere but it may be possible to block based on the fact that there's not much truly random data on the Internet. Also, obfs3 will not get through a HTTP proxy, as it is clearly not HTTP.
So one option for the project is to impersonate HTTP. This is deceptively difficult because although HTTP is transmitted over TCP, the properties it offers to higher layers are not as strong as TCP (and not as required by Tor). For instance, individual HTTP requests may be re-ordered if they are over different TCP connections. Also, responses may be truncated without an error being reported to higher layers (which is why HTTP includes length fields as an option). HTTP doesn't give the same congestion avoidance as TCP and proxies can both cache and modify data they transmit. The HTTP specification is vague on some topics, and even when it specifies a particular behaviour, proxies frequently violate the specification.
On the up-side of a HTTP proxy, HTTP is probably one of the last protocols a country will block before they turn off the Internet completely, so it has a good chance of getting through. Also, in some scenarios the only way for traffic to get out is via a HTTP proxy. So I think there is significant usefulness in this option. Also, it is incredibly difficult to hide one protocol inside a different one, because just recording the approximate number of bytes sent vs bytes received can give a good recognition of the protocol [4]. Hiding HTTP-over-Tor as BitTorrent traffic will likely be detectable based on such a statistic, but HTTP-over-Tor as HTTP at least has a chance.
On the side of scanning resistance, we discussed the challenge of implementing scanning resistance with TCP. Here, the problem is that someone sending a SYN packet to a port which is open will receive a SYN-ACK, regardless of what user code does. To resist scanning (with something like BridgeSPA [1]) the pluggable transport would need to be quite tightly integrated with the OS rather than just using the standard socket API. Therefore it will create deployment difficulties, especially on Windows which has locked-down the raw sockets API.
Therefore it might be interesting to send data over UDP rather than TCP, as then it is the responsibility of the user code to send the SYN-ACK-equivalent. Tor needs properties similar to TCP from it's pluggable transport, and so any UDP-pluggable transport would need something which replaces TCP: reliable in-order delivery with congestion management. One option here is libutp, as used by BitTorrent [2]. There is a vast amount of libutp traffic on the Internet, but it's timing and upstream/downstream characteristics will be different from how Tor would use it. Alternatively, it might not be worth worrying about this type of scanning resistance, and just focus on what it is possible to do with TCP, as done with ScrambleSuite [3].
Chang, George, do you have anything to add to this summary? Does anyone else on tor-dev have thoughts on these topics?
Best wishes, Steven
[1] http://www.cypherpunks.ca/~iang/pubs/bridgespa-wpes.pdf [2] https://github.com/bittorrent/libutp [3] http://www.cs.kau.se/philwint/pdf/scramblesuit2013.pdf [4] http://dedis.cs.yale.edu/2010/anon/papers/parrot.pdf
I have another idea. (Not "another" in the sense of "do this instead", but "another" in the sense of "maybe do this additionally").
Can a country block SSH? Surely state-sponsored network operations take place over SSH, so I suspect a country cannot block it quickly, easily, and without internal retaliation from it's legitimate users. Bureaucracy.
What if one of the obfuscated proxy protocols was SSH? And not "Let's make Tor look like SSH" but "Let's just put Tor inside of SSH". Have obfsssh-client use libssh to connect to a obfsssh bridge. When you get the Bridge IP, you also get a SSH private key to connect to the SSH daemon.
On the serverside, obfsssh-server makes sure SSH is listening on port whatever, and connected to Tor's local port 50000 (using the -W option). When you login, the client just talks Tor into the SSH stream, and on the server it's passed right into Tor.
This also very neatly absolves the issue of "What if a censor tries probing a obfs-bridge to see if it's a Tor bridge?" The server is a perfectly legitimate SSH daemon showing a perfectly legitimate key-based login prompt. Unless the censor has the private key, they can't login. The key is distributed next to the bridge IP - it's not intended to be anymore secret than the IP.
I think the advantages of this are: 1) While it does require some development effort - it's not nearly as much as other proposals. Accordingly it's lightweight. It's easy to deploy and experiment with and just see if or how it works. 2) It allows us to test a theory: that if we can identify a particular service or protocol that a censored country's government relies on, we can disguise Tor as it, and make it painful or difficult for them to block.
I brainstormed this with Daniel Kahn Gilmore, Phillip Winter, Isis, and a few others in Hong Kong. The notes we took are below:
Client Side libssh connection using a private key that is distributed w/ the bridge IP connect to the server
obs normally listens on 1337 Two options: 1) ssh -L 1337 and obs doesn't listen on anything 2) obs listens on 1337 and takes the data and passes it to ssh -W
ssh -W keeps the same mechanisms that obsproxy uses so it's preferable
Server Side tor runs on local port 50000 ssh -W tells ssh on the other side to connect to the tor port obsproxy does not touch data on the server side
obsproxy does not open a port it sits there making sure: - tor is running - tor is configured right - ssh is listening on the correct port - ssh is configured right - this includes checking that MaxSessions is appropriately sane - users can auth to ssh using the private key that is expected
Open questions: Should we use ssh -L or ssh -W on the client side? (Probably -W) Is the -W option (the control messages) in the clear or in the encrypted transport If it's in the clear, this can be fingerprintable
Libraries - paramiko does SSH server and SSH client, could use it
-tom
How good are SSH connections with hiding what's inside?
Website fingerprinting has demonstrated, that SSH connections may hide communication contents, but which website was visited, could be guessed with a fairly good results.
Tor isn't a website, but if SSH leaks which website has been visited even when using a SSH tunnel, will it also leak the fact, that someone is using Tor through a SSH tunnel?
On 28 May 2013 14:51, adrelanos adrelanos@riseup.net wrote:
How good are SSH connections with hiding what's inside?
Website fingerprinting has demonstrated, that SSH connections may hide communication contents, but which website was visited, could be guessed with a fairly good results.
Tor isn't a website, but if SSH leaks which website has been visited even when using a SSH tunnel, will it also leak the fact, that someone is using Tor through a SSH tunnel?
I think that if we make the adversary upgrade from probing and byte matching (e.g. look for specific ciphersuites) to statistical protocol modeling, especially with a small time investment on our part, we have won a battle. Development effort isn't free.
You probably can detect Tor traffic inside of SSH with some probability X after some amount of traffic Y. But what X, what Y, and how much effort on behalf of the adversary will it take? I don't know, but I do think we should work to move the fight beyond something as simple as byte matching.
-tom
Tom Ritter:
On 28 May 2013 14:51, adrelanos adrelanos@riseup.net wrote:
How good are SSH connections with hiding what's inside?
Website fingerprinting has demonstrated, that SSH connections may hide communication contents, but which website was visited, could be guessed with a fairly good results.
Tor isn't a website, but if SSH leaks which website has been visited even when using a SSH tunnel, will it also leak the fact, that someone is using Tor through a SSH tunnel?
I think that if we make the adversary upgrade from probing and byte matching (e.g. look for specific ciphersuites) to statistical protocol modeling, especially with a small time investment on our part, we have won a battle. Development effort isn't free.
You probably can detect Tor traffic inside of SSH with some probability X after some amount of traffic Y. But what X, what Y, and how much effort on behalf of the adversary will it take? I don't know, but I do think we should work to move the fight beyond something as simple as byte matching.
Yes. Don't let me put off this idea. It was just a wild guess. Most likely an ssh transport will always work for a few people and that already an improvements. The more pluggable transports, the better. Maybe if there are enough transports, the other side just gives up.
On 2013-05-28 4:42 PM, adrelanos wrote:
The more pluggable transports, the better. Maybe if there are enough transports, the other side just gives up.
My interest is piqued by this statement and similar sounding ones that I hear, and myself also think, when talking about censorship. I suspect that if certain information leakage events (ILE) are important for the censor then I don't think giving up is an option, even if it means doing the hard(?) task of blocking all the pluggable transports. Of course this is all conjecture---I don't know the censor---which brings me to my main point:
It is important that we have a model of which censor we are wanting to defeat (or at least annoy). I don't mean every censor or every use case, just the one we are currently discussing. Also, we can have different descriptions of the same censor depending on the situation. The same censor can bring to bear very different tools depending on if the users are being annoying enough, or if ultimately the problem can be dealt with by non-Internet means. This also allows us to talk about the same censor and produce different censorship solutions depending on our goals and the censorship conditions that apply. We can through our actions (like becoming popular) shift the censorship context and thus have to reevaluate our solution.
I know that it is fiendishly difficult to get correct but it would help the discussion if we knew exactly what we're up against, at least in qualitative terms.
My attempt from what I think we're talking about, please correct/add to this where I err:
Circumvention strategies: 0. Collateral damage and obfuscation.
Censor Capabilities: 1. View all traffic coming in and out of a network (most likely has visibility of all AS and IX level traffic). We'll call this the visibility bubble. 2. Can manipulate (add, delete, change) said traffic in time and data dimensions.
Motivations: 3. Block *all* information leakage events. This means if even one ILE occurs the circumventor wins. 4. Limit collateral damage but some is acceptable.
Censorship Target: 5. General user population (G) within the visibility of the bubble. 6. Circumventor population (Cr) in visibility bubble. 7. Cr/G << 1; the incidence rate (R).
I think that this censor, while in a seemingly powerful position due to 1 and 2, is in a difficult dilemma due to 3 and 4, especially if 7 is a small number. Of course if we relax the condition of blocking *all* ILEs then the situation becomes more favorable for the censor.
I hope that descriptions such as the above really help identify the issues at hand helps focus on what is pertinent. I suspect that with Tor being useful to a diverse user-base the censorship scenarios are just as varied and the solutions (even within the plugabble transport space) can be useful in ways we did not think of.
Cheers, -mtee
On Tue, May 28, 2013 at 07:55:45PM -0400, Tariq Elahi wrote:
- Can manipulate (add, delete, change) said traffic in time and data
dimensions.
The challenge is to predict what can actually be done with these three simple atoms. Be it terminating non-whitelisted TCP connections after 60 seconds, hijacking TCP connections after authentication or actively probing suspicious traffic.
Motivations: 3. Block *all* information leakage events. This means if even one ILE occurs the circumventor wins.
I suppose, in practice it's absolutely sufficient to block most of it. Plenty of deployed censorship systems are trivial to circumvent by exploiting specific DPI shortcomings (should we call it "spear circumvention"?). But only if you have the knowledge to do that. If only the very small technical elite is able to bypass the filters, you effectively win.
There's also a social component. If you, as a censor, can spread enough FUD about the national filter, people might not even try to circumvent it.
Cheers, Philipp
On 2013-05-29 5:48 AM, Philipp Winter wrote:
On Tue, May 28, 2013 at 07:55:45PM -0400, Tariq Elahi wrote:
- Can manipulate (add, delete, change) said traffic in time and data
dimensions.
The challenge is to predict what can actually be done with these three simple atoms. Be it terminating non-whitelisted TCP connections after 60 seconds, hijacking TCP connections after authentication or actively probing suspicious traffic.
It is challenging to predict, but since the censor is a black box we can only make assumptions and hope that they are over estimations of the censor's capabilities, that is if the capabilities can be ordered (partially or totally).
Also, Tor tries to satisfy a range of users behind a range of censorship regimes. Circumvention solutions that work with Tor inherit this diverse user base. Balancing diversity of user base (with censorship regime) with efficacy of circumvention is something that needs further looking in to.
Motivations: 3. Block *all* information leakage events. This means if even one ILE occurs the circumventor wins.
I suppose, in practice it's absolutely sufficient to block most of it. Plenty of deployed censorship systems are trivial to circumvent by exploiting specific DPI shortcomings (should we call it "spear circumvention"?).
I like that. Like a spear it attacks one weak link, but like a spear it doesn't catch (feed) much.
But only if you have the knowledge to do that. If only the very small technical elite is able to bypass the filters, you effectively win.
Going back to the point above, the tech elite are just some of the Tor user base. If this is who the circumvention system should serve then awesome. If not then we have more thinking to do.
There's also a social component. If you, as a censor, can spread enough FUD about the national filter, people might not even try to circumvent it.
This is true. FUD works. But I don't think that is something we can address through technological means, unless we're talking about keeping ppl anonymous so that they may test the FUD without repercussions.
mtee
On Tue, May 28, 2013 at 02:33:40PM -0400, Tom Ritter wrote:
Can a country block SSH? Surely state-sponsored network operations take place over SSH, so I suspect a country cannot block it quickly, easily, and without internal retaliation from it's legitimate users. Bureaucracy.
There would be rate-limiting. While not touching latency for SSH connections, a censor could rate-limit the throughput. That way, "normal" SSH would still work while bulk file transfers (such as Tor tunneled over SSH) would become a pain.
Nevertheless, I think this is an interesting idea and worth exploring further.
Cheers, Philipp
On Tue, May 28, 2013 at 02:33:40PM -0400, Tom Ritter wrote:
I have another idea. (Not "another" in the sense of "do this instead", but "another" in the sense of "maybe do this additionally").
Can a country block SSH? Surely state-sponsored network operations take place over SSH, so I suspect a country cannot block it quickly, easily, and without internal retaliation from it's legitimate users. Bureaucracy.
I assume they're more likely to simply whitelist IP addrs than allow unimpeded connectivity for all. As Philipp said, SSL/TLS connections are throttled heavily, if not entirely blocked, in similar situations.
What if one of the obfuscated proxy protocols was SSH? And not "Let's make Tor look like SSH" but "Let's just put Tor inside of SSH". Have obfsssh-client use libssh to connect to a obfsssh bridge. When you get the Bridge IP, you also get a SSH private key to connect to the SSH daemon.
We could go slightly further than this and build into the PT a rekey-gen period such that bridge (IP addr, priv key) pairs are only valid for a certain amount of time. We could even increase this to a (addr, privkey, port) triplet, if we're ambitious. (Ideally BridgeDB will support similar limited-time tokens for most/all future PTs, also).
(I also enjoy the slight irony of the fact that obfs2 was based on brl's obfuscated openssh [0] and now we're trying to use SSH as a means of obfuscation. :) )
On the serverside, obfsssh-server makes sure SSH is listening on port whatever, and connected to Tor's local port 50000 (using the -W option). When you login, the client just talks Tor into the SSH stream, and on the server it's passed right into Tor.
This also very neatly absolves the issue of "What if a censor tries probing a obfs-bridge to see if it's a Tor bridge?" The server is a perfectly legitimate SSH daemon showing a perfectly legitimate key-based login prompt. Unless the censor has the private key, they can't login. The key is distributed next to the bridge IP - it's not intended to be anymore secret than the IP.
I think the advantages of this are:
- While it does require some development effort - it's not nearly as much
as other proposals. Accordingly it's lightweight. It's easy to deploy and experiment with and just see if or how it works. 2) It allows us to test a theory: that if we can identify a particular service or protocol that a censored country's government relies on, we can disguise Tor as it, and make it painful or difficult for them to block.
I found an interesting comment on a blog post the other day [1]: " On January 23rd, 2011 Anonymous said: First Tor. Next Ultrasurf. Then freegate. Then everything else. What if Iran gets tired of the shell game and whitelists the Internet? Then what? "
This was two years ago. We can mimic existing protocols for a while, but eventually - it somehow seems - whitelisting actually becomes an options. What if this trend grows? In which direction do we go?
One idea I've been thinking about may actually be applicable for this, but it's still in a this-is-probably-a-bad-idea stage. Basically, create a whitelist of sockets we know the censor will not block, maybe including google.com:80 or the-iranian-govt-loves-me.ir:22. Once these are known, setup a system of facilitators similar to (if not the same as) what flashproxy uses but instead of having the flashproxy connect to the censored user, the client sends out a single udp packet to one of these hosts that is on the whitelist. She then includes the information about which socket she chose in the introduction to the facilitator. The server-side PT then receives the information from the facilitator (client IP addr:port, unblocked server addr:port, public key, etc) and spoofs a packet from that node which includes the PT's information (real IP address, token, etc) and they can then proceed to handshake over some protocol.
There are at least half a dozen ways this is fingerprintable right now, so it won't work as-is, and the fact that it requires raw sockets, hence root, is not ideal in any scenario.
I brainstormed this with Daniel Kahn Gilmore, Phillip Winter, Isis, and a few others in Hong Kong. The notes we took are below:
Client Side libssh connection using a private key that is distributed w/ the bridge IP connect to the server
obs normally listens on 1337 Two options: 1) ssh -L 1337 and obs doesn't listen on anything 2) obs listens on 1337 and takes the data and passes it to
ssh -W
ssh -W keeps the same mechanisms that obsproxy uses so it's
preferable
For the sake of Windows users I think we'd only be able to consider 2).
Server Side tor runs on local port 50000 ssh -W tells ssh on the other side to connect to the tor port obsproxy does not touch data on the server side
The -W option on my man page says: -W host:port Requests that standard input and output on the client be forwarded to host on port over the secure channel.
Is this the option you're referring to? As far as I can tell, this option only exists in ssh, what am I missing? :) Below you specify that this would be used on the client side...I guess I may just be confused by your wording and definition of "ssh on the other side"? :)
obsproxy does not open a port it sits there making sure: - tor is running - tor is configured right - ssh is listening on the correct port - ssh is configured right - this includes checking that MaxSessions is appropriately
sane - users can auth to ssh using the private key that is expected
Open questions: Should we use ssh -L or ssh -W on the client side? (Probably -W) Is the -W option (the control messages) in the clear or in the encrypted transport If it's in the clear, this can be fingerprintable
Libraries
- paramiko does SSH server and SSH client, could use it
-tom
I think this is a fine idea to investigate further, I'm glad you all spent some time to think about this and sketch it up.
One thing I fear about implementing these schemes where we use/mimic existing protocols to obfuscate our own traffic is that we are inevitably increasing the censorship in the respective country because the censor eventually figures out a way to get what she wants while blocking the circumvention tools, accepting any collateral damage. It is one consequence with obfs* when the censor blocks the ip addr:port of the bridge - no harm, no foul, only the loss of a bridge. I worry about the reprecussions we'll see with this.
As a side note, I had an interesting find earlier when I stumbled upon an "obfs2ssh" [2] implementation. "obfs2ssh used obfsproxy to obscure SSH tunnel to avoid the DPI detection recently installed in China." I haven't reviewed the code, just thought it was an interesting find.
- Matt
[0] https://github.com/brl/obfuscated-openssh [1] https://blog.torproject.org/blog/update-internet-censorship-iran [2] https://code.google.com/p/obfs2ssh/
On Tue, May 28, 2013 at 03:59:15PM +0100, Steven Murdoch wrote:
Hi Chang,
We've been discussing how to build better pluggable transports for Tor as part of your application to Google Summer of Code. Now that you've been accepted, I thought it would be good to bring this discussion to tor-dev so that others can contribute.
The basic idea behind the project is to build a pluggable transport which is as unlikely has possible to be blocked. In particular, we are most interested in improving the situation in countries where obfs2/obfs3 is blocked or likely to be blocked in the near future.
There are two basic options, which would ideally be combined:
- Better camouflaging Tor traffic
- Scanning resistance
On better camouflaging Tor traffic, the benchmark we have is obfs3, which converts Tor traffic to data which is indistinguishable from random bytes (but timing and packet-size patterns are not disguised). As far as I'm aware, this is not being blocked anywhere but it may be possible to block based on the fact that there's not much truly random data on the Internet. Also, obfs3 will not get through a HTTP proxy, as it is clearly not HTTP.
So one option for the project is to impersonate HTTP. This is deceptively difficult because although HTTP is transmitted over TCP, the properties it offers to higher layers are not as strong as TCP (and not as required by Tor). For instance, individual HTTP requests may be re-ordered if they are over different TCP connections. Also, responses may be truncated without an error being reported to higher layers (which is why HTTP includes length fields as an option). HTTP doesn't give the same congestion avoidance as TCP and proxies can both cache and modify data they transmit. The HTTP specification is vague on some topics, and even when it specifies a particular behaviour, proxies frequently violate the specification.
On the up-side of a HTTP proxy, HTTP is probably one of the last protocols a country will block before they turn off the Internet completely, so it has a good chance of getting through. Also, in some scenarios the only way for traffic to get out is via a HTTP proxy. So I think there is significant usefulness in this option. Also, it is incredibly difficult to hide one protocol inside a different one, because just recording the approximate number of bytes sent vs bytes received can give a good recognition of the protocol [4]. Hiding HTTP-over-Tor as BitTorrent traffic will likely be detectable based on such a statistic, but HTTP-over-Tor as HTTP at least has a chance.
On the side of scanning resistance, we discussed the challenge of implementing scanning resistance with TCP. Here, the problem is that someone sending a SYN packet to a port which is open will receive a SYN-ACK, regardless of what user code does. To resist scanning (with something like BridgeSPA [1]) the pluggable transport would need to be quite tightly integrated with the OS rather than just using the standard socket API. Therefore it will create deployment difficulties, especially on Windows which has locked-down the raw sockets API.
Therefore it might be interesting to send data over UDP rather than TCP, as then it is the responsibility of the user code to send the SYN-ACK-equivalent. Tor needs properties similar to TCP from it's pluggable transport, and so any UDP-pluggable transport would need something which replaces TCP: reliable in-order delivery with congestion management. One option here is libutp, as used by BitTorrent [2]. There is a vast amount of libutp traffic on the Internet, but it's timing and upstream/downstream characteristics will be different from how Tor would use it. Alternatively, it might not be worth worrying about this type of scanning resistance, and just focus on what it is possible to do with TCP, as done with ScrambleSuite [3].
Chang, George, do you have anything to add to this summary? Does anyone else on tor-dev have thoughts on these topics?
Best wishes, Steven
FYI: Tao (Cc'd) has a pluggable transport using libutp (somewhat?) working. [Currently, each bridge can only handle one client at a time, but that doesn't seem hard to work around.]
- Ian