We sent out an earlier version of this, well, earlier. Here it is more complete and more revised.
I've tried to incorporate people's comments. There's some worthwhile stuff it doesn't do yet, like provide automatic and easy ways to compose plugins. I think it should work well enough, but more comment is always worthwhile.
yrs, -- Nick
===== Filename: 180-pluggable-transport.txt Title: Pluggable transports for circumvention Author: Jacob Appelbaum, Nick Mathewson Created: 15-Oct-2010 Status: Open
Overview
This proposal describes a way to decouple protocol-level obfuscation from the core Tor protocol in order to better resist client-bridge censorship. Our approach is to specify a means to add pluggable transport implementations to Tor clients and bridges so that they can negotiate a superencipherment for the Tor protocol.
Scope
This is a document about transport plugins; it does not cover discovery improvements, or bridgedb improvements. While these requirements might be solved by a program that also functions as a transport plugin, this proposal only covers the requirements and operation of transport plugins.
Motivation
Frequently, people want to try a novel circumvention method to help users connect to Tor bridges. Some of these methods are already pretty easy to deploy: if the user knows an unblocked VPN or open SOCKS proxy, they can just use that with the Tor client today.
Less easy to deploy are methods that require participation by both the client and the bridge. In order of increasing sophistication, we might want to support:
1. A protocol obfuscation tool that transforms the output of a TLS connection into something that looks like HTTP as it leaves the client, and back to TLS as it arrives at the bridge. 2. An additional authentication step that a client would need to perform for a given bridge before being allowed to connect. 3. An information passing system that uses a side-channel in some existing protocol to convey traffic between a client and a bridge without the two of them ever communicating directly. 4. A set of clients to tunnel client->bridge traffic over an existing large p2p network, such that the bridge is known by an identifier in that network rather than by an IP address.
We could in theory support these almost fine with Tor as it stands today: every Tor client can take a SOCKS proxy to use for its outgoing traffic, so a suitable client proxy could handle the client's traffic and connections on its behalf, while a corresponding program on the bridge side could handle the bridge's side of the protocol transformation. Nevertheless, there are some reasons to add support for transportation plugins to Tor itself:
1. It would be good for bridges to have a standard way to advertise which transports they support, so that clients can have multiple local transport proxies, and automatically use the right one for the right bridge.
2. There are some changes to our architecture that we'll need for a system like this to work. For testing purposes, if a bridge blocks off its regular ORPort and instead has an obfuscated ORPort, the bridge authority has no way to test it. Also, unless the bridge has some way to tell that the bridge-side proxy at 127.0.0.1 is not the origin of all the connections it is relaying, it might decide that there are too many connections from 127.0.0.1, and start paring them down to avoid a DoS.
3. Censorship and anticensorship techniques often evolve faster than the typical Tor release cycle. As such, it's a good idea to provide ways to test out new anticensorship mechanisms on a more rapid basis.
4. Transport obfuscation is a relatively distinct problem from the other privacy problems that Tor tries to solve, and it requires a fairly distinct skill-set from hacking the rest of Tor. By decoupling transport obfuscation from the Tor core, we hope to encourage people working on transport obfuscation who would otherwise not be interested in hacking Tor.
5. Finally, we hope that defining a generic transport obfuscation plugin mechanism will be useful to other anticensorship projects.
Non-Goals
We're not going to talk about automatic verification of plugin correctness and safety via sandboxing, proof-carrying code, or whatever.
We need to do more with discovery and distribution, but that's not what this proposal is about. We're pretty convinced that the problems are sufficiently orthogonal that we should be fine so long as we don't preclude a single program from implementing both transport and discovery extensions.
This proposal is not about what transport plugins are the best ones for people to write. We do, however, make some general recommendations for plugin authors in an appendix.
We've considered issues involved with completely replacing Tor's TLS with another encryption layer, rather than layering it inside the obfuscation layer. We describe how to do this in an appendix to the current proposal, though we are not currently sure whether it's a good idea to implement.
We deliberately reject any design that would involve linking more code into Tor's process space.
Design overview
To write a new transport protocol, an implementer must provide two pieces: a "Client Proxy" to run at the initiator side, and a "Server Proxy" to run at the server side. These two pieces may or may not be implemented by the same program.
Each client may run any number of Client Proxies. Each one acts like a SOCKS proxy that accepts connections on localhost. Each one runs on a different port, and implements one or more transport methods. If the protocol has any parameters, they are passed from Tor inside the regular username/password parts of the SOCKS protocol.
Bridges (and maybe relays) may run any number of Server Proxies: these programs provide an interface like stunnel: they get connections from the network (typically by listening for connections on the network) and relay them to the Bridge's real ORPort.
To configure one of these programs, it should be sufficient simply to list it in your torrc. The program tells Tor which transports it provides. The Tor consensus should carry a new approved version number that is specific for pluggable transport; this will allow Tor to know when a particular transport is known to be unsafe safe or non-functional.
Bridges (and maybe relays) report in their descriptors which transport protocols they support. This information can be copied into bridge lines. Bridges using a transport protocol may have multiple bridge lines.
Any methods that are wildly successful, we can bake into Tor.
Specifications: Client behavior
We extend the bridge line format to allow you to say which method to use to connect to a bridge.
The new format is: "bridge method address:port [[keyid=]id-fingerprint] [k=v] [k=v] [k=v]"
To connect to such a bridge, the Tor program needs to know which local SOCKS proxy will support the transport called "method". It then connects to this proxy, and asks it to connect to address:port. If [id-fingerprint] is provided, Tor should expect the public identity key on the TLS connection to match the digest provided in [id-fingerprint]. If any [k=v] items are provided, they are configuration parameters for the proxy: Tor should separate them with semicolons and put them in the user and password fields of the request, splitting them across the fields as necessary. If a key or value value must contain a semicolon or a backslash, it is escaped with a backslash.
The "id-fingerprint" field is always provided in a field named "keyid", if it was given. Method names must be C identifiers.
Example: if the bridge line is "bridge trebuchet www.example.com:3333 rocks=20 height=5.6m" AND if the Tor client knows that the 'trebuchet' method is provided by a SOCKS5 proxy on 127.0.0.1:19999, the client should connect to that proxy, ask it to connect to www.example.com, and provide the string "rocks=20;height=5.6m" as the username, the password, or split across the username and password.
There are two ways to tell Tor clients about protocol proxies: external proxies and managed proxies. An external proxy is configured with ClientTransportPlugin method socks4 address:port [auth=X] or ClientTransportPlugin method socks5 address:port [username=X] [password=Y] as in "ClientTransportPlugin trebuchet socks5 127.0.0.1:9999". This example tells Tor that another program is already running to handle 'trubuchet' connections, and Tor doesn't need to worry about it.
A managed proxy is configured with ClientTransportPlugin <method> exec <path> [options] as in "ClientTransportPlugin trebuchet exec /usr/libexec/trebuchet --managed" This example tells Tor to launch an external program to provide a socks proxy for 'trebuchet' connections. The Tor client only launches one instance of each external program with a given set of options, even if the same executable and options are listed for more than one method.
If instead of a transport name, the torrc lists "*" for a managed proxy, tor uses that proxy for all transports that it supports. So "ClientTransportPlugin * exec /usr/libexec/tor/foobar" tells Tor that it should use the foobar plugin for everything that it supports.
If two proxies support the same method, Tor should use whichever one is listed first.
The same program can implement a managed or an external proxy: it just needs to take an argument saying which one to be.
See "Managed proxy behavior" for more information on the managed proxy interface.
Server behavior
Server proxies are configured similarly to client proxies. When launching a proxy, the server must tell it what ORPort it has configured, and what address (if any) it can listen on. The server must tell the proxy which (if any) methods it should provide if it can; the proxy needs to tell the server which methods it is actually providing, and on what ports.
When a client connects to the proxy, the proxy may need a way to tell the server some identifier for the client address. It does this in-band.
As before, the server lists proxies in its torrc. These can be external proxies that run on their own, or managedproxies that Tor launches.
An external server proxy is configured as ServerTransportPlugin method proxy address:port param=val.. as in ServerTransportPlugin trebuchet proxy 127.0.0.1:999 rocks=heavy The param=val pairs and the address are used to make the bridge configuration information that we'll tell users.
A managed proxy is configured as ServerTransportPlugin method exec /path/to/binary [options] or ServerTransportPlugin * exec /path/to/binary [options]
When possible, Tor should launch only one binary of each binary/option pair configured. So if the torrc contains
ClientTransportPlugin foo exec /usr/bin/megaproxy --foo ClientTransportPlugin bar exec /usr/bin/megaproxy --bar ServerTransportPlugin * exec /usr/bin/megaproxy --foo
then Tor will launch the megaproxy binary twice: once with the option --foo and once with the option --bar.
Managed proxy interface
When the Tor client launches a client proxy from the command line, it communicates via environment variables. At a minimum, it sets:
{Client and server} HOME, PATH -- as you'd expect.
"STATE_LOCATION" -- a directory where the proxy should store state if it wants to. This directory is not required to exist, but the proxy SHOULD be able to create it if it doesn't. The proxy SHOULD NOT store state elsewhere.
"MANAGED_TRANSPORT_VER=1" -- To tell the proxy which versions of this configuration protocol Tor supports. Future versions will give a comma-separated list. Clients MUST accept comma-separated lists containing any version that they recognize, and MUST work correctly even if some of the versions they don't recognize are non-numeric.
{Client only}
"CLIENT_TRANSPORTS" -- a comma-separated list of which methods this client should enable, or * if all methods should be enabled. The proxy SHOULD ignore methods that it doesn't recognize.
{Server only}
"EXT_SERVER_PORT=addr:portnum" -- A port (probably on localhost) that speaks the extended server protocol.
"ORPORT=addr:portnum" -- Our regular ORPort in a form suitable for local connections.
"BINDADDR=addr" -- An address on which to listen for local connections. This might be the advertised address, or might be a local address that Tor will forward ports to. It MUST be an address that will work with bind().
"SERVER_TRANSPORTS=..." -- A comma-separated list of server methods that the proxy should support, or *
The transport proxy replies by writing NL-terminated lines to stdout. The metaformat is
Keyword OptArgs NL OptArgs = Args | Args = SP ArgChar | Args ArgChar ArgChar = Any character but NUL or NL Keyword = KeywordChar | Keyword KeywordChar KeyWordChar = All alphanumeric characters, dash, and underscore.
Tor MUST ignore lines with keywords that it doesn't recognize.
First, the proxy writes "VERSION 1" to say that it supports this protocol. It must either pick a version that Tor told it about, or pick no version at all, and say "ERROR no-version\n" and exit.
The proxy should then open its ports. If running as a client proxy, it should not use fixed ports; instead it should autoselect ports to avoid conflicts. A client proxy should by default only listen on localhost for connections.
A server proxy SHOULD try listen at a consistent port, though it SHOULD pick a different one if the port it last used is now allocated.
A client or server proxy then should tell which methods it has made available and how. It does this by printing zero or more CMETHOD and SMETHOD lines to its stdout. These lines look like:
CMETHOD methodname SOCKS4/SOCKS5 address:port [ARGS=arglist] \ [OPT-ARGS=arglist]
as in
CMETHOD trebuchet SOCKS5 127.0.0.1:19999 ARGS=rocks,height \ OPT-ARGS=tensile-strength
The ARGS field lists mandatory parameters that must appear in every bridge line for this method. The OPT-ARGS field lists optional parameters. If no ARGS or OPT-ARGS field is provided, Tor should not check the parameters in bridge lines for this method.
The proxy should print a single "CMETHODS DONE" line after it is finished telling Tor about the client methods it provides. If it tries to supply a client method but can't for some reason, it should say: CMETHOD-ERROR methodname "Message"
A proxy should tell Tor about the server methods it is providing by printing zero or more SMETHOD lines. These lines look like:
SMETHOD methodname address:port [Options]
If there's an error setting up a configured server method, the proxy should say: SMETHOD-ERROR methodname "message"
The 'address:port' part of an SMETHOD line is the address to put in the bridge line. The ARGS: part is a list of key-value pairs that the client needs to know. The Options part is a list of space-separated K:V flags that Tor should know about. Recognized options are:
- FORWARD:1
If this option is set, and address:port is not a publicly accessible address, then the bridge needs to forward some other address:port to address:port via upnp-helper.
- ARGS:k=v,k=v,k=v
If this option is set, the K=V arguments are added to the extrainfo document.
- DECLARE:K=V,...
If this option is set, all the K=V options should be added as extension entries to the router descriptor. (See below)
- USE-EXTPORT:1
If this option is set, the server plugin is using the extended server port.
SMETHOD and CMETHOD lines may be interspersed. After the list SMETHOD line, the proxy says "SMETHODS DONE"
The proxy SHOULD NOT tell Tor about a server or client method unless it is actually open and ready to use.
Tor clients SHOULD NOT use any method from a client proxy or advertise any method from a server proxy UNLESS it is listed as a possible method for that proxy in torrc, and it is listed by the proxy as a method it supports.
Proxies should respond to a single INT signal by closing their listener ports and not accepting any new connections, but keeping all connections open, then terminating when connections are all closed. Proxies should respond to a second INT signal by shutting down cleanly.
The extended ORPort protocol.
Server transports may need to connect to the bridge and pass additional information about client connections that the bridge would ordinarily receive from the kernel's TCP stack. To to this, they connect to the "extended server port" as given in SERVER_PORT, sent a short amount of information, wait for a response, and then send the user traffic on that port.
The extended server port protocol is as follows:
COMMAND [2 bytes, big-endian] BODYLEN [2 bytes, big-endian] BODY [Bodylen bytes]
Commands sent from the transport to the server are:
[0x0000] DONE: There is no more information to give. (body ignored)
[0x0001] USERADDR: an address:port string that represents the user's address. If the transport doesn't actually do addresses, this shouldn't be sent.
Replies sent from tor to the proxy are:
[0x1001] OKAY: Send the user's traffic. (body ignored)
[0x1002] DENY: Tor would prefer not to get more traffic from this address for a while. (body ignored)
[We could also use an out-of-band signalling method to tell Tor about client addresses, but that's a historically error-prone way to go about annotating connections.]
Advertising bridge methods:
Bridges put the 'method' lines in their extra-info documents.
method SP methodname SP address:port SP arglist NL
The address:port parse are as returned from an SMETHOD line. The arglist is a K=V,... list as retuned in the ARGS part of the SMETHOD line.
If the SMETHOD line includes a DECLARE: part, the routerinfo gets a new line:
method-info SP methodname SP arglist NL
Bridge authority behavior
We need to specify a way to test different transport methods that bridges claim to support. We should test as many as possible. We should NOT require that we have a way to tra
Bridgedb behavior:
Bridgedb can, given a set of router descriptors and their corresponding extrainfo documents, generate a set of bridge lines for each descriptor. Bridgedb may want to avoid handing out methods that seem to get bridges blocked quickly.
Implementation plan
First, we should implement per-bridge socks settings (as described above in "manually configuring a client proxy for a bridge") and the extended-server-port mechanism. This will let bridges run transport proxies such that they can hand-generate bridge lines to give to clients for testing.
Once that's done, we can improve usability a little bit by implementing external proxies. Once that's done, we can see if we need any managed proxies, or if the whole idea there is silly.
If we do, the next most important part seems to be getting the client-side automatic part written. And once that's done, we can evaluate how much of the server side is easy for people to do and how much is hard.
The "obfsproxy" obfuscating proxy is a likely candidate for an initial transport, as is Steven Murdoch's http thing or something similar.
Notes on plugins to write:
We should ship a couple of null plugin implementations in one or two popular, portable languages so that people get an idea of how to write the stuff.
1. We should have one that's just a proof of concept that does nothing but transfer bytes back and forth.
1. We should not do a rot13 one.
2. We should implement a basic proxy that does not transform the bytes at all
1. We should implement DNS or HTTP using other software (as goodesll did years ago with DNS) as an example of wrapping existing code into our plugin model.
2. The obfuscated-ssh superencipherment is pretty trivial and pretty useful. It makes the protocol stringwise unfingerprintable.
1. Nick needs to be told firmly not to bikeshed the obfuscated-ssh superencipherment too badly
1. Go ahead, bikeshed my day
1. If we do a raw-traffic proxy, openssh tunnels would be the logical choice.
Appendix: recommendations for transports
Be free/open-source software. Also, if you think your code might someday do so well at circumvention that it should be implemented inside Tor, it should use the same license as Tor.
Use libraries that Tor already requires. (You can rely on openssl and libevent being present if current Tor is present.)
Be portable: most Tor users are on Windows, and most Tor developers are not, so designing your code for just one of these platforms will make it either get a small userbase, or poor auditing.
Think secure: if your code is in a C-like language, and it's hard to read it and become convinced it's safe, then it's probably not safe.
Think small: we want to minimize the bytes that a Windows user needs to download for a transport client.
Avoid security-through-obscurity if possible. Specify.
Resist trivial fingerprinting: There should be no good string or regex to search for to distinguish your protocol from protocols permitted by censors.
Imitate a real profile: There are many ways to implement most protocols -- and in many cases, most possible variants of a given protocol won't actually exist in the wild.
I just took a very quick look. Generally, I like it. ;-)
On Tue, Mar 15, 2011 at 05:17:23PM -0400, Nick Mathewson wrote:
To configure one of these programs, it should be sufficient simply to list it in your torrc. The program tells Tor which transports it provides. The Tor consensus should carry a new approved version number that is specific for pluggable transport; this will allow Tor to know when a particular transport is known to be unsafe safe or non-functional.
I'm not sure I understand the above.
"MANAGED_TRANSPORT_VER=1" -- To tell the proxy which versions of this configuration protocol Tor supports. Future versions will give a comma-separated list. Clients MUST accept comma-separated lists containing any version that they recognize, and MUST work correctly even if some of the versions they don't recognize are non-numeric.
Then you need to say what the valid characters in any future version numbers can be.
Bridge authority behavior
We need to specify a way to test different transport methods that bridges claim to support. We should test as many as possible. We should NOT require that we have a way to tra
This paragraph is truncated.
- Ian
On Wed, Mar 16, 2011 at 10:05 AM, Ian Goldberg iang@cs.uwaterloo.ca wrote:
I just took a very quick look. Generally, I like it. ;-)
On Tue, Mar 15, 2011 at 05:17:23PM -0400, Nick Mathewson wrote:
To configure one of these programs, it should be sufficient simply to list it in your torrc. The program tells Tor which transports it provides. The Tor consensus should carry a new approved version number that is specific for pluggable transport; this will allow Tor to know when a particular transport is known to be unsafe safe or non-functional.
I'm not sure I understand the above.
Ah, this idea didn't seem to go anywhere during the design phase. It might need more attention. The original idea (IIUC) was that the consensus should be able to recommend/disrecommend given transport types and versions, so that if there were a nasty bug discovered in one, we could tell folks not to use it. If/when we finally get good automated updates working, this ought to be less necessary. I don't have a good sense for whether or not it's something we want to design and build.
I fixed up the other issues you mentioned in the git repo as of commit eda9c36984 . Thanks!