anti-censorship-team August 2019

anti-censorship-team@lists.torproject.org

5 participants
6 discussions

Next steps for unclassifiable protocols
by Roger Dingledine 04 Oct '20

04 Oct '20

Hi folks, Here are some brainstorming thoughts to get us started improving our "unclassifiable protocols" idea. ---------------------------------------------------------------------- For context, right now we have essentially one deployed design in this space, which is obfs4: https://gitweb.torproject.org/pluggable-transports/obfs4.git/tree/doc/obfs4… This protocol does a short handshake (one client message, one server message) to establish session keys, then it just sends bytes back and forth with an (additional) layer of encryption so no payload bytes assist an attacker with doing content-based protocol classification. ---------------------------------------------------------------------- One of the more thorough analyses of obfs4 comes from the FTE folks: https://www.freehaven.net/anonbib/#ccs2015obfsdetection where they show a classifier that assesses entropy of the first server-side packet, plus looking at the length of that initial message, and they can consistently pick out obfs4 flows with only a 0.25% false positive rate compared to background traffic they recorded at their university. That false positive rate is where it gets interesting: it sounds good until we realize that it means it flags 1 in every 400 flows as being obfs4, i.e. it falsely tags about 36000 flows in the 14M flow dataset. And the false positives aren't randomly distributed either: 1.22% of their SSL flows are falsely tagged as obfs4, and it's the ones where the SNI in the SSL handshake has a lot of entropy (because everything else in the SSL handshake is already random). Also, entire protocols like RTMP get caught up and tagged as obfs4, because their handshake starts out with a block of random bytes. A key takeaway is that if they ran this classifier, with its high true positive rate and low false positive rate, on a large country's network traffic, approximately none of the flows that it would tag as obfs4 would actually be obfs4. Their Section 7 in that paper has a good discussion of the base rate issues and how they apply here. Are there other papers that design or analyze classifiers for obfs4? ---------------------------------------------------------------------- The good news: It would seem that obfs4 is most effective in the "long tail of varied traffic" scenario. That is, considering the spectrum between "small corporate network" and "China backbone", obfs4 needs that more broad background traffic in order to make its false positives too painful to block. The bad news: I still worry about an attack that puts together many building blocks, each of which individually is like the "1% false positive rate" classifier in the above paper, but that together drive their false positive rate low enough that blocking is safe to do. One observation there is that the more complexity there is to a protocol, the harder it is to "really" look like it, i.e. to match it in all dimensions at once. Consider this lovingly handcrafted graph, where the X axis is how thoroughly we try to look like some expected protocol, and the Y axis is how close we can get to making the adversary unwilling to block us: 1_| |_1 | | |\ /| | \ / | | \ / | | \ / | 0_| \___________________________________________/ |_0 | | There's a huge valley in the middle, where we're trying to look like something, but we're not doing it well at all, so there is little damage from blocking us. The ramp on the right is the game that FTE tries to play, where in theory if they're not perfectly implementing their target protocol then the adversary can deploy a distinguisher and separate their traffic from the "real" protocol, but in practice there are enough broken and weird implementations of the protocol in the wild that even being "close enough" might cause the censor to hesitate. And the ramp on the left is our unclassifiable traffic game, where we avoid looking like any particular expected protocol, and instead rely on the long tail of legitimate traffic to include something that we blend with. Observation: *both* of these ramps are in the "broken in theory, but works in practice" situation. I started out thinking it was just the obfs4 one, and that the FTE one was somehow better grounded in theory. But neither side is actually trying to build the ideal perfect protocol, whatever that even is. The game in both cases is about false positives, which come from messy (i.e. hard to predict) real-world background traffic. One research question I wrestled with while making the graph: which ramp is steeper? That is, which of these approaches is more forgiving? Does one of them have a narrower margin for error than the other, where you really have to be right up against the asymptote or it doesn't work so well? For both approaches, their success depends on the variety of expected background traffic. The steepness of the right-hand (look-like-something) ramp also varies greatly based on the specific protocol it's trying to look like. At first glance we might think that the more complex the protocol, the better you're going to need to be at imitating it in all dimensions. That is, the more aspects of the protocol you need to get right, the more likely you'll slip up on one of them. But competing in the other direction is: the more complex the protocol, the more broken weird implementations there could be in practice. I raise the protocol complexity question here because I think it has something subtle and critical to do with the look-like-nothing strategy. Each dimension of the protocol represents another opportunity to deploy a classifier building block, where each classifier by itself is too risky to rely on, but the composition of these blocks produces the killer distinguisher. One of the features of the unclassifiable protocol that we need to maintain, as we explore variations of it, is the simplicity: it needs to be the case that the adversary can't string together enough building-block classifiers to reach high confidence. We need to force them into building classifiers for *every other protocol*, rather than letting them build a classifier for our protocol. (I'll also notice that I'm mushing together the notion of protocol complexity with other notions like implementation popularity and diversity: a complex proprietary protocol with only one implementation will be no fun to imitate, but the same level of complexity where every vendor implements their own version will be much more workable.) ---------------------------------------------------------------------- I've heard two main proposed ways in which we could improve obfs in theory -- and hopefully thus in practice too: (A) Aaron's idea of using the latest adversarial machine learning approaches to evolve a traffic transform that resists classifying. That is, play off the classifiers with our transform, in many different background traffic scenarios, such that we end up with a transform that resists classifying (low true positive and/or high false positive) in many of the scenarios. (B) Philipp's idea from scramblesuit of having the transform be parameterized, and for each bridge we choose and stick with a given set of parameters. That way we're not generating *flows* that each aim to blend in, but rather we're generating bridges that each aim to blend differently. This diversity should adapt well to many different background traffic scenarios because in every scenario some bridges might be identified but some bridges will stay under the radar. At first glance these two approaches look orthogonal, i.e. we can do both of them at once. For example, Aaron's approach tells us the universe of acceptable parameters, and Philipp's approach gives us diversity within that universe. Aaron: How do we choose the parameter space for our transforms to explore? How much of that can be automated, and how much needs to be handcrafted by subject-matter-experts? I see how deep learning can produce a magic black-box classifier, but I don't yet see how that approach can present us with a magic black-box transform. And as a last note, it would sure be nice to produce transforms that are robust relative to background traffic, i.e. to not be brittle or overfit to a particular scenario. Otherwise we're giving up one of our few advantages in the arms race, which is that right now we force the censor to characterize the traffic -- including expected future traffic! -- and assess whether it's safe to block us. There. Hopefully some of these ideas will cause you to have better ideas. :) --Roger

2 2

Summary of Point Break (Jansen et al. USENIX Security '19) as it pertains to bridges
by David Fifield 26 Aug '19

26 Aug '19

Rob Jansen, Tavish Vidya, and Micah Sherr have a paper about bandwidth-based DoS against Tor. Section 5 is about default bridges. https://www.usenix.org/conference/usenixsecurity19/presentation/jansen https://www.usenix.org/system/files/sec19-jansen.pdf#page=6 While elsewhere in the paper they discuss in-protocol attacks, in the context of bridges they limit themselves to attacks using third-party paid "stresser" DoS services, which you can rent for about $1/Gbps/hour (Section 3.1). They looked at the default bridges in Tor Browser 8.0.3 (October 2018). Only 12 of 25 default obfs4 bridges were working. (I think most of the non-working bridges have since been pruned, e.g. in #29378, #30264.) The median bandwidth of the 12 working bridges was 368 KB/s, with a large variance: minimum of 67 KB/s and maximum of 1190 KB/s. Besides the default bridges, they requested 135 bridges from BridgeDB, and found that only 70% (95/135) of them worked. (This has also been at least partially addressed by #30441.) BridgeDB bridges are faster than default bridges, with a median bandwidth of closer to 600 KB/s (Figure 1). They estimate that disabling the 12 default obfs4 bridges would require 30 stresser jobs, at a rate of $22/hour or $17K/month. If all users of default obfs4 bridges switched to BridgeDB bridges, the median bandwidth of BridgeDB bridges would slow down to under 100 KB/s (Figure 2). If even half of default obfs4 users switch to using meek, the cost to operate meek will at least double (Figure 3).

3 3

Turbo Tunnel: let's build a sequencing/reliability layer into our circumvention protocols
by David Fifield 16 Aug '19

16 Aug '19

I've just posted a manifesto on circumvention protocol design at the net4people BBS: https://github.com/net4people/bbs/issues/9 I'll include the text here as well. Code name: Turbo Tunnel Designing circumvention protocols for speed, flexibility, and robustness In working on circumvention protocols, I have repeatedly felt the need for a piece that is missing from our current designs. This document summarizes the problems I perceive, and how I propose to solve them. In short, I think that every circumvention transport should incorporate some kind of session/reliability protocol—even the ones built on reliable channels that seemingly don't need it. It solves all kinds of problems related to performance and robustness. By session/reliability protocol, I mean something that offers a reliable stream abstraction, with sequence numbers and retransmissions, like [QUIC](https://quicwg.org/) or [SCTP](https://tools.ietf.org/html/rfc4960#section-1.5.2). Instead of a raw unstructured data stream, the obfuscation layer will carry encoded datagrams that are provided by the session/reliability layer. When I say that circumvention transports should incorporate something like QUIC, for example, I don't mean that QUIC UDP packets are what we should send on the wire. No—I am not proposing a new *outer* layer, but an additional *inner* layer. We take the datagrams provided by the session/reliability layer, and encode them as appropriate for whatever obfuscation layer we happen to be using. So with meek, for example, instead of sending an unstructured blob of data in each HTTP request/response, we would send a handful of QUIC packets, encoded into the HTTP body. The receiving side would decode the packets and feed them into a local QUIC engine, which would reassemble them and output the original stream. A way to think about it is that the the sequencing/reliability layer is the "TCP" to the obfuscation layer's "IP". The obfuscation layer just needs to deliver chunks of data, on a best-effort basis, without getting blocked by a censor. The sequencing/reliability layer builds a reliable data stream atop that foundation. I believe this design can improve existing transports, as well as enable new transports that are now possible now, such as those built on unreliable channels. Here is a list of selected problems with existing or potential transports, and how a sequencing/reliability layer helps solve them: Problem: Censors can disrupt obfs4 by terminating long-lived TCP connections, as Iran did in 2013, killing connections after 60 seconds. This problem exists because the obfs4 session is coupled with the TCP connection. The obfs4 session begins and ends exactly when the TCP connection does. We need an additional layer of abstraction, a virtual session that exists independently of any particular TCP connection. That way, if a TCP connection is terminated, it doesn't destroy all the state of the obfs4 session—you can open another TCP connection and resume where you left off, without needing to re-bootstrap Tor or your VPN or whatever was using the channel. Problem: The performance of meek is limited because it is half-duplex: it never sends and receives at the same time. This is because, while the bytes in a single HTTP request arrive in order, the ordering of multiple simultaneous requests is not guaranteed. Therefore, the client sends a request, then waits for the server's response before sending another, resulting in a delay of an RTT between successive sends. The session/reliability layer provides sequence numbers and reordering. Both sides can send data whenever is convenient, or as needed for traffic shaping, and any unordered data will be put back in order at the other end. A client could even split its traffic over two or more CDNs, with different latency characteristics, and know that the server will buffer and reorder the encoded packets to correctly recover the data stream. Problem: A Snowflake client can only use one proxy at a time, and that proxy may be slow or unreliable. Finding a working proxy is slow because each non-working one must time out in succession before trying another one. The problem exists because even though each WebRTC DataChannel is reliable (DataChannel uses SCTP internally), there's no ordering between multiple simultaneous DataChannels on separate Snowflake proxies. Furthermore, if and when a proxy goes offline, we cannot tell whether the last piece of data we sent was received by the bridge or not—the SCTP ACK information is not accessible to us higher in the stack—so even if we reconnect to the bridge through another proxy, we don't know whether we need to retransmit the last piece of data or not. All we can do is tear down the entire session and start it up again from scratch. As in the obfs4 case, this problem is solved by having an independent virtual session that persists across transient WebRTC sessions. An added bonus is the opportunity to use more than one proxy at once, to increase bandwidth or as a hedge against one of them disappearing. Problem: [DNS over HTTPS](https://groups.google.com/d/msg/traffic-obf/ZQohlnIEWM4/09N7zsxjBgAJ) is an unreliable channel: it is reliable TCP up to the DoH server, but after that, recursive resolutions are plain old unreliable UDP. And as with meek, the ordering of simultaneous DNS-over-HTTPS requests is not guaranteed. Solved by retransmission in the session layer. There's no [DNS pluggable transport](https://trac.torproject.org/projects/tor/wiki/doc/DnsPluggableTr… yet, but I think some kind of retransmission layer will be a requirement for it. Existing DNS tunnel software uses various ad-hoc sequencing/retransmission protocols. I think that a proper user-space reliability layer is the "right" way to do it. Problem: Shadowsocks opens a separate encrypted TCP connection for every connection made to the proxy. If a web page loads resources from 5 third parties, then the Shadowsocks client makes 5 parallel connections to the proxy. This problem is really about multiplexing, not session/reliability, but several candidate session/reliability protocols additionally offer multiplexing, for example streams in QUIC, streams in SCTP, or smux for KCP. Tor does not have this problem, because Tor already is a multiplexing protocol, with multiple virtual circuits and streams in one TCP/TLS connection. But every system could benefit from adding multiplexing at some level. Shadowsocks, for example, could open up one long-lived connection, and each new connection to the proxy would only open up a new stream inside the long-lived connection. And if the long-lived connection were killed, all the stream state would still exist at both endpoints and could be resumed on a new connection. As an illustration of what I'm proposing, here's the protocol layering of meek (which [sends chunks of the Tor TLS stream](https://trac.torproject.org/projects/tor/wiki/doc/AChildsGardenOfPl… inside HTTP bodies), and where the new session/reliability layer would be inserted. Tor can remain oblivious to what's happening: just as before it didn't "know" that it was being carried over HTTP, it doesn't now need to know that it is being carried over QUIC-in-HTTP (for example). ``` [TLS] [HTTP] [session/reliability layer] ⇐ 🆕 [Tor] [application data] ``` I've done a little survey and identified some suitable candidate protocols that also seem to have good Go packages: * [QUIC](https://quicwg.org/) with [quic-go](https://github.com/lucas-clemente/quic-go) * KCP with [kcp-go](https://github.com/xtaci/kcp-go) * [SCTP](https://tools.ietf.org/html/rfc4960) with [pion/sctp](https://github.com/pion/sctp) I plan to evaluate at least these three candidates and develop some small proofs of concept. The overall goal of my proposal is to liberate the circumvention context from particular network connections and IP addresses. ### Related work The need for a session and sequencing layer has been felt—and dealt with—repeatedly in many different projects. It has not yet, I think, been treated systematically or recognized as a common need. Systems typically implement some form of TCP-like SEQ and ACK numbers. The ones that don't, are usually built on the assumption of one long-lived TCP connection, and therefore are really using the operating system's sequencing and reliability functions behind the scenes. Here are are few examples: * Code Talker Tunnel (a.k.a. SkypeMorph) uses [SEQ and ACK numbers](https://www.cypherpunks.ca/~iang/pubs/skypemorph-ccs.pdf#page=7) and mentions selective ACK as a possible extension. I think it uses the UDP 4-tuple to distinguish sessions, but I'm not sure. * OSS used [SEQ and ACK numbers](https://www.freehaven.net/anonbib/papers/pets2013/paper_29.pdf#pag… and a random ID to distinguish sessions. * I [wasted time](https://www.bamsoftware.com/papers/thesis/#p227) in the early development of meek grappling with sequencing, before [punting by strictly serializing requests](https://www.bamsoftware.com/papers/fronting/#sec:deploy-tor), sacrificing performance for simplicity. meek uses an [X-Session-Id](https://trac.torproject.org/projects/tor/wiki/doc/AChildsGard… HTTP header to distinguish sessions. * [DNS tunnels](https://trac.torproject.org/projects/tor/wiki/doc/DnsPluggableTran… all tend to do their own idiosyncratic thing. dnscat2, one of the better-thought-out ones, uses [explicit SEQ and ACK numbers](https://github.com/iagox86/dnscat2/blob/master/doc/protocol.md#seq…. My position is that SEQ/ACK schemes are subtle enough and independent enough that they should be treated as a separate layer, not as an underspecified and undertested component of some specific system. Psiphon can use [obfuscated QUIC](https://github.com/Psiphon-Labs/psiphon-tunnel-core/tree/52de11dbabae… as a transport. It's directly using QUIC UDP on the wire, except that each UDP datagram is [additionally obfuscated](https://github.com/Psiphon-Labs/psiphon-tunnel-core/blob/52de11… before being sent. You can view my proposal as an extension of this design: instead of always sending QUIC packets as single UDP datagrams, we allow them to be encoded/encapsulated into a variety of carriers. [MASQUE](https://davidschinazi.github.io/masque-drafts/draft-schinazi-masque.html#RFC8441) tunnels over HTTPS and can use QUIC, but is not really an example of the kind of design I'm talking about. It leverages the multiplexing provided by HTTP/2 (over TLS/TCP) or HTTP/3 (over QUIC/UDP). In HTTP/2 mode it does not introduce its own session or reliability layer (instead using that of the underlying TCP connection); and in HTTP/3 mode it directly exposes the QUIC packets on the network as UDP datagrams, instead of encapsulating them as an inner layer. That is, it's using QUIC as a carrier for HTTP, rather than HTTP as a carrier for QUIC. The main similarity I spot in the MASQUE draft is the envisioned [connection migration](https://davidschinazi.github.io/masque-drafts/draft-schinazi-masque.html#rfc.section.7.1) which frees the circumvention session from specific endpoint IP addresses. Mike Perry wrote a [detailed summary](https://lists.torproject.org/pipermail/tor-dev/2018-March/013026.h… of considerations for migrating Tor to use end-to-end QUIC between the client and the exit. What Mike describes is similar to what is proposed here—especially the subtlety regarding protocol layering. The idea is not to use QUIC hop-by-hop, replacing the TLS/TCP that is used today, but to *encapsulate* QUIC packets end-to-end, and use some other *unreliable* protocol to carry them hop-by-hop between relays. Tor would not be using QUIC as the network transport, but would use features of the QUIC protocol. ### Anticipated questions Q: Why don't VPNs like Wireguard have to worry about all this? A: Because they are implemented in kernel space, not user space, they are, in effect, using the operating system's own sequencing and reliability features. Wireguard just sees IP packets; it's the kernel's responsibility to notice, for example, when a TCP segment need to be retransmitted, and retransmit it. We should do in user space what kernel-space VPNs have been doing all along! Q: You're proposing, in some cases, to run a reliable protocol inside another reliable protocol (e.g. QUIC-in-obfs4-in-TCP). What about the reputed inefficiency TCP-in-TCP? A: Short answer: don't worry about it. I think it would be premature optimization to consider at this point. The fact that the need for a session/reliability layer has been felt for so long by so many systems indicates that we should start experimenting, at least. There's contradictory information online as to whether TCP-in-TCP is as bad as they say, and anyway there are all kinds of performance settings we may tweak if it turns out to be a problem. But again: let's avoid premature optimization and not allow imagined obstacles to prevent us from trying. Q: QUIC is not just a reliability protocol; it also has its own authentication and encryption based on TLS. Do we need that extra complexity if the underlying protocol (e.g. Tor) is already encrypted and authenticated independently? A: The transport and TLS parts of QUIC are specified separately (https://tools.ietf.org/html/draft-ietf-quic-transport and https://tools.ietf.org/html/draft-ietf-quic-tls), so in principle they are separable and we could just use the transport part without encryption or authentication, as if it were SCTP or some other plaintext protocol. In practice, quic-go [assumes](https://godoc.org/github.com/lucas-clemente/quic-go#Dial) right in the API that you'll be using TLS, so separating them may be more trouble than it's worth. Let's start with simple layering that is clearly correct, and only later start breaking abstraction for better performance if we need to. Q: What about traffic fingerprinting? If you simply obfuscate and send each QUIC packet as it is produced, you will leak traffic features through their size and timing, especially when you [consider retransmissions and ACKs](https://www-users.cs.umn.edu/~hoppernj/ccs13-cya.pdf#page=5). A: Don't do that, then. There's no essential reason why the traffic pattern of the obfuscation layer needs to match that of the sequencing/reliability layer. Part of the job of the obfuscation layer is to erase such traffic features, if doing so is a security requirement. That implies, at least, *not* simply sending each entire QUIC packet as soon as it is produced, but padding, splitting, and delaying as appropriate. An ideal traffic scheduler would be *independent* of the underlying stream—its implementation would [not even *know*](https://lists.torproject.org/pipermail/tor-dev/2017-June/012310.html) how much actual data is queued to send at any time. But it's likely we don't have to be quite that rigorous in implementation, at least at this point.

2 2

tor and umts
by Stefan Kuttler 13 Aug '19

13 Aug '19

https://stackoverflow.com/questions/57268523/care-for-privacy-encrypted-bro…

1 0

tor and umts
by Stefan Kuttler 13 Aug '19

13 Aug '19

https://stackoverflow.com/questions/57268523/care-for-privacy-encrypted-bro…

1 0

Best (simple) obfs4 deployment url?
by Roger Dingledine 06 Aug '19

06 Aug '19

Hi Philipp! I am finishing up my Defcon slides, and I realized I don't know where to point people for setting up obfs4 bridges. Google sends me to https://support.torproject.org/operators/operators-6/ but I bet that's not up-to-date (but maybe it should become up-to-date). On the tor-relays list I see https://trac.torproject.org/projects/tor/wiki/doc/PluggableTransports/obfs4… which is (a) a really long url, and (b) an intimidatingly long page. And I hear irl is working a 'tor-bridge' meta-deb: https://bugs.torproject.org/31153 So: assuming I want to tell a bunch of people to do something in a week and a half, using a url they can hastily scribble down from my slide, what should I tell them? :) It is ok if the url doesn't say the right thing today but we're confident it will say the right thing on the morning of Aug 8. Thanks, --Roger

2 3

2024

2023

2022

2021

2020

2019

anti-censorship-team August 2019