"Seeing through Network-Protocol Obfuscation"

List overview All Threads
Download

newer

older

request for advice regarding bug...

Re: [tor-dev] Number of directory...

Philipp Winter

19 Aug 2015 19 Aug '15

6:13 p.m.

https://kpdyer.com/publications/ccs2015-measurement.pdf

They claim that they are able to detect obfs3, obfs4, FTE, and meek using entropy analysis and machine learning.

I wonder if their dataset allows for such a conclusion. They use a (admittedly, large) set of flow traces gathered at a college campus. One of the traces is from 2010. The Internet was a different place back then. I would also expect college traces to be very different from country-level traces. For example, the latter should contain significantly more file sharing, and other traffic that is considered inappropriate in a college setting. Many countries also have popular web sites and applications that might be completely missing in their data sets.

Considering the rate difference between normal and obfuscated traffic, the false positive rate in the analysis is significant. Trained classifiers also seem to do badly when classifying traces they weren't trained for. The authors suggest active probing to reduce false positives, but don't mention that this doesn't work against obfs4 and meek.

Cheers, Philipp

Show replies by date

Yawning Angel

19 Aug 19 Aug

6:58 p.m.

NB: quickly responding before I go to bed.

On Wed, 19 Aug 2015 14:13:03 -0400 Philipp Winter phw@nymity.ch wrote:

...

https://kpdyer.com/publications/ccs2015-measurement.pdf

They claim that they are able to detect obfs3, obfs4, FTE, and meek using entropy analysis and machine learning.

Not surprised for obfs3/4 since they're mounting an entropy attack which is explicitly outside of the stated threat model for both protocols.

The FTE semantic attack they presented isn't the easiest one I know of (the GET request as defined by the regex is pathologically malformed).

Haven't looked at the meek portion of the paper.

...

I wonder if their dataset allows for such a conclusion. They use a (admittedly, large) set of flow traces gathered at a college campus. One of the traces is from 2010. The Internet was a different place back then. I would also expect college traces to be very different from country-level traces. For example, the latter should contain significantly more file sharing, and other traffic that is considered inappropriate in a college setting. Many countries also have popular web sites and applications that might be completely missing in their data sets.

Dunno. Others probably have a better idea on what average internet traffic looks like these days.

...

Considering the rate difference between normal and obfuscated traffic, the false positive rate in the analysis is significant. Trained classifiers also seem to do badly when classifying traces they weren't trained for. The authors suggest active probing to reduce false positives, but don't mention that this doesn't work against obfs4 and meek.

Coming up with something better than obfs4/meek would be nice. At this point I'm viewing obfs4 as more of a minimum standard than anything else.

It's worth noting that Dust2 (mostly done but not yet deployed) can reduce payload entropy to match a target distribution, but will have issues with protocol whitelist based DPI.

Regards,

-- Yawning Angel

Kevin P Dyer

22 Aug 22 Aug

12:51 a.m.

On Wed, Aug 19, 2015 at 11:58 AM, Yawning Angel yawning@schwanenlied.me wrote:

...

[snip]

The FTE semantic attack they presented isn't the easiest one I know of (the GET request as defined by the regex is pathologically malformed).

Very interesting! This is news to me. I'm assuming I did something silly. (Even though I tested it against bro, wireshark, etc.)

How is it pathologically malformed?

...

[snip]

-Kevin

Yawning Angel

7:43 a.m.

On Fri, 21 Aug 2015 17:51:20 -0700 Kevin P Dyer kpdyer@gmail.com wrote:

...

On Wed, Aug 19, 2015 at 11:58 AM, Yawning Angel yawning@schwanenlied.me wrote:

...
[snip]

The FTE semantic attack they presented isn't the easiest one I know of (the GET request as defined by the regex is pathologically malformed).

Very interesting! This is news to me. I'm assuming I did something silly. (Even though I tested it against bro, wireshark, etc.)

Huh. I brought it up in conversation with a few people and was under the impression it was passed on. I probably should have e-mailed you about it or something.

...

How is it pathologically malformed?

"manual-http-request": { "regex": "^GET\ \/([a-zA-Z0-9\.\/]*) HTTP/1\.1\r\n\r\n$" },

No "Host" header. All complaint requests MUST include one per RFC 2616, and all compliant servers MUST respond with a 400 if it is missing.

Since requests of that sort should invoke the error path on RFC compliant servers it's a really good distinguisher since legitimate clients will not do such a thing. Existing realistic adversaries already have "identify 'suspicious behavior', call back to confirm" style filtering in production, so false positive rate can be reduce to 0 if needed.

Regards,

-- Yawning Angel

Kevin P Dyer

9:40 p.m.

On Sat, Aug 22, 2015 at 12:43 AM, Yawning Angel yawning@schwanenlied.me wrote:

...

On Fri, 21 Aug 2015 17:51:20 -0700 Kevin P Dyer kpdyer@gmail.com wrote:

...
On Wed, Aug 19, 2015 at 11:58 AM, Yawning Angel yawning@schwanenlied.me wrote:

...
[snip]

The FTE semantic attack they presented isn't the easiest one I know of (the GET request as defined by the regex is pathologically malformed).

Very interesting! This is news to me. I'm assuming I did something silly. (Even though I tested it against bro, wireshark, etc.)

Huh. I brought it up in conversation with a few people and was under the impression it was passed on. I probably should have e-mailed you about it or something.

...
How is it pathologically malformed?

"manual-http-request": { "regex": "^GET\ \/([a-zA-Z0-9\.\/]*) HTTP/1\.1\r\n\r\n$" },

No "Host" header. All complaint requests MUST include one per RFC 2616, and all compliant servers MUST respond with a 400 if it is missing.

Ah, gotcha. It's not RFC compliant. RFC2616 was created in 1999 and there are tons of HTTP-like implementations since then that, ostensibly, don't need to follow it. (e.g., an HTTP-like client/server that only talk to each other.) A network monitor must deal with these cases too, and they'll broadcast HTTP/1.1 in their headers.

This [1] paper is a bit dated (2007) but my intuition is that real-world implementations have drifted even further from the RFC over the last 8 years. I swear there's a more recent paper on this topic, but I couldn't find it...

...

Since requests of that sort should invoke the error path on RFC compliant servers it's a really good distinguisher since legitimate clients will not do such a thing. Existing realistic adversaries already have "identify 'suspicious behavior', call back to confirm" style filtering in production, so false positive rate can be reduce to 0 if needed.

Based on our exploration of data, we found there's a wide range of implementations and most of which have non-RFC-compliant behaviors. See Section 4 of our paper for more details. For that reason I'd be very surprised if a host-header-check could result in a 0 FP rate.

With that being said, I'll add the host-header-check to the list of experiments that we want to do for the full version of our paper. Would be interesting to learn what the data tells us.

-Kevin

[1] https://www.ideals.illinois.edu/bitstream/handle/2142/11424/Non-compliant%20...

Yawning Angel

11:34 p.m.

On Sat, 22 Aug 2015 14:40:08 -0700 Kevin P Dyer kpdyer@gmail.com wrote:

...

Ah, gotcha. It's not RFC compliant. RFC2616 was created in 1999 and there are tons of HTTP-like implementations since then that, ostensibly, don't need to follow it. (e.g., an HTTP-like client/server that only talk to each other.) A network monitor must deal with these cases too, and they'll broadcast HTTP/1.1 in their headers.

This [1] paper is a bit dated (2007) but my intuition is that real-world implementations have drifted even further from the RFC over the last 8 years. I swear there's a more recent paper on this topic, but I couldn't find it...

I'd be surprised if there were lots of clients that advertise HTTP/1.1 that don't include a Host header, since clients that are broken in that manner will not be able to talk to apache/ngnix/tomcat/etc[0].

Then again, fteproxy is an example of such a thing, so I may be rather sad at the results of an actual survey.

...

...
Since requests of that sort should invoke the error path on RFC compliant servers it's a really good distinguisher since legitimate clients will not do such a thing. Existing realistic adversaries already have "identify 'suspicious behavior', call back to confirm" style filtering in production, so false positive rate can be reduce to 0 if needed.

Based on our exploration of data, we found there's a wide range of implementations and most of which have non-RFC-compliant behaviors. See Section 4 of our paper for more details. For that reason I'd be very surprised if a host-header-check could result in a 0 FP rate.

The point isn't to use non-compliance as the sole discriminator (since people do write broken code), but to cut down the candidate IP/Port list down to something that's reasonable for whatever active probing infrastructure that exists to manage.

From there, delta-T later with separate infrastructure attempt a full FTE + Tor handshake, and blacklist/RST inject/etc target candidates that succeed.

The second step gets to 0 FP, and precisely this sort of thing is how China currently handles obfs3. The delay (anecdotal) is about 10 mins.

Intuitively I think that "missing Host header" will be extremely rare but I don't have a way to get traces to prove/disprove it.

...

With that being said, I'll add the host-header-check to the list of experiments that we want to do for the full version of our paper. Would be interesting to learn what the data tells us.

I would be interested in seeing the results.

-- Yawning Angel [0]: Ngnix supports hooking the error handler rather easily, apache less so.

Kevin P Dyer

12:46 a.m.

Hey Philipp!

Thanks for the interest! I'm one of the authors on the paper. My response is inline.

On Wednesday, August 19, 2015, Philipp Winter phw@nymity.ch wrote:

...

https://kpdyer.com/publications/ccs2015-measurement.pdf

...

They claim that they are able to detect obfs3, obfs4, FTE, and meek

...

using entropy analysis and machine learning.

...

I wonder if their dataset allows for such a conclusion. They use a

...

(admittedly, large) set of flow traces gathered at a college campus.

...

One of the traces is from 2010. The Internet was a different place back

...

then.

Correct, we used datasets collected in 2010, 2012, and 2014, which total to

...

1TB of data and 14M TCP flows.

We could have, say, just used the 2014 dataset. However, we wanted to show that the choice of dataset matters and even with millions of traces, the collection date and network-sensor location can impact results.

...

I would also expect college traces to be very different from

...

country-level traces. For example, the latter should contain

...

significantly more file sharing, and other traffic that is considered

...

inappropriate in a college setting. Many countries also have popular

...

web sites and applications that might be completely missing in their

...

data sets.

That's probably accurate. I bet that even across different types of universities (e.g., technical vs. non-technical) one might see very different patterns. Certainly different countries (e.g., Iran vs. China) will see different patterns, too.

For that reason, we're going to release our code [1] prior to CCS. Liang Wang, a grad student at University of Wisconsin - Madison, lead a substantial engineering effort to make this possible. We undersold it in the paper, but it makes it easy to re-run all these experiments on new datasets. We'd *love* it if others could rerun the experiments against new datasets and report their results.

...

Considering the rate difference between normal and obfuscated traffic,

...

the false positive rate in the analysis is significant. Trained

...

classifiers also seem to do badly when classifying traces they weren't

...

trained for.

We definitely encountered this. If you train on one dataset and test on a different one, then accuracy plummeted.

I think that raises a really interesting research question: what does it mean for two datasets to be different? For this type of classification problem, what level of granularity/frequency would a network operator train at to achieve optimal accuracy and low false positives? (e.g., do you need a classifier per country? state? city? neighborhood?) Also, how often does one need to retrain? daily? weekly?

I guess all we showed is that datasets collected from sensors at different network locations (and years apart) are different enough to impact classifier accuracy. Probably not surprising...

...

The authors suggest active probing to reduce false

...

positives, but don't mention that this doesn't work against obfs4 and

...

meek.

I don't want to get too off track here, but do obfs4 and meek really resist against active probing from motivated countries? Don't we still have the unsolved bridge/key distribution problem?

Finally, we’ll be working on a full version of this paper with additional results. If anyone is interested in reviewing and providing feedback, we’d love to hear it. (Philipp - do you mind if I reach out to you directly?)

-Kevin

[1] https://github.com/liangw89/obfs-detection

Yawning Angel

8:37 a.m.

On Fri, 21 Aug 2015 17:46:39 -0700 Kevin P Dyer kpdyer@gmail.com wrote:

...

...
The authors suggest active probing to reduce false

...
positives, but don't mention that this doesn't work against obfs4 and

...
meek.

I don't want to get too off track here, but do obfs4 and meek really resist against active probing from motivated countries? Don't we still have the unsolved bridge/key distribution problem?

meek does because the entry point into the Tor network is a well known high traffic CDN platform. So an adversary can see that there is a meek instance running on a given CDN (since it's not a secret), along with content that people want to see, so distinguishing between normal traffic/meek traffic requires a TLS break or statistical attacks.

I personally hold distribution to be orthogonal to circumvention protocol design in the context of obfs4 (scramblesuit, fte, and other bridge based circumvention protocols), because if someone breaks the bridge distribution mechanism, the protocol is irrelevant since the attackers win by virtue of having the IP address/Port of the obfuscated server[0].

Assuming all the adversary sees is the obfs4/scramblesuit stream, then both are active probing resistant, because it requires compromising a separate system to be able to generate a valid handshake for probing.

Active probing attacks should be able to defeat a scenario like:

"I setup a unlisted bridge, firewall off the ORPort and GPG e-mail/OTR/Pond a series of bridge lines to a collaborator in China."

The adversary gets to see the IP address/Port of the obfuscated server, and can send traffic as they see fit.

Note: There's a few other things an adversary can do under the assumption that whatever is speaking obfs4 is probably a Tor client/bridge pair. But those are attacks against either the Tor network, or limitations of the tor implementation itself[1].

Regards,

-- Yawning Angel [0]: Distribution still is an important problem that needs to be solved, and maybe linking it closer to the protocol design is something that is required. Open research questions are open. [1]: Likewise this stuff is important and should be/will be fixed, but they are Tor issues and not "obfs4/fte/whatever" issues.

3365

Age (days ago)

3368

Last active (days ago)

tor-dev@lists.torproject.org

7 comments

3 participants

tags (0)

participants (3)

Kevin P Dyer
Philipp Winter
Yawning Angel