Projects to combat/defeat data correlation

List overview All Threads
Download

newer

older

[Question] Onion router's bandwidth

Estimating Traffic Correlation...

Jim Rucker

16 Jan 2014 16 Jan '14

3:16 a.m.

There was a story in the news recently of a Harvard student who used Tor to send a bomb threat to Harvard in order to cancel classes so he wouldn't have to take a test. He was apprehended within a day, which puts into question the anonymity of Tor.

...

From my understanding (please correct me if I'm wrong) Tor has a weakness

in that if someone can monitor data going into the relays and going out of the exit nodes then they can defeat the anonymity of tor by correlating the size and number of packets being sent to relays and comparing those that the packets leaving the exit nodes.

Are there any projects in Tor being worked in to combat data correlation? For instance, relays the send/recv constant data rates continuously - capping data rates and padding partial or non-packets with random data to maintain the data rates

Attachments:

attachment.html (text/html — 911 bytes)

Show replies by date

Moritz Bartl

16 Jan 16 Jan

4:02 a.m.

On 01/16/2014 04:16 AM, Jim Rucker wrote:

...

There was a story in the news recently of a Harvard student who used Tor to send a bomb threat to Harvard in order to cancel classes so he wouldn't have to take a test. He was apprehended within a day, which puts into question the anonymity of Tor.

The way I understand it is that they did not exploit a weakness in any system, they just (more or less) performed regular police work.

See https://www.schneier.com/crypto-gram-1401.html#3

...

From my understanding (please correct me if I'm wrong) Tor has a weakness in that if someone can monitor data going into the relays and going out of the exit nodes then they can defeat the anonymity of tor by correlating the size and number of packets being sent to relays and comparing those that the packets leaving the exit nodes.

It is not that simple, but in principle you are correct. A good paper to read about this is http://freehaven.net/anonbib/#ccs2013-usersrouted

See anonbib also for mitigations that were suggested and investigated over time (which are not that easy either).

-- Moritz Bartl https://www.torservers.net/

Andreas Krey

8:40 a.m.

On Wed, 15 Jan 2014 21:16:20 +0000, Jim Rucker wrote:

...

There was a story in the news recently of a Harvard student who used Tor to send a bomb threat to Harvard in order to cancel classes so he wouldn't have to take a test. He was apprehended within a day, which puts into question the anonymity of Tor.

This was because it was known that the threat was delivered via tor, and that he was the only one in $(organizational unit of harvard) using tor at that time, and he confessed when being confronted with that. There was nothing that actually proved that he did the threat. (Unless this is a case of parallel construction, of course, which I don't assume.)

...

Are there any projects in Tor being worked in to combat data correlation? For instance, relays the send/recv constant data rates continuously - capping data rates and padding partial or non-packets with random data to maintain the data rates

At the moment that would be prohibitively expensive. Also, it wouldn't guard against the scenario above - you can't be online and shoveling data all the time, so longterm correlation is still possible.

Andreas

-- "Totally trivial. Famous last words." From: Linus Torvalds <torvalds@*.org> Date: Fri, 22 Jan 2010 07:29:21 -0800

Matthew Finkel

12:54 p.m.

On Wed, Jan 15, 2014 at 09:16:20PM -0600, Jim Rucker wrote:

...

Are there any projects in Tor being worked in to combat data correlation? For instance, relays the send/recv constant data rates continuously - capping data rates and padding partial or non-packets with random data to maintain the data rates

The very quick answer without providing much detail is that you may want to look at scramblesuit [0][1]. It doesn't try to provide constant throughput, but (as the website says) "we alter inter-arrival times and the transported protocol's packet length distribution". This isn't a perfect solution, and won't impress a GPA, but it's a start if you're dealing with a localized passive observer.

- Matt

[0] http://www.cs.kau.se/philwint/scramblesuit/ [1] https://gitweb.torproject.org/user/phw/scramblesuit.git

David Stainton

6:12 p.m.

In that case would it then look like zero in $(organizational unit of harvard) using tor and one in $(organizational unit of harvard) using scramble suit?

I like the idea of the tor pluggable transport combiner... wherein we could wrap a pseudo-random appearing obfuscation protocol (such as obfs3, scramblesuit etc) in a white listed obfuscation protocol such as http?, sshrproxy, hexchat etc.

I imagine the anonymity set would be much smaller for these combined transports... fewer people using them.

On Thu, Jan 16, 2014 at 12:54 PM, Matthew Finkel matthew.finkel@gmail.com wrote:

...

On Wed, Jan 15, 2014 at 09:16:20PM -0600, Jim Rucker wrote:

...
Are there any projects in Tor being worked in to combat data correlation? For instance, relays the send/recv constant data rates continuously - capping data rates and padding partial or non-packets with random data to maintain the data rates

The very quick answer without providing much detail is that you may want to look at scramblesuit [0][1]. It doesn't try to provide constant throughput, but (as the website says) "we alter inter-arrival times and the transported protocol's packet length distribution". This isn't a perfect solution, and won't impress a GPA, but it's a start if you're dealing with a localized passive observer.

Matt

[0] http://www.cs.kau.se/philwint/scramblesuit/ [1] https://gitweb.torproject.org/user/phw/scramblesuit.git _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

Matthew Finkel

18 Jan 18 Jan

1:40 a.m.

On Thu, Jan 16, 2014 at 06:12:47PM +0000, David Stainton wrote:

...

In that case would it then look like zero in $(organizational unit of harvard) using tor and one in $(organizational unit of harvard) using scramble suit?

I like the idea of the tor pluggable transport combiner... wherein we could wrap a pseudo-random appearing obfuscation protocol (such as obfs3, scramblesuit etc) in a white listed obfuscation protocol such as http?, sshrproxy, hexchat etc.

I imagine the anonymity set would be much smaller for these combined transports... fewer people using them.

On Thu, Jan 16, 2014 at 12:54 PM, Matthew Finkel matthew.finkel@gmail.com wrote:

...
On Wed, Jan 15, 2014 at 09:16:20PM -0600, Jim Rucker wrote:

...
Are there any projects in Tor being worked in to combat data correlation? For instance, relays the send/recv constant data rates continuously - capping data rates and padding partial or non-packets with random data to maintain the data rates

The very quick answer without providing much detail is that you may want to look at scramblesuit [0][1]. It doesn't try to provide constant throughput, but (as the website says) "we alter inter-arrival times and the transported protocol's packet length distribution". This isn't a perfect solution, and won't impress a GPA, but it's a start if you're dealing with a localized passive observer.

Matt

[0] http://www.cs.kau.se/philwint/scramblesuit/ [1] https://gitweb.torproject.org/user/phw/scramblesuit.git

yes? Technically this is true that if a user uses scramblesuit then they will be very unique right now. However, the computational resources required to associate a byte stream of pseudorandom bits with scramblesuit is likely much larger than a university devotes to traffic analysis on their network. This can likely also be generalized to larger areas.

obfs3 is supposed to be fairly difficult to detect because entropy estimation is seemingly more difficult than typically assumed, and thus far from what has been seen in practice this seems to be true.

Also, if you are interested in combining PTs and want to contribute check out ticket #10061 [2]. Successfully implementing a PT that looks like a "white listed protocal" is also a fairly difficult task, but if you can help then that would really be great.

[2] https://trac.torproject.org/projects/tor/ticket/10061 [3] https://trac.torproject.org/projects/tor/wiki/doc/PluggableTransports#Combin...

- Matt

Ian Goldberg

20 Jan 20 Jan

1:30 p.m.

On Sat, Jan 18, 2014 at 01:40:43AM +0000, Matthew Finkel wrote:

...

obfs3 is supposed to be fairly difficult to detect because entropy estimation is seemingly more difficult than typically assumed, and thus far from what has been seen in practice this seems to be true.

Wouldn't the way to detect obfs3 be to look at packet sizes, not contents? obfs3 doesn't hide those at all, right?

- Ian

Philipp Winter

4:21 p.m.

On Mon, Jan 20, 2014 at 08:30:12AM -0500, Ian Goldberg wrote:

...

On Sat, Jan 18, 2014 at 01:40:43AM +0000, Matthew Finkel wrote:

...
obfs3 is supposed to be fairly difficult to detect because entropy estimation is seemingly more difficult than typically assumed, and thus far from what has been seen in practice this seems to be true.

Wouldn't the way to detect obfs3 be to look at packet sizes, not contents? obfs3 doesn't hide those at all, right?

Yes, obfs3 doesn't hide packet sizes. As a result, Tor over obfs3 results in packets which are multiples of Tor's 512-byte cells (excluding TLS headers).

Cheers, Philipp

Matthew Finkel

22 Jan 22 Jan

2:17 a.m.

On Mon, Jan 20, 2014 at 05:21:26PM +0100, Philipp Winter wrote:

...

On Mon, Jan 20, 2014 at 08:30:12AM -0500, Ian Goldberg wrote:

...
On Sat, Jan 18, 2014 at 01:40:43AM +0000, Matthew Finkel wrote:

...
obfs3 is supposed to be fairly difficult to detect because entropy estimation is seemingly more difficult than typically assumed, and thus far from what has been seen in practice this seems to be true.

Wouldn't the way to detect obfs3 be to look at packet sizes, not contents? obfs3 doesn't hide those at all, right?

Yes, obfs3 doesn't hide packet sizes. As a result, Tor over obfs3 results in packets which are multiples of Tor's 512-byte cells (excluding TLS headers).

True. I also assume that the complete absense of a plaintext header is a potential fingerprint, as well. In no way did I intend to suggest that obf3 is completely undetectable by DPI, but based on what I know, it is the most successful PT that Tor provides. There is always room for improvement, such as what scramblesuit accomplishes, but the main point I wanted to make was that look-like-nothing transports seem to work.

Matthew Finkel

2:28 a.m.

On Wed, Jan 22, 2014 at 02:17:34AM +0000, Matthew Finkel wrote:

...

On Mon, Jan 20, 2014 at 05:21:26PM +0100, Philipp Winter wrote:

...
On Mon, Jan 20, 2014 at 08:30:12AM -0500, Ian Goldberg wrote:

...
On Sat, Jan 18, 2014 at 01:40:43AM +0000, Matthew Finkel wrote:

...
obfs3 is supposed to be fairly difficult to detect because entropy estimation is seemingly more difficult than typically assumed, and thus far from what has been seen in practice this seems to be true.

Wouldn't the way to detect obfs3 be to look at packet sizes, not contents? obfs3 doesn't hide those at all, right?

Yes, obfs3 doesn't hide packet sizes. As a result, Tor over obfs3 results in packets which are multiples of Tor's 512-byte cells (excluding TLS headers).

True. I also assume that the complete absense of a plaintext header is a potential fingerprint, as well.

Sorry, that should have said handshake instead of header.

Philipp Winter

20 Jan 20 Jan

4:30 p.m.

On Sat, Jan 18, 2014 at 01:40:43AM +0000, Matthew Finkel wrote:

...

obfs3 is supposed to be fairly difficult to detect because entropy estimation is seemingly more difficult than typically assumed, and thus far from what has been seen in practice this seems to be true.

There's a recent paper which covers that topic [1]. While entropy estimation is certainly more expensive than, say, counting packet sizes, it's probably not out of reach for well-equipped boxes.

[1] http://cs.unc.edu/~amw/resources/opaque.pdf

Cheers, Philipp

Roger Dingledine

5:28 p.m.

On Mon, Jan 20, 2014 at 05:30:27PM +0100, Philipp Winter wrote:

...

On Sat, Jan 18, 2014 at 01:40:43AM +0000, Matthew Finkel wrote:

...
obfs3 is supposed to be fairly difficult to detect because entropy estimation is seemingly more difficult than typically assumed, and thus far from what has been seen in practice this seems to be true.

There's a recent paper which covers that topic [1]. While entropy estimation is certainly more expensive than, say, counting packet sizes, it's probably not out of reach for well-equipped boxes.

I think (we should expect that) entropy detection is one of the standard tools in the DPI toolkit.

obfs3 isn't meant to be secure "because nobody can tell there's a lot of entropy". It's meant to drive up the risk of false positives -- if you cut all flows that have a lot of entropy, what else do you cut besides obfs2 and obfs3? And even if you're convinced it's a worthwhile risk now, are you convinced the background traffic won't change in the future?

That's why pairing obfs3 with something that modifies packet volume (and maybe timing) is important -- otherwise more complex DPI rulesets can look not just for entropy, but also for underlying hints that it's the Tor protocol underneath, and then reduce their false positive rates.

It also explains why the most effective attacks against obfs2 and obfs3 involve detecting that it "might" be obfs traffic, and then doing some follow-up checking to get more confidence.

--Roger

Kevin P Dyer

16 Jan 16 Jan

7:29 p.m.

On Wed, Jan 15, 2014 at 7:16 PM, Jim Rucker mrjimorg@gmail.com wrote:

...

[snip]

From my understanding (please correct me if I'm wrong) Tor has a weakness in that if someone can monitor data going into the relays and going out of the exit nodes then they can defeat the anonymity of tor by correlating the size and number of packets being sent to relays and comparing those that the packets leaving the exit nodes.

Are there any projects in Tor being worked in to combat data correlation? For instance, relays the send/recv constant data rates continuously - capping data rates and padding partial or non-packets with random data to maintain the data rates

What you are referring to is a traffic confirmation attack. It's a deceptively hard problem --- even if the naive strategy of sending data at a constant rate "worked" (for some definition) it would be prohibitively expense in practice. It is also worth reiterating that even if such a countermeasure is in place, it wouldn't conceal that fact that a specific user is connecting to the Tor network.

If you are interested in recent academic works on traffic analysis, you should have a look at [1] and [2]. They explore the related setting of website fingerprinting attacks and defenses (including the one you suggest.)

-Kevin

[1] https://kpdyer.com/publications/oakland2012-peekaboo.pdf [2] http://cacr.uwaterloo.ca/techreports/2013/cacr2013-30.pdf

3942

Age (days ago)

3948

Last active (days ago)

tor-dev@lists.torproject.org

12 comments

9 participants

tags (0)

participants (9)

Andreas Krey
David Stainton
Ian Goldberg
Jim Rucker
Kevin P Dyer
Matthew Finkel
Moritz Bartl
Philipp Winter
Roger Dingledine