Can a developer please explain to me why something like the following obfuscation of 'torified traffic' is exploitable?
Suppose a scenario where a collective of authorities is able to observe large parts of the www. Then observing traffic correlation can unreveal a connection through the network.
But why can't we just alter the pattern inside the network, such that there is no correlation between 'incomming' and 'outgoing' data anymore?
Suppose I'm connected to a server and there is a lot of traffic between from the server to me. Through the TOR network, of course. Data is encrypted and the exploit measures the raw data stream pattern.
But why not change the data stream inside the network? Suppose the server A is outside of the TOR network, i.e. it is not a hidden service. The data stream into our network then is out of our controll. Encrypted or not, we can't change it. Now it flows to node B (exit node) and inside TOR. Then node B streams the data to node C, than node C streams the data to node D and node D exits the stream to me. (simplification)
Ok now node B got the data from the 'outside world'. Then B and C first make a handshake to define a shared key for a private encryption protocoll only valid for some time.
Now node B does not stream the data to node C, but obfuscates it. That means if there are n packages it transforms them into m packages in some unpredictable way and each new packages gets a small amount of additional random-data. (The point is that the new stream will not look at all like the old one)
Only node B nows the way to de-obfuscate this. But B and C did a handshake and using this encryption B shares with C how to de-obfuscate the data.
Now C recovers the real data and then does another secret handshake with D. (separated from the shared secret with B of course) Then C obfuscates the data again and only D will know how to recover the original data.
This repeats until I recive an obfuscatet stream and my client can recover the original data.
======
The point is, that patterns of the in-stream of the server A aren't correlated to what streams from D to me anymore. Hence an observer isn't able to see correlations anymore. Number, size and pattern of the stream packages are different.
Above this one could try to add randomly zero-information streams between the network and its clients. That way an observer can't even be sure if a client recieves information at al...
======
Now can someone please downtalk this approach?
best /jo
On 4 September 2013 20:09, josef.winger@email.de wrote:
Now node B does not stream the data to node C, but obfuscates it. That means if there are n packages it transforms them into m packages in some unpredictable way and each new packages gets a small amount of additional random-data. (The point is that the new stream will not look at all like the old one)
Only node B nows the way to de-obfuscate this. But B and C did a handshake and using this encryption B shares with C how to de-obfuscate the data.
Node A sends 40KB of data to Node B, in some particular distribution. Node B sends 60KB of data (a 50% increase!) in a new distribution to Node C. Node C sends 40 KB of traffic to whereever.
An adversary watching Node B knows that it is passing the data from A to C. It's obvious. Now, it's _less_ obvious when Node B is receiving two streams of data, 40KB from Node A and 50KB from Node X, and sending two streams of 60KB to Nodes Y and Z (which stream went where?) - but that only holds up for really small streams. For longer lived streams in a low latency network where the packet sizes and frequency of the Node A->B and X->B streams diverge, the B->Y and B->Z streams will likewise diverge, and it's then easy to correlate them again.
-tom
On 2013-09-04, at 8:09 PM, josef.winger@email.de wrote:
Can a developer please explain to me why something like the following obfuscation of 'torified traffic' is exploitable?
Suppose a scenario where a collective of authorities is able to observe large parts of the www. Then observing traffic correlation can unreveal a connection through the network.
But why can't we just alter the pattern inside the network, such that there is no correlation between 'incomming' and 'outgoing' data anymore?
Regardless of what goes on inside the network, the traffic must be in- order at the points of entrance and exit to the network (a property of TCP). Those are the points of interest to an observer doing traffic correlation.
Compounding that problem is the low latency of the network: the relative timing within any given stream is preserved.
The first problem might be mitigated with packet padding; the second problem might be mitigated with random packet delays. My understanding is that these two approaches are being studied at the moment.
Modifying the behaviour of traffic within the network does not help.
It has also been suggested that cover traffic is a solution, based on a Bayesian argument with (IMHO) incorrect assumptions. I think it will be proven wrong as attacks get better.