Tobias Pulls:
On 29/07/18 15:42, George Kadianakis wrote:
- They also told me of research by Tobias Pulls which eliminates the needs for histograms in WTF-PAD and instead it samples from the probability distribution directly. They think that this can simplify things somewhat. Any thoughts on this?
Yes this is actually exactly what I want to do with the next iteration of WTF-PAD! The question is what form/model to use for these probability distributions. Right now we're encoding inter-burst and inter-packet timings with some weird geometric distribution determining how long these bursts should go on for, when it might be more natural to encode and sample from length-based distributions/histograms.
(Histograms vs distribution is not the problem -- its what they encode and how they encode it that matters).
I don't see this paper on Tobias's website. Is it up anywhere yet?
Hmm. Looking at the README of wtfpad (see the APE section), I think this blog post is the best resource we have on this: https://www.cs.kau.se/pulls/hot/thebasketcase-ape/
Hi George and Mike,
You found the main writeup of the hasty work I did in this direction a while back, also some comments in the source [0]. Unfortunately my funding took me in other directions and I didn't want to publish any paper without spending more time on it. As written on the blog post it looks like a promising direction, but please also note that the attack implementation of Wa-kNN used has some rough edges for example when it comes to time-based features (so robustness of the naive distributions when moving around the PT server far from a given). If someone wants to collaborate on this I'd be more than happy to contribute, got funding to work on Tor-related things again starting August.
This is great! Sorry it took me so long to reply. I've been deep in it thinking about related traffic analysis issues with onion services.
I'm very much interested in this direction. This is the post, right: https://www.cs.kau.se/pulls/hot/thebasketcase-ape/
Did you handle deplenishing the distributions when normal traffic is transmitted? Counting traffic that fits the target distribution as "already sent padding" (and thus sending padding less overall traffic in that case) is a key piece of WTF-PAD that allows it to have better goodput. This is in fact why the original e2e defense was called "Adaptive Padding". Because its padding distributions adapt to observed traffic.
If we could alter the distribution in this same way, it may be the a good way to go. However, histograms tend to be easier to do this with, and they also encode distributions (just perhaps more tediously and verbosely).
One of the other things I want to try, that may overlap, is changing the type of information the distribution/histogram encodes. Inter-packet and inter-burst delay (encoded as two separate states in the state machines) is perhaps not as optimal or useful or easy to specify/optimize as something more naturally resembling web traffic, such as a distribution of request sizes and object sizes, and some way to simulate concurrent fetch (selection of overlap) of these object sizes, and subtract these objects-size instances from the distribution when we see them.
What do you think about that? Does that make sense?
Do you think we should try to do this as a parameterized distribution, or as a histogram?
Are you interested in attempting to implement both/either?
Ooh nice! This is done as a PT implementation.
You might like: https://github.com/mikeperry-tor/vanguards/blob/master/README_SECURITY.md
In it, I recommend obfs4 with iat-mode=2 because it does some limited traffic packet size and timing obfuscation. Should we consider recommending basket2 also? Is anyone running bridges with it? Probably not, I guess :/.