Check out these pictures of relays in the consensus during November and December. (Warning, the linked images are huge, just look at the downscaled attachments if you want.)
November https://people.torproject.org/~dcf/graphs/microdescs/microdescs-2014-11.png (720×13242 pixels)
December (spot the anomaly) https://people.torproject.org/~dcf/graphs/microdescs/microdescs-2014-12.png (744×18136 pixels)
Each row is one descriptor, and each column is one hour. The rows are sorted so that descriptors with similar uptimes are nearby. The conspicuous narrow vertical stripe in the lower right of the December graph is a few thousand descriptors entering the consensus all at once and then leaving it again.
If you want to dig into any patterns you see, the lines of these text files correspond 1:1 to the rows of the images. https://people.torproject.org/~dcf/graphs/microdescs/microdescs-2014-11.txt https://people.torproject.org/~dcf/graphs/microdescs/microdescs-2014-12.txt
Source code is in https://people.torproject.org/~dcf/graphs/microdescs/. wget https://collector.torproject.org/archive/relay-descriptors/microdescs/microd... tar xJf microdescs-2014-11.tar.xz tar xJf microdescs-2014-12.tar.xz ./microdescs --output microdescs-2014-11 microdescs-2014-11 ./microdescs --output microdescs-2014-12 microdescs-2014-12
David Fifield
Hello Guys,
it would be great if I could get a few opinions regarding my upcoming master thesis topic.
My supervisor is Andriy Panchenko (you may know some of his work from Mike Perry's critique on website fingerprinting attacks). As a defense, we'd like to experiment with traffic splitting (like conflux- split traffic over multiple entry guards, but already merging at the middle relay) and padding.
I know that the no. of entry guards got decreased from three to one. May it be worth the research or is the approach heading in a not so great direction w.r.t. the Tor Project's "only one entry node" decision? Or, actually, what do you think in general..?
Thanks,
Daniel
Daniel Forster:
Hello Guys,
it would be great if I could get a few opinions regarding my upcoming master thesis topic.
My supervisor is Andriy Panchenko (you may know some of his work from Mike Perry's critique on website fingerprinting attacks). As a defense, we'd like to experiment with traffic splitting (like conflux- split traffic over multiple entry guards, but already merging at the middle relay) and padding.
I know that the no. of entry guards got decreased from three to one. May it be worth the research or is the approach heading in a not so great direction w.r.t. the Tor Project's "only one entry node" decision? Or, actually, what do you think in general..?
I think regardless of our current entry guard choice (which is governed by the consensus and subject to relatively easy change, btw), having a datapoint on how traffic splitting affects Website Traffic Fingerprinting accuracy would be a very useful research contribution.
I am in general very concerned that basically one research paper caused us to make a decision to switch to a single, long-lived guard, and that a core assumption underlying this research paper is that traffic correlation is always perfect. Research that can at least shine some light on the actual tradeoff we made here seems generally useful and important.
One thing I would ask is that for this particular piece of research, you also investigate the accuracy of an adversary that gets to see both links, but externally (ie a censoring/surveilling national firewall). I suspect that for such adversaries, there won't be much benefit from just splitting by itself, but we may end up surprised by how splitting and padding interact together with conflux-style load balancing. There may be emergent effects there that further complicate the attack, even for a local observer such as this.
Hopefully you are also aware of our attempts to prototype padding defenses to the first hop using pluggable transports. See in particular: https://gitweb.torproject.org/user/mikeperry/torspec.git/tree/proposals/idea... https://bitbucket.org/mjuarezm/obfsproxy-wfpadtools/ and https://lists.torproject.org/pipermail/tor-dev/2014-December/007977.html
Perhaps you could save some implementation effort by laying these pluggable transports on top of a native tor splitting/conflux mechanism?
You may also be able to collaborate with Marc Juarez in other aspects of this research, too. There's a lot to study here, I think.
On Jan 7, 2015, at 9:13 PM, Mike Perry mikeperry@torproject.org wrote:
I think regardless of our current entry guard choice (which is governed by the consensus and subject to relatively easy change, btw), having a datapoint on how traffic splitting affects Website Traffic Fingerprinting accuracy would be a very useful research contribution.
I am in general very concerned that basically one research paper caused us to make a decision to switch to a single, long-lived guard, and that a core assumption underlying this research paper is that traffic correlation is always perfect. Research that can at least shine some light on the actual tradeoff we made here seems generally useful and important.
Okay, this is enough motivation. My concerns were small though, but I liked to ask beforehand..
One thing I would ask is that for this particular piece of research, you also investigate the accuracy of an adversary that gets to see both links, but externally (ie a censoring/surveilling national firewall). I suspect that for such adversaries, there won't be much benefit from just splitting by itself, but we may end up surprised by how splitting and padding interact together with conflux-style load balancing. There may be emergent effects there that further complicate the attack, even for a local observer such as this.
Yes, of course! We feel like this too, and we will evaluate the effect of the system when the adversary is able to see and merge the mutliple streams.
Hopefully you are also aware of our attempts to prototype padding defenses to the first hop using pluggable transports. See in particular: https://gitweb.torproject.org/user/mikeperry/torspec.git/tree/proposals/idea... https://bitbucket.org/mjuarezm/obfsproxy-wfpadtools/ and https://lists.torproject.org/pipermail/tor-dev/2014-December/007977.html
Perhaps you could save some implementation effort by laying these pluggable transports on top of a native tor splitting/conflux mechanism?
You may also be able to collaborate with Marc Juarez in other aspects of this research, too. There's a lot to study here, I think.
-- Mike Perry
Thanks for the pointers, I wasn't aware of all of them. And yes, the plan is to implement the splitting part and then use pluggable transports for the padding part.
Daniel
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 07/01/15 20:16, Daniel Forster wrote:
Hello Guys,
it would be great if I could get a few opinions regarding my upcoming master thesis topic.
My supervisor is Andriy Panchenko (you may know some of his work from Mike Perry's critique on website fingerprinting attacks). As a defense, we'd like to experiment with traffic splitting (like conflux- split traffic over multiple entry guards, but already merging at the middle relay) and padding.
I know that the no. of entry guards got decreased from three to one. May it be worth the research or is the approach heading in a not so great direction w.r.t. the Tor Project's "only one entry node" decision? Or, actually, what do you think in general..?
Thanks,
Daniel _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Hi Daniel,
I find it a very interesting idea to explore.
I feel that a smart use of padding in combination with splitting will be necessary in order to see improvements. The most immediate effect of splitting is to conceal packet lengths, but Tor fixed-length cells already make length not an interesting feature to exploit in WF attacks. Even if the cells are routed through different entry guards, ISP-like adversaries sitting between the user and the entry have the advantage of knowing the origin of the fragments. However, DLP strategies combined with Conflux-like splitting can be interesting. Also, routing through different entries seems to raise the bar for internal adversaries only controlling entry guards.
As Mike already mentioned, the framework we developed within the GSoC project allows to implement a wide range of padding strategies in the first hop, including chopping packets at arbitrary lengths (e.g., following a length distribution). But, as Mike pointed out, the framework is implemented as a PT and a Conflux-like strategy that reassembles fragments at the middle-node requires to be implemented in Tor itself.
I'm still working on the framework, currently refactoring and implementing new defenses. My goal now is to extend it to become an evaluation framework of WF defenses. So, I'm definitely interested in this topic. My research is closely related to WF, so I'm up for a collaboration on this as well as in other related problems.
Best, - -- marc
Hi Marc,
your plans for the wfpadtools framework sound really interesting. An evaluation framework of website fingerprinting defenses would be really useful! I would be happy to use it to evaluate the splitting/padding approach.
Like you and Mike said, I have to implement the splitting in Tor first but I will definitely come back to you when this first step is done.
Thanks,
Daniel
Hi Daniel,
I find it a very interesting idea to explore.
I feel that a smart use of padding in combination with splitting will be necessary in order to see improvements. The most immediate effect of splitting is to conceal packet lengths, but Tor fixed-length cells already make length not an interesting feature to exploit in WF attacks. Even if the cells are routed through different entry guards, ISP-like adversaries sitting between the user and the entry have the advantage of knowing the origin of the fragments. However, DLP strategies combined with Conflux-like splitting can be interesting. Also, routing through different entries seems to raise the bar for internal adversaries only controlling entry guards.
As Mike already mentioned, the framework we developed within the GSoC project allows to implement a wide range of padding strategies in the first hop, including chopping packets at arbitrary lengths (e.g., following a length distribution). But, as Mike pointed out, the framework is implemented as a PT and a Conflux-like strategy that reassembles fragments at the middle-node requires to be implemented in Tor itself.
I'm still working on the framework, currently refactoring and implementing new defenses. My goal now is to extend it to become an evaluation framework of WF defenses. So, I'm definitely interested in this topic. My research is closely related to WF, so I'm up for a collaboration on this as well as in other related problems.
Best,
marc
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Very nice.
It seems like the dark (tangent-like) line running through the middle of the uptimes (not the conspicuous vertical stripe you describe) is just a consequence of the sorting.
Are the vertical white lines periods where a consensus wasn’t reached?
All the dust (to right at the top, left at the bottom) is kind of interesting.
On Tuesday, January 6, 2015 at 11:36 AM, David Fifield wrote:
Check out these pictures of relays in the consensus during November and December. (Warning, the linked images are huge, just look at the downscaled attachments if you want.)
November https://people.torproject.org/~dcf/graphs/microdescs/microdescs-2014-11.png (720×13242 pixels)
December (spot the anomaly) https://people.torproject.org/~dcf/graphs/microdescs/microdescs-2014-12.png (744×18136 pixels)
Each row is one descriptor, and each column is one hour. The rows are sorted so that descriptors with similar uptimes are nearby. The conspicuous narrow vertical stripe in the lower right of the December graph is a few thousand descriptors entering the consensus all at once and then leaving it again.
If you want to dig into any patterns you see, the lines of these text files correspond 1:1 to the rows of the images. https://people.torproject.org/~dcf/graphs/microdescs/microdescs-2014-11.txt https://people.torproject.org/~dcf/graphs/microdescs/microdescs-2014-12.txt
Source code is in https://people.torproject.org/~dcf/graphs/microdescs/. wget https://collector.torproject.org/archive/relay-descriptors/microdescs/microd... tar xJf microdescs-2014-11.tar.xz tar xJf microdescs-2014-12.tar.xz ./microdescs --output microdescs-2014-11 microdescs-2014-11 ./microdescs --output microdescs-2014-12 microdescs-2014-12
David Fifield _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org (mailto:tor-dev@lists.torproject.org) https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Attachments:
microdescs-2014-11-short.jpg
microdescs-2014-12-short.jpg
On Thu, Jan 08, 2015 at 10:52:17AM -0800, Arlo Breault wrote:
It seems like the dark (tangent-like) line running through the middle of the uptimes (not the conspicuous vertical stripe you describe) is just a consequence of the sorting.
Yes, how it works is we first sort by center of mass, then by total uptime. The center of mass moves from left to right as you look from top to bottom.
It turns out that there are many descriptors that are only up for one or two hours out of an entire month. Those are what make the dark stripe.
The big surprise is how many descriptors there are. In November, there were 13,242 unique descriptors seen, more than double the steady-state total number of relays. Many relays appear for a short time and then disappear forever. The black band at the center consists of the actual stable relays. Above and below that is a lot of churn.
Are the vertical white lines periods where a consensus wasn’t reached?
Yes, there are some missing hours in the input files.
David Fifield