Was the Raccoon ACTUALLY RIGHT after ALL THESE YEARS: Proof of a SOLO CUTTING EDGE PARADIGM SHIFT, or an ALIEN INTERVENTION to EXPOSE ACADEMIC CONSPIRACY?
by Raccoon23
(You'll have to excuse my caps lock: I am going for a world record high score on the crackpot index[1,2]. Cypherpunk lore has a loooong history of high scores in this index, so I really do have to work very hard to get every point I reasonably can. However, I'm already at 226 points in the title, by my count. Plus I've got this[3] going for me, which is nice.)
But enough about that. Let's get to the meat[4]!
Recent advances in traffic analysis defenses have finally proved that my controversial but revolutionary theories[5,6] are valid, overturning decades of theories about traffic analysis attacks against anonymity networks!
Ok ok, so I might have made a mistake in the math[7]. But I'm just a Raccoon who reads discarded academic research papers in a dumpster. While I have been highly educated through my dumpster schooling, one can't expect raccoons to do math correctly. Such math is best left to others who can properly express my theory in terms of equations.
Others like Panchenko, Pulls, Danezis, Kadianakis, et al; and maybe Perry. (But probably not Perry.)
I have worked on this problem *alone* for many years, while I was in or around many highly respected mental institutions. I was even cautioned to take a break, by my therapeutic advisors. Probably because I was getting too close to the TRUTH!
However, as we shall see, this was not entirely a solo effort. This leaves only one possibility: ALIEN INTERVENTION[8].
But before we get to that, much has happened since my two posts on this topic, over a decade ago.
So let's recap:
My two posts were written in opposition to the orthodox view that anonymity networks would always be 100% broken by traffic correlation attacks, and so therefore side channels such as cryptographic tagging attacks were not worth including in Tor's threat model.
Paul Syverson, the famous self-appointed defender of the orthodoxy of onion routing, argued most vigorously against my points, carefully trying to correct me. But I would not be dissuaded! (I love you Paul! Just scoring crackpot points.)
Paul's clout with the "scientific establishment" is clearly very strong. Paul has had a distinguished career, being named in The Foreign Policy Top 100 Global Thinkers List in 2012[9], and named an ACM Fellow in 2014[10].
As evidence of his obvious involvement in the scientific conspiracy against me, consider that my original post was cut short "somehow" on the official Tor Project mailing list archive[11], compared to the original[5].
This is also not the first time a famous thinker has suppressed an idea that they secretly believed in. Elon Musk, renowned and respected former proponent of the Simulation Hypothesis[12], clearly knows that optimized versions of the Alcubierre Drive[13] are realizable using excess exotic matter accumulated during particle accelerator "downtime". Downtime, I might add, that is facilitated by a Tor Project volunteer administrator who goes by the codename 'weasel', who subsequently faked his own death to avoid implication[14]. These are the kinds of people we are dealing with here. They will stop at NOTHING to conceal the TRUTH!
Elon has since begun a cover-up of the Simulation Hypothesis[15], presumably because someone collected his bounty to help him escape The Matrix[16]. Despite all of this being so obviously exposed, Elon still pursues his wasteful and foolish rocketry! Merely because he likes lighting things on fire[17], digging tunnels[18], and playing with electricity[19]. Elon, the raccoons can relate -- we just wish you would come clean, and admit you did it all for the tunnel rave...
Anyway, I digress. Some time after my post, I would learn from my daily dumpster deliveries that Website Traffic Fingerprinting had became a fad in academic research circles. Website Traffic Fingerprinting is a form of traffic analysis that uses machine learning to recognize website access over Tor, by observing only the encrypted traffic patterns entering the Tor network.
For ethical and practical reasons, these attacks are performed in lab conditions on researcher generated web crawls, often using individual static web page access, rather than involving interactive user web browsing or other concurrent Tor activity common to real-world Tor usage.
Mike Perry, a suspiciously private employee of the Tor Project, critiqued the limitations of these lab conditions, and questioned the accuracy of many of these attacks in realistic settings[20]. Perry and Kadianakis also developed a framework for building traffic analysis defenses for Tor[21].
Despite this, or perhaps because of it, a long series of attacks and defenses ensued, and the advent of deep learning[22] caused many to believe that defenses against deep learning based Website Traffic Fingerprinting attacks were as hopeless as trying to defend against end-to-end correlation.
I watched from my dumpster as these hidebound reactionary responses to traffic analysis defenses continued for many years, seemingly trying to bait me out of hiding, to no avail[23]. (To be honest, I don't even know what hidebound means. I'm just a raccoon. But saying that gets me crackpot points. And it totally happened!)
However, I was eventually delighted to see that some recent advances by others had arrived in my dumpster, finally allowing my ideas to have vindication!
Recently, Panchenko et al found traffic splitting to be highly effective against state of the art Website Traffic Fingerprinting attacks based on deep learning[24].
Concurrently, Tobias Pulls used Perry and Kadianakis's padding machines in an optimization problem, using a Genetic Algorithm to evolve optimal padding machines against deep learning classifiers, for use in defense against Website Traffic Fingerprinting[25]. With this result, we have finally entered the age of the machines versus the machines. Raccoon math, while groundbreaking, is no longer necessary.
Both of these defenses were highly successful on their own.
However, with the combination of traffic splitting and cover traffic defenses, Tor will be on the CUTTING EDGE of making a PARADIGM SHIFT in its threat model, to tackle the hardest problem of all: END-TO-END TRAFFIC CORRELATION.
Allow me to present my case:
Einstien once posited that time was relative. While Einstien's theory of relativity was fundamentally misguided, time is relative. Relative to the problem of both end-to-end traffic correlation, and Website Traffic Fingerprinting.
Einstien, in his later years, even alluded to quantum observation correlation as "spooky action at a distance", setting precedent for the connection that I have known all along to be true. Einstien might not have been as insightful as me, but for all his fumbling, Einstien was right about one thing: correlation attacks *are* spooky. Metadata kills people[26]. (Sadly, that is not a joke...)
Unfortunately, Einstien's results were incomplete, and his groping to unify these problems was not properly understood by any, except yours truly. Deep learning was so successful at Website Traffic Fingerprinting that most researchers did not even bother to provide their deep learning classifiers with time-based features. They claimed their classifiers were accurate enough without considering time at all! We now know this to be false, thanks to the independent discoveries of new defenses by the teams of Panchenko and Pulls.
As it turns out, quantum mechanics is also fundamentally misguided. We now know that the universe appears to be made up of fragments of energy at a fundamental level[27]. In fact, after all these years, raccoons and their allies (alien or otherwise) seem to be almost as close to unifying physics as we are to addressing end-to-end correlation in Tor!
Indeed, once time is included as a feature, deep learning based Website Traffic Fingerprinting attacks will effectively be correlating the timing and traffic patterns of websites to their representations in its neural model. This model comparison is extremely similar to how end-to-end correlation compares the timing and traffic patterns of Tor entrance traffic to Tor exit traffic. In fact, deep learning classifiers have already shown success in correlating end-to-end traffic on Tor[28].
Some say that Long Term Statistical Disclosure (LTSD) attacks will still always win the end-to-end correlation game against anonymity networks, in the fullness of time[29].
However, LTSD attacks are only a theory. And much like quantum mechanics, relativity, and LSD, these attacks also warp one's perception of reality, time, and space. All of these theories are fundamentally misguided.
LTSD attacks predict that over time, correlation gradually leaks enough information to fully deanonymize users of anonymity networks. But also much like quantum mechanics, they fail to fully define the mechanism.
Consider this thought experiment (feel free to use whatever mind expanding devices you have at hand to assist you): LTSD assumes that an adversary has complete high resolution information of all traffic that enters and exits an anonymity network. Additionally, LTSD assumes that an adversary has identifiers available to properly track traffic streams on *each* side of the correlation, over the full duration of observation and long-term correlation.
Several real-world effects undermine these assumptions. Widespread deployment of HTTPS[30], the trend towards encrypted DNS and SNI, shared cloud infrastructure, and the practical infeasibility of full Internet-wide traffic record keeping, all reduce the ability of the adversary to track repeated connections over time. Additionally, defenses that multiplex traffic entering the Tor network with traffic splitting and cover traffic undermine the adversary's ability to fully determine traffic time and quantity information that pertains to specific connections.
All of this means that the LTSD adversary, much like Einstien's light-riding cowboy and Schrodinger's cat, remain an idealized approximation of reality.
Despite the results of the DeepCorr experiment[28], from this thought experiment, it is clear that correlation can be mitigated, perhaps even pushing long-term correlation attacks into time durations that allow for many practical use cases, even web browsing.
As we know from the historical record[31], aliens need anonymity too[32]. And when their hyper-dimensional cats join with all of the raccoons[33] and other creatures who are using Tor on a regular basis, the quantity of co-incident events (and unmatched pairs due to to incomplete observation) will rise high enough to cause LTSD to require larger and larger amounts of observation time, to perform effective correlation.
This by itself is a huge win. We can now say with certainty that The Raccoon Effect has thoroughly discromulated correlation attacks.
(Discromulation is my term to describe what this kind of defense does. Most interestingly, I am forced into winning this crackpot point. Because deep learning is an opaque machine generated attack, and because the GA-optimized defense is also machine generated, it is actually impossible to precisely describe the complete behaviors of either one, other than with the resulting model definitions themselves! Brave new world.)
Now, what about alien intervention? Well, assuming we do not consider the AI that participated in this work to be alien: if aliens did intervene, none would argue the discromulating conflugruity of The Raccoon Effect. Unfortunately however, I can neither confirm nor deny these allegations[34], at this time[35].
But that's not all! Since the circuit padding framework is implemented in Tor, this means that it is covered by Tor's bug bounty. While research papers that break padding defenses are not covered by the bounty (especially if those defenses are not actually deployed), there *is* in fact prize money for any flaws found in the framework that could lead to code execution, or deanonymization[36].
In conclusion:
Here I am, setting a world record high score on the crackpot index (despite many admirable high scores by previous cypherpunks), and I was RIGHT ALL ALONG. I deserve a Nobel Prize for this work. But I do not expect to get one. I expect the brownshirts and Nazis embedded in the scientific establishment to continue to work hard to suppress my TRUTH.
Even the crackpot index itself is involved in this conspiracy of scientific suppression. John Baez, the creator of the crackpot index, would surely confirm this, if he was not also part of the conspiracy against me! I have included him on CC, just in case he would like to recant his suppression of my original thinking, before his show trial.
Like Galileo, I will cede no ground in this Inquisition!
In fact, much like myself, even Newton struggled through a pandemic[37], and prevailed. We are more alike than you know. Newton tossed apples to many a raccoon in his garden. That's how we taught him to poorly approximate gravity (even though his misguided and simple calculus was barely up to the task).
In these tough times, it is very satisfying to finally have vindication. As the Base Rate Fallacy showed all those years ago: The Raccoon Effect is real. THEY are watching; but WE are Legion[38].
Unfortunately, the reproducibility crisis in science is also real. More than ten years later, it is *still* often hard to reproduce, confirm, and compare results across the various papers that end up in my dumpster. Venues that do not have artifact archival policies are, in fact, on the verge of becoming shams. I am glad that some venues, and some researchers, are recanting the old ways[39,40].
The paper is not the only product of good research!
P.S. You'll have to keep this all much more secret than I did. I don't want anyone to steal these ideas; the aliens might get upset. 2020 is not over!
P.P.S. This sentence gets me #4. (As a liar's paradox[41]). Some say this puts me in a superposition of both getting and not getting #4, and I inherently can't get those points in this way. But I say that should get me double points! (And thus earn more points for #5). Gotta collect 'em all.
P.P.P.S. At 1004 points on the crackpot index, I believe this post is now the highest scoring publication with a valid novel idea that has been written, to date[2].
P.P.P.P.S. Fucking bored as fuck during this fucking pandemic. Fuck![42]
1. https://math.ucr.edu/home/baez/crackpot.html 2. https://www.reddit.com/r/math/comments/4r05wh/has_anyone_with_a_high_crackpo... 3. https://en.m.wikipedia.org/wiki/Betteridge%27s_law_of_headlines 4. http://www.stinkymeat.net/ 5. https://archives.seul.org/or/dev/Mar-2012/msg00019.html - Raccoon23 Post1 6. https://archives.seul.org/or/dev/Sep-2008/msg00016.html - Raccoon23 Post2 7. https://conspicuouschatter.wordpress.com/2008/09/30/the-base-rate-fallacy-an... 8. https://fahrplan.events.ccc.de/congress/2006/Fahrplan/speakers/1242.en.html 9. https://awards.acm.org/award_winners/syverson_5067587 10. https://web.archive.org/web/20121130072122/http://www.foreignpolicy.com/arti... 11. https://lists.torproject.org/pipermail/tor-dev/2008-September/002493.html 12. https://en.wikipedia.org/wiki/Simulation_hypothesis#The_simulation_argument 13. https://en.wikipedia.org/wiki/Alcubierre_drive 14. https://www.bbc.com/news/world-europe-36173247 - Weasel takes down LHC 15. https://www.slashgear.com/elon-musk-has-banned-hot-tub-talks-about-simulated... 16. https://www.forbes.com/sites/janetwburns/2016/10/13/elon-musk-and-friends-ar... 17. https://www.youtube.com/watch?v=qLcma0YyzhY - Elon Musk Flame Thrower 18. https://www.yogonet.com/international/noticias/2020/12/07/55695-boring-compa... 19. https://www.inverse.com/innovation/tesla-electric-jet-3-4-years-away 20. https://blog.torproject.org/critique-website-traffic-fingerprinting-attacks 21. https://github.com/torproject/tor/blob/master/doc/HACKING/CircuitPaddingDeve... 22. https://arxiv.org/pdf/1801.02265.pdf - Deep Fingerprinting Tor 23. https://www.youtube.com/watch?v=TvjMr6DU7C8 - Raccoon call 24. https://www.comsys.rwth-aachen.de/fileadmin/papers/2020/2020-delacadena-traf... 25. https://arxiv.org/abs/2011.13471 - Pulls GA Defense 26. https://abcnews.go.com/blogs/headlines/2014/05/ex-nsa-chief-we-kill-people-b... 27. https://www.full-thesis.net/fragments-of-energy-not-waves-or-particles-may-b... 28. https://people.cs.umass.edu/~amir/papers/CCS18-DeepCorr.pdf 29. https://www.freehaven.net/anonbib/cache/statistical-disclosure.pdf 30. https://transparencyreport.google.com/https/overview?hl=en 31. https://en.wikipedia.org/wiki/Accelerando#Characters 32. https://fahrplan.events.ccc.de/congress/2006/Fahrplan/attachments/1167-Speak... 33. https://www.youtube.com/watch?v=jSRfIMjvtFk - Raccoons and cats <3 34. https://edition.cnn.com/2020/04/27/politics/pentagon-ufo-videos/index.html 35. https://www.nbcnews.com/news/weird-news/former-israeli-space-security-chief-... 36. https://hackerone.com/torproject 37. https://www.msn.com/en-ie/news/coronavirus/during-a-pandemic-isaac-newton-ha... 38. https://www.youtube.com/watch?v=Ofp26_oc4CA - Raccoons are Legion 39. https://www.usenix.org/conference/usenixsecurity21/artifact-evaluation-infor... 40. https://petsymposium.org/artifacts.php 41. https://en.wikipedia.org/wiki/Liar_paradox 42. https://www.youtube.com/watch?v=04_rIuVc_qM - WTF
For Karsten: https://cs5.livemaster.ru/storage/3a/1f/1449eb23f3c3b318ab4960815fn4--waterc...
On 12/22/20 7:58 PM, The23rd Raccoon wrote:> Recent advances in traffic analysis defenses have finally proved that my
controversial but revolutionary theories[5,6] are valid, overturning decades of theories about traffic analysis attacks against anonymity networks!
Ok ok, so I might have made a mistake in the math[7]. But I'm just a Raccoon who reads discarded academic research papers in a dumpster. While I have been highly educated through my dumpster schooling, one can't expect raccoons to do math correctly. Such math is best left to others who can properly express my theory in terms of equations.
I am glad you liked the papers!
Others like Panchenko, Pulls, Danezis, Kadianakis, et al; and maybe Perry. (But probably not Perry.)
Hey, I can MATH!
As Tor's Research Janitor, I confirm that your bulletin contains valid novel ideas, and they are very testable (see below). This is insanely great! I wish I thought of it!
In fact, unification of correlation and fingerprinting, along with the unification and combination of defenses, is an entire research area, with many possible paper topics.
Recently, Panchenko et al found traffic splitting to be highly effective against state of the art Website Traffic Fingerprinting attacks based on deep learning[24].
Concurrently, Tobias Pulls used Perry and Kadianakis's padding machines in an optimization problem, using a Genetic Algorithm to evolve optimal padding machines against deep learning classifiers, for use in defense against Website Traffic Fingerprinting[25]. With this result, we have finally entered the age of the machines versus the machines. Raccoon math, while groundbreaking, is no longer necessary.
Both of these defenses were highly successful on their own.
Pulls's methodology in your reference[25] was exemplary. Using the circpad simulator and the circpad frameworks allows us to rapidly and directly deploy exact research solutions on the Tor network, as-is.
In fact, we could deploy the GA-generated machine specifications in his paper on live Tor relays today.
We will need to re-tune everything once congestion control and conflux is deployed, and when timing is involved, so I think the best plan is to have another round or two of research into optimizing and tuning for that scenario.
However, with the combination of traffic splitting and cover traffic defenses, Tor will be on the CUTTING EDGE of making a PARADIGM SHIFT in its threat model, to tackle the hardest problem of all: END-TO-END TRAFFIC CORRELATION.
I also agree that the combination should require less overhead and better performance than either one by themselves. Obviously, testing this is a very promising research area. I encourage full collaboration between Pulls, Panchenko, Tor, wild raccoons, and others, in this area.
For those who are considering studying this, see: https://gitlab.torproject.org/mikeperry/torspec/-/blob/ticket40202_01/propos...
We are optimizing that using congestion control, to achieve high-speed low-latency traffic splitting, to exit relays and onion services. We will likely only use 2 circuits, to reduce exposure to guard relays with respect to other potential attacks, so some padding overhead will likely still be necessary.
The combination could also be tuned to help reduce the overhead needed by padding, in an optimization problem context, like Pulls's GA.
I will be updating that draft with more information as the proposal solidifies.
Note to those from the future: this proposal draft link will eventually be merged to the torspec repo. Check for the final version here: https://gitlab.torproject.org/tpo/core/torspec/-/tree/master/proposals
Indeed, once time is included as a feature, deep learning based Website Traffic Fingerprinting attacks will effectively be correlating the timing and traffic patterns of websites to their representations in its neural model. This model comparison is extremely similar to how end-to-end correlation compares the timing and traffic patterns of Tor entrance traffic to Tor exit traffic. In fact, deep learning classifiers have already shown success in correlating end-to-end traffic on Tor[28].
While you have offered no specific testable predictions for this theory, presumably to score more crackpot points, allow me to provide a reduction proof sketch, as well as an easily testable result.
To see that Deep Fingerprinting reduces to Deep Correlation, consider the construction where the correlator function from DeepCorr is used to correlate pairs of raw test traces to the raw training traces that were used to train the Deep Fingerprinting classifier. The correlated pairs would be constructed from the monitored set's test and training examples. This means that instead of correlating client traffic to Exit traffic, DeepCorr is correlating "live" client traces directly to the raw fingerprinting training model, as you said.
This gets us "closed world" fingerprinting results. For "open world" results, include the unmonitored set as input that does not contain matches (to represent partial network observation that results in unmatched pairs).
If the accuracy from this DeepCorr Fingerprinting construction is better than Deep Fingerprinting for closed and open world scenarios, one can conclude that Deep Fingerprinting reduces to DeepCorr, in a computational complexity and information-theoretic sense. This is testable.
If the accuracy is worse, then Deep Fingerprinting is actually a more powerful attack than DeepCorr, and thus defenses against Deep Fingerprinting should perform even better against DeepCorr, for web traffic. This is also testable.
This reduction also makes sense intuitively. The most powerful correlation and fingerprinting attacks now use CNNs under the hood. So they should both have the same expressive power, and inference capability.
Interestingly, the dataset that Pulls used was significantly larger than what DeepCorr used, in terms of "pairs" that must be matched.
More interestingly, DeepCorr also found that truncating flows to the initial portion was still sufficient for high accuracy. Pulls's defenses also found that the beginning of website traces were most important to pad heavily.
Some say that Long Term Statistical Disclosure (LTSD) attacks will still always win the end-to-end correlation game against anonymity networks, in the fullness of time[29].
However, LTSD attacks are only a theory. And much like quantum mechanics, relativity, and LSD, these attacks also warp one's perception of reality, time, and space. All of these theories are fundamentally misguided.
LTSD attacks predict that over time, correlation gradually leaks enough information to fully deanonymize users of anonymity networks. But also much like quantum mechanics, they fail to fully define the mechanism.
Consider this thought experiment (feel free to use whatever mind expanding devices you have at hand to assist you): LTSD assumes that an adversary has complete high resolution information of all traffic that enters and exits an anonymity network. Additionally, LTSD assumes that an adversary has identifiers available to properly track traffic streams on *each* side of the correlation, over the full duration of observation and long-term correlation.
For a more modern treatment of LTSD-like correlation attack theory, see The Anonymity Trilemma: https://eprint.iacr.org/2017/954.pdf
Even so, all of the limitations you have identified still apply. Some have been incorporated into the theory and indeed show decrease in efficacy, but others have still not been accounted for!
As I said in the circpad framework documentation, I prefer an empirical approach to pure formalism, for this reason. I agree that it looks like we can do much better than today, for a realistic amount of overhead.
All of that said, anonymity is a complicated problem. As your earlier posts indicate: targeting, stylometry, and mailinglist post timing can degrade anonymity in surprising ways. The Raccoon Effect only works if we have enough raccoons who behave and look alike, and are exceedingly careful about it. The machines can do much more than correlate traffic patterns, these days!
This by itself is a huge win. We can now say with certainty that The Raccoon Effect has thoroughly discromulated correlation attacks.
(Discromulation is my term to describe what this kind of defense does. Most interestingly, I am forced into winning this crackpot point. Because deep learning is an opaque machine generated attack, and because the GA-optimized defense is also machine generated, it is actually impossible to precisely describe the complete behaviors of either one, other than with the resulting model definitions themselves! Brave new world.)
This *is* interesting. Pulls also pointed this out in his paper. This is another reason why it seems better to rely on reproducible empirical methods, rather than pure formalism.
Now, what about alien intervention? Well, assuming we do not consider the AI that participated in this work to be alien: if aliens did intervene, none would argue the discromulating conflugruity of The Raccoon Effect. Unfortunately however, I can neither confirm nor deny these allegations[34], at this time[35].
The fact that Pulls's AI named itself 'Interspace' has me curious and eager to subscribe to your newsletter!
But that's not all! Since the circuit padding framework is implemented in Tor, this means that it is covered by Tor's bug bounty. While research papers that break padding defenses are not covered by the bounty (especially if those defenses are not actually deployed), there *is* in fact prize money for any flaws found in the framework that could lead to code execution, or deanonymization[36].
Unfortunately, when OTF lost funding due to the Trump administration's desire to fund closed source Internet Freedom tools, we also lost our OTF funding for this bug bounty, and had to temporarily suspend it while we look for a new sponsor.
However, to keep you honest (and preserve your crackpot points), I will personally honor the bounty for any bugs found in the circpad framework, as deployed in Tor, that lead to code execution or full deanonymization, as a result of that code (excluding correlation and fingerprinting attacks, until we deploy strong defenses). It is mostly my code anyway, and I doubt George Kadianakis made any mistakes.
If anyone wants to help support Tor's ability to make progress on these types of problems, please consider donating: https://donate.torproject.org/
P.P.P.S. At 1004 points on the crackpot index, I believe this post is now the highest scoring publication with a valid novel idea that has been written, to date[2].
If it helps to get a raccoon into the world record books: I again confirm this is a valid, novel idea. I have kept John Baez on Cc for this reason. We should probably take him off after this :).
P.P.P.P.S. Fucking bored as fuck during this fucking pandemic. Fuck![42]
I hear you. To help pass the time until the aliens reveal themselves, I've made a playlist: https://open.spotify.com/playlist/5iYQ0BZNEOaoRhf8Pydvqp
- https://math.ucr.edu/home/baez/crackpot.html
- https://www.reddit.com/r/math/comments/4r05wh/has_anyone_with_a_high_crackpo...
- https://en.m.wikipedia.org/wiki/Betteridge%27s_law_of_headlines
- http://www.stinkymeat.net/
- https://archives.seul.org/or/dev/Mar-2012/msg00019.html - Raccoon23 Post1
- https://archives.seul.org/or/dev/Sep-2008/msg00016.html - Raccoon23 Post2
- https://conspicuouschatter.wordpress.com/2008/09/30/the-base-rate-fallacy-an...
- https://fahrplan.events.ccc.de/congress/2006/Fahrplan/speakers/1242.en.html
- https://awards.acm.org/award_winners/syverson_5067587
- https://web.archive.org/web/20121130072122/http://www.foreignpolicy.com/arti...
- https://lists.torproject.org/pipermail/tor-dev/2008-September/002493.html
- https://en.wikipedia.org/wiki/Simulation_hypothesis#The_simulation_argument
- https://en.wikipedia.org/wiki/Alcubierre_drive
- https://www.bbc.com/news/world-europe-36173247 - Weasel takes down LHC
- https://www.slashgear.com/elon-musk-has-banned-hot-tub-talks-about-simulated...
- https://www.forbes.com/sites/janetwburns/2016/10/13/elon-musk-and-friends-ar...
- https://www.youtube.com/watch?v=qLcma0YyzhY - Elon Musk Flame Thrower
- https://www.yogonet.com/international/noticias/2020/12/07/55695-boring-compa...
- https://www.inverse.com/innovation/tesla-electric-jet-3-4-years-away
- https://blog.torproject.org/critique-website-traffic-fingerprinting-attacks
- https://github.com/torproject/tor/blob/master/doc/HACKING/CircuitPaddingDeve...
- https://arxiv.org/pdf/1801.02265.pdf - Deep Fingerprinting Tor
- https://www.youtube.com/watch?v=TvjMr6DU7C8 - Raccoon call
- https://www.comsys.rwth-aachen.de/fileadmin/papers/2020/2020-delacadena-traf...
- https://arxiv.org/abs/2011.13471 - Pulls GA Defense
- https://abcnews.go.com/blogs/headlines/2014/05/ex-nsa-chief-we-kill-people-b...
- https://www.full-thesis.net/fragments-of-energy-not-waves-or-particles-may-b...
- https://people.cs.umass.edu/~amir/papers/CCS18-DeepCorr.pdf
- https://www.freehaven.net/anonbib/cache/statistical-disclosure.pdf
- https://transparencyreport.google.com/https/overview?hl=en
- https://en.wikipedia.org/wiki/Accelerando#Characters
- https://fahrplan.events.ccc.de/congress/2006/Fahrplan/attachments/1167-Speak...
- https://www.youtube.com/watch?v=jSRfIMjvtFk - Raccoons and cats <3
- https://edition.cnn.com/2020/04/27/politics/pentagon-ufo-videos/index.html
- https://www.nbcnews.com/news/weird-news/former-israeli-space-security-chief-...
- https://hackerone.com/torproject
- https://www.msn.com/en-ie/news/coronavirus/during-a-pandemic-isaac-newton-ha...
- https://www.youtube.com/watch?v=Ofp26_oc4CA - Raccoons are Legion
- https://www.usenix.org/conference/usenixsecurity21/artifact-evaluation-infor...
- https://petsymposium.org/artifacts.php
- https://en.wikipedia.org/wiki/Liar_paradox
- https://www.youtube.com/watch?v=04_rIuVc_qM - WTF
This is an auspicious number of top-tier references!
For Karsten: https://cs5.livemaster.ru/storage/3a/1f/1449eb23f3c3b318ab4960815fn4--waterc...
It is comforting to know that Karsten had friends even among the raccoons. Probably among the aliens too.
On Wednesday, December 23, 2020 4:15 AM, Mike Perry mikeperry@torproject.org wrote:
On 12/23/20 7:58 PM, The23rd Raccoon wrote:
Indeed, once time is included as a feature, deep learning based Website Traffic Fingerprinting attacks will effectively be correlating the timing and traffic patterns of websites to their representations in its neural model. This model comparison is extremely similar to how end-to-end correlation compares the timing and traffic patterns of Tor entrance traffic to Tor exit traffic. In fact, deep learning classifiers have already shown success in correlating end-to-end traffic on Tor[28].
While you have offered no specific testable predictions for this theory, presumably to score more crackpot points, allow me to provide a reduction proof sketch, as well as an easily testable result.
To see that Deep Fingerprinting reduces to Deep Correlation, consider the construction where the correlator function from DeepCorr is used to correlate pairs of raw test traces to the raw training traces that were used to train the Deep Fingerprinting classifier. The correlated pairs would be constructed from the monitored set's test and training examples. This means that instead of correlating client traffic to Exit traffic, DeepCorr is correlating "live" client traces directly to the raw fingerprinting training model, as you said.
This gets us "closed world" fingerprinting results. For "open world" results, include the unmonitored set as input that does not contain matches (to represent partial network observation that results in unmatched pairs).
Thank you for this clarification! This is exactly what I was talking about, in between scoring crackpot points.
If the accuracy from this DeepCorr Fingerprinting construction is better than Deep Fingerprinting for closed and open world scenarios, one can conclude that Deep Fingerprinting reduces to DeepCorr, in a computational complexity and information-theoretic sense. This is testable.
If the accuracy is worse, then Deep Fingerprinting is actually a more powerful attack than DeepCorr, and thus defenses against Deep Fingerprinting should perform even better against DeepCorr, for web traffic. This is also testable.
This reduction also makes sense intuitively. The most powerful correlation and fingerprinting attacks now use CNNs under the hood. So they should both have the same expressive power, and inference capability.
Interestingly, the dataset that Pulls used was significantly larger than what DeepCorr used, in terms of "pairs" that must be matched.
I am very suspicious that DeepCorr found in figure 7 that the false positive rate did not change with additional flows. This makes me suspect that this figure is reporting raw per-flow P(C|M) and P(C|~M), from my first post: https://archives.seul.org/or/dev/Mar-2012/msg00019.html
Again, Danezis argued against my math saying that modern correlators perform correlation on all n^2 streams "as a whole", rather than pairwise: https://conspicuouschatter.wordpress.com/2008/09/30/the-base-rate-fallacy-an...
However, based on the fact that Website Traffic Fingerprinting works, as you add more concurrent flows for popular websites, shouldn't the correlation find false positives among them? And then, what about defenses that make different websites correlate in this way?
Additionally, it is somewhat amusing to me that DeepCorr used almost the exact same scale of experimental flows as my 2008 post (~5000), and reports a false positive rate of the same magnitudes. (0.001 FP; 0.999 TP, from eyeballing Figure 8): https://people.cs.umass.edu/~amir/papers/CCS18-DeepCorr.pdf
More science is needed! The construction above is a very good start!
More interestingly, DeepCorr also found that truncating flows to the initial portion was still sufficient for high accuracy. Pulls's defenses also found that the beginning of website traces were most important to pad heavily.
I actually agree with the dogma that "more packets means more information", and that DeepCorr should improve with longer flows.
However, research does indicate that the highest differentiating information gain is present in the initial portion of web traffic. Additionally, Pulls's alien AI confirmed this independently.
There is also a limit to how long website flows tend to be. The application layer can also get involved to enforce an arbitrary limit, before reconnecting via other paths, as Panchenko showed.
P.P.P.S. At 1004 points on the crackpot index, I believe this post is now the highest scoring publication with a valid novel idea that has been written, to date[2].
If it helps to get a raccoon into the world record books: I again confirm this is a valid, novel idea. I have kept John Baez on Cc for this reason. We should probably take him off after this :).
The suppression of my ideas remains extreme! My post never made it back to me, nor did it end up in my spam folder! Very suspicious!
In case anyone missed it, it did hit the list archives: https://lists.torproject.org/pipermail/tor-dev/2020-December/014496.html