Re: [tor-project] Constructing a real-world dataset for studying website fingerprinting

21 Apr 2023


      ...
On Apr 20, 2023, at 9:34 PM, Tom Ritter tom@ritter.vg wrote:
On Thu, 20 Apr 2023 at 17:16, Jansen, Robert G CIV USN NRL (5543) Washington DC (USA) via tor-project tor-project@lists.torproject.org wrote:
The primary
information being measured is the directionality of the first 5k cells sent on a
measurement circuit, and a keyed-HMAC of the first domain name requested on the
circuit.
I suppose this is kind of a non-question, since you wouldn't be doing it otherwise, but I am surprised that associating the traffic patterns to a single key, that of the first domain name, is sufficient.  Every page or query made to that domain (e.g. duckduckgo) will have the same key, with potentially a lot of entirely disparate traffic patterns.
You are absolutely correct! I think it’s worth exploring the extent to which those different traffic patterns from the different subpages, perhaps even loaded in different orders, can be combined to identify the site, and how we can protect traffic in this scenario. This really is web*site* fingerprinting more than web*page* fingerprinting.
...
Obviously this is limited by what you can technically achieve in this scenario: you have the plaintext DNS requests, and everything else is going to be TLS-encrypted. The alternative would be to instrument a tor client/browser and find volunteers to opt-in to their data collection.
Yes, the volunteer approach has it’s own limitations too, particularly in terms of potential bias, lower traffic/browser diversity, etc. It’s also an entirely different beast from a research perspective because it involves direct participation from human subjects. I think this approach could be useful, especially if the volunteer pool was very large. But I think the one we’ve proposed is easier to get started.
...
-tom
Thanks for the comments, tom!
Peace, love, and positivity,
Rob

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [tor-project] Constructing a real-world dataset for studying website fingerprinting