Re: [tor-dev] [tor-talk] Client simulation

7 Jun 2013


      (Sorry for cross-posting, but I think this is a topic for tor-dev@, not
tor-talk@.  If you agree, please reply on tor-dev@ only.  tor-talk@
people can follow the thread here:
https://lists.torproject.org/pipermail/tor-dev/2013-June/thread.html)
On 6/6/13 7:32 PM, Norman Danner wrote:
...
I have two questions regarding a possible research project.
First, the research question:  can one use machine-learning techniques
to construct a model of Tor client behavior?  Or in a more general form:
 can one use <fill-in-the-blank> to construct a model of Tor client
behavior?  A student of mine did some work on this over the last year,
and the results are encouraging, though not strong enough to do anything
with yet.
Second, the meta-question:  is it worthwhile to answer the first
question?  It seems to me that if the answer to the first question is
"yes," then the solution could be used to (at least) provide better
simulations of Tor (e.g., via Shadow or ExperimenTor).  This possibly
naive thought would imply that the answer to the second question is "yes."
I'd be interested to hear responses to my second question, either
validating my naive thought or explaining why the first question isn't
worth answering.  I'd accept responses to my first question, too, in
case this has already been done.
Hi Norman,
yes, it's worthwhile to answer this question!  I can imagine how at
least Shadow and the Tor path generator would benefit from better client
models.  User number estimates on the metrics website might benefit from
them, too.
I found two tickets where we asked similar questions before, and maybe
there are more tickets like these:
https://trac.torproject.org/projects/tor/ticket/2963
https://trac.torproject.org/projects/tor/ticket/6295
Some very early thoughts:
- How do we make sure that we ask a representative set of people to
instrument their clients and export data on their usage behavior?  If we
only ask people who read their favorite news site twice per day, our
client model will be just that, but not representative for all Tor
users.  (Still, we would know more than we know now.)
- Can we somehow aggregate usage information enough to make it safe for
people to send actual usage reports to us?  I could imagine having a
torrc flag that is disabled by default and that, when enabled, writes
sanitized usage information to disk.  For this we need a very good idea
what we're planning to do with the data, and we'll need to specify the
aggregation approach in a tech report and get it reviewed by the community.
Are your student's results available somewhere?
Best,
Karsten

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [tor-dev] [tor-talk] Client simulation