Sorry for always picking random stuff from the volunteer page, but
having read this,
<quote>
Programs like Torbutton aim to hide your browser's UserAgent string by
replacing it with a uniform answer for every Tor user. That way the
attacker can't splinter Tor's anonymity set by looking at that header.
It tries to pick a string that is commonly used by non-Tor users too, so
it doesn't stand out. Question one: how badly do we hurt ourselves by
periodically updating the version of Firefox that Torbutton claims to
be? If we update it too often, we splinter the anonymity sets ourselves.
If we don't update it often enough, then all the Tor users stand out
because they claim to be running a quite old version of Firefox. The
answer here probably depends on the Firefox versions seen in the wild.
Question two: periodically people ask us to cycle through N UserAgent
strings rather than stick with one. Does this approach help, hurt, or
not matter? Consider: cookies and recognizing Torbutton users by their
rotating UserAgents; malicious websites who only attack certain
browsers; and whether the answers to question one impact this answer.
</quote>
I think the best answer is simple, although a little more than trivial
to implement.
If you want to anonymize the user agent, you need to mimic the user
agent changes made by normal users across the web, and I'm positing that
in any statistically significant sample of web users, the behavior will
be similar and simple enough on nearly all orders of magnitude
available, i.e., normally distributed in any analytical dimension.
Think about the natural evolution of a typical web user's user agent.
What are the factors that will result in a change of user agent?
1) A user may have more than one browser that they use on the same computer,
2) they may use the web on more than one device (phone, tablet, laptop,
desktop),
3) and typical, stochastic upgrade patterns.
For (1) and (2) there is not much torbutton could do to coordinate
reasonable obfuscation among multiple version, logical, space, and time
separated instances without way more effort than would be profitable.
In fact, (1) and (2) ought to naturally provide most if not all the
anonymizing that can reasonably be accomplished from torbutton's
perspective without any action at all on torbutton's part.
For (3), though, there is something that could be done by torbutton.
Some factors to consider in constructing a stochastic user agent updater
are:
- which browsers automatically upgrade themselves?
- which browsers bother the user to upgrade?
- what are the typical user response patterns towards browser upgrades?
Remember, we're thinking about a single browser on a single system. If
there are things going on for that user external to this (reinstall
windows, upgrade ubuntu, get a new computer, etc.) those effects are
already accounted for by (1) and (2).
Some investigative questions/statements to lead an analysis on this
could be something like:
'what is the distribution of browsers for human web users?'
'what is the distribution of systems and system versions for human web
users?'
'what are the significant correlations on the cartesian product of these
two dimensions?'
'how do the browser versions in this product space evolve through time?'
...
'2/3 of firefox users are on windows and their upgrade habits follow a
temporal distribution that is a spike followed by an exponential decay
of order 2.3',
'0.8 of the remaining firefox users are on linux and their update habits
are dominated by package management systems',
etc.
Then, upon finding the most significant trends in browser update
patterns, construct a mechanism for torbutton that mimics them on a per
user basis: Once a user installs torbutton, it samples (selects) a
browser from the distribution of browsers and then follows that
browser's typical upgrade pattern. The problem with this idea, I guess,
is that torbutton will have to phone home to find out when browser
updates are adopted by users so that it can make its change at the
expectation value or whatever.
The goal being to distribute torbutton users according to the
distribution of all web users among all user agents, you must first find
out what that latter distribution is and how it is likely to evolve.
This second part is not so bad as it seems because I'm guessing that
some forward standard deviations of the expectation value in the
temporal sense will occur late enough after the browser update that
torbutton can push it out to most installed instances (it's not going to
happen before, causality and all).
Justin