Sorry for always picking random stuff from the volunteer page, but having read this,
<quote> Programs like Torbutton aim to hide your browser's UserAgent string by replacing it with a uniform answer for every Tor user. That way the attacker can't splinter Tor's anonymity set by looking at that header. It tries to pick a string that is commonly used by non-Tor users too, so it doesn't stand out. Question one: how badly do we hurt ourselves by periodically updating the version of Firefox that Torbutton claims to be? If we update it too often, we splinter the anonymity sets ourselves. If we don't update it often enough, then all the Tor users stand out because they claim to be running a quite old version of Firefox. The answer here probably depends on the Firefox versions seen in the wild. Question two: periodically people ask us to cycle through N UserAgent strings rather than stick with one. Does this approach help, hurt, or not matter? Consider: cookies and recognizing Torbutton users by their rotating UserAgents; malicious websites who only attack certain browsers; and whether the answers to question one impact this answer. </quote>
I think the best answer is simple, although a little more than trivial to implement.
If you want to anonymize the user agent, you need to mimic the user agent changes made by normal users across the web, and I'm positing that in any statistically significant sample of web users, the behavior will be similar and simple enough on nearly all orders of magnitude available, i.e., normally distributed in any analytical dimension.
Think about the natural evolution of a typical web user's user agent. What are the factors that will result in a change of user agent?
1) A user may have more than one browser that they use on the same computer, 2) they may use the web on more than one device (phone, tablet, laptop, desktop), 3) and typical, stochastic upgrade patterns.
For (1) and (2) there is not much torbutton could do to coordinate reasonable obfuscation among multiple version, logical, space, and time separated instances without way more effort than would be profitable. In fact, (1) and (2) ought to naturally provide most if not all the anonymizing that can reasonably be accomplished from torbutton's perspective without any action at all on torbutton's part.
For (3), though, there is something that could be done by torbutton. Some factors to consider in constructing a stochastic user agent updater are: - which browsers automatically upgrade themselves? - which browsers bother the user to upgrade? - what are the typical user response patterns towards browser upgrades?
Remember, we're thinking about a single browser on a single system. If there are things going on for that user external to this (reinstall windows, upgrade ubuntu, get a new computer, etc.) those effects are already accounted for by (1) and (2).
Some investigative questions/statements to lead an analysis on this could be something like:
'what is the distribution of browsers for human web users?' 'what is the distribution of systems and system versions for human web users?' 'what are the significant correlations on the cartesian product of these two dimensions?' 'how do the browser versions in this product space evolve through time?' ... '2/3 of firefox users are on windows and their upgrade habits follow a temporal distribution that is a spike followed by an exponential decay of order 2.3', '0.8 of the remaining firefox users are on linux and their update habits are dominated by package management systems', etc.
Then, upon finding the most significant trends in browser update patterns, construct a mechanism for torbutton that mimics them on a per user basis: Once a user installs torbutton, it samples (selects) a browser from the distribution of browsers and then follows that browser's typical upgrade pattern. The problem with this idea, I guess, is that torbutton will have to phone home to find out when browser updates are adopted by users so that it can make its change at the expectation value or whatever.
The goal being to distribute torbutton users according to the distribution of all web users among all user agents, you must first find out what that latter distribution is and how it is likely to evolve. This second part is not so bad as it seems because I'm guessing that some forward standard deviations of the expectation value in the temporal sense will occur late enough after the browser update that torbutton can push it out to most installed instances (it's not going to happen before, causality and all).
Justin