Hi everybody,
we're discussing in #5684 whether we can stop sanitizing nicknames in the bridge descriptors that we publish here:
https://metrics.torproject.org/data.html#bridgedesc
The sanitizing process is described here:
https://metrics.torproject.org/formats.html#bridgedesc
When we started making sanitized bridge descriptors available on the metrics website we replaced all contained nicknames with "Unnamed". The reason was that "bridge nicknames might give hints on the location of the bridge if chosen without care; e.g. a bridge nickname might be very similar to the operators' relay nicknames which might be located on adjacent IP addresses."
This was an easy decision back then, because we didn't use the nickname for anything. This has changed with #5629 where we try to count EC2 bridges which all have a similar nickname. So, while we don't have that information, there'd now be a use for it. Another advantage of having bridge nicknames would be that they're easier to look up in a status website like Atlas (which doesn't support searching for bridges yet). We should re-consider whether it still makes sense to sanitize nicknames in bridge descriptors or not.
Regarding the reasoning above, couldn't an adversary just scan adjacent IP addresses of all known relays, not just the ones with similar nicknames? And are we giving away anything else with the nicknames?
It would be great to get some feedback here whether leaving nicknames in the sanitized descriptors is a terrible idea, and if so, why.
If nobody objects within the next, say, two weeks, I'm going to make an old tarball from 2008 available with original nicknames. And if nobody screams, I'll provide the remaining tarballs containing original nicknames another two weeks later.
Thanks! Karsten