[Cc'ing tor-relays, because this discussion might be relevant for relay/bridge operators, too. Please keep the discussion on tor-dev. See https://lists.torproject.org/pipermail/tor-dev/2012-May/003489.html for the whole thread.]
Hi Sebastian,
On 5/2/12 9:35 PM, Sebastian G. <bastik.tor> wrote:
[...]
Do similar names actually mean that bridges are located where the relay is? (Apparently you've got the data to see these correlations)
A fine question.
How do we define "similar" and "located where the relay is?" I can see how a relay "bastik1" and a bridge "bastik2" have similar nicknames, but would we also teach a program that "bastikrelay" and "bastikbridge" are similar? And are two IP address in the same, say, /30 located nearby, or is the same /28 or even /24 okay, too?
So, while we have the data to see these correlations, I think that whatever similarity algorithm we come up with, somebody else might come up with something smarter. If we do the analysis you suggest and learn that it's safe to include nicknames, that doesn't say very much. Only because we have the data to confirm how well our attack would works doesn't automatically mean we're in a good position to design the attack.
If you want to run this analysis with the 2008 tarball (assuming there won't be general objections within the next two weeks), I'm happy to take your list of likely bridge IP addresses and tell you how accurate your algorithm is.
"We don't need it, so better remove it." I really like that.
I think we're really conservative with giving out bridge data, and that's good.
At the same time there's a value in giving out information about bridges, so that "remove everything" is not a good answer. For example, I think if we give bridge operators better feedback how their bridge is doing, we'll suddenly have a lot more bridges. Making it easy for bridge operators to use Atlas would be a good step into that direction. The same applies to funders who realize from our statistics how successful the Tor Cloud project is and who then want to fund it more to make it more usable, support more cloud providers, etc.
And are we giving away anything else with the nicknames?
Maybe it's location ;)
As I read "hints on the location" for the first time; I though it would mean that "TowerBridge" or "BridgeofLondon" would be bad since it could hint to London.
Well, in that case you'd learn that there's a (Tor) bridge in London. But that wouldn't help you very much, would it?
Could it make sense to ask the same question on the tor-relay list? Here you (the Tor people) have more data again and know who subscribed to both lists. I for myself assume that relay and bridge operators, which could object, because it's their naming scheme that could reveal something, are more likely to be subscribed to tor-relays.
Good idea. I added tor-relays to the Cc to let relay/bridge operators know. Let's keep this discussion on tor-dev though.
And if nobody screams, I'll provide the remaining tarballs containing original nicknames another two weeks later.
Probably two weeks later, since unpacking, processing and re-packing takes some time :) I know the sanitized ones are large when they are unpacked. Windows needs some time to delete the extracted files.
Right. :) I'll probably start sanitizing all bridge descriptors at once in two weeks, starting with the 2008 ones, and provide only the 2008 tarball then. It's going to keep my CPU and disks busy for a while.
Thanks for your input!
Best, Karsten