Karsten Loesing, 16.05.2012 08:47:
On 5/2/12 2:30 PM, Karsten Loesing wrote:
If nobody objects within the next, say, two weeks, I'm going to make an old tarball from 2008 available with original nicknames. And if nobody screams, I'll provide the remaining tarballs containing original nicknames another two weeks later.
Here we go. These are the sanitized bridge descriptors from May 2008 including original bridge nicknames:
http://freehaven.net/~karsten/volatile/bridges-2008-05-nicknames.tar.bz2
Here we go with the similarities of bridge and relay nicknames. While some are sharing names, some seem to share a naming scheme and others could be run by the same person on an adjacent IP address.
Attached you'll see "findings.txt", which contains a bridge line, followed by a relay line, followed by a comment line.
Each should be in another line. Windows and Linux users should see the same as LF (Linefeed) and CR are included. That might be a problem for other files I might release as some contain LF only, which is for Linux and will be understood by a proper notepad even on Windows. When no one objects I'm going to release them somewhere else, so it's possible to check my way of doing things.
I'm not good in time tracking, but spend approx. 4-5 hours on processing the tarballs to get the lists of names for both relays and bridges.
Shortly after I agreed to do this, I downloaded the relay tarball and started to figure out how I would get only what I needed. After I did that I waited for the bridge tarball and did the same to it.
Simply because I know the things I did I assume that it will be possible to do that quicker. With other tools it might be faster. I assume it would be much faster, when processing the tarballs could be done by a script. Unfortunately I'm not capable of doing so.
The comparison was done manually, but I'm sure a algorithm would have found most of the similarities anyway. I took the list of bridge names and compared it to the list of relay names. I looked for and included exact matches and close similarities. After that there wasn't much left to look for on the bridge list. I looked a each bridge name and guessed what it could be. I searched the Internet on them. I may have found something that isn't based on a naming scheme, but can be linked in other ways.
The worst part was to put the lines together. I underestimated the time the plain comparison would take, but copying the lines into findings.txt took longer as I would have imagined. I'm not sure if my provided data are the best in how I put the lines together.
The nicest part, even if it was time consuming, was to look for other things that could link relay and bridge names together. It wasn't as successful as I hoped, but it was fun. Maybe because I like the universe and mythology, which may influenced the findings to a higher degree as useful.
The last part could be done by a script as well. It would have to look for a bridge name in a list "moons" and when it finds one, it has to compare the list "moons" against the relay list to find a relay that's in the "moons" list. Of course with many lists like "freedom activists".
The comparison, including copying the line together took approx. 08-10 hours. I really expected much less as I saw the relay names.
The total time spent was 12-15 hours. I assume that this can be reduced by using scripts and or tools with are designed to do parts of the job.
This mail is getting far too long, so all I'm going to say is that I'm looking forward to the results.
Best regards, bastik_tor