Forwarding my original answer to Sebastian here.
-------- Original Message -------- Subject: Re: [tor-dev] Can we stop sanitizing nicknames in bridge descriptors? Date: Mon, 21 May 2012 19:56:34 +0200 From: Karsten Loesing karsten@torproject.org To: Sebastian G. <bastik.tor> bastik.tor@googlemail.com
Hi Sebastian,
On 5/21/12 7:08 PM, Sebastian G. <bastik.tor> wrote:
(Did you intend to send this mail only to me, not to tor-dev? Feel free to move the discussion back to tor-dev if you want.)
Karsten Loesing, 21.05.2012 11:05:
Here we go with the similarities of bridge and relay nicknames.
Thanks for spending this much time on the analysis!
I could have done far worse, but also a lot better in terms of time spend on extracting the data that I wanted or at least considered that they'd might be useful.
Sometimes I'm just slow at things, e.g. writing this reply.
Here's what I did with your findings.txt:
- extract unique fingerprint pairs of relays and bridges that you found
as having similar nicknames,
- look through descriptor archives to see if relay and bridge were
running in the same /24 at any time in May 2008, and
- determine the absolute and relative number of bridges in a given
network status that could have been located via nickname similarity.
Results are that 24 of your 81 guesses (30%) were correct in the sense that a bridge was at least once running in the same /24 as the relay with similar nickname. At any time in May 2008, you'd have located between 1 and 6 bridges (2.5% to 18%) with 3 bridges (10%) in the mean via nickname similarity.
Not too bad.
I agree. :)
I think it's acceptable to publish more recent bridge descriptors with nicknames in a week from now. Results may look quite different with 1000 bridges instead of 30.
May 2008 was the first month with bridges. I expected lot's of relay operators that tested a bridge with the same name. Things may have changed over time. I assume that further comparisons won't have such a "high" hit ratio.
That would be my guess, too. In May 2008, only a few early adopters were running bridges, and most of those probably ran relays at the same time, too. Plus, they were enthusiastic and put some energy in finding cool nicknames. It might be that this has changed since then. To be honest, I didn't look at 2012 tarballs yet.
Again, thanks for running this analysis! Maybe you're interested in automating your comparison and re-running it for a 2012 tarball?
My claim was you got the data, so you can check. (Not with May 2008)
To be honest, my first impression was that I wouldn't do anything useful and did not intend to do that. I guessed it wouldn't turn out that it doesn't hurt since at least 2011, so I wouldn't find anything good.
Then you asked and I agreed, but already thought "I couldn't keep my mouth shut!". I mean I replied to this topic. I surely could have said no there. I didn't.
After and while I was doing what I did. I would have said no to the question if I'm going to do this again. That's valid for up to Sunday night. Today I'm agreeing again.
That's a pretty long way to say: Yes!
Hah, great! :)
I'm going to make the 2012 tarballs available next Wednesday (May 30), assuming that my poor Linux box doesn't run out of $resource. I'll let you know.
Thank you,it's an 2012 tarball. The number of bridges is scary.
I'm going to upload some files somewhere and explain what I did. Step by step (somewhat around that). So anyone can check and reproduce what I did. It would be nice to hear feedback and ways to improve the way I did what I did.
Maybe you can tell me if the findings.txt was alright.
Yes, the file format was fine.
Unless one objects or you disagree I'm going to upload the files I created and explain how and maybe I can say even why.
No objections at all. Open discussion is good.
I created a Blog, just because I wanted it some when in the past, but found it silly. That's the channel I planed to use. Maybe it's OK to put it on a Tor-List as well, but maybe it's considered as noise.
I wonder if the Tor wiki would be a better place to collect ideas for reversing the bridge descriptor sanitizing process. Feel free to grab a new page in doc/ and start describing what you did.
Best, Karsten