New subject: Fwd: Re: Can we stop sanitizing nicknames in bridge descriptors?

22 May 2012

      Forwarding my original answer to Sebastian here.
-------- Original Message --------
Subject: Re: [tor-dev] Can we stop sanitizing nicknames in bridge
descriptors?
Date: Mon, 21 May 2012 19:56:34 +0200
From: Karsten Loesing karsten@torproject.org
To: Sebastian G. <bastik.tor> bastik.tor@googlemail.com
Hi Sebastian,
On 5/21/12 7:08 PM, Sebastian G. <bastik.tor> wrote:
(Did you intend to send this mail only to me, not to tor-dev?  Feel free
to move the discussion back to tor-dev if you want.)
...
Karsten Loesing, 21.05.2012 11:05:
...
...
Here we go with the similarities of bridge and relay nicknames.
Thanks for spending this much time on the analysis!
I could have done far worse, but also a lot better in terms of time
spend on extracting the data that I wanted or at least considered that
they'd might be useful.
Sometimes I'm just slow at things, e.g. writing this reply.
...
Here's what I did with your findings.txt:

extract unique fingerprint pairs of relays and bridges that you found

as having similar nicknames,

look through descriptor archives to see if relay and bridge were

running in the same /24 at any time in May 2008, and

determine the absolute and relative number of bridges in a given

network status that could have been located via nickname similarity.
Results are that 24 of your 81 guesses (30%) were correct in the sense
that a bridge was at least once running in the same /24 as the relay
with similar nickname.  At any time in May 2008, you'd have located
between 1 and 6 bridges (2.5% to 18%) with 3 bridges (10%) in the mean
via nickname similarity.
Not too bad.
I agree. :)
...
...
I think it's acceptable to publish more recent bridge descriptors with
nicknames in a week from now.  Results may look quite different with
1000 bridges instead of 30.
May 2008 was the first month with bridges. I expected lot's of relay
operators that tested a bridge with the same name. Things may have
changed over time. I assume that further comparisons won't have such a
"high" hit ratio.
That would be my guess, too.  In May 2008, only a few early adopters
were running bridges, and most of those probably ran relays at the same
time, too.  Plus, they were enthusiastic and put some energy in finding
cool nicknames.  It might be that this has changed since then.  To be
honest, I didn't look at 2012 tarballs yet.
...
...
Again, thanks for running this analysis!  Maybe you're interested in
automating your comparison and re-running it for a 2012 tarball?
My claim was you got the data, so you can check. (Not with May 2008)
To be honest, my first impression was that I wouldn't do anything useful
and did not intend to do that. I guessed it wouldn't turn out that it
doesn't hurt since at least 2011, so I wouldn't find anything good.
Then you asked and I agreed, but already thought "I couldn't keep my
mouth shut!". I mean I replied to this topic. I surely could have said
no there. I didn't.
After and while I was doing what I did. I would have said no to the
question if I'm going to do this again. That's valid for up to Sunday
night. Today I'm agreeing again.
That's a pretty long way to say: Yes!
Hah, great! :)
I'm going to make the 2012 tarballs available next Wednesday (May 30),
assuming that my poor Linux box doesn't run out of $resource.  I'll let
you know.
...
Thank you,it's an 2012 tarball. The number of bridges is scary.
I'm going to upload some files somewhere and explain what I did. Step by
step (somewhat around that). So anyone can check and reproduce what I
did. It would be nice to hear feedback and ways to improve the way I did
what I did.
Maybe you can tell me if the findings.txt was alright.
Yes, the file format was fine.
...
Unless one objects or you disagree I'm going to upload the files I
created and explain how and maybe I can say even why.
No objections at all.  Open discussion is good.
...
I created a Blog, just because I wanted it some when in the past, but
found it silly. That's the channel I planed to use. Maybe it's OK to put
it on a Tor-List as well, but maybe it's considered as noise.
I wonder if the Tor wiki would be a better place to collect ideas for
reversing the bridge descriptor sanitizing process.  Feel free to grab a
new page in doc/ and start describing what you did.
Best,
Karsten