Thanks, Karsten!
The bridge descriptor tarballs contain bridge network statuses, server descriptors, and extra-info descriptors. See:
Oops, I read 'contain similar documents as the relay descriptor archives' as being server descriptors. Maybe in this first sentence it should explicitly say that it's a bundled batch of network status, server descriptors, and extra-info descriptors?
You'll find an example here:
https://metrics.torproject.org/formats.html#bridgedesc
(I'll also include an example of the suggested format below.)
Oops again. Didn't figure that we'd use the same scrubbing description for both. Personally I'd find it more intuitive if we had separate sections for both, though I see why you did it this way.
No, the fingerprint is the identity key digest, whereas the descriptor identifier is the descriptor digest.
Gotcha. Added support for the router-digest lines and flagged them as being required for bridge server descriptors... https://gitweb.torproject.org/stem.git/commitdiff/e7e03d2f61d6dcc7bc5e5ad4de...
Minor tweak for the is_scrubbed() method, but that's all.
Great.
Changed... https://gitweb.torproject.org/stem.git/commitdiff/f7fb726cc3dea8bfd294833b15...
After thinking more about it, I came to the conclusion that we should stop sanitizing *-stats lines at all.
In that case the 'router-signature' lines are the only ones being scrubbed out of bridge extra-info descriptors, right? If so then we don't need a 'router-digest' here since the digest can be calculated from the (now unscrubbed) content - right?
Cheers! -Damian