Hi Damian, hi devs,
I'm planning to make microdescriptor tarballs available on the metrics website that contain both microdescriptor consensuses and microdescriptors.
Some background: Recent Tor clients don't download the network status consensus and full server descriptors anymore, but instead download the microdescriptor consensus and the microdescriptors referenced from it. We didn't provide these formats on the metrics website yet, because they are derived from the formats we already provide and don't contain anything novel. But having the new formats will, for example, make it easier for developers to analyze the directory protocol and for researchers to understand what information is available to clients to make path selection decisions. If you need more background, see #2785 and search for "microdesc" in dir-spec.txt.
Here's a sample tarball:
https://people.torproject.org/~karsten/microdescs-2014-01.tar.bz2
Damian, can you try to parse these descriptors using stem, to see if the descriptor annotations are correct and if stem can parse them without issues?
If all goes well, microdescriptor tarballs will start to be available on the metrics website before the end of the month.
All the best, Karsten
Damian, can you try to parse these descriptors using stem, to see if the descriptor annotations are correct and if stem can parse them without issues?
Hi Karsten, sorry about the delay! Yup, stem parses them just fine (though processing compressed tarballs still takes an unpleasantly long time)...
% du -h microdescs-2014-01.tar.bz2 1.8M microdescs-2014-01.tar.bz2
% cat parse.py from stem.descriptor.reader import DescriptorReader
counter = 0
with DescriptorReader(["microdescs-2014-01.tar.bz2"]) as reader: for desc in reader: counter += 1
print "Found %i microdescriptors" % counter
% time python parse.py Found 14999 microdescriptors
real 67m15.022s user 65m50.259s sys 1m13.717s
Cheers! -Damian
On 1/22/14 4:32 AM, Damian Johnson wrote:
Damian, can you try to parse these descriptors using stem, to see if the descriptor annotations are correct and if stem can parse them without issues?
Hi Karsten, sorry about the delay! Yup, stem parses them just fine (though processing compressed tarballs still takes an unpleasantly long time)...
% du -h microdescs-2014-01.tar.bz2 1.8M microdescs-2014-01.tar.bz2
% cat parse.py from stem.descriptor.reader import DescriptorReader
counter = 0
with DescriptorReader(["microdescs-2014-01.tar.bz2"]) as reader: for desc in reader: counter += 1
print "Found %i microdescriptors" % counter
% time python parse.py Found 14999 microdescriptors
real 67m15.022s user 65m50.259s sys 1m13.717s
Wow, that's indeed time-consuming. Inflating the tarball before feeding it into stem probably solves this problem. (That's what I usually do with metrics-lib, too.)
Thanks for testing this! Will deploy the metrics-db changes on yatei.
All the best, Karsten
On 22/01/14 09:01, Karsten Loesing wrote:
Thanks for testing this! Will deploy the metrics-db changes on yatei.
Microdescriptor tarballs are now available on the metrics website:
https://metrics.torproject.org/data.html#relaydesc
All the best, Karsten