Keep in mind that metrics tarballs can be huge. stem's tests probably shouldn't download one or more of these tarballs in an automatic integ test run.
Oops yup. Should have mentioned that. We're just picking out a descriptor that seems to exercise most of the parsing. This is just for a sanity check that 'we can still parse something found in the wild'. Megan, Erik: the layout should be pretty obvious when you take a peek in test/integ/descriptor/data/*.
The Java metrics-lib doesn't understand microdescriptor consensuses, because they don't contain anything new for statistical analysis, but I think stem will want to parse them.
Definitely. Microdescriptors are available via the control protocol so we need to be able to parse them.
It probably makes sense to have an abstract NetworkStatusEntry class that does most of the parsing work but that can be specialized in its subclasses. Picking names like ConsensusEntry if the consensus class is called Consensus makes sense.
Perfect, thanks. Megan, Erik: if I was in your shoes the first thing that I'd do to approach this is propose the following on this list... - an object hierarchy (we already have a bit of one, ex. ServerDescriptor vs RelayDescriptor/BridgeDescriptor) - a description for each of the classes, preferably something meaty that we can use for the pydocs of each class with the :var: entries - your thoughts on which parsing logic should go where (look at the previous descriptor classes for a pattern that you might want to follow)
If there's a similar concept to Java's inner classes in Python, maybe using something like Consensus.Entry might be a good choice, too, because this class will only be used as part of a Consensus.
Yup, there is.
class Foo:
... class Bar: ... def __init__(self): ... self.my_value = 5 ... def __init__(self): ... self.my_bar = Foo.Bar() ...
f = Foo() f.my_bar.my_value
5
A related question: can you give us a couple of use-cases for the export functionality? E.g., is filtering (we only want fields X, Y, and Z when Q = ...) likely to be of use? Anything beyond just a straight dump of descriptor/network status/etc entries?
I'll mostly leave this question for Fabio since the csv dumping functionality was his idea, though my thoughts on some use cases are...
- user writes a script that has stem parse the descriptors, filter the results (say, down to Syrian exit relays), then dumps to a csv so they can make pretty graphs or do other analysis of the data
- user has a python script that hourly parses their cached descriptors to get any new exits that only allow plaintext traffic, then dump just the fingerprint and ip to a csv so they can later be scanned for malicious activity
Please use the built-in function vars() instead of __dict__ to retrive instance attributes.
Ah ha, thanks.