Damian Johnson atagar@torproject.org writes:
- Q: Why do you slow stem instead of parsing consensuses with Python on your own?
This is another part where I might have taken the wrong design decision, but I decided to not get into the consensus parsing business and just rely on stem.
This is also because I was hoping to use stem to verify consensus signatures. However, now that we might use Daniel's patch to populate our consensus database, maybe we don't need to treat consensuses as untrusted anymore.
If you think that I should try to parse the consensuses on my own, please tell me and I will give it a try. Maybe it will be fast. Definitely not as fast as summary files, but maybe we can parse 3 months worth of consesuses in 15 to 40 seconds.
I'm not sure why you think it was the wrong choice. If Stem isn't providing you the performance you want then seems like speeding it up is the right option rather than writing your own parser. That is, of course, unless you're looking for something highly specialized in which case have fun.
Nick improved parsing performance by around 30% in response to this...
https://trac.torproject.org/projects/tor/ticket/12859
Between that and turning off validation I'd be a little curious where the time is going if it's still too slow for you.
Indeed, our use case is quite specialized. The only thing the guardiness script cares about is whether relays have the guard flag. No other consensus parsing actually needs to happen.
However, you have a point that stem performance could be improved and I will look a bit more into stem parsing and see what I can do.
That said, currently stem parses (with validation enabled) 24 consensuses in 25 seconds. That's one consensus per second. If we are aiming for 7000 consenuses in less than a minute, we need to parse 120~ consensuses a second. That will probably require quite some optimization in stem, I think.