Hi Damian,
I'm interested in building a lightweight, internal domain-specific language to explore archived Tor data. The goal is to make it easy to answer questions like the one that recently came up on tor-relays, "how many guards shift location significantly across the Internet, and how often?" Combining Stem and zoossh seems like a good solution.
Ideally, zoossh should do the heavy lifting as it's implemented in a compiled language. For data exploration, however, having a Stem-enabled Python shell with a set of analysis methods sounds better. Now the question is how to pass potentially large amounts of readily-parsed consensuses and descriptors from zoossh to Stem? In a perfect world, we would have bindings to use zoossh in Python. The gopy [0] folks are working on that, but it's a young project; interfaces are not yet supported. Two workarounds come to mind until gopy catches up, both requiring some glue code:
1. Let zoossh do the data filtering and then return a list of files that are then parsed again by Stem. That's easy to implement, but can be quite inefficient if the filtering step still returns plenty of data.
2. Have some IPC mechanism that passes objects from zoossh to Stem. Objects could be serialised in some way to minimise unnecessary parsing. While that might be the most efficient option for now, it probably requires too much work.
3. ...something else I didn't consider?
Please let me know if you have any thoughts.
[0] https://github.com/go-python/gopy
Cheers, Philipp