Hey all,
I apologize for this unusual timing for a status report, but I ended up delaying it beyond measure, so better now than later I guess. I can reiterate it + any updates soon, it's just that I figure I'm long overdue on informing tor-dev on what's going on.
I've started my project [1] later than is usual, and more or less immediately ran into what I deemed to be a database / ORM scaling issue (the thing I'd been actually trying to avoid since writing the proposal), or at least a behaviour of the ORM which was suboptimal to what we have in mind: delivering (first and foremost) a searchable metrics archive backend/database which incorporates, as of current plan, server descriptors (relays and bridges, turns out a server descriptor model can happily service both) and server/router statuses across a few year timespan (currently using v3 consensus documents only), and provides querying functionality which can extract relations between the two. The 'querying with relations between the two' part, when tested on a broader span of data, seemed to be causing trouble to me. I ended up allocating probably inefficiently large amounts of time to this problem, rewriting the backend part, and trying to optimize the queries which underlied the ORM (turns out I didn't need to strip off the ORM abstraction - learned a few things about SQLAlchemy that way - I will follow-up with an email pointing to current code (sorry)).
* The current iteration of the ORM model / backend (which actually is very simple) solves this problem. * Stem descriptor and network status mapping to ORM works, and is nicely (enough) integrated with the data import (from downloaded metrics archive) tools, as well as an API to make queries on the ORM. * Implemented a partial Onionoo-protocol-adhering (without compression and without some fields) backend for ?summary and ?details Onionoo queries. * Still tidying everything up. And *finally* writing a design document outlining what we actually ended up with, and what is required till full Onionoo integration.
Code review will happen pretty soon, and hopefully we'll have some discussion upon where to go from here. Karsten mentioned that it might be possible to use the existing Onionoo incarnation to continue providing bandwidth weight etc. data (basically stuff from extra-info), and it might be possible to join the two systems into an Onionoo-supporting backend which will cover all / majority of archives available. Another (or) further avenue would be to continue with the initial proposed plan to extend the query format; and to build a frontend which would make use of the extended query format. Expect another email with links to (decent) code.