Hello,
This status update is less extensive/dramatic compared to the last one, but I'm still happy to report to be slowly moving ahead towards a stable searchable archive system. In short, I've been working on what I said I should work on in the last report, more or less:
I should now move on with implementing / extending the Onionoo API, in particular, working on date range queries, and refining/rewriting the "list status entries" API point (see below). Need to carefully plan some things, and always keep an updated API document. (Also need to update and publish a separate, more detailed specification document.)
The searchable metrics backend (encompassing a part of the database and the Onionoo-like API) [2] is still happily chugging along online; quite a few folk ran some queries on it, including myself. I'm in the process of expanding my benchmark.py tool to generate realistic-looking parallel/asynchronous traffic for different kinds of relays and API points; from my limited parallelized benchmarking so far, everything looks good - the bottlenecks are still localized to the individual queries. I should try and generate human-readable more-or-less rigorous benchmarking reports and publish them.
I've briefly run the backend on an EC2 instance again, to compare benchmark outputs and average query times. Natural database caching seems to be helping quite a bit for the current online database (at
ts.mkj.lt), by which I mean, indexes and some query results get cached via natural usage of the backend/database. I've been tinkering on a simple system to pre-warm indexes upon application start (without the need for any PostgreSQL extensions), for more uniformly distributed query times. Overall though, we seem to be doing OK in regards to actual query times.
Made updates to the Onionoo API, but haven't pushed them yet (hoped to do that until this report, but finally learning not to delay); expect them soon (and subsequent updates to the Onionoo API doc [3] as well, once that happens.) I'm now trying to incorporate the most recent Karsten's feedback in regards to API points, parameters and some preliminary simplistic caching. Basically, once the date range parameters are working nicely for all three types of documents currently provided by the API, and once the status entry API point is returning valid-after summaries/ranges in a more intuitive document format, the whole thing will hopefully be able to satisfy actual usefulness criteria to a significant extent. Together with caching, it will hopefully be able to be considered as an almost-proper (smallish, as of now) subset of the Onionoo API/backend.
Did some experiments with PostgreSQL's pg_trgm module for full-text search (so that search strings matching only the middle of some descriptor field could work); I realize that's not priority now, but I was curious to see if it would work. Nothing conclusive thus far, unfortunately.
Specification document explaining more detailed design and applicable use cases for the Onionoo-like API coming along.
I should continue finalizing the Onionoo-like API into a working, non-hacky state; keep the codebase clean and maybe do some cleanup refactoring; continue observing database performance; write more documentation; and if all is well, expand the list of fields contained in the three status documents. Besides all this,
- update the database to the latest consensuses and descriptors;
- turn on the cronjob for rsync and import of newest archives;
- caching;
- hopefully later integration with the bandwidth and weights documents in Onionoo proper;
- import 2008 year data, etc.
These couple of weeks coincide with an (not quite planned for) apartment-moving period, but I *really* hope I have the vast majority of the things mentioned working in a decent state by the end of the next week, to leave ourselves some wiggle space to ensure the resulting system is as stable as possible before the official date for the end of the coding period. I'll probably be available throughout the weekend/s, just in case.
Cheers to you all
Kostas.