On 8/23/13 3:12 PM, Kostas Jakeliunas wrote:
Hello,
this accompanies my status report [1], and includes info how to query the searchable metrics archive for anyone curious. I also refer to the original (now semi-outdated) project proposal/document. [0] Only sending to tor-dev@for now.
The Onionoo-like backend is listening on
(backup URI where it's actually running (domain unrelated): ravinesmp.com:5555)
This document details how it can be queried:
https://github.com/wfn/torsearch/blob/master/docs/onionoo_api.md
(It is, by design, an almost-subset (it does change some things as of now, though) of the Onionoo API [2].)
Hi Kostas,
I finally managed to test your service and take a look at the specification document.
The few tests I tried ran pretty fast! I didn't hammer the service, so maybe there are still bottlenecks that I didn't find. But AFAICS, you did a great job there!
Thanks for writing down the specification.
So, would it be accurate to say that you're mostly not touching summary, status, bandwidth, and weights resources, but that you're adding a new fifth resource statuses?
In other words, does the attached diagram visualize what you're going to add to Onionoo? Some explanations:
- summary and details documents contain only the last known information about a relay or bridge, but those are on a pretty high detail level (at least for details documents). In contrast to the current Onionoo, your service returns summary and details documents for relays that didn't run in the last week, so basically since 2007. However, you're not going to provide summary or details for arbitrary points in time, right? (Which is okay, I'm just asking if I understood this correctly.)
- bandwidth and weights documents always contain information covering the whole lifetime of a relay or bridge, where recent events have higher detail level. Again, you're not going to change anything here besides providing these documents for relays and bridges that are offline for more than a week.
- statuses have the same level of detail for any time in the past. These documents are new. They're designed for the relay search service and for a simplified version of ExoneraTor (which doesn't care about exit policies and doesn't provide original descriptor contents). There are no statuses documents for bridges, right?
If this is correct (and please tell me if it's not), this seems like a plausible extension of Onionoo.
A few ideas on statuses documents: how about you change the format of statuses, so that there's no more one document per relay and valid-after time, but exactly one document per relay? That document could then contain an array of status objects saying when the relay was contained in the network status, together with information about its addresses.
It might be useful to group consecutive valid-after times when all addresses and other relevant information about a relay stayed the same. So, rather than adding "valid_after", put in "valid_after_from" and "valid_after_to". And maybe you can compress information even more by putting all relevant IP addresses in a list and refer to them by list index. Compare this to bandwidth and weights documents which are optimized for size, too.
Maybe you could even generate these statuses documents in advance once per hour and store them as JSON documents in the database, similar to what's the plan for the other document types? That might reduce database load a lot, though you'll still need most of your database foo for the search part.
Happy to chat more about these ideas on IRC.
Please report any inconsistencies / errors / time-outs / anything that takes a few seconds or more to execute. I'm logging the queries (together with IP addresses for now - for shame!), so will be able to later correlate activity with database load, which will hopefully provide some realistic semi-benchmark-like data.
I could imagine that you'll get more testers if you provide instructions for using your service as relay search or ExoneraTor replacement. Maybe you could write down the five most common searches that people could perform to search for a relay or find out whether an IP address was a Tor relay at a given time? If you want, I can link to such a page from the relay search and the ExoneraTor page.
All in all, great work! Nice!
Thanks, Karsten