Greetings!
I'm a student who will be working on the Searchable Tor descriptor archive as part of Google Summer of Code. Yay!
I've been following Tor development for a while and hope that this opportunity will be my way of sneaking into the development kitchen of Tor. In any case, I hope to stay around for a longer time to come.
The original GSoC project proposal is based on one of the Tor project ideas available [1] and is part of the Tor Metrics project [2]. The GSoC proposal itself is also available to read [3] (TXT; if there's any interest, I can work on reformatting.) My primary mentor is Karsten and my secondary mentor is Damian.
I will quote the abstract from the proposal to sum up the high-level goals of this project:
I'd like to create a more integrated and powerful descriptor archival search and browse system. (The current tools are very restrictive and the experience disjointed.) To do this, I'll write an archival browsing application wherein the results are interactive: they may act as further search filters. Together with a search string input tool which will have more filtering options, the application will provide a more cohesive archival browse & search experience and will be a more efficient tool.
So as of now, we have an array of tools for inspecting, searching for and getting aggregate data about running relays. (For an overview, see the Tools page in the Metrics portal. [4]) These tools include relay search, consensus info, exit-by-IP search, and quite a few more; furthermore, two Onionoo [5] based applications/tools: Atlas and Compass.
This project would proposes to:
- implement a more powerful backend that would allow one to search for all available relays since mid-2007 (I should have clarified in the previous discussions, and Karsten already includes this bit; i.e., since v2 statuses became available [6]; I guess this can also be discussed). "More powerful" here means, first and foremost, "all (>= v2) archival data" (relay descriptors and consensuses at the very least), and furthermore (at least per the original proposal), involving more complex queries: we'd be looking into, I think, minimally, combined AND/OR filters referring to a wider range of data fields available in the archival data and the ability to specify multiple date ranges. Referring to consensus-related data while searching for relays and vice versa would also be possible. (The capabilities would therefore also include those of exoneraTor.)
- implement backend results which would, as of current standing, aim for Onionoo compatibility (again see protocol design in [5]), or perhaps supersede it while providing backwards compatibility (e.g. returning paginated lists of consensus-status-entries where a specified relay was present.)
- (as per original proposal,) implement a more powerful archival descriptor search & browse tool (frontend) which would provide a more uniform "looking up relays" / "searching by using many criteria" / "further refining search in the results page" experience - "refining search results", i.e. adjusting filters would be semantically the same as entering search criteria in the beginning; hence a more interactive experience, a more powerful search/browse tool.
The goals and design of the project have to be clarified, however. There is ongoing discussion (see another tor-dev thread [7] e.g.) whether perhaps the focus could be to create a backend which would speak the full Onionoo protocol and therefore be a potential replacement not only for relay search and exoneraTor, but also for other components: all presently-speaking Onionoo applications could be made to talk to the new backend, for example. The overall count of components will hopefully be reduced in any case, but ideally, we would end up with a much more integrated Tor Metrics (and maybe beyond) ecosystem.
Many open questions, however - see again [7]. Obviously discussions are very welcome indeed!
I'm wfn on OFTC (#tor-dev, #nottor), also reachable via XMPP <phistopheles@jabber.org>, and am very much up for any kind of chat. :) I'll be busy with exams in the first three weeks of June, though - but will find time for sure!
Regards
Kostas.
[1] https://www.torproject.org/getinvolved/volunteer#metricsSearch
[2] https://metrics.torproject.org/
[3] http://kostas.mkj.lt/gsoc2013.txt
[4] https://metrics.torproject.org/tools.html
[5] https://onionoo.torproject.org/
[6] https://metrics.torproject.org/data.html#relaydesc
[7] https://lists.torproject.org/pipermail/tor-dev/2013-May/004940.html