Hello Karsten and everyone else :)

(TL;DR: would like to work on the searchable Tor descriptor archive project idea; considering drafting up a GSoC application)

I'm a student & backend+frontend programmer from Lithuania who'd be very much interested in contributing to the Tor project via Google Summer of Code (well, ideally at least; the plan would be to volunteer some time to Tor in any case, but it's yet to happen, and GSoC is simply too awesome an opportunity not to try) -

The 'searchable Tor descriptor/metrics archive' project idea [1] would, I think, best fit in with my previous experience and general interests in terms of contributing to the Tor project. The searchable archive project idea in itself has a rather clear list of goals / generic constraints, and since I haven't contributed any code to the Tor project before, working with an existing general project idea (building a more concrete design proposal on top of it) probably makes most sense.

This particular project, I think, would match my previous Python backend programming experience: building backends to work with large datasets / databases -- crafting efficient ORMs and responsive APIs to interact with them. [2]

Applying the knowledge/skills learned to something which is ideologically close at heart and the purpose of which is very obvious to me sounds thrilling! (This year, as far as Python frameworks are concerned, I've been mostly exposed and have been working with Flask - have some (limited) experience with Django before that. As far as a proof-of-concept for the searchable archive is concerned, I'm considering trying some things out with Flask, since it allows me to do some quick prototyping.)

I'd like to try and work out an implementation/design draft for what I could / would like to do (this is a preliminary email - I know I'm a bit late!) Ideally it (and a simple proof of concept search form -> browseable/clickable results / relay descriptor navigation page) would serve as the base for my GSoC application, but I have to be realistic about me being rather late to apply and not having participated in neither Tor nor GSoC before. I'd like to work out an application draft if possible, though. (Were I to get accepted, I would be able not to do any part-time work this summer, or would only need to take passive care of a couple of already running backends.)

I've read into the Tor Metrics portal pages (esp. Data Formats), and am trying to get acquainted with the existing archiving solution (reading into the 'metrics-web' java source (under metrics-web/src/org/torproject/ernie/web) to see how the descriptor etc. archives are currently parsed / imported into Postgres and so on), to first and foremost be able to evaluate the scope of what I'd like to write.

I will presently work on a more specific list of constraints for the searchable archive project idea. I can then try producing a GSoC application draft.

Just to get an idea of what kind of system I'd be building / working on - at the very least, we'd be looking into:

Hopefully Karsten can help me with the application, assuming my idea for the project is to make sense. :)

I will follow up with more details. Besides my email address <kostas@jakeliunas.com>, I can be reached on #tor-dev as 'wfn', or XMPP/Jabber via <phistopheles@jabber.org>.

Cheers to you all
Kostas.


[1] https://www.torproject.org/getinvolved/volunteer.html.en#metricsSearch

[2] My largest Python backend related project was this winter, building a redis (product likes/dislikes) + mysql (everything else) product recommendation solution to work with a large dataset of (product) metainfo (such as user votes on products), and creating APIs (on top of Flask) (API for the frontend CMS and for a mobile app) which include custom-per-user product recommendation feeds, etc. Large data (nothing close to the Tor descriptor/metrics archives, though!) + Interactive application logic architectures are of interest to me.