Hi,
I am Praveen Kumar from India. I want to work on the project "Searchable Tor descriptor and Metrics data archive". I have participated in the past instances of GSoC with Melange and e-cidadania, and have an extensive experience in development with Python.
For the search application, I propose using Django with MongoDB as a NoSQL database backend for our search application. We have 100GB+ of data which eventually grows everyday and so using a NoSQL backend will ensure us that our application scales well with the increase in data as well as user traffic.
The application will have various interfaces such as:
1) Data Updator: This end will connect and retrieve data from the metrics website periodically via rsync. It will also be responsible for pre-processing the data to a suitable format as our search application needs.
2) Storage End: A relay descriptor can be searched by nickname, fingerprint, IP Addr and various other attributes that define a relay descriptor. So we can preprocess the whole data, extract the attributes that define a descriptor and then save it in an appropriate model MongoDB provides. Since queries are very fast in a NoSQL datastore, our searches will be very fast.
3) Search Front End: This will be exposed to the user where a user provides its search query to us.
4) Search query processor: This end will process the query of a user and determine its type for eg. whether the query is an IP Address or a nickname etc. It will then connect with our Storage End and return the appropriate data to the Search Front End.
Above is a very high level view of my approach to this project. We can also use Django Haystack as a search application framework(I did some research for existing search frameworks). I can implement this app in an object oriented way in Python. Python being such a beautiful and easy to understand language, it will be easy for others to understand and make changes to the application in least amount of time.
I would like to know if I am thinking in the right direction and would like to know what Karsten has to say about this.