Hi Karsten,
I am so sorry for replying late. I had a seminar presentation on Friday and have another
on Monday, so I was a little busy studying for it.
I had downloaded about 1GB data of Server Descriptors from the metrics website. I thought
of generating some performance metrics of a search application with a MySQL backend and
with a MongoDB database backend in Django. So, I implemented two basic apps with a MySQL
and MongoDB backend in Django. I processed each file and extracted router_nickname, router_ip,
tor_version and platform_os as searchable fields for each server descriptor file. At the time of writing this email, I had processed around 330,000 files for MySQL and have the data of 670,000 files in MongoDB. I can not process all the files as that 1GB data is composed of millions of files and processing is slow on my system.
My aim is to issue same queries to both the apps and see which one performs better. Both the databases are
indexed on the same fields. I will tell you the metrics day after tomorrow i.e on Tuesday.
But, theoretically speaking, MongoDB is fast because every document is stored in JSON, it is schema less and doesn't has to preform any joins etc. The indexes that are built are based on BTrees which have the worst case time complexity of O(log(n)) for insertion, lookup and deletion. MongoDB also keeps the indexes in RAM as required, for faster searches and to reduce disk reads. MongoDB also has the capability of scaling efficiently.
I am now, somewhat, in favor of Django Haystack with Solr as the search engine. Using MongoDB will
require us to spend considerable time developing the search interface which will be responsible for handling complicated queries and then create appropriate indices to handle those complicated queries.