A couple of thoughts to add from my experience working with DB's:
-Consider a partitioning strategy for the data if it makes sense. Queries which will hit only a single partition will be much faster in general. -Check out advanced indexing strategies like Solr. this can sometimes speed up certain queries by orders of magnitude, and works well with batch systems. Often it is easier to setup & maintain than advanced DB tuning. -I second Karsten's comment on multi-column indexes. Designing the right indexes & columns to include isn't simple though - you have to know what is going to be searched. -If you know in advance what kind's of where clauses you will see, consider implementing partial indexes.
If you have specific problem queries w/ explain plans, feel free to post them and I would be glad to spend a few minutes looking them over.
Charlie
On Tue, Jul 30, 2013 at 3:35 AM, Karsten Loesing karsten@torproject.orgwrote:
On 7/29/13 10:15 PM, Kostas Jakeliunas wrote:
It should also be possible to do efficient *estimated* COUNTs (using reltuples [1, 2], provided the DB can be regularly VACUUMed + ANALYZEd (postgres-specific awesomeness)) - i.e. if everything is set up right, doing COUNTs would be efficient. This would be nice not only because one could run very quick queries asking e.g. "how many consensuses include nickname LIKE %moo% between [daterange1, daterange2]?" (if e.g. full text search is set up) but also, if we have to resort to sometimes returning
an
arbitrary subset of results (or sorted however we wish, but the sorting being done already on a small subset of results, if that makes sense),
we'd
be able to also supply info how many other results matching these particular criteria there are, and so on. The usefulness of all this
really
depends on intended use cases, and I suppose here some discussion could
be
had who / how would an Onionoo system covering all / most of all the descriptor+consensus archives and hopefully having an extended set of filter / result options be used?
I can see how estimated counts could be valuable information. Or not. Do you want to first specify what type of queries you're planning to support?
(I didn't spot anything in the other two mails that requires a reply. If there are still open questions, please let me know.)
Best, Karsten
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev