Re: [tor-dev] Python ExoneraTor

11 Jun 2014


      On Tue, Jun 10, 2014 at 10:38 AM, Karsten Loesing
karsten@torproject.org wrote:
...
On 10/06/14 05:41, Damian Johnson wrote:
...
...
...
let me make one remark about optimizing Postgres defaults: I wrote quite
a few database queries in the past, and some of them perform horribly
(relay search) whereas others perform really well (ExoneraTor).  I
believe that the majority of performance gains can be achieved by
designing good tables, indexes, and queries.  Only as a last resort we
should consider optimizing the Postgres defaults.
You realize that a searchable descriptor archives focuses much more on
database optimization than the ExoneraTor rewrite from Java to Python
(which would leave the database untouched)?
Are other datastore models such as splunk or MongoDB useful?
[splunk has a free yet proprietary limited binary... those having
historical woes and takebacks, mentioned just for example here.]
Earlier I mentioned the idea of Dynamo. Unless I'm mistaken this lends
itself pretty naturally to addresses as a hash key, and descriptor
dates as the range key. Lookups would then be O(log(n)) where n is the
total number of descriptors an address has published (... that is to
say very, very quick).
This would be a fun project to give Boto a try. *sigh*... there really
should be more hours in the day...
Quoting my reply to Damian to a similar question earlier in the thread:
...
I'm wary about moving to another database, especially NoSQL ones and/or cloud-based ones.  They don't magically make things faster, and Postgres is something I understand quite well by now. [...] Not saying that DymanoDB can't be the better choice, but switching the database is not a priority for me.
If somebody wants to give, say, MongoDB a try, I'd be interested in
seeing the performance comparison to the current Postgres schema.  When
you do, please consider all three search_* functions that the current
schema offers, including searches for other IPv4 addresses in the same
/24 and other IPv6 addresses in the same /48.
Personally, the only NoSQL thing I've come across (and have had some
really good experiences with in the past) was Redis, which is a kind
of key-value store-in-memory, with some nice simple data structures
(like sets, and operations on sets. So if you can reduce your problem
to (e.g.) sets and set operations, Redis might be a good fit.)
(I think that isis is actually experimenting with Redis right now, to
do prop 226-bridgedb-database-improvements.txt)
If the things that you store in Redis can't be made to fit into
memory, you'll probably have a bad time.
So to generalize, if some relational data which needs to be searchable
can be made to fit into memory ("we can guarantee it wouldn't exceed x
GB [for t time]"), offloading that part onto some key-value (or some
more elaborate) system *might* make sense.
Also, I mixed up the link in footnote [2]. It should have linked to
this diagnostic postgres query:
https://github.com/wfn/torsearch/blob/master/misc/list_indexes_in_memory.sql
--
regards
Kostas

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [tor-dev] Python ExoneraTor