On 04/04/14 21:24, Lukas Erlacher wrote:
Hello everyone (reply all ftw),
Hi Lukas,
On 04/04/2014 07:13 PM, Karsten Loesing wrote:
Christian, Lukas, everyone,
I learned today that we should have something working in a week or two. That's why I started hacking on this today and produced some code:
https://github.com/kloesing/challenger
Here are a few things I could use help with:
- Anybody want to help turning this script into a web app,
possibly using Flask? See the first next step in README.md.
I might be able to do that, but currently I don't have enough free time to make a commitment.
Okay. Maybe I'll give it a try by stealing heavily from Sathya's Compass code. Unless somebody else wants to give this a try?
- Lukas, you announced OnionPy on tor-dev@ the other day. Want to
look into the "Add local cache for ..." bullet points under "Next steps"? Is this something OnionPy could support? Want to write the glue code?
onion-py already supports transparent caching using memcached. I use a (hopefully) unique serialisation of the query as the key (see serializer functions here: https://github.com/duk3luk3/onion-py/blob/master/onion_py/manager.py#L7) and have a bit of spaghetti code to check for available cached data and the 304 response status from onionoo (https://github.com/duk3luk3/onion-py/blob/master/onion_py/manager.py#L97).
On second thought, and after sleeping over this, I'm less convinced that we should use an external library for the caching. We should rather start with a simple dict in memory and flush it based on some simple rules. That would allow us to tweak the caching specifically for our use case. And it would mean avoiding a dependency.
We can think about moving to onion-py at a later point. That gives you the opportunity to unspaghettize your code, and once that is done we'll have a better idea what caching needs we have for the challenger tool to decide whether to move to onion-py or not.
Would you still want to help write the simple caching code for challenger?
I don't really understand what the code does. What is meant by "combining" documents? What exactly are we trying to measure? Once I know that and have thought of a sensible way to integrate it into onion-py I'm confident I can infact write that glue code :)
Right now, the script sums up all graphs contained in Onionoo's bandwidth, clients, uptime, and weights documents. It also limits the range of the new graphs to max(first) to max(last) of given input graphs.
For example, assume we want to know the total bandwidth provided by the following 2 relays participating in the relay challenge:
datetime: 0, 1, 2, 3, 4, 5, ...
relay 1: [5, 4, 5, 6] relay 2: [4, 3, 5, 4]
combined: [8, 9, 9, 6]
This is not perfect for various reasons, but it's the best I came up with yesterday. Also, as we all know, perfect is the enemy of good.
(If you're curious, reason #1: the graph goes down at the end, and we can't say whether it's because relay 2 disappeared or did not report data yet; reason #2: we're weighting both relays' B/s equally, though relay 1 might have been online 24/7 and relay 2 only long enough that Onionoo doesn't put in null; there may be more reasons.)
Cutting off the rest of the quote tree here (is that a polite thing to do on mailing lists? Sorry if not.), I just have two more comments towards Roger's thoughts:
- Groups of relays taking the challenge together could just form
relay families and we could count relay families in aggregate. (I'm already thinking about relay families a lot because gamambel wants me to overhaul the torservers exit-funding scripts to use relay families.)
Relay families are a difficult topic. I remember spending a day or two figuring out how to group by family in Compass a while back. There must be some notes or thoughts on Trac if you're curious.
Regarding these graphs, I'm not sure what we would gain from grouping new relays by family. My current plan is to provide only graphs that have a single graph line for all relays and bridges participating in the challenge. So, "total bytes read", "total bytes written", "total number of new relays and bridges", "total consensus weight fraction added", "total advertised bandwidth added", etc. I don't think we should add categories by family or any other criteria. KISS.
- If you want to do something with consensus weight, why
not compare against all other new relays based on the first_seen property? ("new" can be adjusted until sufficiently pretty graphs emerge; and we'd need to periodically (every 4 or 12 or 24 hours?) fetch the consensus_weights from onionoo)
I'm not sure what you mean. We do have consensus weight fractions in (combined) weights documents. I'm also planning to add absolute consensus weights to those documents in the future.
By "fetching something periodically from Onionoo", do you mean keeping a local state other than the latest cached Onionoo documents? I'm explicitly trying to avoid that. Keeping a state means you need to back it up and restore it, and most importantly, fix it whenever there's a bug. I'm already feeling that pain with Onionoo, so I'd want to keep all state in Onionoo and not make the new tool any more complex than required.
PS: If you'd like me to support different backends for the caching in onion-py, I'm open to integrating anything that has a python 3 library.
See above. Happy to discuss caching more when we know what caching needs we have.
I'm not also sure about Python 3. Whatever we write needs to run on Debian Wheezy with whatever libraries are present there. If they're all Python 3, great. If not, can't do.
Thanks for your feedback!
All the best, Karsten