Re: [tor-dev] Stem Descriptor Parsers

List overview All Threads
Download

newer

older

Mike's May 2012 Status Report

[GSoC] Stem 'stuff' - Progress...

Norman Danner

6 Jul 2012 6 Jul '12

2:25 a.m.

OK, bringing back to tor-dev...

On 7/5/12 1:44 PM, Damian Johnson wrote:

...

Hi Norman.

...
(Taking this off tor-dev@ for the moment until we get things straightened out...)

Actually, this would have been an interesting discussion for the list. Feel free to add it back in.

...
The TorExport functionality seems pretty straightforward, but the stem-interesting parts (new descriptor parsers) already seem to be on Ravi's punch-list.

My understanding is that the csv work is to add export (and maybe import) functionality to the Descriptor class. This would provide those capabilities to the current ServerDescriptor and ExtraInfoDescriptor classes in addition to anything we add in the future like consensus based entities. If it really isn't possible to do in the Descriptor then it would be an abstract method that's implemented by Descriptors as we go along.

OK, there's the first confusion on my part; I thought the export functionality was to be something like utility scripts rather than built into stem itself.

So is export intended to be an instance method of descriptor, one that just dumps a single csv line of the instance attributes (maybe subject to some selection of those attributes)? Or a static method that takes a collection?

It seems like it might be awkward to have to hack stem itself to add a new export format (for example). Is this a concern?

Actually, this makes me wonder a bit about what exactly stem is. It seems like it is:

* stem the Tor Control Protocol interface,

and

* stem the (relay, network status,...) descriptor utility library.

It seems that the former is dependent upon the latter (for stem to provide a list of cached descriptor objects, it needs a descriptor utility library that defines those objects), but not the reverse (the utilities don't much care where the descriptors come from). This isn't completely correct, since the descriptor utilities might provide APIs for parsing some source of descriptor(s) (Tor Control message, cached-consensus, metrics), but making the descriptor utility library one module among many in stem makes it seem like they are more intertwined than they appear. Of course, you've been thinking about this a lot longer than have I. Do all the known use-cases make need both an interface to Tor Control and a descriptor utility library? I guess I'm not quite sure what the design philosophy for stem is.

...

...
Onionoo is always a possibility, though we'd probably need a bit more guidance on which part to work on (front end, back end, both?). But regardless, it still seems like it depends on various parsers that Ravi is working on.

For an Onionoo project I would be the main mentor for stem based parts, and Karsten would mentor the work on Onionoo itself since he wrote the java version (though I'll still do the code reviews). First step would be to start a thread with both of us (and tor-dev@) to figure out milestones.

FYI: at the Tor dev meeting Sathyanarayanan (gsathya.ceg@gmail.com) also expressed interest in taking on this project, though with his summer internship I'm not sure how much time he'll have to help.

Do I understand Onionoo correctly to be basically a small webservice that returns a JSON formatted description of data read from a file based on the HTTP request parameters, along with a program that presumably runs with some frequency to create that file? It seems that at least porting the webservice side to a Django webapp might be a reasonable project for the rest of our summer.

However, the little bit of software development that I learned does make me want to ask: Why a Python port or this component?

- Norman

-- Norman Danner - ndanner@wesleyan.edu - http://ndanner.web.wesleyan.edu Department of Mathematics and Computer Science - Wesleyan University

Show replies by date

Karsten Loesing

6 Jul 6 Jul

8:10 a.m.

New subject: Stem Descriptor Parsers

On Fri, Jul 6, 2012 at 4:25 AM, Norman Danner ndanner@wesleyan.edu wrote:

...

Do I understand Onionoo correctly to be basically a small webservice that returns a JSON formatted description of data read from a file based on the HTTP request parameters, along with a program that presumably runs with some frequency to create that file?

Yes, that pretty much describes it.

...

It seems that at least porting the webservice side to a Django webapp might be a reasonable project for the rest of our summer.

Sounds great!

Would it be possible for the Django webapp to offer the same protocol (as in, GET requests) as the current Java servlet?

...

However, the little bit of software development that I learned does make me want to ask: Why a Python port or this component?

The main reason is that there are far more potential developers who can maintain and extend the Python version of Onionoo than the Java version of it.

Best, Karsten

Norman Danner

1:41 p.m.

New subject: Stem Descriptor Parsers

On 7/6/12 4:10 AM, Karsten Loesing wrote:

...

On Fri, Jul 6, 2012 at 4:25 AM, Norman Danner ndanner@wesleyan.edu wrote:

...
Do I understand Onionoo correctly to be basically a small webservice that returns a JSON formatted description of data read from a file based on the HTTP request parameters, along with a program that presumably runs with some frequency to create that file?

Yes, that pretty much describes it.

...
It seems that at least porting the webservice side to a Django webapp might be a reasonable project for the rest of our summer.

Sounds great!

OK; Megan and Erik, after you incorporate the export function into Descriptor in stem, please start reading through the Django tutorial.

We'll start working out milestones on Monday (I'm away until Monday morning, but I'll probably have occasional e-mail access); Sathyanarayanan, you should probably chime in too.

...

Would it be possible for the Django webapp to offer the same protocol (as in, GET requests) as the current Java servlet?

From what I remember of Django, I don't think this will be a problem.

- Norman

-- Norman Danner - ndanner@wesleyan.edu - http://ndanner.web.wesleyan.edu Department of Mathematics and Computer Science - Wesleyan University

Sathyanarayanan Gunasekaran

2:34 p.m.

New subject: Stem Descriptor Parsers

...

OK; Megan and Erik, after you incorporate the export function into Descriptor in stem, please start reading through the Django tutorial.

I'm not sure if Django is a good choice for this project. We don't require such a heavy web framework with a templating engine, auth, etc. I'd rather use Tornado or Cyclone+Twisted. Since tor2web, APAF use Twisted, it would make sense to have onionoo use the same thing.

I've been hacking on this since morning and I have a very simple prototype which uses Cyclone that parses only the summary documents and provides an API for that("/summary") - I will clean it up and put on github in a bit

...

We'll start working out milestones on Monday (I'm away until Monday morning, but I'll probably have occasional e-mail access); Sathyanarayanan, you should probably chime in too.

Sounds good.

Norman Danner

2:53 p.m.

New subject: Stem Descriptor Parsers

Yes, I was wondering whether there would be something simpler than Django after I wrote that message.

Megan and Erik: take a look through the websites for Django, Tornado, and Cyclone/Twisted to get a sense as to what each does.

- Norman

On 7/6/12 10:34 AM, Sathyanarayanan Gunasekaran wrote:

...

...
OK; Megan and Erik, after you incorporate the export function into Descriptor in stem, please start reading through the Django tutorial.

I'm not sure if Django is a good choice for this project. We don't require such a heavy web framework with a templating engine, auth, etc. I'd rather use Tornado or Cyclone+Twisted. Since tor2web, APAF use Twisted, it would make sense to have onionoo use the same thing.

I've been hacking on this since morning and I have a very simple prototype which uses Cyclone that parses only the summary documents and provides an API for that("/summary") - I will clean it up and put on github in a bit

...
We'll start working out milestones on Monday (I'm away until Monday morning, but I'll probably have occasional e-mail access); Sathyanarayanan, you should probably chime in too.

Sounds good. _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

-- Norman Danner - ndanner@wesleyan.edu - http://ndanner.web.wesleyan.edu Department of Mathematics and Computer Science - Wesleyan University

Damian Johnson

5:49 p.m.

New subject: Stem Descriptor Parsers

...

So is export intended to be an instance method of descriptor, one that just dumps a single csv line of the instance attributes (maybe subject to some selection of those attributes)? Or a static method that takes a collection?

Either would work fine. I was envisioning the former, though on reflection stem/descriptor/export.py module would probably be better since that localizes this functionality and allows for better expansion in the future (other formats such as json, or the inclusion of import functionality).

...

It seems like it might be awkward to have to hack stem itself to add a new export format (for example). Is this a concern?

That depends on how useful users would find it to be. If researchers commonly want csv export functionality then we might as well support it. However, if it's a rarely desired feature then there's little reason to clutter our API. My understanding is that this feature is mostly for researchers and sysadmins, so as part of the target audience I'm happy to defer to you on how we handle this.

...

Do all the known use-cases make need both an interface to Tor Control and a descriptor utility library?

No, you're completely right. Stem's controller functionality utilizes its descriptor functionality but not vice versa. Another design that we could go with is to make several smaller libraries (descriptors, controller, response parsing, shared utilities, etc) if stem grows unwieldy. However, we're nowhere near that yet and keeping stem as a single library makes development, testing, installation and usage far easier.

Stem is a library to make working with Tor easier for developers and researchers, with the current scope of the Tor control and dir specs. My plan is to complete that, release it to the community, then see based on feedback where we should go from there.

Erik I Islo

9 Jul 9 Jul

3:40 p.m.

New subject: Stem Descriptor Parsers

Hello,

Megan and I have been working on the CSV export functionality that was being discussed a little over a week ago, and given the recent discussion, we would like to clarify the expected/desired implementation of this feature.

We have created an export.py module within /stem/descriptor, which contains a single method as of now that takes a descriptor object and two possible lists of fields. These lists are to be specified as either the explicitly included attributes of the descriptor or the attributes to be excluded. As we continue to work on this code, Megan and I were wondering if it wouldn't be better to accept a file object as well, in addition to accepting any number of descriptor objects (i.e. def csv_exp(..., *descriptors)). Or are there other suggestions request concerning what sort of input such a method should take?

-Erik & Megan

On Fri, Jul 6, 2012 at 1:49 PM, Damian Johnson atagar@torproject.orgwrote:

...

...
So is export intended to be an instance method of descriptor, one that

just dumps a single csv line of the instance attributes (maybe subject to some selection of those attributes)? Or a static method that takes a collection?

Either would work fine. I was envisioning the former, though on reflection stem/descriptor/export.py module would probably be better since that localizes this functionality and allows for better expansion in the future (other formats such as json, or the inclusion of import functionality).

...
It seems like it might be awkward to have to hack stem itself to add a

new export format (for example). Is this a concern?

That depends on how useful users would find it to be. If researchers commonly want csv export functionality then we might as well support it. However, if it's a rarely desired feature then there's little reason to clutter our API. My understanding is that this feature is mostly for researchers and sysadmins, so as part of the target audience I'm happy to defer to you on how we handle this.

...
Do all the known use-cases make need both an interface to Tor Control

and a descriptor utility library?

No, you're completely right. Stem's controller functionality utilizes its descriptor functionality but not vice versa. Another design that we could go with is to make several smaller libraries (descriptors, controller, response parsing, shared utilities, etc) if stem grows unwieldy. However, we're nowhere near that yet and keeping stem as a single library makes development, testing, installation and usage far easier.

Stem is a library to make working with Tor easier for developers and researchers, with the current scope of the Tor control and dir specs. My plan is to complete that, release it to the community, then see based on feedback where we should go from there.

Damian Johnson

5:22 p.m.

New subject: Stem Descriptor Parsers

On Mon, Jul 9, 2012 at 8:40 AM, Erik I Islo eislo@wesleyan.edu wrote:

...

Hello,

Megan and I have been working on the CSV export functionality that was being discussed a little over a week ago, and given the recent discussion, we would like to clarify the expected/desired implementation of this feature.

We have created an export.py module within /stem/descriptor, which contains a single method as of now that takes a descriptor object and two possible lists of fields. These lists are to be specified as either the explicitly included attributes of the descriptor or the attributes to be excluded. As we continue to work on this code, Megan and I were wondering if it wouldn't be better to accept a file object as well, in addition to accepting any number of descriptor objects (i.e. def csv_exp(..., *descriptors)). Or are there other suggestions request concerning what sort of input such a method should take?

-Erik & Megan

On Fri, Jul 6, 2012 at 1:49 PM, Damian Johnson atagar@torproject.org wrote:

...
...
So is export intended to be an instance method of descriptor, one that just dumps a single csv line of the instance attributes (maybe subject to some selection of those attributes)? Or a static method that takes a collection?

Either would work fine. I was envisioning the former, though on reflection stem/descriptor/export.py module would probably be better since that localizes this functionality and allows for better expansion in the future (other formats such as json, or the inclusion of import functionality).

...
It seems like it might be awkward to have to hack stem itself to add a new export format (for example). Is this a concern?

That depends on how useful users would find it to be. If researchers commonly want csv export functionality then we might as well support it. However, if it's a rarely desired feature then there's little reason to clutter our API. My understanding is that this feature is mostly for researchers and sysadmins, so as part of the target audience I'm happy to defer to you on how we handle this.

...
Do all the known use-cases make need both an interface to Tor Control and a descriptor utility library?

No, you're completely right. Stem's controller functionality utilizes its descriptor functionality but not vice versa. Another design that we could go with is to make several smaller libraries (descriptors, controller, response parsing, shared utilities, etc) if stem grows unwieldy. However, we're nowhere near that yet and keeping stem as a single library makes development, testing, installation and usage far easier.

Stem is a library to make working with Tor easier for developers and researchers, with the current scope of the Tor control and dir specs. My plan is to complete that, release it to the community, then see based on feedback where we should go from there.

Naif: This was your feature request. Thoughts?

...

Megan and I were wondering if it wouldn't be better to accept a file object as well, in addition to accepting any number of descriptor objects (i.e. def csv_exp(..., *descriptors)).

If we can make it work then that would be nice, though having a *list entry generally doesn't work well for optional keyword fields. Ie, if you had the signature...

def csv_exp(include_fields = None, exclude_fields = None, destination = None, *descriptors)

Then the caller needs to provide all of those keyword fields which kinda defeats the purpose of them being optional. For instance, to call it with the defaults and a single descriptor it would be...

csv_exp(None, None, None, my_descriptor)

My suggestion is to just accept a single argument that can either be a single descriptor or a list of descriptors.

Cheers! -Damian

Megan Chang

10 Jul 10 Jul

3:23 p.m.

New subject: Stem Descriptor Parsers

Hi Damian,

After looking at possible use cases, wouldn't it make sense to allow the caller to specify a file to be written to? Regardless, we were thinking of creating two methods, one that takes a list of descriptors, and one that takes a single descriptor. This would remove the need to check for a list versus an object, allowing more consistent typing.

Just to clarify, the include_fields and exclude_fields parameters would have default values of none and since we are taking in descriptors are a list rather than a *arg, we don't need to worry about specifying the keyword parameters. That said, if a caller doesn't specify either, all parameters would be returned. Otherwise, it is expected that only one of these parameters would be specified by the caller.

Also, going back to features expected by the community, would users want a csv header to be written? Or simply a csv file?

- Erik & Megan

On Mon, Jul 9, 2012 at 1:22 PM, Damian Johnson atagar@torproject.orgwrote:

...

On Mon, Jul 9, 2012 at 8:40 AM, Erik I Islo eislo@wesleyan.edu wrote:

...
Hello,

Megan and I have been working on the CSV export functionality that was

being

...
discussed a little over a week ago, and given the recent discussion, we would like to clarify the expected/desired implementation of this

feature.

...
We have created an export.py module within /stem/descriptor, which

contains

...
a single method as of now that takes a descriptor object and two possible lists of fields. These lists are to be specified as either the

explicitly

...
included attributes of the descriptor or the attributes to be excluded.

As

...
we continue to work on this code, Megan and I were wondering if it

wouldn't

...
be better to accept a file object as well, in addition to accepting any number of descriptor objects (i.e. def csv_exp(..., *descriptors)). Or

are

...
there other suggestions request concerning what sort of input such a

method

...
should take?

-Erik & Megan

On Fri, Jul 6, 2012 at 1:49 PM, Damian Johnson atagar@torproject.org wrote:

...
...
So is export intended to be an instance method of descriptor, one that just dumps a single csv line of the instance attributes (maybe

subject to

...
...
...
some selection of those attributes)? Or a static method that takes a collection?

Either would work fine. I was envisioning the former, though on reflection stem/descriptor/export.py module would probably be better since that localizes this functionality and allows for better expansion in the future (other formats such as json, or the inclusion of import functionality).

...
It seems like it might be awkward to have to hack stem itself to add a new export format (for example). Is this a concern?

That depends on how useful users would find it to be. If researchers commonly want csv export functionality then we might as well support it. However, if it's a rarely desired feature then there's little reason to clutter our API. My understanding is that this feature is mostly for researchers and sysadmins, so as part of the target audience I'm happy to defer to you on how we handle this.

...
Do all the known use-cases make need both an interface to Tor Control and a descriptor utility library?

No, you're completely right. Stem's controller functionality utilizes its descriptor functionality but not vice versa. Another design that we could go with is to make several smaller libraries (descriptors, controller, response parsing, shared utilities, etc) if stem grows unwieldy. However, we're nowhere near that yet and keeping stem as a single library makes development, testing, installation and usage far easier.

Stem is a library to make working with Tor easier for developers and researchers, with the current scope of the Tor control and dir specs. My plan is to complete that, release it to the community, then see based on feedback where we should go from there.

Naif: This was your feature request. Thoughts?

...
Megan and I were wondering if it wouldn't be better to accept a file object as well, in addition to accepting any number of descriptor objects (i.e. def csv_exp(..., *descriptors)).

If we can make it work then that would be nice, though having a *list entry generally doesn't work well for optional keyword fields. Ie, if you had the signature...

def csv_exp(include_fields = None, exclude_fields = None, destination = None, *descriptors)

Then the caller needs to provide all of those keyword fields which kinda defeats the purpose of them being optional. For instance, to call it with the defaults and a single descriptor it would be...

csv_exp(None, None, None, my_descriptor)

My suggestion is to just accept a single argument that can either be a single descriptor or a list of descriptors.

Cheers! -Damian

Damian Johnson

3:59 p.m.

New subject: Stem Descriptor Parsers

...

After looking at possible use cases, wouldn't it make sense to allow the caller to specify a file to be written to?

Make sense, though via a convenience method. Libraries should provide basic building blocks (such as 'give me the csv string for these descriptors') in addition to less flexible but user friendly functions ('write the csv to path X or file Y').

...

Regardless, we were thinking of creating two methods, one that takes a list of descriptors, and one that takes a single descriptor.

The problem was using a *descriptors argument, not accepting a list verses a single descriptor. Accepting either a single value or a list can make things quite a bit nicer, for instance the 'target' argument of the DescriptorReader... https://gitweb.torproject.org/stem.git/blob/HEAD:/stem/descriptor/reader.py#...

That said, I haven't seen the code yet so do what you think is best.

...

Just to clarify, the include_fields and exclude_fields parameters would have default values of none

I was just using that as an example since I didn't know what you were defaulting it to. My assumption was that they'd both default to None to indicate "user didn't provide anything" and the behavior was... include_fields - default is to include everything (ie, all the fields that a descriptor has) exclude_fields - default is to exclude nothing

...

... and since we are taking in descriptors are a list rather than a *arg, we don't need to worry about specifying the keyword parameters.

Sorry, I'm coming up with two interpretations of the sentence "we don't need to worry about specifying the keyword parameters". If you mean...

... "we don't need those parameters to have a keyword" then no, we definitely want them to have keywords so users can pick and choose what they want to set.

... "users don't need to supply those parameters" then yup, without a *descriptors argument they'll be completely optional.

...

That said, if a caller doesn't specify either, all parameters would be returned.

Yup, sounds good. Defaulting to "give me a csv with all of the descriptor's attributes" makes sense.

...

Otherwise, it is expected that only one of these parameters would be specified by the caller.

It would be weird if the user set both, but we can easily handle it. Just remove anything in the exclude_fields from the include_fields.

...

Also, going back to features expected by the community, would users want a csv header to be written? Or simply a csv file?

Yup. Users will need a header so they can figure out what the fields are (otherwise adding new descriptor fields will break all of the old csvs that were only based on position). However, we might as well accept a 'header' boolean argument to let them turn it off if they want.

Cheers! -Damian

4503

Age (days ago)

4507

Last active (days ago)

tor-dev@lists.torproject.org

9 comments

6 participants

tags (0)

participants (6)

Damian Johnson
Erik I Islo
Karsten Loesing
Megan Chang
Norman Danner
Sathyanarayanan Gunasekaran