-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Erring on the side of "release early, release often" I have put my Twisted-based (asynchronous, Python) Tor control protocol implementation online:
http://readthedocs.org/docs/txtorcon/en/latest/ https://github.com/meejah/txtorcon
It is MIT licensed (to match Twisted). I would certainly not consider it "done", and I made it to learn more about Twisted and Python -- criticisms, comments appreciated.
Currently it has the following features (see the above-linked documentation for more, and examples):
. TorControlProtocol implements the control protocol . TorState tracks the state of Tor (streams, circuits, routers, address-map), listening for updates . TorConfig provides read/write configuration access , with HS abstraction (still needs some work) . IStringAttacher, a stream-to-circuit attacher interface for new streams . launch_tor can launch slave Tor processes . integrates into Twisted's endpoints with TCPHiddenServiceEndpoint
The main code is about 1600 LOC, ~4000 with tests and 25% comments (according to ohcount). There is currently 98% test coverage, if one believes code-coverage is a good metric.
In the short-term, be aware that I'm planning to re-organize where things are in files. If you "import txtorcon" and use the classes like "txtorcon.TorConfig" it will all still work.
Thanks for your attention, mike
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
I have tagged txtorcon 0.2, which adds:
. incremental parsing; . faster TorState startup; . SAFECOOKIE support; . several bug fixes; . options to circuit_failure_rates.py example to make it actually-useful; . include built documentation + sources in tarball; . include tests in tarball; . improved logging; . a few patches from mmaker and kneufeld
(If this type of mail isn't appropriate for tor-dev please let me know...)
- -- mike
(If this type of mail isn't appropriate for tor-dev please let me know...)
It's perfectly appropriate - glad to hear about the improvements!
On a side note, do you think that any txtorcon/stem work would be appropriate? They're both aiming to be a library that does largely the same things. The twisted/threading differences mean that our controller classes are incompatible, but other bits of the parsing and such should be interchangeable. For instance, I've invested an immense amount of effort into parsing (and tests) for descriptor content... https://gitweb.torproject.org/stem.git/blob/HEAD:/stem/descriptor/server_des... https://gitweb.torproject.org/stem.git/blob/HEAD:/stem/descriptor/extrainfo_...
so that things like "GETINFO desc/*" will provide usefully parsed information. We could probably also share connection and authentication code.
Cheers! -Damian
On 6/1/12 8:45 PM, Damian Johnson wrote:
(If this type of mail isn't appropriate for tor-dev please let me know...)
It's perfectly appropriate - glad to hear about the improvements!
On a side note, do you think that any txtorcon/stem work would be appropriate? They're both aiming to be a library that does largely the same things. The twisted/threading differences mean that our controller classes are incompatible, but other bits of the parsing and such should be interchangeable. For instance, I've invested an immense amount of effort into parsing (and tests) for descriptor content... https://gitweb.torproject.org/stem.git/blob/HEAD:/stem/descriptor/server_des... https://gitweb.torproject.org/stem.git/blob/HEAD:/stem/descriptor/extrainfo_...
so that things like "GETINFO desc/*" will provide usefully parsed information. We could probably also share connection and authentication code.
Yes please, try to make a set of generic and small sub-libraries as possible.
It seems to me that within the Tor community, for everything that's not the "tor core c code", there's a lot of duplication, duplicated code around or abbandoned projects.
Additionally you might consider rewriting, based on STEM, the ooonioo software that is now written in heavy Java.
Based on your parsers it should take probably few hours to provide the Atlas's REST interface.
-naif
It seems to me that within the Tor community, for everything that's not the "tor core c code", there's a lot of duplication, duplicated code around or abbandoned projects.
We do have quite a few abandoned projects, though the core codebase certainly isn't the only one that's actively maintained... https://www.torproject.org/getinvolved/volunteer.html.en#Projects
Stem aims to be a replacement for TorCtl which has been collecting dust for years due to maintainability problems (long story short, its codebase lacks tests and its maintainers are fearful of any substantial changes to it).
Additionally you might consider rewriting, based on STEM, the ooonioo software that is now written in heavy Java.
Based on your parsers it should take probably few hours to provide the Atlas's REST interface.
That's actually already our plan. Stem will eventually replace metrics-lib (a java library for descriptor parsing), and be used for a python Oninooo. This has been discussed on occasion in the 'Python metrics-lib' thread... https://lists.torproject.org/pipermail/tor-dev/2012-May/003504.html
Cheers! -Damian
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Damian Johnson atagar@torproject.org writes:
(If this type of mail isn't appropriate for tor-dev please let me know...)
On a side note, do you think that any txtorcon/stem work would be appropriate? They're both aiming to be a library that does largely the same things. The twisted/threading differences mean that our controller classes are incompatible, but other bits of the parsing and such should be interchangeable. For instance, I've invested an immense amount of effort into parsing (and tests) for descriptor content...
I was thinking about this a little last week -- it would certainly be nice to abstract more of the "general parsing stuff". There are a few gotchas since the threaded versus event-based way to get information from the protocol is pretty different. For authentication, for example, SAFECOOKIE is a two-part affair and you have to wait for a response half way which is quite different in an event-based vs. threaded API.
I've tried to imagine a threaded-friendly wrapper around at least txtorcon.TorControlProtocol which might not be hard for the simple command-response things (but see below).
Certainly at least the parsing should be able to be shared somehow. Further also to naif's email, I would imagine this would be most useful as a "Python utilities for Tor" library. The only thing I can really imagine abstracting from txtorcon is the simple descriptors, like what "getinfo ns/all" returns. Most of the other parsing is pretty protocol-specific, IMO
The main issue with abstracting more than that in a controller is that at some point there will be a need in the API to wait for something from Tor -- and at that point, you have to make the API event-based or threaded. txtorcon.TorState is so far pretty de-coupled from the underlying networking library. txtorcon.TorConfig is less so. As things like TorState generate callbacks (e.g. stream added, deleted, etc) via listeners, there's also probably a slight issue that these callbacks would need to execute "fast" (i.e. can't wait for disk/net IO) and this would probably be surprising to threaded implementors.
so that things like "GETINFO desc/*" will provide usefully parsed information. We could probably also share connection and authentication code.
Like I said, the main issue will be "how do I wait for things I need from the protocol"? For example, I can imagine a Twisted / event-based "low-level" TorControlProtocl class being wrapped by a threading-friendly API of some sort (which just pauses the caller thread until Twisted gets back with the answer) with the "nicer" classes layered on top (TorState, etc) which could take either one and hence be implemented in a threaded or event-based fashion, as they like.
I don't really see that this gains a whole bunch, though: then you're depending on Twisted but not using the event-based stuff "outside". One big "pro" for a threaded version like stem that I see is only standard-lib dependencies. Besides, anyone excited about a Twisted dependency probably wants Deferrred's returned, not a threaded API... ;)
So, I see a use for a good Python utility + parsing library which stem + txtorcon (+ whatever) could use to do their heavy lifting, and the network/protocol details would be "all" that's in the controller libs.
*Ideally*, such a library could leverage the parsing code in Tor itself -- if at least the "utility" methods in Tor could be published as a shared library, a "ctypes" wrapper could easily be made with a more-Pythonic interface around that. Then, there's only one chunk of "parse descriptors" (for example) code, and it would be used by Tor and the controller, so no chance of being out of sync. Perhaps there are other reasons not to do shared libraries...and I haven't actually looked at these C methods very hard; but routerparse.c has 5200+ lines of code that'd be nice to leverage.
Another thing I think would be really nice is to be able to get grouping and documentation information about config options from Tor, or from the tor-spec file (i.e. by parsing it). This would keep documentation that users see consistent across Tor control protocol clients, and make it easier to more-automatically generate GUIs (i.e. with grouping and maybe ordering information). Anyway, just brainstorming here.
I'm mostly-away until around the 18th, but perhaps we could meet on #tor-dev after that and discuss further? Are there specific things besides descriptors that you think could be easily abstracted out of stem (and/or useful for txtorcon)?
p.s. may I encourage you to consider the way-more-standard 4 spaces for indenting...? I've never seen Python code with 2-space indenting before.
- -- mike
I was thinking about this a little last week -- it would certainly be nice to abstract more of the "general parsing stuff".
We plan to put most of the parsing stuff into the stem.response module, which txtorcon could use without touching any of the threading stuff. For example...
https://gitweb.torproject.org/stem.git/blob/HEAD:/stem/response/protocolinfo...
There is not much there at present since I've been focused on other areas. Ravi's GSoC project is to make this and our Controller class cover the complete control spec.
One minor gotcha that I should sort out, if you want to use stem, is for the ControlMessage class to have a simple string constructor since, at present, it expects to get partly parsed content from the socket...
https://gitweb.torproject.org/stem.git/blob/HEAD:/stem/response/__init__.py
There are a few gotchas since the threaded versus event-based way to get information from the protocol is pretty different.
It is, but this just means that we'd have trouble sharing the ControlSocket and BaseController subclasses. That's a fair bit (including the stem.connection module), but still plenty that could be shared.
I've tried to imagine a threaded-friendly wrapper around at least txtorcon.TorControlProtocol which might not be hard for the simple command-response things (but see below).
Great. If we could come up with a common interface for simply sending a message and receiving its reply then the stem.connection module would be no trouble.
*Ideally*, such a library could leverage the parsing code in Tor itself -- if at least the "utility" methods in Tor could be published as a shared library, a "ctypes" wrapper could easily be made with a more-Pythonic interface around that.
Eek! I'd hate to have the distribution pains that any C component would bring. If Tor does not conform with its spec and allow for independent implementations in other languages then that's a Tor bug, and testing that Tor conforms to its own spec is part of the goals for the integ tests that I'm writing.
Another thing I think would be really nice is to be able to get grouping and documentation information about config options from Tor
Do you mean if a particular config option belongs to Relays, Hidden Services, etc? If so then this might interest you...
https://gitweb.torproject.org/arm.git/blob/HEAD:/src/resources/torConfigDesc...
Arm parses the tor man page when it starts up into files like these so it can present the category of configuration options, and help text. Here's the parsing code...
https://gitweb.torproject.org/arm.git/blob/HEAD:/src/util/torConfig.py#l97
I plan to clean it up and move those capabilities to stem, but that'll be a while.
I'm mostly-away until around the 18th, but perhaps we could meet on #tor-dev after that and discuss further?
Certainly.
Are there specific things besides descriptors that you think could be easily abstracted out of stem (and/or useful for txtorcon)?
Pretty much anything outside of stem.socket, stem.control, and maybe stem.connection (depending on the common interface discussion above).
Cheers! -Damian