Hey,
I favor GSoC projects that are collections of small items rather than
a single big one (that way the incremental pieces are helpful, even
if the project isn't quite fit into the summer).
We're on the same page here, I think more tasks of a smaller scope is the way to go.
The next area I was planning to primary focus was client migrations,
which would involve work with TorBEL and Tor Weather...
I've looked through both TorBEL and Weather and while undoubtedly they could use some love, I think I'll follow my heart and stick with Stem :)
Please don't take my ambiguity in regards to any concrete ideas as lack of interest on my side - it's that the more I learn about Stem (and I try to dedicate it most of my free time) the better I can assess its current needs and areas of lacking and come up with something substantial.
As far as tutorials go, I was thinking about dealing with manual stream attachment (as a follow up to #8728), exploring more operations on circuits i.e. tearing down, forcing hops through specified countries or address ranges (maybe a more robust example as an excuse to come up with good use cases for REDIRECTSTREAM and TRUNCATE).
As for the Stem itself, I'd love to work on adding any missing functionality (against the Tor protocol or expanding the lib itself), but I'd like to hear your suggestions where the real priorities and missing spots are first.
Thanks, Tom
As far as tutorials go, I was thinking about dealing with manual stream attachment (as a follow up to #8728), exploring more operations on circuits i.e. tearing down, forcing hops through specified countries or address ranges (maybe a more robust example as an excuse to come up with good use cases for REDIRECTSTREAM and TRUNCATE).
Great! Actually I was just thinking about this yesterday too. Indeed, a couple things that would be great to have are...
* A truncate_circuit() method. From what I can tell TRUNCATE isn't an operation (just a REMOTE_REASON for CIRC events), so this might require expanding the tor controller interface.
* The ability for new_circuit() and extend_circuit() to handle paths where tor picks some of the hops. For instance...
circ_id = controller.new_circuit([None, None, 'amunet1'])
... to construct a circuit where tor picks the first two hops normally but we exit through amunet1. There are a couple ways of doing this. The right way is to add this capability to tor, and the hacky way is to construct a normal circuit, truncate the exit, then replace it.
If you're willing to get your hands dirty with tor code as well as stem then two other things that have been high on my wish list are...
https://trac.torproject.org/7953 https://trac.torproject.org/8323
Though that said while these would certainly make for a meaty proposal a lot of this work would be C rather than Python. I'd understand if that makes this less than appealing - there's plenty of other options too.
Some stem specific ideas are...
* Navigation of our API documentation presently sucks a bit. Two ideas for things that could help it are...
https://trac.torproject.org/7632 https://trac.torproject.org/8780
* Controllers often iterate over descriptors but it's presently horribly inefficient. The trouble is that we read the full controller response into memory which can be hundreds of kilobytes and parse it before providing the first reply. Unfortunately fixing this will be tricky, requiring some additional capabilities in the stem.socket module.
https://trac.torproject.org/8248
* As you've found the process for attaching a stream to a circuit is presently pretty clunky. Maybe the controller should have a method that takes two arguments: a circuit id and a functor which will generate a stream. It then makes the stream, attaches it to the circuit, and returns to the caller when its done?
Besides that I'm really having trouble coming up with meaty ideas that solely involve working on stem. Most ideas that I'm coming up with are either limited by tor's control protocol (so would require work in C) or would involve a new project.
One that Karsten warned me away from suggesting for GSoC but I'd love to see happen is Pyonionoo, a python rewrite of our Onionoo service. That project would involve a fair bit of stem expansion...
https://trac.torproject.org/8252
All that said maybe a fresh pair of eyes are best here. Do you have any ideas, maybe some expansion to stems site, capabilities, or tests that you think would make a good project?
Cheers! -Damian
Hi Tomasz. I gave this some more thought throughout the day and here's a few other tasks that would be nicely suited for GSoC. Together with some tutorial and test expansions this should make for a pretty full summer.
Honestly expanding the tor controller interface (and the corresponding stem changes to take advantage of it) would be the most impactful for stem users, but these would still be nice capabilities to have...
* Remote descriptor fetching [1]. This would allow stem applications to query descriptors without piggybacking on a tor instance (a very handy capability to have).
* Support for parsing the tor man page. This is a capability that arm has [2], allowing it to show information about tor's configuration options in the editor panel. We'd want to snag arm's code, clean it up, then add lots 'o tests.
* Support for connection resolution [3]. This is another capability that arm has but stem lacks [4].
Cheers! -Damian
[1] https://trac.torproject.org/8257 [2] https://gitweb.torproject.org/arm.git/blob/HEAD:/src/util/torConfig.py [3] https://trac.torproject.org/7910 [4] https://gitweb.torproject.org/arm.git/blob/HEAD:/src/util/connections.py
Hey Damian,
(I hope I got replying to digest right this time around).
Thanks for all the feedback and assistance so far, it's an immense help.
The right way is to add this capability to tor, and the hacky way is to construct a normal circuit, truncate the exit, then replace it.
The closest thing possible at the moment, I imagine, is:
circ1 = new_circuit( [] ) ... getting nodes on the path ... circ2 = new_circuit( [circ1_node_1, circ1_node_2, desired_exit_node] )
- Navigation of our API documentation presently sucks a bit. Two ideas
for things that could help it are...
https://trac.torproject.org/7632 https://trac.torproject.org/8780
Actually #7632 caught my eye from the very beginning as an interesting project, but the issue holding me back from mentioning it is that it's not just a matter of mere implementation, but also researching the right solution, and only then assessing e.g. how much time it would take. That being said, I'm really interested in this one and if I gather more intel on the matter in the following days I'll definitely get back to it.
#8780 Sounds like fun, I thought about alternatives since I am not a biggest fan of horizontal drop down submenus but I couldn't find a better way to get a quick overview of the module insides, so I'm planning on including this one in application.
Next on comes Controller.assign_upcoming_stream() (working name), skipping adding this functionality, especially since I'm already in the process of getting some hands-on experience just seems like a missed opportunity.
Both #8248 and #8257 require some more research on my part and I'm looking forward to taking them on.
As for ports from arm, I'm really excited about torrc parser because I never really got around to write a porper config parser and finally there's a good reason for it.
connections.py looks interesting, doing *nix specific tasks is nice for a change, since I'm dealing with win32api much more often.
Coupled with tutorial ideas how is the above shaping for a project proposal?
Do you think there's enough to make it a busy summer?
Regards, Tom
connections.py looks interesting, doing *nix specific tasks is nice for a change, since I'm dealing with win32api much more often.
I just finished attending Linux Fest Northwest during which Lee and I talked quite a bit about iron python for stem. Considering that you have some windows experience maybe a tutorial for making stem work with iron python and/or Visual Studio would be a good fit? I'm adding Lee to the cc in case he has some thoughts.
Speaking of connections.py, if you improved it during the port to stem to also support Window's netstat then that would greatly help in porting arm to windows...
https://trac.torproject.org/projects/tor/wiki/doc/arm#Windows
Coupled with tutorial ideas how is the above shaping for a project proposal? Do you think there's enough to make it a busy summer?
It would certainly make for a fine proposal. I *think* it'll keep you pretty busy throughout the summer (development rarely finishes early after all, especially when testing is involved). Any thoughts on what kind of tutorials you would like to add?
Cheers! -Damian
Speaking of connections.py, if you improved it during the port to stem to also support Window's netstat then that would greatly help in porting arm to windows...
https://trac.torproject.org/projects/tor/wiki/doc/arm#Windows
Oh. Speaking of connection.py improvements, that module presently doesn't work when 'DisableDebuggerAttachment' is enabled within tor (which is the default). Jake and I discussed a gross hack to work around this but I never implemented it...
https://trac.torproject.org/3313
Making that module work despite the setting would greatly benefit arm users.
Cheers! -Damian
Hey,
talked quite a bit about iron python for stem. Considering that you have some windows experience maybe a tutorial for making stem work with iron python and/or Visual Studio would be a good fit? I'm adding Lee to the cc in case he has some thoughts.
Interesting idea, after some light tinkering today I managed to run it with stem onboard, I'm unsure how much patching this will need (or rather how much can be done), but I'm on it.
After giving it some thought I've come up with a polished and hopefully even more appealing version of GSoC project.
The task list includes:
* connections.py (#7910, porting from arm; since I will be digging into it deep anyway, throwing netstat.exe and workaround for the anti-debugging patch (#3313) in the mix would make sense and makes it quite a meaty task.
* Migrating config utils (#8251, porting from arm)
* Extending Stem's module overview by additional horizontal submenus (#8780, the nesting gets a bit wilder but it's doable)
* Fetching descriptors independently (#8257)
* Optimizing desc queries (#8248)
* Making stem work with IronPython (there are some things in stem.process and stem.util.system that require attention from what I can tell so far, and there's probably be more to it.)
* Controller.assign_next_stream() Pretty self-explanatory, there's a bit of back and forth to make use __LeaveStreamsUnattached, it's not horrible but we can do better
Tutorials. Timewise I'm listing them as the last thing, mainly because I there's the least uncertainity involved, stem will (hopefully) undergo some changes and I (also hopefully) get much better at making the best out of the library.
There is room for setting up and managing a relay, gathering statistics, I definitely want to push controller's potential further (the more I work on #8728 the more fascinated I am by possibilites enabled by manual circuit creation and stream attachment. :-) )
Something Windows-specific will be useful to go along with making stem IronPython enabled, there's a constant problem with lack of good curses counterpart on Win, so that would mean reaching for non standard module (I'm aware of pdcurses with python bindings floating around).
That's how in the main I envision spending the summer under the Tor's flag.
I'd very much like to hear your opinon and any suggestions.
Regards, Tom
Interesting idea, after some light tinkering today I managed to run it with stem onboard, I'm unsure how much patching this will need (or rather how much can be done), but I'm on it.
Thanks! Personally I'm not at all familiar with IronPython - what kind of modifications do CPython apps usually need to run under it?
After giving it some thought I've come up with a polished and hopefully even more appealing version of GSoC project.
This looks great! I'm especially looking forward to #8257 since it'll let us improve our descriptor integ tests. Presently stem tests against our cached descriptors which can take several test runs to be populated (the tests are skipped if they're not around). Once we can fetch descriptors directly we can change the test to check if the *current* consensus is valid or not.
Tutorials. Timewise I'm listing them as the last thing, mainly because I there's the least uncertainity involved, stem will (hopefully) undergo some changes and I (also hopefully) get much better at making the best out of the library.
Good idea.
There is room for setting up and managing a relay, gathering statistics, I definitely want to push controller's potential further (the more I work on #8728 the more fascinated I am by possibilites enabled by manual circuit creation and stream attachment. :-) )
I'm not quite sure what you mean about gathering statistics but I'm glad to hear that you want to make the manual circuit functionality better. That's functionality that quite a few potential users, but likely has a lot of room for improvement within stem.
Something Windows-specific will be useful to go along with making stem IronPython enabled, there's a constant problem with lack of good curses counterpart on Win, so that would mean reaching for non standard module (I'm aware of pdcurses with python bindings floating around).
What are the disadvantages of PDCurses? When I last looked into the various curses alternatives for Windows that looked to be the best option, though that was years ago.
Looking forward to seeing the application in Melange! -Damian