Hello
I'm Ravi Chandra Padmala, a 20 year old computer programmer from India.
I want to apply to the Tor Project/EFF for Google's Summer of Code program this summer. I would like to spend the summer building the PathSupport counter part for Stem. I have submitted a work in progress proposal at http://www.google-melange.com/gsoc/proposal/review/google/gsoc2012/neena/1
I've submitted a WIP proposal primarily because I need feedback on the direction my API design is taking. I have only answered question (a) from https://www.torproject.org/about/gsoc.html.en#Template . I will be updating the proposal with the rest soon.
Any feedback will be greatly appreciated.
-- neena
I want to apply to the Tor Project/EFF for Google's Summer of Code program this summer. I would like to spend the summer building the PathSupport counter part for Stem. I have submitted a work in progress proposal at http://www.google-melange.com/gsoc/proposal/review/google/gsoc2012/neena/1
I'm sending the contents of my proposal here (with minor modifications), since not everyone has access to that page. Not everyone can comment on it even if I make it public, so, I would prefer it if people with access to that page also commented here.
What project would you like to work on? ============
I would like to work on the PathSupport counterpart for Stem.
Apart from simply writing code, I want to focus on creating a well designed API (during the community bonding period) and on writing comprehensive documentation (very important for a library).
During the community bonding period, I will spend my time preparing a proper API and interacting with the maintainers of projects which use TorCtl and getting their feedback. Their feedback will ensure that the API will be usable. I also will communicate with my mentor about my progress and hopefully, will have an intuitive, easy to use API design ready before the coding period starts.
As suggested, I will be coding iteratively, writing documentation and tests as I write each class/function/module of the project. Implementation implies writing the code, tests and the documentation.
Rough API Draft ---------------
Introducing a new submodule - stem.path
This submodule will contain all PathSupport specific code.
stem.path.PathController
An amalgamation of the PathSupport.PathBuilder and the PathSupport.ConsensusTracker classes. Justification: PathSupport.ConsensusTracker can be merged because TorCtl.PathSupport.ConsensusTracker isn't being used anywhere but as a super class of TorCtl.PathSupport.PathBuilder. Among pytorctl, arm, torbel and torflow the only other mention of ConsensusTracker is TorCtl.SQLSupport.ConsensusTrackerListener which is used by Torflow for storing stats and SQLSUPPORT. TorCtl.PathSupport.PathBuilder uses a TorCtl.PathSupport.SelectionManager. A helper class for handling (router) configuration updates. I will merge a part of this into stem.path.PathController too, in particular, the configuration update handling code. I will look into moving the NodeGenerator handling functionality and the circuit generation functionality of TorCtl.PathSupport.SelectionManager into stem.path.PathGenerator. In the process of merging these three classes together, if stem.path.PathController grows too large or becomes unwieldy, I will make helper classes as necessary. Is a direct subclass of stem.control.BaseController A major change would be to make PathController fully thread-safe instead of an event/queue system. Stem's BaseController class prefers locking and we will use the same approach. Handles tracking the NEWCONSENSUS and NEWDESC events just like TorCtl.PathSupport.ConsensusTracker Write integration tests that check circuit construction and the like, where possible. Write unit tests to ensure all the individual methods of the class are working as expected.
The following classes will be renamed
TorCtl.PathSupport.CircuitHandler -> stem.path.CircuitController CircuitHandler gives the impression that the class 'handles' one Circuit CircuitManager is more appropriate since they 'control' (multiple) Circuit building, and it is also what people should expect since it is a subclass of stem.control.BaseController
TorCtl.PathSupport.StreamHandler -> stem.path.StreamController StreamHandler gives the impression that the class 'handles' one Stream StreamController is more appropriate since they 'control' Stream-Circuit attachment, and it is also what people should expect since it is a subclass of stem.control.BaseController. Torflow (OPAddon) has PingHandler which is a subclass of this. Check how it is subclassing StreamHandler and if this can be made easier to use.This will be done during the community bonding period.
TorCtl.PathSelector -> stem.path.PathGenerator Maintain naming consistency with NodeGenerator Merge parts of TorCtl.PathSupport.SelectionManager into this. In particular, the NodeGenerator handling code.
stem.path.NodeRestriction stem.path.PathRestriction
Both of these Restriction classes have one restriction checking function, r_is_ok and path_is_ok respectively. These will be renamed to 'check'.
Implement the various NodeRestrictions and PathRestrictions available in TorCtl.PathSupport
There's a TODO comment in PathSupport.py (also in torflow/TODO) which mentions other NodeRestrictions and PathRestrictions. If time permits, implement these Restrictions too.
Deliverables ------------
Mid term evaluation:
The following will be implemented as deliverables for the mid term evaluation
Required base classes, such as the stem.path.Circuit class and the stem.path.Stream class. NodeRestrictions and CircuitRestrictions currently available in TorCtl.PathSupport. NodeGenerators and PathGenerators. stem.path.PathController (This might be split into 2 or more classes, if the merge doesn't seem like the right thing to do)
Final evaluation:
The following will be implemented as deliverables for the final term evaluation
stem.path.CircuitController stem.path.StreamController
The following will be ported to use Stem:
Torflow
Note: Torflow is our main 'testbed' for testing the new API, since it the only major codebase that uses PathSupport. Depending on how this goes, I might refactor or make changes to the stem.path submodule.
Optional Deliverables:
In case I finish my deliverables with time to spare, I'll spend it doing the following tasks
Implementing the extra Restrictions mentioned in the TODO Porting Arm to use Stem Porting TorBEL to use Stem
Note: Neither TorBEL, nor Arm use PathSupport
Timeline --------
April 23rd - May 20th Community bonding period
I will spend this time designing the final API for Stem's TorCtl.PathSupport equivalent. I will be communicating with people in the Tor community who have used TorCtl and TorCtl.PathSupport and getting their feedback, thus ensuring that we have a nice API design ready by the beginning of the coding period.
There are some unimplemented parts of the general controller class that are required for the implementation of PathSupport, such as the Router class. Atagar is currently working on this. I will help with implementing these so that they will be ready before the coding period begins.
May 20th - July 9th Coding Period (Pre-mid term evaluation)
The pre-mid term coding period I will be implement the deliverables for the mid term evaluation as follows:
Week 1 -
Implement required base classes, such as the stem.path.Circuit class and the stem.path.Stream class.
Week 2 -
Implement NodeRestrictions and CircuitRestrictions currently available in TorCtl.PathSupport.
Week 3 -
Implement NodeGenerators and PathGenerators.
Week 4-7 -
Implement stem.path.PathController.This might be split into 2 or more classes, if the merge doesn't seem like the right thing to do. This is a major chunk of the project and will take a significant amount of time and might consume a part of week 8.
Note: The individual blocks that PathController is composed of make up about 50% of the existing PathSupport.py code. I will also be writing tests and documentation. This is why I have allocated 3 weeks to this.
Week 8 -
Leaving this week free as buffer time. I have optional deliverables that will be worked on if I have extra time.
July 9th - August 13th (Post-mid term evaluation)
Week 9 -
Implement stem.path.CircuitController and stem.path.StreamController
Week 10 -
Port Torflow to use Stem. This will consume a part of week 11, since Torflow depends on Stem and PathSupport a fair bit. The codebase itself consists of a variety of scripts and utilities and is relatively large (compared to something like TorBEL).
Week 11 -
More buffer time. Depending on the status of my work, I will spend this time doing the following things (In order of importance): Finishing stuff Write the additional Restrictions Begin porting Arm and TorBEL to Stem
August 13th - August 20th (Post-soft pencils down deadline)
Week 12 -
The gsoc page suggests taking a week to scrub code, write tests, improve documentation, etc. Since this is a library, documentation takes priority. So, assuming the code is complete and assuming it has sufficient test coverage I will spend this week writing and improving the documentation. This will involve writing examples and comprehensive documentation.
Point us to a code sample: something good and clean to demonstrate that you know what you're doing, ideally from an existing project.
I have written a few patches for some Tor Project projects. #1667, #5032. Two to Stem, which have been committed to the repository #5199 and #5472.
Why do you want to work with The Tor Project / EFF in particular? ===============
I began reading stuff about The Tor Project about 2 months ago after a friend suggest that I contribute to it. Now, I love the internet, and it is responsible for a large part who I am. The Tor Project and the EFF work to defend the things that make the internet what it is, i.e. (among other things) free speech.
I can relate with this goal very easily.
Tell us about your experiences in free software development environments. We especially want to hear examples of how you have collaborated with others rather than just working on a project by yourself. ===============
Though I have been using Free software for a long time (I switched to Linux about 7 years ago), I haven't made any significant contributions to free software, apart from a few bugs reports and minor patches. However, I am familiar with version control software, bug trackers etc. I have used them while submitting the patches mentioned earlier.
Will you be working full-time on the project for the summer, or will you have other commitments too (a second job, classes, etc)? If you won't be available full-time, please explain, and list timing if you know them for other major deadlines (e.g. exams). Having other activities isn't a deal-breaker, but we don't want to be surprised. ===============
I have exams until the 29th of April, so I will be missing a few days of the community bonding period, though, I hope to show up on the IRC channels even then, albeit sporadically. I also might have to write an exam either in july or august. Though, that depends on me flunking. It won't cost me more than 2 days, and I will work extra during the weekends to make up for it.
Will your project need more work and/or maintenance after the summer ends? What are the chances you will stick around and help out with that and other related projects? ===============
Stem, like all libraries implementing an API for a moving target requires maintenance. I will co-maintain Stem in the future. By the time I'm done with the SoC program, I would've also gained familiarity with other related projects such as Torflow, TorBEL and Arm. I'll be in a position where I can help out with those if there is a need.
Though, personally, I am interested in getting involved in Tor development, and the re-implementation of Thandy (if/when it happens), and that is what I'll do unless any other project requires more attention.
What is your ideal approach to keeping everybody informed of your progress, problems, and questions over the course of the project? Said another way, how much of a "manager" will you need your mentor to be? ===============
IRC is my preferred mode of communication, and I will be using it to ask questions and for help with my problems. If I'm unable to get the answer I want on the IRC, I will ask them on the mailing list.
I will keep people informed about my progress by sending (probably monthly, or as often as required) reports the mailing list.
What school are you attending? What year are you, and what's your major/degree/focus? If you're part of a research group, which one? ===============
I'm an undergraduate student majoring in computer science studying at GITAM University I'm currently working on my final year project which involves computer network modelling.
How can we contact you to ask you further questions? Google doesn't share your contact details with us automatically, so you should include that in your application. In addition, what's your IRC nickname? Interacting with us on IRC will help us get to know you, and help you get to know our community. ===============
I'm available via email at neenaoffline@gmail.com. I'm also subscribed to many of the tor-* mailing lists, including tor-dev and tor-talk. My nickname on OFTC is 'neena'. My email account also doubles up as my Jabber account, though, I prefer IRC.
Are you applying to other projects for GSoC and, if so, what would be your preference if you're accepted to both? Having a stated preference helps with the deduplication process and will not impact if we accept your application or not. ===============
I am not applying to any other projects for GSoC.
Hi Ravi. This is a nice first draft and please keep in mind that I'm pretty green with PathSupport (I've never used it myself) so feel free to push back on any suggestions.
The high level approach that you seem to be taking is to copy PathSupport into stem, then refactor and test it. Is that right? If so then a few questions...
* Did you get Mike's permission for that? TorCtl is under the BSD license (I think) and stem is LGPLv3. * Is this the design that we want? PathSupport is modeled as a narrow object hierarchy built upon TorCtl.EventHandler. We have the opportunity to make any API we want so, as a user, what would you find to be the most intuitive?
My suggestion for starting tasks would be to...
1. Write a simple script to use PathSupport to, say, run wget from a target locale ('./my_script FR http://www.torproject.org/'). See where the pain points were in using PathSupport and what, as a user, you would rather that it did differently.
My understanding is that PathSupport is highly focused on experimentation since that is what Mike needed for his work. However, that is just one consumer and I'm most interested in providing an elegant, simple API that handles basic use cases (like the wget example) easily and can be *extended* for experiments.
2. Talk with the users of PathSupport to figure out their use cases. We should either include those capabilities in our PathSupport counterpart *or* provide what they need to easily make it themselves (if it's a specialized use case). Only three people or places to contact come to mind...
* Mike for SoaT and the bandwidth authorities * Sebastian for TorBEL * tor-dev@ for researchers and other developers using PathSupport, Roger might have some suggestions
3. Part of why I was dubious about this being a quick and easy project is that Stem currently lacks the controller capabilities that you need. You mention using stem.control.BaseController at several points which makes sense since it... well, exists. However, as its pydocs say this is not the class you are looking for...
"Don't use this directly - subclasses provide higher level functionality."
... or they will once we have them. Part of this project would be to start the general controller class to provide the capabilities that you need (plus tests of course). On first glance the things that a PathSupport copy would need are...
* Event handling for, at least, NEWCONSENSUS and NEWDESC. * A Network Status class. This would be similar to stem.descriptor.server_descriptor but *far* easier (there's only around three network status lines).
These are easy and I'm happy to work on them with you. We will, of course, need more before actually migrating any clients.
Their feedback will ensure that the API will be usable.
Don't count on it. This will give a nice first draft but expect to rewrite things quite a few times as we go along. Actually using your API for real clients will certainly reveal some things that we could do better. ;)
I also will communicate with my mentor about my progress and hopefully, will have an intuitive, easy to use API design ready before the coding period starts.
I would like to see a rough first draft of an API as part of the application, which we could then incrementally refine. Maybe a trac subpage under stem would be the best place for this?
Implementation implies writing the code, tests and the documentation.
Yay!
An amalgamation of the PathSupport.PathBuilder and the PathSupport.ConsensusTracker classes.
I understand why Mike made them separate. A few things to think about...
a. The ConsensusTracker is useful as a standalone class by providing the current consensus and descriptors. I used this for a short time with arm but stopped due to 'b'.
b. Loading all of the consensus and descriptor data is... a lot.
atagar@morrigan:~$ du -h ~/.tor/cached-consensus ~/.tor/cached-descriptors 672K /home/atagar/.tor/cached-consensus 3.1M /home/atagar/.tor/cached-descriptors
When I did this with arm a couple years ago it choked the application for several seconds and caused high memory usage. I've heard that this is better, but still we should figure out what is really necessary for the PathSupport functionality that we want.
c. This will be moot, of course, if we go with a different design.
TorCtl.PathSupport.PathBuilder uses a TorCtl.PathSupport.SelectionManager. A helper class for handling (router) configuration updates. I will merge a part of this into stem.path.PathController too
Not quite following. I thought that the SelectionManager was an argument for the configuration the user wanted to run PathSupport with. Keeping those separate conceptually seems like a good idea, though again I haven't actually tried it in practice.
Is a direct subclass of stem.control.BaseController
Why?
A major change would be to make PathController fully thread-safe instead of an event/queue system.
Slight correction, stem uses almost the exact same event/queue based model as TorCtl. The difference is that it also adds read/write locks to provide more complete thread safety.
The following will be ported to use Stem:
- Torflow
Woah, bad idea. Torflow = SoaT + Bandwidth Authorities. That is both way bigger than you want to take on, and probably the last things that will migrate (if they ever do at all). Doesn't TorBEL manually construct circuits? If so then that would be a far better client.
That said, I see where you're getting this from and I might be completely misunderstanding how TorBEL works...
04:34 < logan> please recommend some TorCtl clients which use the PathSupport module 04:42 < Sebastian> logan: I think there's just torflow 04:44 < logan> what about torbel ? 04:47 < logan> and SoaT ? 04:50 < Sebastian> soat is a part of torflow 04:50 < Sebastian> torbel doesn't use it 04:51 < Sebastian> torbel uses TorCtl.Router and TorCtl.TorUtil
There are some unimplemented parts of the general controller class that are required for the implementation of PathSupport, such as the Router class. atagar is currently working on this.
Oh, good that you spotted this. In an ideal world I'd be working on this but, if the last couple months are any guide, I wouldn't count on it.
I will help with implementing these so that they will be ready before the coding period begins.
Great. The top slot on my dance card usually goes to anything that has people actively offering to help. At the moment that's mostly around descriptor parsing, but I'm happy to swap back to the controller if you want to work on it with me.
Port Torflow to use Stem. This will consume a part of week 11,
/me chokes, realizing that ten days are being allocated to this
... er, ambitious
I have written a few patches for some Tor Project projects, #1667 (Tor), #5032 (Thandy). Two to Stem, which have been committed to the repository #5199 and #5472.
Many thanks for those, btw. :)
Do you have any standalone code samples (preferably python) that you've written? Possibly for school?
I have exams until the 29th of April, so I will be missing a few days of the community bonding period...
No problem.
Stem, like all libraries implementing an API for a moving target, requires maintenance. I will co-maintain Stem in the future. By the time I'm done with the SoC program, I would've also gained familiarity with other related projects such as Torflow, TorBEL and Arm. I'll be in a position where I can help out with those if there is a need.
Great, we're always glad when people stick around after GSoC. It's unpleasantly rare, but always good to hope for.
I will keep people informed about my progress by sending (probably monthly, or as often as required) reports the mailing list.
Last year we did bi-weekly status updates. I think that I'd like to work directly with whoever is selected rather than just having code tossed over the fence, but we'll see if that works out (it's not everyone's cup of tea). If you'd rather work on things more independently then let me know.
I'm a little uncomfortable with how nebulous the individual PathSupport tasks are. Please more concretely say what they include and your approach. Alternatively, feel free to make this a "semi-PathSupport and other stem tasks" proposal, taking on some general stem tasks (like Safe Cookie, metrics-lib migration, general controller work, etc) plus _exploratory_ work on PathSupport.
* The advantage of that approach would be a better defined tasks without the unknowns that often derail projects. * The disadvantage is that you'd finish lots of small, useful features rather than a big one (personally I count this as a plus, but some people like just having a single big goal).
Completely up to you. Feel free to continue focusing your application on PathSupport if you want, the above is just a potential alternative.
Cheers! -Damian