Hi all,
To test the Tor program, I though an independent implementation might help. I started writing TorPylle with that in mind.
The purpose is NOT to implement a secure or robust implementation that could be an alternative to Tor.
It relies on Scapy (http://www.secdev.org/projects/scapy/) and is supposed to be used more or less the same way.
The code is here : https://github.com/cea-sec/TorPylle and includes an example file.
This is an early development stage. Comments, fixes and questions welcome !
Pierre
Hi Pierre, sounds neat! Two quick thoughts:
* The tor project generally dislikes other projects using the name 'tor' within them. This is why I called my projects arm and stem (arm was actually originally called tormoni, a name I still like a lot better)...
https://www.torproject.org/docs/trademark-faq.html.en
* Stem might be able to help you a great deal with this project. It's not just a client of tor's control port, but also a python implementation of its directory specification. For starters I'd look at its new remote descriptor fetching module to pull descriptor content...
https://stem.torproject.org/ https://stem.torproject.org/api/descriptor/remote.html
Best of luck! -Damian
Hi Damian !
Thanks for your answer.
On 07/23/2013 06:00 PM, Damian Johnson wrote:
- The tor project generally dislikes other projects using the name
'tor' within them. This is why I called my projects arm and stem (arm was actually originally called tormoni, a name I still like a lot better)...
I was not aware of that. But when I read "That said, if the research paper comes with software, and that software could potentially confuse people (users or other researchers) about whether it's written or endorsed by The Tor Project, then we're back to the above scenariohttps://www.torproject.org/docs/trademark-faq.html.en#combining."
I think there is no way my "software" (which is not event a software, more a framework or DSL to speak the TOR protocol). Think of it as a research paper. Written in Python ;-).
More seriously, it's not a software intended to be run by Tor users or even Tor relay operators, it's a framework intended to be used by Tor developers / hackers to play with implementations of the Tor protocol.
- Stem might be able to help you a great deal with this project. It's
not just a client of tor's control port, but also a python implementation of its directory specification. For starters I'd look at its new remote descriptor fetching module to pull descriptor content...
https://stem.torproject.org/ https://stem.torproject.org/api/descriptor/remote.html
I just had a look at this project, and if I understand well, it speaks to a Tor's control port and gets its data from the Tor client. With TorPylle, I want to be able to fetch data from any Directory Server directly, and act more or less "as if" I was a Tor client.
To be sure people understand what I mean, the examples.py file in the repository shows what you can do with TorPylle (for now, keep in mind it's a work in progress).
But there are ideas in stem I could use in TorPylle !
Thanks again,
Pierre
More seriously, it's not a software intended to be run by Tor users or even Tor relay operators, it's a framework intended to be used by Tor developers / hackers to play with implementations of the Tor protocol.
Gotcha. It might be worthwhile to clear this with tor-assistants@lists.torproject.org to ensure that this won't be a problem down the road if TorPylle grows. I can speak from experience when I say that renaming a project is a pain in the ass. Especially once its been established for a while. I'd loooove to rename arm (years back when I picked the name the processor didn't have much prominence) but doing so is now just too daunting.
On a side note the appearance of your project has kinda funny timing. Just last week I was thinking "Gah! Why does tor's reference implementation need to be C?". In my not-so-humble opinion that's dragging the application down in terms of maintainability and continued development...
* Tor has only three people (mostly just Nick) routinely touching the core codebase. This means effectively no code reviews and little collaboration. This is not likely to get better with time - universities have been shifting toward higher level languages for years as have industry.
* Mocking is a pain with C. Nick had some ideas six months back to get around this, but I'm not sure if they ever really took off or itself is maintainable.
* C is simply difficult to get right. Besides the risk of stack overflows and memory leaks, there's countless pitfalls that necessitate years of C development experience before touching a line of code.
* Tor really doesn't *need* to be in C. Descriptors, controller, consensus voting, and much of its other functionality would do better with a higher level language, with small C modules for networking and crypto parts that truly need Libevent and such. This would be fine with Java's JNI, Python, Ruby, or any of a handful of languages.
All this said Nick no doubt could list a dozen reasons why this is a terrible idea, not the least being the monumental amount of work and wanting a tor executable without the need for an interpretor. Oh well, I can still dream.
I just had a look at this project, and if I understand well, it speaks to a Tor's control port and gets its data from the Tor client. With TorPylle, I want to be able to fetch data from any Directory Server directly, and act more or less "as if" I was a Tor client.
As I said previously, "It's not just a client of tor's control port, but also a python implementation of its directory specification."
Take another look at the remote descriptor module I pointed out. It's exactly what you want - it pulls descriptor content from the tor directory authorities and mirrors.
Cheers! -Damian
On Wed, Jul 24, 2013 at 7:20 PM, Damian Johnson atagar@torproject.org wrote: [...]
On a side note the appearance of your project has kinda funny timing. Just last week I was thinking "Gah! Why does tor's reference implementation need to be C?". In my not-so-humble opinion that's dragging the application down in terms of maintainability and continued development...
- Tor has only three people (mostly just Nick) routinely touching the
core codebase. This means effectively no code reviews and little collaboration.
This part isn't actually true. We review each other's code, and don't merge stuff without reviewing it. Further, Andrea is full-time on the tor codebase, just like me. The code review slows us down a fair bit, but we do do it.
- Mocking is a pain with C. Nick had some ideas six months back to get
around this, but I'm not sure if they ever really took off or itself is maintainable.
I merged it. It's in master.
- C is simply difficult to get right. Besides the risk of stack
overflows and memory leaks, there's countless pitfalls that necessitate years of C development experience before touching a line of code.
Agreed.
- Tor really doesn't *need* to be in C. Descriptors, controller,
consensus voting, and much of its other functionality would do better with a higher level language, with small C modules for networking and crypto parts that truly need Libevent and such. This would be fine with Java's JNI, Python, Ruby, or any of a handful of languages.
All this said Nick no doubt could list a dozen reasons why this is a terrible idea, not the least being the monumental amount of work and wanting a tor executable without the need for an interpretor. Oh well, I can still dream.
Actually, I think we have a path to get to a less-pure-C Tor implementation. For sandboxing reasons, we'll want to move Tor to work as a set of multiple processes that communicate over well-defined IPC interfaces via a master process. Once we get there, it's no longer too much to think about doing some of those processes in a language other than C.
(What I'm *not* thrilled about is the idea of using an embedded interpreter for this kind of stuff, or embarking on any direction that requires us to rewrite too much of the program at once. That way, in my opinion, lies long-term destabilization.)
The main obstacle for most of these cases is that Tor hasn't been written with modularity in mind from the start, and so therefore some of the parts of Tor which we would do well to disentangle into implementations in other languages are not easily split off from the rest of the codebase. There's interest in doing this for some particular modules, though, and I suspect that once we get started, we'll be able to do it more easily for others.
best wishes,
This part isn't actually true. We review each other's code, and don't merge stuff without reviewing it. Further, Andrea is full-time on the tor codebase, just like me. The code review slows us down a fair bit, but we do do it.
My bad then. From interactions on tickets and commit history it simply seemed like you are far and most the primary person engaged with the codebase...
atagar@odin:~/Desktop/tor/tor$ git log HEAD~200.. | grep "^Author:" | sort | uniq -c | sort -rn 404 Author: Nick Mathewson nickm@torproject.org 40 Author: Andrea Shepard andrea@torproject.org 25 Author: Roger Dingledine arma@torproject.org 20 Author: Mike Perry mikeperry-git@fscked.org 12 Author: Karsten Loesing karsten.loesing@gmx.net ...
Of course commits are roughly as meaningful as lines of code for involvement (which is to say, they're meaningless). Just the stat I had handy. Besides that Andrea hasn't sent a status report since March so I haven't a clue what she's up to.
Actually, I think we have a path to get to a less-pure-C Tor implementation. For sandboxing reasons, we'll want to move Tor to work as a set of multiple processes that communicate over well-defined IPC interfaces via a master process. Once we get there, it's no longer too much to think about doing some of those processes in a language other than C.
Neat! I didn't know we had plans around this. Is this on the horizon or off in unscheduled wishlist territory? If this starts with descriptor or controller functionality then I'd be interested in helping.
(What I'm *not* thrilled about is the idea of using an embedded interpreter for this kind of stuff, or embarking on any direction that requires us to rewrite too much of the program at once. That way, in my opinion, lies long-term destabilization.)
Understandable, though doesn't avoiding an interpreter drop most modern languages from consideration (and any sandboxing an interpreter would provide)? What did you have in mind instead?
Cheers! -Damian
On Thu, Jul 25, 2013 at 10:23 AM, Damian Johnson atagar@torproject.orgwrote:
(What I'm *not* thrilled about is the idea of using an embedded interpreter for this kind of stuff, or embarking on any direction that requires us to rewrite too much of the program at once. That way, in my opinion, lies long-term destabilization.)
Understandable, though doesn't avoiding an interpreter drop most modern languages from consideration (and any sandboxing an interpreter would provide)? What did you have in mind instead?
I would like to point out the existence of some modern, high-level, compiled languages. For instance: Haskell, OCaml, and Go.
Cheers! -Damian _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
On Thu, Jul 25, 2013 at 5:23 PM, Damian Johnson atagar@torproject.org wrote:
Actually, I think we have a path to get to a less-pure-C Tor implementation. For sandboxing reasons, we'll want to move Tor to work as a set of multiple processes that communicate over well-defined IPC interfaces via a master process. Once we get there, it's no longer too much to think about doing some of those processes in a language other than C.
Neat! I didn't know we had plans around this. Is this on the horizon or off in unscheduled wishlist territory? If this starts with descriptor or controller functionality then I'd be interested in helping.
Somewhere in the middle. The sandboxing is now; the partitioning is "not too long I hope"; the multiprocess-transition is "after that, as possible"; and the reimplementation of bits and pieces is "time permitting, as relevant."
(What I'm *not* thrilled about is the idea of using an embedded interpreter for this kind of stuff, or embarking on any direction that requires us to rewrite too much of the program at once. That way, in my opinion, lies long-term destabilization.)
Understandable, though doesn't avoiding an interpreter drop most modern languages from consideration (and any sandboxing an interpreter would provide)? What did you have in mind instead?
What Brandon said here, plus I'm not opposed to an interpreter, but an _embedded_ interpreter (that is, one running in the same process space as all the rest of the Tor code).
yrs,