-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi everyone,
To help the tor project, I'll contribute some of my spare time to improve multithreading for the Tor code base.
I've speak a bit with Nick M. and it seems the crypto lib is an important part to begin with. The wiki page (https://trac.torproject.org/projects/tor/wiki/org/projects/Tor/Multithreaded...) indicates, basically, that a worker thread pool with a work queue to dispatch crypto events should be the right approach and I do agree.
Is it acceptable to link an external library to the project being a dependence?
The library I'm thinking about is "liburcu" which stands for user-space RCU (http://lttng.org/urcu). It's a complete set of lockless data structure including wait-free queue which can be very useful for our case. It support a large variety of architecture and works on BSD and Linux. The Linux kernel use RCU mechanism for a lot of internal data structure today so it's quite tested and solid.
The question I think is do we want lockless data structure in Tor or it's not and will not be necessary for the type of workload ? (lockless re-sizable hash tables, red-black tree, stack, linked-list (double also) and queue are available as of today).
Waiting on your feedback guys, either way, I'll begin implementing parallel crypto largely based on the wiki page (really good ideas there).
Thanks a lot! David
On Tue, Jan 31, 2012 at 1:46 PM, David Goulet dgoulet@ev0ke.net wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi everyone,
To help the tor project, I'll contribute some of my spare time to improve multithreading for the Tor code base.
Color me confused: This is for taking advantage of multiprocessor systems, correct?
I've speak a bit with Nick M. and it seems the crypto lib is an important part to begin with. The wiki page (https://trac.torproject.org/projects/tor/wiki/org/projects/Tor/Multithreaded...) indicates, basically, that a worker thread pool with a work queue to dispatch crypto events should be the right approach and I do agree.
Is it acceptable to link an external library to the project being a dependence?
The library I'm thinking about is "liburcu" which stands for user-space RCU (http://lttng.org/urcu). It's a complete set of lockless data structure including wait-free queue which can be very useful for our case. It support a large variety of architecture and works on BSD and Linux. The Linux kernel use RCU mechanism for a lot of internal data structure today so it's quite tested and solid.
One future issue I foresee is the use of batching oriented crypto operations. If we use binary Edwards curves for speed in the onion skins, then batching becomes a major timesaver. The obvious way to do this is with workers grabbing and putting back batches, but we also want to maintain responsiveness. If a router is getting low traffic it shouldn't wait forever to fill up a batch.
The question I think is do we want lockless data structure in Tor or it's not and will not be necessary for the type of workload ? (lockless re-sizable hash tables, red-black tree, stack, linked-list (double also) and queue are available as of today).
Waiting on your feedback guys, either way, I'll begin implementing parallel crypto largely based on the wiki page (really good ideas there).
Thanks a lot! David -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux)
iQEcBAEBAgAGBQJPKET6AAoJEELoaioR9I02OO0H/2lxrvak2ItAdGsXHsyH2dgz U3ePxZUg8Ix5UuZXA/LnP3T7/HBa47mtPMj3hwuz2Wnarf6FulumYA3A9jKsZyxQ tf6azD+G7CbZjjYPbe8XYfOZC6+x58mF7SciM/maLoFQLzCvw7ruBBXu8j0Ghw5Q hcm8RMIa4UyB0szSpMqkt615sYQBgy7hhEkNKqxnfdP4zIqUIK8mJqBING6r7qU+ EhnIT5VNzKG9FZPkYNzXOvzbtH0MegNfePsi6gDYlkjR7gekiT9wYH9n5tFTPQUu 4BwqaaHR/Wk+zfHaQOmz+KC3eefUqcd+XP82mcPTSUDj4mzG1Sio2ZHKX0IeJVw= =r0da -----END PGP SIGNATURE----- _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Sincerely, Watson Ladd
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 12-01-31 03:08 PM, Watson Ladd wrote:
On Tue, Jan 31, 2012 at 1:46 PM, David Goulet dgoulet@ev0ke.net wrote: Hi everyone,
To help the tor project, I'll contribute some of my spare time to improve multithreading for the Tor code base.
Color me confused: This is for taking advantage of multiprocessor systems, correct?
Yep :)
I've speak a bit with Nick M. and it seems the crypto lib is an important part to begin with. The wiki page (https://trac.torproject.org/projects/tor/wiki/org/projects/Tor/Multithreaded...) indicates, basically, that a worker thread pool with a work queue to dispatch crypto events should be the right approach and I do agree.
Is it acceptable to link an external library to the project being a dependence?
The library I'm thinking about is "liburcu" which stands for user-space RCU (http://lttng.org/urcu). It's a complete set of lockless data structure including wait-free queue which can be very useful for our case. It support a large variety of architecture and works on BSD and Linux. The Linux kernel use RCU mechanism for a lot of internal data structure today so it's quite tested and solid.
One future issue I foresee is the use of batching oriented crypto operations. If we use binary Edwards curves for speed in the onion skins, then batching becomes a major timesaver. The obvious way to do this is with workers grabbing and putting back batches, but we also want to maintain responsiveness. If a router is getting low traffic it shouldn't wait forever to fill up a batch.
Indeed. Adding latency to a node is just a no go I think so definitely things to consider.
Thanks! David
The question I think is do we want lockless data structure in Tor or it's not and will not be necessary for the type of workload ? (lockless re-sizable hash tables, red-black tree, stack, linked-list (double also) and queue are available as of today).
Waiting on your feedback guys, either way, I'll begin implementing parallel crypto largely based on the wiki page (really good ideas there).
Thanks a lot! David
_______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Sincerely, Watson Ladd
On Tue, Jan 31, 2012 at 2:46 PM, David Goulet dgoulet@ev0ke.net wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi everyone,
To help the tor project, I'll contribute some of my spare time to improve multithreading for the Tor code base.
I've speak a bit with Nick M. and it seems the crypto lib is an important part to begin with. The wiki page (https://trac.torproject.org/projects/tor/wiki/org/projects/Tor/Multithreaded...) indicates, basically, that a worker thread pool with a work queue to dispatch crypto events should be the right approach and I do agree.
Is it acceptable to link an external library to the project being a dependence?
It depends, I'd say. Most of the data structures we're talking about here are ones that allow a lockless and locked implementations. So my ideal implementation would be to have the ability to use lockless structures where available, but a locked implementation otherwise. This would let us work with better lockless libraries if they come along, and continue to run on operating systems or on CPUs that don't support librcu, and also migrate to another system in the future in case a better one comes along.
But personally, I would be very surprised if this turned out to make a very big difference: even symmetric crypto is pretty slow in comparison to even the most obvious work-queue implementations, right? (If I'm missing something there, please let me know.)
cheers,
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 12-01-31 03:42 PM, Nick Mathewson wrote:
On Tue, Jan 31, 2012 at 2:46 PM, David Goulet dgoulet@ev0ke.net wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi everyone,
To help the tor project, I'll contribute some of my spare time to improve multithreading for the Tor code base.
I've speak a bit with Nick M. and it seems the crypto lib is an important part to begin with. The wiki page (https://trac.torproject.org/projects/tor/wiki/org/projects/Tor/Multithreaded...) indicates, basically, that a worker thread pool with a work queue to dispatch crypto events should be the right approach and I do agree.
Is it acceptable to link an external library to the project being a dependence?
It depends, I'd say. Most of the data structures we're talking about here are ones that allow a lockless and locked implementations. So my ideal implementation would be to have the ability to use lockless structures where available, but a locked implementation otherwise. This would let us work with better lockless libraries if they come along, and continue to run on operating systems or on CPUs that don't support librcu, and also migrate to another system in the future in case a better one comes along.
I do agree on that! However, sometimes APIs from those kind of libs can be quite complex (if I think about the red-black tree in URCU...) so having compat layer between lock and lockless is sometime a bit of work.
So going for a wait-free queue and a normal locked queue, it's not that difficult (in terms of APIs/ABIs handling) but the question I think is do we want first to do a "normal locking queue" in the tor code tree and than go for a lockless from a external lib with a compat layer between lock and lockless ?
Personally, I think we should go straight for one type of data structure and make sure we create a decent compat layer on top to be able to switch from one technology to an other easily.
Does it makes sense to you?
But personally, I would be very surprised if this turned out to make a very big difference: even symmetric crypto is pretty slow in comparison to even the most obvious work-queue implementations, right? (If I'm missing something there, please let me know.)
Well, I'm not too knowledgeable in crypto implementation but if some hardware can be use to do the job, it will considerably speed up the process so a situation where your crypto will go faster than queuing events is a possibility (if I understand the question right).
Cheers! David
cheers,
On Tue, Jan 31, 2012 at 8:49 PM, David Goulet dgoulet@ev0ke.net wrote: [...]
So going for a wait-free queue and a normal locked queue, it's not that difficult (in terms of APIs/ABIs handling) but the question I think is do we want first to do a "normal locking queue" in the tor code tree and than go for a lockless from a external lib with a compat layer between lock and lockless ?
Personally, I think we should go straight for one type of data structure and make sure we create a decent compat layer on top to be able to switch from one technology to an other easily.
Does it makes sense to you?
I think that makes sense, if I understand you correctly. If what we need is a work queue, for example, I don't much care what the initial implementation is, so long as it is easy to add others.
yrs,
Thus spake Nick Mathewson (nickm@alum.mit.edu):
On Tue, Jan 31, 2012 at 2:46 PM, David Goulet dgoulet@ev0ke.net wrote:
To help the tor project, I'll contribute some of my spare time to improve multithreading for the Tor code base.
I've speak a bit with Nick M. and it seems the crypto lib is an important part to begin with. The wiki page (https://trac.torproject.org/projects/tor/wiki/org/projects/Tor/Multithreaded...) indicates, basically, that a worker thread pool with a work queue to dispatch crypto events should be the right approach and I do agree.
Is it acceptable to link an external library to the project being a dependence?
It depends, I'd say. Most of the data structures we're talking about here are ones that allow a lockless and locked implementations. So my ideal implementation would be to have the ability to use lockless structures where available, but a locked implementation otherwise. This would let us work with better lockless libraries if they come along, and continue to run on operating systems or on CPUs that don't support librcu, and also migrate to another system in the future in case a better one comes along.
But personally, I would be very surprised if this turned out to make a very big difference: even symmetric crypto is pretty slow in comparison to even the most obvious work-queue implementations, right? (If I'm missing something there, please let me know.)
Linus Tordvalds emerges from 2008 to agree with Nick, and to add: http://www.realworldtech.com/forums/index.cfm?action=detail&id=91906&...
For extra lullz, note the "topic" is "1000 cores SMP is going to happen". Is this a fake thread, or just some sketch web forum? I can't find it online anywhere else, and it has a pretty high volume of idiocy, even for 2008. Other than Linus, of course. My heart goes out to you, young(er) Torvalds, if that is the real you...
It's also possible that RCU has been through enough trial by fire since then to have caused Real 2012 Linus to disagree with Fake 2008 Linus.
Either way, it sounds like good sense to make sure we have the option to say to people "Omg, you hit that crazy crash under heavy crypto load? Try building with --disable-non-determinism this time."
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 12-02-03 02:46 AM, Mike Perry
Either way, it sounds like good sense to make sure we have the option to say to people "Omg, you hit that crazy crash under heavy crypto load? Try building with --disable-non-determinism this time."
Haha! I'll remember that one! :).
I do agree with you!
This class of lockless algorithm can bring more edge cases and cause tremendous amount of work to debug them often not easily reproducible. The Tor project is too "sensitive", I think, to hit those kinds of issue in production so normally, before adding RCU data structures, a huge number of tests has to be done.
Anyhow, you are right, it should be done with the two options at least. We'll see if RCU brings a significant performance/scalability improvement before going "full throttle" with it.
Cheers! David
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev