Parallel Crypto - Library dep. - tor-dev

List overview All Threads
Download

newer

Parallel Crypto - Library dep.

older

Proposal xxx: Safe cookie...

Re: [tor-dev] Simulating a slow...

David Goulet

31 Jan 2012 31 Jan '12

7:46 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Hi everyone,

To help the tor project, I'll contribute some of my spare time to improve multithreading for the Tor code base.

I've speak a bit with Nick M. and it seems the crypto lib is an important part to begin with. The wiki page (https://trac.torproject.org/projects/tor/wiki/org/projects/Tor/Multithreaded...) indicates, basically, that a worker thread pool with a work queue to dispatch crypto events should be the right approach and I do agree.

Is it acceptable to link an external library to the project being a dependence?

The library I'm thinking about is "liburcu" which stands for user-space RCU (http://lttng.org/urcu). It's a complete set of lockless data structure including wait-free queue which can be very useful for our case. It support a large variety of architecture and works on BSD and Linux. The Linux kernel use RCU mechanism for a lot of internal data structure today so it's quite tested and solid.

The question I think is do we want lockless data structure in Tor or it's not and will not be necessary for the type of workload ? (lockless re-sizable hash tables, red-black tree, stack, linked-list (double also) and queue are available as of today).

Waiting on your feedback guys, either way, I'll begin implementing parallel crypto largely based on the wiki page (really good ideas there).

Thanks a lot! David

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iQEcBAEBAgAGBQJPKET6AAoJEELoaioR9I02OO0H/2lxrvak2ItAdGsXHsyH2dgz U3ePxZUg8Ix5UuZXA/LnP3T7/HBa47mtPMj3hwuz2Wnarf6FulumYA3A9jKsZyxQ tf6azD+G7CbZjjYPbe8XYfOZC6+x58mF7SciM/maLoFQLzCvw7ruBBXu8j0Ghw5Q hcm8RMIa4UyB0szSpMqkt615sYQBgy7hhEkNKqxnfdP4zIqUIK8mJqBING6r7qU+ EhnIT5VNzKG9FZPkYNzXOvzbtH0MegNfePsi6gDYlkjR7gekiT9wYH9n5tFTPQUu 4BwqaaHR/Wk+zfHaQOmz+KC3eefUqcd+XP82mcPTSUDj4mzG1Sio2ZHKX0IeJVw= =r0da -----END PGP SIGNATURE-----

Show replies by date

Watson Ladd

31 Jan 31 Jan

8:08 p.m.

On Tue, Jan 31, 2012 at 1:46 PM, David Goulet dgoulet@ev0ke.net wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Hi everyone,

To help the tor project, I'll contribute some of my spare time to improve multithreading for the Tor code base.

Color me confused: This is for taking advantage of multiprocessor systems, correct?

...

I've speak a bit with Nick M. and it seems the crypto lib is an important part to begin with. The wiki page (https://trac.torproject.org/projects/tor/wiki/org/projects/Tor/Multithreaded...) indicates, basically, that a worker thread pool with a work queue to dispatch crypto events should be the right approach and I do agree.

Is it acceptable to link an external library to the project being a dependence?

The library I'm thinking about is "liburcu" which stands for user-space RCU (http://lttng.org/urcu). It's a complete set of lockless data structure including wait-free queue which can be very useful for our case. It support a large variety of architecture and works on BSD and Linux. The Linux kernel use RCU mechanism for a lot of internal data structure today so it's quite tested and solid.

One future issue I foresee is the use of batching oriented crypto operations. If we use binary Edwards curves for speed in the onion skins, then batching becomes a major timesaver. The obvious way to do this is with workers grabbing and putting back batches, but we also want to maintain responsiveness. If a router is getting low traffic it shouldn't wait forever to fill up a batch.

...

The question I think is do we want lockless data structure in Tor or it's not and will not be necessary for the type of workload ? (lockless re-sizable hash tables, red-black tree, stack, linked-list (double also) and queue are available as of today).

Waiting on your feedback guys, either way, I'll begin implementing parallel crypto largely based on the wiki page (really good ideas there).

Thanks a lot! David -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux)

iQEcBAEBAgAGBQJPKET6AAoJEELoaioR9I02OO0H/2lxrvak2ItAdGsXHsyH2dgz U3ePxZUg8Ix5UuZXA/LnP3T7/HBa47mtPMj3hwuz2Wnarf6FulumYA3A9jKsZyxQ tf6azD+G7CbZjjYPbe8XYfOZC6+x58mF7SciM/maLoFQLzCvw7ruBBXu8j0Ghw5Q hcm8RMIa4UyB0szSpMqkt615sYQBgy7hhEkNKqxnfdP4zIqUIK8mJqBING6r7qU+ EhnIT5VNzKG9FZPkYNzXOvzbtH0MegNfePsi6gDYlkjR7gekiT9wYH9n5tFTPQUu 4BwqaaHR/Wk+zfHaQOmz+KC3eefUqcd+XP82mcPTSUDj4mzG1Sio2ZHKX0IeJVw= =r0da -----END PGP SIGNATURE----- _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

Sincerely, Watson Ladd

-- "Those who would give up Essential Liberty to purchase a little Temporary Safety deserve neither Liberty nor Safety." -- Benjamin Franklin

David Goulet

1 Feb 1 Feb

1:52 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 12-01-31 03:08 PM, Watson Ladd wrote:

...

On Tue, Jan 31, 2012 at 1:46 PM, David Goulet dgoulet@ev0ke.net wrote: Hi everyone,

To help the tor project, I'll contribute some of my spare time to improve multithreading for the Tor code base.

...
Color me confused: This is for taking advantage of multiprocessor systems, correct?

Yep :)

...

I've speak a bit with Nick M. and it seems the crypto lib is an important part to begin with. The wiki page (https://trac.torproject.org/projects/tor/wiki/org/projects/Tor/Multithreaded...) indicates, basically, that a worker thread pool with a work queue to dispatch crypto events should be the right approach and I do agree.

Is it acceptable to link an external library to the project being a dependence?

The library I'm thinking about is "liburcu" which stands for user-space RCU (http://lttng.org/urcu). It's a complete set of lockless data structure including wait-free queue which can be very useful for our case. It support a large variety of architecture and works on BSD and Linux. The Linux kernel use RCU mechanism for a lot of internal data structure today so it's quite tested and solid.

...
One future issue I foresee is the use of batching oriented crypto operations. If we use binary Edwards curves for speed in the onion skins, then batching becomes a major timesaver. The obvious way to do this is with workers grabbing and putting back batches, but we also want to maintain responsiveness. If a router is getting low traffic it shouldn't wait forever to fill up a batch.

Indeed. Adding latency to a node is just a no go I think so definitely things to consider.

Thanks! David

...

The question I think is do we want lockless data structure in Tor or it's not and will not be necessary for the type of workload ? (lockless re-sizable hash tables, red-black tree, stack, linked-list (double also) and queue are available as of today).

Waiting on your feedback guys, either way, I'll begin implementing parallel crypto largely based on the wiki page (really good ideas there).

Thanks a lot! David

_______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

...

Sincerely, Watson Ladd

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iQEcBAEBAgAGBQJPKJr1AAoJEELoaioR9I02ymsH/1/30lIjE6UF/lEOxDWdGQp7 JDE9bE6ggoHN8Os/Uuw8Xe6ggE8p7ywbz6ofq+kXZn9PA299gwcH2jtNAa2/Ht9s 3smWlLJkcsNFMx7IXSapictXL1wAV/Mpo7HwmutT3BKRynhwNTmExzutSuEavROD BN4OUV/3YmhFqwOqkvbA5ohNNHBss+BKrRjeeK+LmyP6o4tLYl1tjdMFP+y5Pol5 NoTM0nW2SS0cVA5GssfBbJyTqBsvOQF6JB5y17VhpPz7yGbK5C4qx1VsfQ9jAoZv U2BRaMxVMVtwQrxgk5YFhMGoXEBgTAEp2hG2EsSsE18M9v/R9tHBoIwSkDkAnrM= =IvNp -----END PGP SIGNATURE-----

Nick Mathewson

31 Jan 31 Jan

8:42 p.m.

On Tue, Jan 31, 2012 at 2:46 PM, David Goulet dgoulet@ev0ke.net wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Hi everyone,

To help the tor project, I'll contribute some of my spare time to improve multithreading for the Tor code base.

I've speak a bit with Nick M. and it seems the crypto lib is an important part to begin with. The wiki page (https://trac.torproject.org/projects/tor/wiki/org/projects/Tor/Multithreaded...) indicates, basically, that a worker thread pool with a work queue to dispatch crypto events should be the right approach and I do agree.

Is it acceptable to link an external library to the project being a dependence?

It depends, I'd say. Most of the data structures we're talking about here are ones that allow a lockless and locked implementations. So my ideal implementation would be to have the ability to use lockless structures where available, but a locked implementation otherwise. This would let us work with better lockless libraries if they come along, and continue to run on operating systems or on CPUs that don't support librcu, and also migrate to another system in the future in case a better one comes along.

But personally, I would be very surprised if this turned out to make a very big difference: even symmetric crypto is pretty slow in comparison to even the most obvious work-queue implementations, right? (If I'm missing something there, please let me know.)

cheers,

-- Nick

David Goulet

1 Feb 1 Feb

1:49 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 12-01-31 03:42 PM, Nick Mathewson wrote:

...

On Tue, Jan 31, 2012 at 2:46 PM, David Goulet dgoulet@ev0ke.net wrote:

...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Hi everyone,

To help the tor project, I'll contribute some of my spare time to improve multithreading for the Tor code base.

I've speak a bit with Nick M. and it seems the crypto lib is an important part to begin with. The wiki page (https://trac.torproject.org/projects/tor/wiki/org/projects/Tor/Multithreaded...) indicates, basically, that a worker thread pool with a work queue to dispatch crypto events should be the right approach and I do agree.

Is it acceptable to link an external library to the project being a dependence?

It depends, I'd say. Most of the data structures we're talking about here are ones that allow a lockless and locked implementations. So my ideal implementation would be to have the ability to use lockless structures where available, but a locked implementation otherwise. This would let us work with better lockless libraries if they come along, and continue to run on operating systems or on CPUs that don't support librcu, and also migrate to another system in the future in case a better one comes along.

I do agree on that! However, sometimes APIs from those kind of libs can be quite complex (if I think about the red-black tree in URCU...) so having compat layer between lock and lockless is sometime a bit of work.

So going for a wait-free queue and a normal locked queue, it's not that difficult (in terms of APIs/ABIs handling) but the question I think is do we want first to do a "normal locking queue" in the tor code tree and than go for a lockless from a external lib with a compat layer between lock and lockless ?

Personally, I think we should go straight for one type of data structure and make sure we create a decent compat layer on top to be able to switch from one technology to an other easily.

Does it makes sense to you?

...

But personally, I would be very surprised if this turned out to make a very big difference: even symmetric crypto is pretty slow in comparison to even the most obvious work-queue implementations, right? (If I'm missing something there, please let me know.)

Well, I'm not too knowledgeable in crypto implementation but if some hardware can be use to do the job, it will considerably speed up the process so a situation where your crypto will go faster than queuing events is a possibility (if I understand the question right).

Cheers! David

...

cheers,

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iQEcBAEBAgAGBQJPKJo3AAoJEELoaioR9I02Yt8IAJXv4pgj53jhAtqlMX4DfmpX SCu/Vnx3+JUFsS1VgJkXXcA1f4pQNNJasoTHcjXDU7eJD2LD/sWkfbM9FrW1vwgd WjpNwRk0WFFQDZFHULjg72iftEVbLEJNjDhsvX0U/YHwHqgm+OVvoUHEwe22kON1 4peqyGyrem8lutaKA9+SLXhEC7U44Tn7tUqCUj9rWZAtAcCH0AE8baNPiHShNWYx Av+ddnrCt3Pcpo3KZzVK5ZU7D4Ub7/rolyH4T5FjMm0ukmNv+gAjxfDQpqhzFA0h Go92OHHno8kB74hsmMDIVCljUbRMbBn/vOMWPpWmvEiYJD6ofC9neV1qnbjhT7s= =lagt -----END PGP SIGNATURE-----

Nick Mathewson

3 Feb 3 Feb

4:46 p.m.

On Tue, Jan 31, 2012 at 8:49 PM, David Goulet dgoulet@ev0ke.net wrote: [...]

...

So going for a wait-free queue and a normal locked queue, it's not that difficult (in terms of APIs/ABIs handling) but the question I think is do we want first to do a "normal locking queue" in the tor code tree and than go for a lockless from a external lib with a compat layer between lock and lockless ?

Personally, I think we should go straight for one type of data structure and make sure we create a decent compat layer on top to be able to switch from one technology to an other easily.

Does it makes sense to you?

I think that makes sense, if I understand you correctly. If what we need is a work queue, for example, I don't much care what the initial implementation is, so long as it is easy to add others.

yrs,

-- Nick

Mike Perry

7:46 a.m.

Thus spake Nick Mathewson (nickm@alum.mit.edu):

...

On Tue, Jan 31, 2012 at 2:46 PM, David Goulet dgoulet@ev0ke.net wrote:

...
To help the tor project, I'll contribute some of my spare time to improve multithreading for the Tor code base.

I've speak a bit with Nick M. and it seems the crypto lib is an important part to begin with. The wiki page (https://trac.torproject.org/projects/tor/wiki/org/projects/Tor/Multithreaded...) indicates, basically, that a worker thread pool with a work queue to dispatch crypto events should be the right approach and I do agree.

Is it acceptable to link an external library to the project being a dependence?

It depends, I'd say. Most of the data structures we're talking about here are ones that allow a lockless and locked implementations. So my ideal implementation would be to have the ability to use lockless structures where available, but a locked implementation otherwise. This would let us work with better lockless libraries if they come along, and continue to run on operating systems or on CPUs that don't support librcu, and also migrate to another system in the future in case a better one comes along.

But personally, I would be very surprised if this turned out to make a very big difference: even symmetric crypto is pretty slow in comparison to even the most obvious work-queue implementations, right? (If I'm missing something there, please let me know.)

Linus Tordvalds emerges from 2008 to agree with Nick, and to add: http://www.realworldtech.com/forums/index.cfm?action=detail&id=91906&...

For extra lullz, note the "topic" is "1000 cores SMP is going to happen". Is this a fake thread, or just some sketch web forum? I can't find it online anywhere else, and it has a pretty high volume of idiocy, even for 2008. Other than Linus, of course. My heart goes out to you, young(er) Torvalds, if that is the real you...

It's also possible that RCU has been through enough trial by fire since then to have caused Real 2012 Linus to disagree with Fake 2008 Linus.

Either way, it sounds like good sense to make sure we have the option to say to people "Omg, you hit that crazy crash under heavy crypto load? Try building with --disable-non-determinism this time."

-- Mike Perry

David Goulet

3:02 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 12-02-03 02:46 AM, Mike Perry

...

Either way, it sounds like good sense to make sure we have the option to say to people "Omg, you hit that crazy crash under heavy crypto load? Try building with --disable-non-determinism this time."

Haha! I'll remember that one! :).

I do agree with you!

This class of lockless algorithm can bring more edge cases and cause tremendous amount of work to debug them often not easily reproducible. The Tor project is too "sensitive", I think, to hit those kinds of issue in production so normally, before adding RCU data structures, a huge number of tests has to be done.

Anyhow, you are right, it should be done with the two options at least. We'll see if RCU brings a significant performance/scalability improvement before going "full throttle" with it.

Cheers! David

...

tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iQEcBAEBAgAGBQJPK/cPAAoJEELoaioR9I02pJYH/RgdqghIfGZLAXQCXq87n4PX Kqkdpo9/D/lzSkQKJAeoRIRhCSGpuFV8qME5h7WDeR8VKCECpretUhhhu/x+FM2R BlScMYp5gLoSQOcE2B2MPDJisXUizUQFkn3NuEN+7bADo9mzDGCGsWMQCTGb4VCJ a3o+y60b+2mEwRiK5s/ZX0DhzvCQK2hNbMfAaWZOu8UQFqvQfCbOs9ajVQufbAsI Y4kC1vq6yNGoOCEAiAZOscXZpDmb42PmXmqeH2Y0izRcdmR5vu3QqCUcvGSBB2xe 6HqylDduibCP9d2X3USN6GsYrBBgEipSq0RPQs3PPxj1Befv/UzYDHZ1gs0AWT0= =rDTD -----END PGP SIGNATURE-----

4661

Age (days ago)

4664

Last active (days ago)

tor-dev@lists.torproject.org

7 comments

4 participants

tags (0)

participants (4)

David Goulet
Mike Perry
Nick Mathewson
Watson Ladd