In reference to: https://lists.torproject.org/pipermail/tor-dev/2011-November/002999.html
FOR A HASH FUNCTION: SHA256, switching to SHA3 in 2012 when it comes out. It might be worthwhile waiting for SHA3 in most places and skipping over the SHA256 stage entirely.
The AES contest resulted in a cipher that was much faster than 3-DES and probably safer as well.
It is looking like the SHA-3 contest will result in a hash function that is slightly slower than SHA-256, and not obviously safer either!
There are some details about the performance issue: the most efficient SHA-3 candidates are faster than SHA-256 on large, expensive, powerful x86_64 servers, and they are faster on long messages (more than a couple thousand bytes). This almost certainly doesn't matter to the tor project! (Nor, I suspect, to almost anyone else.)
I'm guessing (sorry for my ignorance about these important facts) that tor uses secure hashes in two ways: first as the "nails" holding the crypto together, such as in commitments, key-derivation, HMAC, and so forth, and second to integrity-check the bulk data in chunks that are approximately "packet sized" -- a few thousand bytes. On a powerful x86_64 server for 4096 bytes [1], the most efficient SHA-3 candidate (Blake-256) takes about 33,000 cycles and SHA-256 takes about 73,000. So the difference is about 40,000 CPU cycles. Assuming that all of this is done on a single core of a 3.4 GHz chip, that means SHA-256 takes about 12 microseconds longer to hash this 4096-byte packet.
I don't think that makes much of a difference to anyone. You'd have to process data at more than 190,000,000 bytes per second before this would exceed your ability to do it that with SHA-256 on a single one of the 4 cores that come in that chip. Is that a realistic amount of data for a single tor node to process per second in forseeable future? (Honest question: I have no idea if it is although I would guess not.) Anyway, if you *do* want to process more than 190 MBytes/s on a fancy server in the next few years, you can probably just use more than one of its cores.
On the other hand, what if someone wants to deploy Tor in a 32-bit ARM CPU such as in a Freedom Box or in a smart phone. When it is doing 4096-byte packets, Blake-256 is actually less efficient than SHA-256 by about 36,000 cycles. Since the chip in that device is running at a slower clock rate (maybe 800 MHz), then it takes 45 microseconds more to hash that 4096-byte packet with Blake-256 than with SHA-256. If you hash those packets with the slower of the two algorithms (Blake-256), you could handle about 25 MBytes/s using 100% of the only CPU in the system. If you used SHA-256 instead you could handle 35 MBytes/s. If you are using say, 90% of that CPU for other tasks. such as playing a game or watching a movie while running Tor in the background, or even playing a movie which is streaming in over Tor live, then this is the difference between being able to process 2.5 MBytes/s (Blake-256) and 3.5 MBytes/s (SHA-256). That seems be a difference that might matter in practice, unlike the performance difference on expensive x86_64 servers.
Okay, what about the "not obviously safer" part? I think there was a bit of a panic a few years ago, in the aftermath of Wang Xiaoyun's breakthrough on SHA-1, that someone might suddenly figure out a way to find collisions in SHA-256. This panic spurred the creation of the SHA-3 project. However, it seems like in the intervening years nobody has published any techniques that really threaten the safety of SHA-256, so now I'm personally no longer so confident that SHA-3 candidates like Blake will endure longer before someone finds a fatal flaw in them than SHA-256 will. SHA-256 has endured substantial analysis by experts for about a decade now. Blake and its fellow competitors have had about three years.
I'm not saying that I'm confident that SHA-256 will outlast Blake! I'm saying it could go either way.
Bottom line: I would probably move ahead toward SHA-256 and let SHA-3 mature for a few extra years before planning to move to that, if I were you.
Regards,
Zooko
Turns out that almost everything you said about SHA3 vs SHA256 performance is wrong: http://bench.cr.yp.to/impl-hash/blake256.html http://bench.cr.yp.to/impl-hash/blake256.html Blake256 performs better except on the Cortex A. On the ARM v6 it outperforms SHA256. This includes the ppc32, hardly anyones idea of a server powerhouse. Furthermore, crypto efficiency is less likely to be a bottleneck on a client then a node: server architectures matter much more because we do a lot more crypto on them. (This isn't true for each connection but servers handle more connections then clients.)
Secondly SHA256 is already weaker then an ideal hash function. Joux's multicollision attack works on all Merkel-Damgard constructions, and gives multicollisions faster then is possible for an ideal hash. Length extension attacks make HMAC use 2 hashes instead of 1, something that any speed comparison should remember. (HMAC is a bad idea anyway: quadratic security bounds are not the best possible, we have to use nonces anyway to prevent replay attacks, so Wegman-Carter is a better idea for better in{faster, more secure}. GCM would be an example of this.)
As a KDF none of this really matters, and for signatures collision resistance is still the most important thing. But sometimes we do depend on random oracle assumptions in proofs, and SHA3 is designed to be a better approximation to a random oracle then SHA2.
Sincerely, Watson Ladd
I too have been following the development of SHA-3 and will toss in my 2c here.
On 11/01/2011 06:50 AM, Watson Ladd wrote:
Turns out that almost everything you said about SHA3 vs SHA256 performance is wrong: http://bench.cr.yp.to/impl-hash/blake256.html http://bench.cr.yp.to/impl-hash/blake256.html Blake256 performs better except on the Cortex A. On the ARM v6 it outperforms SHA256. This includes the ppc32, hardly anyones idea of a server powerhouse.
I don't know about the specific benchmarks you mentioned, but most of the choices fall in this range of 5 to 12 cycles per hashed byte on modern CPUs.
SHA-3 is also being developed with attention to the amount of circuitry ("die area") needed to implement it in hardware. So it's possible that hardware acceleration will appear for SHA-3 sooner/instead of SHA-2.
Furthermore, crypto efficiency is less likely to be a bottleneck on a client then a node:
Desktop PCs with a 50 W CPU are shrinking relative to the whole client pie. Mobile devices are the growing slice, there the concerns are different: is hardware acceleration available? and what is power consumption?
If I had to guess what would be most available and power-efficient on mobile devices 5 years from now, I'd guess SHA-3.
server architectures matter much more because we do a lot more crypto on them. (This isn't true for each connection but servers handle more connections then clients.)
How big of a network pipe does a dedicated Tor server need to bottleneck on the crypto?
Doesn't the architecture of Tor prefer a larger number of smaller nodes?
Secondly SHA256 is already weaker then an ideal hash function. Joux's multicollision attack works on all Merkel-Damgard constructions, and gives multicollisions faster then is possible for an ideal hash.
Agreed, SHA-3 will fix some problems. Some of these things we've been working around so long that they seem normal.
Length extension attacks make HMAC use 2 hashes instead of 1, something that any speed comparison should remember. (HMAC is a bad idea anyway: quadratic security bounds are not the best possible, we have to use nonces anyway to prevent replay attacks, so Wegman-Carter is a better idea for better in{faster, more secure}. GCM would be an example of this.)
I know Wegman-Carter is not new, but where is it being used in practice?
It looks like NIST took a while to figure out the security on GCM:
http://www.csrc.nist.gov/groups/ST/toolkit/BCM/documents/comments/CWC-GCM/Fe... http://csrc.nist.gov/groups/ST/toolkit/BCM/documents/Joux_comments.pdf
As a KDF none of this really matters,
What matters for a KDF is some assurance that the attacker will not have access to a significantly faster implementation than the defender. I believe scrypt has the best claim to that right now, although something based on the arcfour algorithm could do a little better.
This is a case where performance for the defender translates to additional security (he can set the iteration count higher).
Having benchmarks on optimized hardware implementations is thus important.
and for signatures collision resistance is still the most important thing. But sometimes we do depend on random oracle assumptions in proofs, and SHA3 is designed to be a better approximation to a random oracle then SHA2.
There's sometimes also a benefit of being with the current NIST recommendation. I suspect more users will migrate off of SHA-1 to SHA-3 than they will to SHA-2.
NIST may eventually 'deprecate' SHA-2 in favor of SHA-3 due to just the length extension issue. Which is not to say that I think there's a real problem using SHA-2 correctly, only that you may end up having to explain repeatedly why it's not a problem.
- Marsh
On Tue, Nov 1, 2011 at 9:30 AM, Marsh Ray marsh@extendedsubset.com wrote:
I too have been following the development of SHA-3 and will toss in my 2c here.
Hi Marsh! You made several good points, a few of which I quoted below. Your points make me think, speaking loosely, that SHA-3 will turn out to be the best function to use due to reasons of popularity or bureaucratic fiat. Perhaps someday the chips you buy will come with SHA-3 circuits built-in but not SHA-2, and perhaps people will start looking at you funny if you use SHA-2 when everyone else is using SHA-3. These may be good reasons—perhaps better reasons that the moderate performance issues I previously mentioned or the completely indefensible vague unease I've expressed about SHA-3 being too new.
You did make one small technical mistake though:
SHA-3 is also being developed with attention to the amount of circuitry ("die area") needed to implement it in hardware. So it's possible that hardware acceleration will appear for SHA-3 sooner/instead of SHA-2.
Although the SHA-3 designers have indeed tried to optimize for that, I think SHA-256 is actually still better. See Fig. 17 of http://eprint.iacr.org/2009/510.pdf .
Below my signature is just me quoting a few of the points you made. :-)
Regards,
Zooko
Agreed, SHA-3 will fix some problems. Some of these things we've been working around so long that they seem normal.
…
There's sometimes also a benefit of being with the current NIST recommendation. I suspect more users will migrate off of SHA-1 to SHA-3 than they will to SHA-2.
…
NIST may eventually 'deprecate' SHA-2 in favor of SHA-3 due to just the length extension issue. Which is not to say that I think there's a real problem using SHA-2 correctly, only that you may end up having to explain repeatedly why it's not a problem.
On Tue, Nov 1, 2011 at 12:46 PM, Zooko O'Whielacronx zooko@zooko.com wrote:
On Tue, Nov 1, 2011 at 9:30 AM, Marsh Ray marsh@extendedsubset.com wrote:
I too have been following the development of SHA-3 and will toss in my 2c here.
[....ommitted...]
Although the SHA-3 designers have indeed tried to optimize for that, I think SHA-256 is actually still better. See Fig. 17 of http://eprint.iacr.org/2009/510.pdf .
Its wonderful that you provided references, and even told me what diagram to look for. But figure 17 has every finalist other then Skein outperforming SHA2 in hardware (last column is bits per second), and that was optimizing for speed. In the case of Keccak, that performance is impressively greater. Its possible at the 512 level these reverse, but I don't see that in there. Sincerely, Watson Ladd
Below my signature is just me quoting a few of the points you made. :-)
Regards,
Zooko
Agreed, SHA-3 will fix some problems. Some of these things we've been working around so long that they seem normal.
…
There's sometimes also a benefit of being with the current NIST recommendation. I suspect more users will migrate off of SHA-1 to SHA-3 than they will to SHA-2.
…
NIST may eventually 'deprecate' SHA-2 in favor of SHA-3 due to just the length extension issue. Which is not to say that I think there's a real problem using SHA-2 correctly, only that you may end up having to explain repeatedly why it's not a problem.
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
-- "Those who would give up Essential Liberty to purchase a little Temporary Safety deserve neither Liberty nor Safety." -- Benjamin Franklin
On Tue, Nov 1, 2011 at 1:36 PM, Watson Ladd watsonbladd@gmail.com wrote:
See Fig. 17 of http://eprint.iacr.org/2009/510.pdf .
Its wonderful that you provided references, and even told me what diagram to look for. But figure 17 has every finalist other then Skein outperforming SHA2 in hardware (last column is bits per second), and that was optimizing for speed. In the case of Keccak, that performance is impressively greater. Its possible at the 512 level these reverse, but I don't see that in there.
I'm sorry, once again I failed to spell out explicitly what I was thinking.
In this case it is that the relevant metric is area rather than throughput. This is because of what Marsh Ray brought up—the prospect that future chips might come with a SHA-3 or a SHA-256 circuit built in. The advantage to a chip designer of adding such a circuit in is, of course, that their customers may want it and so buy their chip instead of their competitors'. The disadvantage is the cost in design complexity (~= time) and die area (~= marginal cost to print one of these chips). I'm told that some of these embedded chips are exquisitely sensitive to marginal costs, such that a few pennies can make the difference between success and failure of the product!
Therefore, in the context of whether we can expect SHA-3 and/or SHA-256 circuits to come built into our chips in the future, the fact that SHA-256 can be implemented in a smaller circuit means it would be cheaper for a chip maker to include it.
As for performance, note that the vertical axis of Fig. 17 is in Gbit/s. Even the slowest implementation of SHA-256 was at something like 0.8 Gbit/s, which is about 0.1 Gbyte/s which is about 100 MByte/s, which is more than any one circuit will probably be asked to handle. If the chip designer expects the user to need more than 100 MByte/s throughput, he can put multiple circuits in there. For example the new SPARC T4 chip comes with 8 CPU cores, each with its own SHA-256 circuit (as well as AES and other algorithms).
On the other hand, I still think back to Marsh's observation that the *perception* of superiority of SHA-3 over SHA-2 might mean that the actual chips of the future come with SHA-3 even if it is more expensive.
Oh neat! I just learned that the 64-bit ARMv8 is going to come with SHA-256: http://www.theregister.co.uk/2011/10/28/arm_holdings_arm_v8/
Very cool.
Another factor which might prolong SHA-256's life is its role as the proof-of-work in Bitcoin. This causes there to be a global race for efficient SHA-256 implementation, and whoever gets even a little bit ahead in that race can rake in profits. The current leading technologies are ATI GPUs and FPGAs, but if there were a chip with an efficient enough SHA-256 built in, perhaps they could sell it to Bitcoin miners.
Regards,
Zooko
On Tue, Nov 1, 2011 at 1:20 PM, Zooko O'Whielacronx zooko@zooko.com wrote:
... Therefore, in the context of whether we can expect SHA-3 and/or SHA-256 circuits to come built into our chips in the future, the fact that SHA-256 can be implemented in a smaller circuit means it would be cheaper for a chip maker to include it.
my strong preference for SHA-2-256 is precisely for this reason. i use multiple systems with hardware accelerated SHA-2-256. these systems will never have accelerated SHA-3.
adoption of SHA-3 into hardware designs may change this in the future; i am skeptical :)
On Tue, 1 Nov 2011 14:51:00 -0700 coderman coderman@gmail.com wrote:
On Tue, Nov 1, 2011 at 1:20 PM, Zooko O'Whielacronx zooko@zooko.com wrote:
... Therefore, in the context of whether we can expect SHA-3 and/or SHA-256 circuits to come built into our chips in the future, the fact that SHA-256 can be implemented in a smaller circuit means it would be cheaper for a chip maker to include it.
my strong preference for SHA-2-256 is precisely for this reason. i use multiple systems with hardware accelerated SHA-2-256. these systems will never have accelerated SHA-3.
adoption of SHA-3 into hardware designs may change this in the future; i am skeptical :) _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
I'm very enthusiastic about one of five SHA-3 finalist -- Keccak. I contact with the Keccak team about some ideas and they responded readily. IMHO Keccak is more perspespective than Skein or ChaCha as a universal cryptoprimitive to make most of symmetryc algos obsolete.
Keccak is not only a hash with any possible length of output but PBKDF, KDF, MAC, old-style HMAC, Stream cipher, random acces Stream Cipher, stronge authenticated Stream Cipher, per block or per complete message authenticated Stream Cipher and possible many more, proved to be secure in random oracle model and easy to use to make most of protocols simple.
The Keccak team pointed me to a method for executing stream cipher encryption and authenticated encryption based on sponge.
The first presentation of the so called duplexing mode, using a sponge for MACing and encryption was at the SHA-3 conference in Santa Barbara in 2010. You can download the paper from here http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SH... And recently presented at SAC2011, here you can have a look at the presentation http://sac2011.ryerson.ca/SAC2011/BDPVA.pdf
If NIST make the Keccak a SHA-3 finalist then be prepare to integrate it as a good flexible choice. Not only as a hash but virtually as everything symmetric algos. Unfortunately, most of the Keccak properties may be standartizated so slow.
And most of that non-hash properties seems non-conservative, experimental, innovatory and ambitious but very amazingly perspective and good designed with respectful research works and good reputations of authors.
See www.keccak.noekeon.org for details.
On Tue, Nov 1, 2011 at 5:50 AM, Watson Ladd watsonbladd@gmail.com wrote:
Turns out that almost everything you said about SHA3 vs SHA256 performance is wrong:
I should have specified that from the page I referenced -- http://bench.cr.yp.to/results-hash.html -- I was looking at the first x86_64 machine: amd64 Sandy Bridge (206a7); 2011 Intel Core i7-2600K; 4 x 3400MHz; "sandy0" and the first ARM machine: armeabi (v7-A, Cortex A8); 2009 Freescale i.MX515; 1 x 800MHz; "gcc33". Given that I was using the numbers from those two machines, I believe what I wrote was correct.
Now, perhaps those weren't the right machines to use for examples. My reasoning was that they are the newest and most efficient of their respective instruction sets, so they're likely to be more representative of their successors. I don't know why SHA-256 was slower on the ARMv6 that you mentioned.
Furthermore, crypto efficiency is less likely to be a bottleneck on a client then a node: server architectures matter much more because we do a lot more crypto on them. (This isn't true for each connection but servers handle more connections then clients.)
My letter, to which you replied is a quantitative argument to the contrary. The bottom line quantities that I came up with were that if you used 100% of one Sandy Bridge core, then you could hash at least 190 Mbytes/s, even with the less efficient of the two hash functions (SHA-256), and if you used 10% of a Cortex-A8, then you could hash at least 2.5 Mbytes/s even with the less efficient of the two (Blake-256).
As far as I understand, the case where a server would be asked to handle more than 190 Mbytes/s will be rarer than the case where a client will be asked to handle more than 2.5 Mbytes/s, during the lifetime of the next version of the Tor protocol. I don't know much about Tor performance so I could be all wrong about that.
Now, maybe the assumption that you can afford to dedicate 100% of a server core but only 10% of the embedded CPU is wrong. Even if you change the assumption to 50% of one server core vs. 50% of your embedded CPU, the server will probably still be network-limited but the embedded chip may be CPU-limited.
Secondly SHA256 is already weaker then an ideal hash function.
…
and SHA3 is designed to be a better approximation to a random oracle then SHA2.
I'm aware of these considerations, and I don't disagree! Thank you for bringing them up. But they don't prove that SHA-256 will prove vulnerable to any real attack. Likewise, there could be some unforeseen critical weakness in a SHA-3 candidate. We don't have any proof one way or the other about these questions. My personal, under-informed guess is that none of SHA-256, Blake, Skein will prove vulnerable to any real attack during the lifespan of the next version of the Tor protocol.
I would be interested in other people's personal guesses about that question, especially if you've studied the actual algorithms and attacks closely.
Regards,
Zooko