Turns out that almost everything you said about SHA3 vs SHA256 performance is wrong: http://bench.cr.yp.to/impl-hash/blake256.html http://bench.cr.yp.to/impl-hash/blake256.html Blake256 performs better except on the Cortex A. On the ARM v6 it outperforms SHA256. This includes the ppc32, hardly anyones idea of a server powerhouse. Furthermore, crypto efficiency is less likely to be a bottleneck on a client then a node: server architectures matter much more because we do a lot more crypto on them. (This isn't true for each connection but servers handle more connections then clients.)
Secondly SHA256 is already weaker then an ideal hash function. Joux's multicollision attack works on all Merkel-Damgard constructions, and gives multicollisions faster then is possible for an ideal hash. Length extension attacks make HMAC use 2 hashes instead of 1, something that any speed comparison should remember. (HMAC is a bad idea anyway: quadratic security bounds are not the best possible, we have to use nonces anyway to prevent replay attacks, so Wegman-Carter is a better idea for better in{faster, more secure}. GCM would be an example of this.)
As a KDF none of this really matters, and for signatures collision resistance is still the most important thing. But sometimes we do depend on random oracle assumptions in proofs, and SHA3 is designed to be a better approximation to a random oracle then SHA2.
Sincerely, Watson Ladd