On Sat, Sep 2, 2017 at 4:16 AM, Peter Schwabe peter@cryptojedi.org wrote:
Yawning Angel yawning@schwanenlied.me wrote:
Hi Yawning, hi all,
Note, I'm not hating on Farfalle, I need to look at it more, and the last time I gave serious thought to this question in a Tor context was back around the time Prop 261 was being drafted.
The answer to this from my point of view is "not slow to the point where the network falls over", which I'll admit is extremely handwavy, but truth be told, I have no idea what fraction of the relays are on what micro architectures these days.
Looking at the Farfalle and Kangaroo 12 papers, Kravette may be ok with AVX2 assuming I'm extrapolating correctly. But, while it's probably reasonable to assume that all the fast existing relays have AES-NI, I do not know what fraction of those predate AVX2.
You should end up with something like 13 cycles per byte for Farfalle with the Keccak permutation on Skylake. Would there be some way to test what effects this has on overall performance without harming any users?
If this is *clearly* too slow, then it might be interesting to try the Farfalle construction with different permutations to see how far you can push performance.
I think the first step here is to instrument relays to figure out what fraction of their cryptography is relay cell cryptography: this could tells us what slowdown we should expect. (It _should_ be about a third of our current cell crypto load, but surprises have certainly been known to happen!)
The current performance we have is much faster than 13 cpb -- we're at approximately one AES, plus one third of a SHA1. (The "one third" is because only clients and exits do the SHA1 step.)
It would be hard to experiment to see whether some slowdown would be acceptable: the problem is that the major increase in load would be at the relay side -- and it's hard to tell the impact of putting more load on relays on the actual network without actually doing it.
Yawning is right that doing multithreaded cell crypto is important here too: there are unused cores at the moment.