Hi George,
Thanks for the update.
On Wed, Jun 10, 2020 at 2:05 PM George Kadianakis desnacked@riseup.net wrote:
Tevador, thanks a lot for your tailored work on equix. This is fantastic. I have a question that I don't see addressed in your very well written README. In your initial email, we discuss how Equihash does not have good GPU resistance: https://lists.torproject.org/pipermail/tor-dev/2020-May/014268.html
Since equix is using Equihash isn't this gonna be a problem here too? I'm not too worried about ASIC resistance since I doubt someone is gonna build ASICs for this problem just yet, but script kiddies with their CS:GO graphics cards attacking equix is something I'm concerned about. I bet you have thought of this, so I'm wondering what's your take here.
Equihash runs much faster on GPUs only if the memory requirements exceed the size of the CPU cache. This is the case for most Equihash variants that are in use by cryptocurrencies (e.g. 200,9 and 144,5), but doesn't apply to Equi-X, which uses only 2 MB.
The GPU resistance of Equi-X is based on 2 facts: 1) Each solution attempt uses a different hash function. GPUs cannot compile new kernels fast enough (it would require >1000 kernels per second), so they have to run in emulator mode, which is much slower. GPUs are also impacted by thread divergence. 2) The entire sorting phase fits into CPU cache, so CPUs can benefit from memory bandwidth comparable to GPUs (~500 GB/s).
In tevador's initial mail, they point how the cell should include POW_EFFORT and that we should specify a "minimum effort" value instead of just inserting any effort in the pqueue. I can understand how this can have benefits (like the June discussion between tevador and yoehoduv) but I'm also concerned that this can make us more vulnerable to [ATTACK_BOTTOM_HALF] types of attacks, by completely dropping introduction requests instead of queueing them for an abstract future. I wouldn't be surprised if my concerns are invalid and harmful here. Does anyone have intuition?
I don't see any downsides to including the PoW effort in the cell and specifying the minimum effort. The minimum effort is needed to reduce the verification overhead and to ensure that the queue doesn't grow indefinitely. If the effort of the introduction request is lower than the minimum, the service can simply treat it like a request without any PoW (and saving a verification call). The exact outcome would depend on the circumstances, normally the request would go to the back of the queue.
I suggest a minimum effort equivalent to ~1 second of CPU time and adjust it upward if the queue is full. This is more efficient than trimming.
The size of the queue should be enough to handle short attack bursts without dropping any requests. I'd say that a reasonable maximum queue size is {bottom half throughput} * {number of seconds the client will wait before retrying}. There is no point in queueing up more requests than this because the client will have already given up on the request by the earliest time it could be serviced.
- tevador suggests we use two seeds, and always accept introductions with
the previous seed. I agree this is a good idea, and it's not as complex as I originally thought (I have trauma from the v3 design where we try to support multiple time periods at the same time). However, because this doubles the vefication time, I decided to wait for dgoulet's scheduler numbers and until the PoW function is finalized to understand if we can afford the verification overhead.
There is no verification overhead if the seed is included in the request. If additional 32 bytes are too much, the request can include e.g. the first 4 bytes of the seed. This is enough for the service to select the correct seed from the two active ones. The chance that two subsequent seeds have the same first 32 bits is negligible (and can be even avoided completely).
- Solar Designer suggested we do Ethash's anti-DDoS trick to avoid instances
of [ATTACK_TOP_HALF]. This involves wrapping the final PoW token in a fast hash with a really low difficulty, and having the verifier check that fast hash POW first. This means that a target trying to flood us with invalid PoW would need to do some work for every PoW instead of it being free. This is a decision we should take at the end after we do some number crunching and see where we are at in terms of verification time and attack models.
This trick is mostly relevant to slow-to-verify algorithms, but can be also applied to Equi-X by reordering the server algorithm steps:
Input: C, N, E, S 1) Check that C is a valid seed (there may be multiple seeds active at a time). 2) Check that E is above the minimum effort 3) Check that N hasn't been used before with C 4) Calculate R = blake2b(C || N || E || S) 5) Check R * E <= UINT32_MAX 6) Check equix_verify(C || N || E, S) == EQUIX_OK 7) Put the request in the queue with weight E
Simple spam attacks will fail at step 5, avoiding the call to equix_verify.
However, I would not overly rely on this feature because even a single GPU can provide enough fast hashes (5-10 GH/s) for [ATTACK_TOP_HALF] on schemes like Yespower, Argon2 or RandomX. It works for Ethash because the required effort is determined by the computing power of the entire Ethereum network, so the attacker would need to compute billions of fast hashes even to pass the first check.