Re: [tor-dev] [RFC] Proposal: A First Take at PoW Over Introduction Circuits

10 Jun 2020

      Hi George,
Thanks for the update.
On Wed, Jun 10, 2020 at 2:05 PM George Kadianakis desnacked@riseup.net wrote:
...
Tevador, thanks a lot for your tailored work on equix. This is fantastic.
  I have a question that I don't see addressed in your very well written
  README. In your initial email, we discuss how Equihash does not have good
  GPU resistance:
  https://lists.torproject.org/pipermail/tor-dev/2020-May/014268.html
Since equix is using Equihash isn't this gonna be a problem here too?
  I'm not too worried about ASIC resistance since I doubt someone is gonna
  build ASICs for this problem just yet, but script kiddies with their CS:GO
  graphics cards attacking equix is something I'm concerned about. I bet you
  have thought of this, so I'm wondering what's your take here.
Equihash runs much faster on GPUs only if the memory requirements
exceed the size of the CPU cache. This is the case for most Equihash
variants that are in use by cryptocurrencies (e.g. 200,9 and 144,5),
but doesn't apply to Equi-X, which uses only 2 MB.
The GPU resistance of Equi-X is based on 2 facts:
1) Each solution attempt uses a different hash function. GPUs cannot
compile new kernels fast enough (it would require >1000 kernels per
second), so they have to run in emulator mode, which is much slower.
GPUs are also impacted by thread divergence.
2) The entire sorting phase fits into CPU cache, so CPUs can benefit
from memory bandwidth comparable to GPUs (~500 GB/s).
...
In tevador's initial mail, they point how the cell should include
POW_EFFORT and that we should specify a "minimum effort" value instead
of just inserting any effort in the pqueue. I can understand how this can
have benefits (like  the June discussion between tevador and yoehoduv) but
I'm also concerned that this can make us more vulnerable to
[ATTACK_BOTTOM_HALF] types of attacks, by completely dropping introduction
requests instead of queueing them for an abstract future. I wouldn't be
surprised if my concerns are invalid and harmful here. Does anyone have
intuition?
I don't see any downsides to including the PoW effort in the cell and
specifying the minimum effort. The minimum effort is needed to reduce
the verification overhead and to ensure that the queue doesn't grow
indefinitely. If the effort of the introduction request is lower than
the minimum, the service can simply treat it like a request without
any PoW (and saving a verification call). The exact outcome would
depend on the circumstances, normally the request would go to the back
of the queue.
I suggest a minimum effort equivalent to ~1 second of CPU time and
adjust it upward if the queue is full. This is more efficient than
trimming.
The size of the queue should be enough to handle short attack bursts
without dropping any requests. I'd say that a reasonable maximum queue
size is {bottom half throughput} * {number of seconds the client will
wait before retrying}. There is no point in queueing up more requests
than this because the client will have already given up on the request
by the earliest time it could be serviced.
...

tevador suggests we use two seeds, and always accept introductions with

the previous seed. I agree this is a good idea, and it's not as complex as
I originally thought (I have trauma from the v3 design where we try to
support multiple time periods at the same time). However, because this
doubles the vefication time, I decided to wait for dgoulet's scheduler
numbers and until the PoW function is finalized to understand if we can
afford the verification overhead.
There is no verification overhead if the seed is included in the
request. If additional 32 bytes are too much, the request can include
e.g. the first 4 bytes of the seed. This is enough for the service to
select the correct seed from the two active ones. The chance that two
subsequent seeds have the same first 32 bits is negligible (and can be
even avoided completely).
...

Solar Designer suggested we do Ethash's anti-DDoS trick to avoid instances

of [ATTACK_TOP_HALF]. This involves wrapping the final PoW token in a fast
hash with a really low difficulty, and having the verifier check that fast
hash POW first. This means that a target trying to flood us with invalid PoW
would need to do some work for every PoW instead of it being free. This is a
decision we should take at the end after we do some number crunching and see
where we are at in terms of verification time and attack models.
This trick is mostly relevant to slow-to-verify algorithms, but can be
also applied to Equi-X by reordering the server algorithm steps:
Input: C, N, E, S
1) Check that C is a valid seed (there may be multiple seeds active at a time).
2) Check that E is above the minimum effort
3) Check that N hasn't been used before with C
4) Calculate R = blake2b(C || N || E || S)
5) Check R * E <= UINT32_MAX
6) Check equix_verify(C || N || E, S) == EQUIX_OK
7) Put the request in the queue with weight E
Simple spam attacks will fail at step 5, avoiding the call to equix_verify.
However, I would not overly rely on this feature because even a single
GPU can provide enough fast hashes (5-10 GH/s) for [ATTACK_TOP_HALF]
on schemes like Yespower, Argon2 or RandomX. It works for Ethash
because the required effort is determined by the computing power of
the entire Ethereum network, so the attacker would need to compute
billions of fast hashes even to pass the first check.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [tor-dev] [RFC] Proposal: A First Take at PoW Over Introduction Circuits