Re: [tor-dev] [RFC] Proposal: A First Take at PoW Over Introduction Circuits

7 Apr 2020


      On 06 Apr (17:08:12), Mike Perry wrote:
[snip]
...
...
I think you'll be bound by the amount of data a connection inbuf can take
which has an upper bound of 32 cells each read event.
Then tor will have to empty at once the inbuf, queue all INTRODUCE2 cells (at
most 32) in that priority queue and once done, we would process it until we
return to handling the connection inbuf.
In other words, the queue size, with tor's architecture, is bound to the
number of cells upper bound you can get when doing a recv() pass which is 32
cells.
Nevertheless, that limit is weirdly hardcoded in tor so you should definitely
think of a way to upper bound the queue and just drop the rest. A good
starting point would be that 32 cells number?
dgoulet and I think the following will work better than always doing 32
cells at a time. Basically, the idea is to split our INTRO2 handling
into "top-half" and "bottom-half" handlers.
Top-half handling:

read 32 cells off inbuf as usual
do AES relay cell decryption as usual
Parse relay headers, handle all cells as usual, except:
a) in hs_service_receive_introduce2(), add to pqueue
   and return without further processing of those
Return to rest of mainloop

Agree with the above. It is trivial to do this today so very low engineering
cost.
...
Then, separately, also in mainloop, do the bottom half. (TODO: What
group priority?)
Bottom-half handling:
  I) pop a single intro2 off of pqueue (ie: max difficulty in queue)
 II) Compare this difficulty to desc difficulty. If lower, lower
     desc difficulty
III) Parse it and launch RP circuit (as per current bottom half of
     hs_service_receive_introduce2())
 IV) trim pqueue elements, if queue "too big" (TODO: how to trim?)
  V) Compare trim point difficulty to descriptor difficulty, if trim
     point was higher than descriptor value, raise desc difficulty
 VI) return to libevent/mainloop again
I would maybe try to convince you that we could dequeue more than 1 cell here
because this behavior is changing quite a bit the current state of HS.
Right now, we would get 32 cells out of the inbuf and one at a time, process
it then go back to mainloop.
This new algorithm means that we would process 1 single cell at each mainloop
event instead of 32. This is quite a decrease. Ok, it is not that exact ration
because maybe dequeue the inbuf without processing the decrypted INTRO2 is
fast but it is still a full mainloop run per cell is clearly slower than right
now.
We need some sort of performance measurements here to make an informed
decision but my guts feeling tells me that we might want to don't know process
5 or 10 cells instead of 1 per mainloop round.
We _should_ run timing measurement here to see how much delaying INTRO2
processing to another mainloop event affects the overall rate of introduction.
But let say a full mainloop run takes 100msec, we will process 50
introductions per second... that looks quite low? But could be already what we
do now, unknown.
...
The astute reader will note that even without PoW, the above can provide
almost the exact same functionality as the naive rate limiting currently
done at intropoints - just cut the queue arbitrarily. Basically PoW and
the pqueue just gives us a smarter way to decide who to reply to.
However, it still has the following potential issues:
  A) AES will bottleneck us at ~100Mbit-300Mbit at #2 in top-half above
  B) Extra mainloop() iterations for INTRO2s may be expensive (or not?)
Possibly, from the above, some analysis should happen. I can easily do that
once we get the tracing API upstream.
...
For A, onionbalance will still help by adding new back-end instances
providing intropoints via separate back-end Tor daemons, either on the
same box or different boxes.
But it will only help up to a point. A HSDesc maxes out at 30k, so at
some point we'll run out of space to list more intropoints in a single
descriptor. At that point, we can still list different intropoints at
each HSDir position, but after that, we are screwed.
Small correctino. HSDesc max at 50k for v3 and 20k for v2. But lets just
consider v3 for the forseable future :D.
[snip]
...
...
I would _love_ to but could be too early for that if we consider that we are
still unsure that this defense will be useful or not (according to Mike as a
discussion on IRC).
As described above, multithreading still provides a multiplier in the
AES bottleneck case, even over onionbalance.
But, there may be more bottlenecks than just AES crypto, so this is a
further argument for not jumping the gun just yet, and trying v1 first
(or even a simple prototype without pow, that just cuts the queue
arbitrarily), and getting some more profiling data.
As an initial step, I agree. Onionbalance provides an easy way for service to
outsource client introduction to more CPUs.
But, in my opinion, onionbalance is a power user solution and thus usually a
small percentage of our .onion users that can take advantage of it. As .onion
move more and more in the mobile sphere and client to client applications
(onionshare, ricochet), it ain't much of an option :S.
Without using more CPUs in Tor, I have a _hard_ time seeing tor scale over
time especially for services. As long as we keep that in mind with our
designs, I'm good :).
...
Next steps (not necessarily in order):
a) pqueue plan review + more detailed description
b) Figure out pqueue trim mechanism - can we do better than O(n)?
Initially, we could simply go with an upper limit and just drop cells as you
queue them if you are above limit? As in drop back() if you reach the limit
everytime you queue?
Else, we can get into more complicated schemes with queueing rate versus
processing rate and come down with a golden number to strike a memory and CPU
balance...?
...
c) test using randomx as a hash function in various ways, esp wrt
   key usage, cache setup, and VM infos
d) test pqueue refactoring, maybe without pow involved yet
e) specify a v1.5 that works at intropoint (to avoid AES bottleneck)
f) Merge this thread back into a single proposal document
We could put the engineering details could be in an Annexe of the proposal if
we don't want to loose track of it and not dispersed on a mailing list :)?
I'm happy to help with this, let me know.
...
g) other stuff we forgot, XXX's, TODOs, etc.
Cheers!
David
-- 
6Kt/Zhrgtx/7LnJzUvEZzDZSIq5yophZyqpI6PgAsv4=

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [tor-dev] [RFC] Proposal: A First Take at PoW Over Introduction Circuits