Re: [tor-dev] Weaving a Faster Tor: paper/video of possible interest

20 Jul 2021


      On 2021-07-19 3:24 p.m., David Goulet wrote:
...
On 06 Jul (13:52:50), Ian Goldberg wrote:
...
Hello tor-dev,
Steve Engler (currently part of the Shadow team) and I have a paper of
possible interest to appear at ARES 2021 next month.
It's a design of a multi-threaded relay architecture, of possible
particular interest when considering relay support in arti, for example
(though the proof-of-concept implementation is based on the usual C
codebase).
If you're interested, here are previews of:
Paper: https://cs.uwaterloo.ca/~iang/pubs/mttor-ares21.pdf
Video: https://www.youtube.com/watch?v=41a6nLUJye8
Code: https://git-crysp.uwaterloo.ca/sengler/tor-parallel-relay-conn
       https://git-crysp.uwaterloo.ca/sengler/relay-throughput-testing
Interesting!!!
One part caught my eye in section 4.3:
"The large amount of locking would harm the relay’s performance, and mes-
   sage passing would break some of the assumptions of Tor’s primary scheduler,
   the KIST scheduler. Rather than using a global scheduler, each local
   connection manager uses its own local scheduler which processes only the
   connections it owns."
So KIST was also put in place in order to be able to consider _all_ channels
within one scheduling loop so to properly applied EWMA scheduling that is
basically loud circuits (lots of traffic) are less prioritize from quiet ones.
Moving to a scheduler per thread (as in only handling its set of connections),
we loose that property no? And so loud circuits end up crushing quiet
circuits on the physical link?
(Sending this from an address that is a tor-dev list member.)
If the scheduler's prioritization needs to be exact across all circuits 
in the relay, then yes that isn't possible with a scheduler per thread.
Our argument is that we don't believe the relay needs prioritization to 
be perfect across all circuits in the relay. For relays that currently 
aren't able to saturate their physical link due to CPU performance 
limitations, the greater total throughput from multi-threading might be 
worth partially relaxing the scheduling prioritization. Performing 
per-thread circuit EWMA scheduling should still be enough to prevent 
loud circuits from crushing quiet circuits, as long as connections are 
load-balanced across threads in a way that each thread has a mix of loud 
and quiet circuits.
...
Another thing. Looking at figure (c), it appears only "relay/edge connections"
can queue cells on a circuit half *directly* that is not using a channel. I
assume that concurrency there between a relay connection writing a cell on a
circuit half and a cell received through a channel (from another circuit half)
has been thought of? :)
Since the connection object holds the only reference to the circuit 
half, it can access the circuit half directly without locking or message 
passing. This is good for performance and for limiting buffer bloat. 
Cells arriving at a circuit half on a channel should be queued in that 
channel and shouldn't directly trigger any access on the circuit half 
(the channel shouldn't hold a reference to that circuit half). Since the 
connection object is in charge of both writing a cell on a circuit half 
and having that circuit half process cells that are queued on its 
channel, there shouldn't be any concurrency issues here.
...
I'm asking also because within Tor, there are numerous places where a cell is
enqueued on a circuit (even an OR circuit like TRUNCATE for instance) and so
those place would need to use a channel or use the way relay connections write
them (which appears to be without a channel)?
We were only able to discuss this briefly in the paper, but there's a 
bit more about that in section 5.3.4 here:
https://uwspace.uwaterloo.ca/bitstream/handle/10012/16108/Engler_Steven.pdf
Generally, we want cells to only be enqueued by the circuit half's 
connection object rather than having many places in tor try to 
communicate directly with circuit objects. When other parts of tor want 
to enqueue a cell on a circuit half, they should use the relay 
controller to send a message to the other thread's connection manager 
along one of the async channels shown in figure 4.a. The connection 
manager can then have the corresponding relay connection enqueue a cell 
directly on the circuit half.
...
Anyhow, good read, thanks for this paper!
Thanks for reading it! :)
- Steve

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [tor-dev] Weaving a Faster Tor: paper/video of possible interest