On 2021-07-19 3:24 p.m., David Goulet wrote:
On 06 Jul (13:52:50), Ian Goldberg wrote:
Hello tor-dev,
Steve Engler (currently part of the Shadow team) and I have a paper of possible interest to appear at ARES 2021 next month.
It's a design of a multi-threaded relay architecture, of possible particular interest when considering relay support in arti, for example (though the proof-of-concept implementation is based on the usual C codebase).
If you're interested, here are previews of:
Paper: https://cs.uwaterloo.ca/~iang/pubs/mttor-ares21.pdf Video: https://www.youtube.com/watch?v=41a6nLUJye8 Code: https://git-crysp.uwaterloo.ca/sengler/tor-parallel-relay-conn https://git-crysp.uwaterloo.ca/sengler/relay-throughput-testing
Interesting!!!
One part caught my eye in section 4.3:
"The large amount of locking would harm the relay’s performance, and mes- sage passing would break some of the assumptions of Tor’s primary scheduler, the KIST scheduler. Rather than using a global scheduler, each local connection manager uses its own local scheduler which processes only the connections it owns."
So KIST was also put in place in order to be able to consider _all_ channels within one scheduling loop so to properly applied EWMA scheduling that is basically loud circuits (lots of traffic) are less prioritize from quiet ones.
Moving to a scheduler per thread (as in only handling its set of connections), we loose that property no? And so loud circuits end up crushing quiet circuits on the physical link?
(Sending this from an address that is a tor-dev list member.)
If the scheduler's prioritization needs to be exact across all circuits in the relay, then yes that isn't possible with a scheduler per thread.
Our argument is that we don't believe the relay needs prioritization to be perfect across all circuits in the relay. For relays that currently aren't able to saturate their physical link due to CPU performance limitations, the greater total throughput from multi-threading might be worth partially relaxing the scheduling prioritization. Performing per-thread circuit EWMA scheduling should still be enough to prevent loud circuits from crushing quiet circuits, as long as connections are load-balanced across threads in a way that each thread has a mix of loud and quiet circuits.
Another thing. Looking at figure (c), it appears only "relay/edge connections" can queue cells on a circuit half *directly* that is not using a channel. I assume that concurrency there between a relay connection writing a cell on a circuit half and a cell received through a channel (from another circuit half) has been thought of? :)
Since the connection object holds the only reference to the circuit half, it can access the circuit half directly without locking or message passing. This is good for performance and for limiting buffer bloat. Cells arriving at a circuit half on a channel should be queued in that channel and shouldn't directly trigger any access on the circuit half (the channel shouldn't hold a reference to that circuit half). Since the connection object is in charge of both writing a cell on a circuit half and having that circuit half process cells that are queued on its channel, there shouldn't be any concurrency issues here.
I'm asking also because within Tor, there are numerous places where a cell is enqueued on a circuit (even an OR circuit like TRUNCATE for instance) and so those place would need to use a channel or use the way relay connections write them (which appears to be without a channel)?
We were only able to discuss this briefly in the paper, but there's a bit more about that in section 5.3.4 here: https://uwspace.uwaterloo.ca/bitstream/handle/10012/16108/Engler_Steven.pdf
Generally, we want cells to only be enqueued by the circuit half's connection object rather than having many places in tor try to communicate directly with circuit objects. When other parts of tor want to enqueue a cell on a circuit half, they should use the relay controller to send a message to the other thread's connection manager along one of the async channels shown in figure 4.a. The connection manager can then have the corresponding relay connection enqueue a cell directly on the circuit half.
Anyhow, good read, thanks for this paper!
Thanks for reading it! :)
- Steve