Hello tor-dev,
Steve Engler (currently part of the Shadow team) and I have a paper of possible interest to appear at ARES 2021 next month.
It's a design of a multi-threaded relay architecture, of possible particular interest when considering relay support in arti, for example (though the proof-of-concept implementation is based on the usual C codebase).
If you're interested, here are previews of:
Paper: https://cs.uwaterloo.ca/~iang/pubs/mttor-ares21.pdf Video: https://www.youtube.com/watch?v=41a6nLUJye8 Code: https://git-crysp.uwaterloo.ca/sengler/tor-parallel-relay-conn https://git-crysp.uwaterloo.ca/sengler/relay-throughput-testing
On 06 Jul (13:52:50), Ian Goldberg wrote:
Hello tor-dev,
Steve Engler (currently part of the Shadow team) and I have a paper of possible interest to appear at ARES 2021 next month.
It's a design of a multi-threaded relay architecture, of possible particular interest when considering relay support in arti, for example (though the proof-of-concept implementation is based on the usual C codebase).
If you're interested, here are previews of:
Paper: https://cs.uwaterloo.ca/~iang/pubs/mttor-ares21.pdf Video: https://www.youtube.com/watch?v=41a6nLUJye8 Code: https://git-crysp.uwaterloo.ca/sengler/tor-parallel-relay-conn https://git-crysp.uwaterloo.ca/sengler/relay-throughput-testing
Interesting!!!
One part caught my eye in section 4.3:
"The large amount of locking would harm the relay’s performance, and mes- sage passing would break some of the assumptions of Tor’s primary scheduler, the KIST scheduler. Rather than using a global scheduler, each local connection manager uses its own local scheduler which processes only the connections it owns."
So KIST was also put in place in order to be able to consider _all_ channels within one scheduling loop so to properly applied EWMA scheduling that is basically loud circuits (lots of traffic) are less prioritize from quiet ones.
Moving to a scheduler per thread (as in only handling its set of connections), we loose that property no? And so loud circuits end up crushing quiet circuits on the physical link?
Another thing. Looking at figure (c), it appears only "relay/edge connections" can queue cells on a circuit half *directly* that is not using a channel. I assume that concurrency there between a relay connection writing a cell on a circuit half and a cell received through a channel (from another circuit half) has been thought of? :)
I'm asking also because within Tor, there are numerous places where a cell is enqueued on a circuit (even an OR circuit like TRUNCATE for instance) and so those place would need to use a channel or use the way relay connections write them (which appears to be without a channel)?
Anyhow, good read, thanks for this paper!
Cheers! David
On 2021-07-19 3:24 p.m., David Goulet wrote:
On 06 Jul (13:52:50), Ian Goldberg wrote:
Hello tor-dev,
Steve Engler (currently part of the Shadow team) and I have a paper of possible interest to appear at ARES 2021 next month.
It's a design of a multi-threaded relay architecture, of possible particular interest when considering relay support in arti, for example (though the proof-of-concept implementation is based on the usual C codebase).
If you're interested, here are previews of:
Paper: https://cs.uwaterloo.ca/~iang/pubs/mttor-ares21.pdf Video: https://www.youtube.com/watch?v=41a6nLUJye8 Code: https://git-crysp.uwaterloo.ca/sengler/tor-parallel-relay-conn https://git-crysp.uwaterloo.ca/sengler/relay-throughput-testing
Interesting!!!
One part caught my eye in section 4.3:
"The large amount of locking would harm the relay’s performance, and mes- sage passing would break some of the assumptions of Tor’s primary scheduler, the KIST scheduler. Rather than using a global scheduler, each local connection manager uses its own local scheduler which processes only the connections it owns."
So KIST was also put in place in order to be able to consider _all_ channels within one scheduling loop so to properly applied EWMA scheduling that is basically loud circuits (lots of traffic) are less prioritize from quiet ones.
Moving to a scheduler per thread (as in only handling its set of connections), we loose that property no? And so loud circuits end up crushing quiet circuits on the physical link?
(Sending this from an address that is a tor-dev list member.)
If the scheduler's prioritization needs to be exact across all circuits in the relay, then yes that isn't possible with a scheduler per thread.
Our argument is that we don't believe the relay needs prioritization to be perfect across all circuits in the relay. For relays that currently aren't able to saturate their physical link due to CPU performance limitations, the greater total throughput from multi-threading might be worth partially relaxing the scheduling prioritization. Performing per-thread circuit EWMA scheduling should still be enough to prevent loud circuits from crushing quiet circuits, as long as connections are load-balanced across threads in a way that each thread has a mix of loud and quiet circuits.
Another thing. Looking at figure (c), it appears only "relay/edge connections" can queue cells on a circuit half *directly* that is not using a channel. I assume that concurrency there between a relay connection writing a cell on a circuit half and a cell received through a channel (from another circuit half) has been thought of? :)
Since the connection object holds the only reference to the circuit half, it can access the circuit half directly without locking or message passing. This is good for performance and for limiting buffer bloat. Cells arriving at a circuit half on a channel should be queued in that channel and shouldn't directly trigger any access on the circuit half (the channel shouldn't hold a reference to that circuit half). Since the connection object is in charge of both writing a cell on a circuit half and having that circuit half process cells that are queued on its channel, there shouldn't be any concurrency issues here.
I'm asking also because within Tor, there are numerous places where a cell is enqueued on a circuit (even an OR circuit like TRUNCATE for instance) and so those place would need to use a channel or use the way relay connections write them (which appears to be without a channel)?
We were only able to discuss this briefly in the paper, but there's a bit more about that in section 5.3.4 here: https://uwspace.uwaterloo.ca/bitstream/handle/10012/16108/Engler_Steven.pdf
Generally, we want cells to only be enqueued by the circuit half's connection object rather than having many places in tor try to communicate directly with circuit objects. When other parts of tor want to enqueue a cell on a circuit half, they should use the relay controller to send a message to the other thread's connection manager along one of the async channels shown in figure 4.a. The connection manager can then have the corresponding relay connection enqueue a cell directly on the circuit half.
Anyhow, good read, thanks for this paper!
Thanks for reading it! :)
- Steve
On 20 Jul (01:29:54), Steven Engler wrote:
On 2021-07-19 3:24 p.m., David Goulet wrote:
On 06 Jul (13:52:50), Ian Goldberg wrote:
Hello tor-dev,
Steve Engler (currently part of the Shadow team) and I have a paper of possible interest to appear at ARES 2021 next month.
It's a design of a multi-threaded relay architecture, of possible particular interest when considering relay support in arti, for example (though the proof-of-concept implementation is based on the usual C codebase).
If you're interested, here are previews of:
Paper: https://cs.uwaterloo.ca/~iang/pubs/mttor-ares21.pdf Video: https://www.youtube.com/watch?v=41a6nLUJye8 Code: https://git-crysp.uwaterloo.ca/sengler/tor-parallel-relay-conn https://git-crysp.uwaterloo.ca/sengler/relay-throughput-testing
Interesting!!!
One part caught my eye in section 4.3:
"The large amount of locking would harm the relay’s performance, and mes- sage passing would break some of the assumptions of Tor’s primary scheduler, the KIST scheduler. Rather than using a global scheduler, each local connection manager uses its own local scheduler which processes only the connections it owns."
So KIST was also put in place in order to be able to consider _all_ channels within one scheduling loop so to properly applied EWMA scheduling that is basically loud circuits (lots of traffic) are less prioritize from quiet ones.
Moving to a scheduler per thread (as in only handling its set of connections), we loose that property no? And so loud circuits end up crushing quiet circuits on the physical link?
(Sending this from an address that is a tor-dev list member.)
If the scheduler's prioritization needs to be exact across all circuits in the relay, then yes that isn't possible with a scheduler per thread.
Our argument is that we don't believe the relay needs prioritization to be perfect across all circuits in the relay. For relays that currently aren't able to saturate their physical link due to CPU performance limitations, the greater total throughput from multi-threading might be worth partially relaxing the scheduling prioritization. Performing per-thread circuit EWMA scheduling should still be enough to prevent loud circuits from crushing quiet circuits, as long as connections are load-balanced across threads in a way that each thread has a mix of loud and quiet circuits.
Interesting.
We would need the load balancing to be very active then, and rebalancing connections across the thread pool basically. But, yeah, I think I can see that the current property of needing to look at all connections for prioritization could be relaxed if we pull off proper load-balancing between threads but again, my guts tells me it would need to be active rebalancing.
In my experience, active rebalancing like that can be a complex beast both in terms of concurrency but not impossible. Likely easier in more modern language also :P.
Another thing. Looking at figure (c), it appears only "relay/edge connections" can queue cells on a circuit half *directly* that is not using a channel. I assume that concurrency there between a relay connection writing a cell on a circuit half and a cell received through a channel (from another circuit half) has been thought of? :)
Since the connection object holds the only reference to the circuit half, it can access the circuit half directly without locking or message passing. This is good for performance and for limiting buffer bloat. Cells arriving at a circuit half on a channel should be queued in that channel and shouldn't directly trigger any access on the circuit half (the channel shouldn't hold a reference to that circuit half). Since the connection object is in charge of both writing a cell on a circuit half and having that circuit half process cells that are queued on its channel, there shouldn't be any concurrency issues here.
Ok. Let me try to summarize: A connection reads its inbound cells, queue them onto the correct circuit half (which I assume sticks to that connection and thus no concurrent access) which then process them and sends them on its channel (where a channel is always 1:1 circuit half relationship) and that receiving circuit half (at that point on another thread) then enqueue them in the scheduler list (if need be) so it can end up sending them on the right connection.
I'm asking also because within Tor, there are numerous places where a cell is enqueued on a circuit (even an OR circuit like TRUNCATE for instance) and so those place would need to use a channel or use the way relay connections write them (which appears to be without a channel)?
We were only able to discuss this briefly in the paper, but there's a bit more about that in section 5.3.4 here: https://uwspace.uwaterloo.ca/bitstream/handle/10012/16108/Engler_Steven.pdf
Generally, we want cells to only be enqueued by the circuit half's connection object rather than having many places in tor try to communicate directly with circuit objects. When other parts of tor want to enqueue a cell on a circuit half, they should use the relay controller to send a message to the other thread's connection manager along one of the async channels shown in figure 4.a. The connection manager can then have the corresponding relay connection enqueue a cell directly on the circuit half.
Ok!
Thanks for the answers! David
thanks a lot for this information
On Tue, Jul 6, 2021 at 11:53 PM Ian Goldberg iang@uwaterloo.ca wrote:
Hello tor-dev,
Steve Engler (currently part of the Shadow team) and I have a paper of possible interest to appear at ARES 2021 next month.
It's a design of a multi-threaded relay architecture, of possible particular interest when considering relay support in arti, for example (though the proof-of-concept implementation is based on the usual C codebase).
If you're interested, here are previews of:
Paper: https://cs.uwaterloo.ca/~iang/pubs/mttor-ares21.pdf Video: https://www.youtube.com/watch?v=41a6nLUJye8 Code: https://git-crysp.uwaterloo.ca/sengler/tor-parallel-relay-conn https://git-crysp.uwaterloo.ca/sengler/relay-throughput-testing
-- Ian Goldberg Canada Research Chair in Privacy Enhancing Technologies Professor, Cheriton School of Computer Science University of Waterloo _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev