Linus Nordberg and I are preparing a submission for FOCI about the special way we run tor on the Snowflake bridge. We run many tor processes with the same identity and onion keys, because otherwise tor being limited to one CPU would be the main bottleneck.
I'm writing to fact-check a claim about Arti and how we hope the current complicated procedure will not be needed in the future:
The first and most important bottleneck to overcome is the single-threaded nature of the Tor implementation.² A single Tor process is limited to one CPU core: once Tor hits 100% CPU, the performance of the bridge is capped, no matter the speed of the network connection or the number of CPU cores
²We expect that Arti, the in-progress reimplementation of Tor, will be natively multi-threaded, and remove this primary complication.
Is this correct? Is a relay that uses a future version of Arti expected to be able to use all its CPU resources?
Here is the a draft of the submission. If you have any comments, our submission deadline is 2023-03-15.
https://www.bamsoftware.com/papers/pt-bridge-hiperf/pt-bridge-hiperf.2023030... https://www.bamsoftware.com/papers/pt-bridge-hiperf/pt-bridge-hiperf.2023030...
On Tue, Mar 7, 2023 at 4:07 PM David Fifield david@bamsoftware.com wrote:
Linus Nordberg and I are preparing a submission for FOCI about the special way we run tor on the Snowflake bridge. We run many tor processes with the same identity and onion keys, because otherwise tor being limited to one CPU would be the main bottleneck.
I'm writing to fact-check a claim about Arti and how we hope the current complicated procedure will not be needed in the future:
The first and most important bottleneck to overcome is the single-threaded nature of the Tor implementation.² A single Tor process is limited to one CPU core: once Tor hits 100% CPU, the performance of the bridge is capped, no matter the speed of the network connection or the number of CPU cores ²We expect that Arti, the in-progress reimplementation of Tor, will be natively multi-threaded, and remove this primary complication.
Is this correct? Is a relay that uses a future version of Arti expected to be able to use all its CPU resources?
Yes, that's right. There is no "main thread" in Arti; it's written in an asynchronous task-oriented style, and we use a runtime written in Rust (Tokio by default, but we abstract them so you can swap them out) to schedule tasks across multiple threads.
That said, we have spent approximately zero time so far tuning this multithreading, and I'd be surprised if it scales perfectly the first time. Our first opportunity to show off here will be when we get onion service support later in this year.
cheers,
On Wed, Mar 08, 2023 at 06:30:42AM -0500, Nick Mathewson wrote:
That said, we have spent approximately zero time so far tuning this multithreading, and I'd be surprised if it scales perfectly the first time.
Somewhat related: Rust programs generally tend to have a better performance than their C pedants if they really want to. This is mainly due to the fact, that crazy thread optimization can be done securly.
A prominent example is [fd](https://github.com/sharkdp/fd), which uses multiple threads to traverse the file system, thereby being around 50% faster than find(1). Just the imagination of a parallel FS access in C gives me nightmares. :^)
-- Emil
On Wed, Mar 08, 2023 at 06:30:42AM -0500, Nick Mathewson wrote:
On Tue, Mar 7, 2023 at 4:07 PM David Fifield <[1]david@bamsoftware.com> wrote:
Linus Nordberg and I are preparing a submission for FOCI about the special way we run tor on the Snowflake bridge. We run many tor processes with the same identity and onion keys, because otherwise tor being limited to one CPU would be the main bottleneck. I'm writing to fact-check a claim about Arti and how we hope the current complicated procedure will not be needed in the future: The first and most important bottleneck to overcome is the single-threaded nature of the Tor implementation.² A single Tor process is limited to one CPU core: once Tor hits 100% CPU, the performance of the bridge is capped, no matter the speed of the network connection or the number of CPU cores ²We expect that Arti, the in-progress reimplementation of Tor, will be natively multi-threaded, and remove this primary complication. Is this correct? Is a relay that uses a future version of Arti expected to be able to use all its CPU resources?
Yes, that's right. There is no "main thread" in Arti; it's written in an asynchronous task-oriented style, and we use a runtime written in Rust (Tokio by default, but we abstract them so you can swap them out) to schedule tasks across multiple threads.
That said, we have spent approximately zero time so far tuning this multithreading, and I'd be surprised if it scales perfectly the first time. Our first opportunity to show off here will be when we get onion service support later in this year.
Thank you, Nick.