After the blocking of Tor in Russia in December 2022, the number of Snowflake users rapidly increased. Eventually the tor process became the limiting factor for performance, using all of one CPU core.
In a thread on tor-relays, we worked out a design where we run multiple instances of tor on the same host, all with the same identity keys, in order to effectively use all the server's CPU resources. It's running on the live bridge now, and as a result the bridge's bandwidth use has roughly doubled.
Design thread https://forum.torproject.net/t/tor-relays-how-to-reduce-tor-cpu-load-on-a-si... Installation instructions https://gitlab.torproject.org/tpo/anti-censorship/team/-/wikis/Survival-Guid...
Two details came up that are awkward to deal with. We have workaround for them, but they could benefit from support from core tor. They are:
1. Provide a way to disable onion key rotation, or configure a custom onion key. 2. Provide a way to set a specific authentication cookie for ExtORPort SAFE_COOKIE authentication, or a new authentication type that doesn't require credentials that change whenever tor is restarted.
I should mention that, apart from the load-balancing design we settled on, we have brainstormed some other options for scaling the Snowflake bridge or bridges. At this point, none of these ideas can immediately be put into practice, because there's no way to tell tor "connect to one of these bridges at random, but only one," or "connect to this bridge, but accept any of these fingerprints." https://bugs.torproject.org/tpo/anti-censorship/pluggable-transports/snowfla...
# Disable onion key rotation
Multiple tor instances with the same identity keys will work fine for the first 5 weeks (onion-key-rotation-days + onion-key-grace-period-days), but after that time the instances will have independently rotated their onion keys, and clients will have connection failures unless the load balancer happens to connect them to the instance whose descriptor they have cached. This post investigates what the failure looks like: https://lists.torproject.org/pipermail/tor-relays/2022-January/020238.html
Examples of what could work here are a torrc option to set onion-key-rotation-days to a large value, an option to disable onion key rotation, an option to set a certain named file as the onion key.
What we are doing now is a bit of a nasty hack: we create a directory named secret_onion_key.old, so that a failed replace_file causes an early exit from rotate_onion_key. https://gitweb.torproject.org/tor.git/tree/src/feature/relay/router.c?h=tor-... There are a few apparently benign side effects, like tor trying to rebuild its descriptor every hour, but it's effective at stopping onion key rotation. https://lists.torproject.org/pipermail/tor-relays/2022-January/020277.html
# Stable ExtORPort authentication
ExtORPort (extended ORPort) is a protocol that lets a pluggable transport attach transport and client IP metadata to a connection, for metrics purposes. In order to connect to the ExtORPort, the pluggable transport needs to authenticate using a scheme like ControlPort authentication. https://gitweb.torproject.org/torspec.git/tree/proposals/217-ext-orport-auth... tor generates a secret auth cookie and stores it in a file. When the pluggable transport process is managed by tor, tor tells the pluggable transport where to find the file by setting the TOR_PT_AUTH_COOKIE_FILE environment variable.
In the load-balanced configuration, the pluggable transport server (snowflake-server) is not run and managed by tor. It is an independent daemon, so it doesn't have access to TOR_PT_AUTH_COOKIE_FILE (which anyway would be a different path for every tor instance). The bigger problem is that tor regenerates the auth cookie and rewrites the file on every restart. All the tor instances have different cookies, and snowflake-server does not know which it will get through the load balancer, so it doesn't know what cookie to use.
Examples of what would work here are an option to use a certain file as the auth cookie, an option to leave the auth cookie file alone if it already exists, or a new ExtORPort authentication type that can use the same credentials across multiple instances.
What we're doing now is using a shim program, extor-static-cookie, which presents an ExtORPort interface with a static auth cookie for snowflake-server to authenticate with, then re-authenticates to the ExtORPort of its respective instance of tor, using that instance's auth cookie. https://lists.torproject.org/pipermail/tor-relays/2022-January/020183.html
Linus Nordberg and I will have a paper at FOCI this summer on the special way we run tor on the Snowflake bridges to permit better scaling. It discusses the two workarounds from the post below, namely a shim for predictable ExtORPort auth, and disabling onion key rotation. This setup has been in place on Snowflake bridges since January 2022. About 2.5% of Tor users (all users, not just bridge users) access Tor using Snowflake, so it's not a niche use case even if it's just us.
One of the reviewers asked if there was a chance changes might be made in tor that make our workarounds unnecessary. Is there anything to say to this question? Might tor get a feature to control ExtORPort authentication or onion key rotation, or is something that's planned to stay as it is in favor of Arti? (Arti will probably remove the need for the load-balanced multi-tor configuration, which will also remove the need to disable onion key rotation, though better control over ExtORPort auth could still be useful for running server PTs that are not child processes of arti.)
Here is a draft of the paper, the relevant Sections are 3.1 and 3.2. https://www.bamsoftware.com/papers/pt-bridge-hiperf/pt-bridge-hiperf.2023030... https://www.bamsoftware.com/papers/pt-bridge-hiperf/pt-bridge-hiperf.2023030...
On Mon, Feb 07, 2022 at 07:26:37PM -0700, David Fifield wrote:
After the blocking of Tor in Russia in December 2022, the number of Snowflake users rapidly increased. Eventually the tor process became the limiting factor for performance, using all of one CPU core.
In a thread on tor-relays, we worked out a design where we run multiple instances of tor on the same host, all with the same identity keys, in order to effectively use all the server's CPU resources. It's running on the live bridge now, and as a result the bridge's bandwidth use has roughly doubled.
Design thread https://forum.torproject.net/t/tor-relays-how-to-reduce-tor-cpu-load-on-a-si... Installation instructions https://gitlab.torproject.org/tpo/anti-censorship/team/-/wikis/Survival-Guid...
Two details came up that are awkward to deal with. We have workaround for them, but they could benefit from support from core tor. They are:
- Provide a way to disable onion key rotation, or configure a custom onion key.
- Provide a way to set a specific authentication cookie for ExtORPort SAFE_COOKIE authentication, or a new authentication type that doesn't require credentials that change whenever tor is restarted.
I should mention that, apart from the load-balancing design we settled on, we have brainstormed some other options for scaling the Snowflake bridge or bridges. At this point, none of these ideas can immediately be put into practice, because there's no way to tell tor "connect to one of these bridges at random, but only one," or "connect to this bridge, but accept any of these fingerprints." https://bugs.torproject.org/tpo/anti-censorship/pluggable-transports/snowfla...
# Disable onion key rotation
Multiple tor instances with the same identity keys will work fine for the first 5 weeks (onion-key-rotation-days + onion-key-grace-period-days), but after that time the instances will have independently rotated their onion keys, and clients will have connection failures unless the load balancer happens to connect them to the instance whose descriptor they have cached. This post investigates what the failure looks like: https://lists.torproject.org/pipermail/tor-relays/2022-January/020238.html
Examples of what could work here are a torrc option to set onion-key-rotation-days to a large value, an option to disable onion key rotation, an option to set a certain named file as the onion key.
What we are doing now is a bit of a nasty hack: we create a directory named secret_onion_key.old, so that a failed replace_file causes an early exit from rotate_onion_key. https://gitweb.torproject.org/tor.git/tree/src/feature/relay/router.c?h=tor-... There are a few apparently benign side effects, like tor trying to rebuild its descriptor every hour, but it's effective at stopping onion key rotation. https://lists.torproject.org/pipermail/tor-relays/2022-January/020277.html
# Stable ExtORPort authentication
ExtORPort (extended ORPort) is a protocol that lets a pluggable transport attach transport and client IP metadata to a connection, for metrics purposes. In order to connect to the ExtORPort, the pluggable transport needs to authenticate using a scheme like ControlPort authentication. https://gitweb.torproject.org/torspec.git/tree/proposals/217-ext-orport-auth... tor generates a secret auth cookie and stores it in a file. When the pluggable transport process is managed by tor, tor tells the pluggable transport where to find the file by setting the TOR_PT_AUTH_COOKIE_FILE environment variable.
In the load-balanced configuration, the pluggable transport server (snowflake-server) is not run and managed by tor. It is an independent daemon, so it doesn't have access to TOR_PT_AUTH_COOKIE_FILE (which anyway would be a different path for every tor instance). The bigger problem is that tor regenerates the auth cookie and rewrites the file on every restart. All the tor instances have different cookies, and snowflake-server does not know which it will get through the load balancer, so it doesn't know what cookie to use.
Examples of what would work here are an option to use a certain file as the auth cookie, an option to leave the auth cookie file alone if it already exists, or a new ExtORPort authentication type that can use the same credentials across multiple instances.
What we're doing now is using a shim program, extor-static-cookie, which presents an ExtORPort interface with a static auth cookie for snowflake-server to authenticate with, then re-authenticates to the ExtORPort of its respective instance of tor, using that instance's auth cookie. https://lists.torproject.org/pipermail/tor-relays/2022-January/020183.html
On Wed, May 24, 2023 at 9:10 PM David Fifield david@bamsoftware.com wrote:
Linus Nordberg and I will have a paper at FOCI this summer on the special way we run tor on the Snowflake bridges to permit better scaling. It discusses the two workarounds from the post below, namely a shim for predictable ExtORPort auth, and disabling onion key rotation. This setup has been in place on Snowflake bridges since January 2022. About 2.5% of Tor users (all users, not just bridge users) access Tor using Snowflake, so it's not a niche use case even if it's just us.
One of the reviewers asked if there was a chance changes might be made in tor that make our workarounds unnecessary. Is there anything to say to this question? Might tor get a feature to control ExtORPort authentication or onion key rotation, or is something that's planned to stay as it is in favor of Arti? (Arti will probably remove the need for the load-balanced multi-tor configuration, which will also remove the need to disable onion key rotation, though better control over ExtORPort auth could still be useful for running server PTs that are not child processes of arti.)
I can't speak definitively about the plans for the C tor implementation. My understanding is that they are hoping to keep new features there to a minimum, so I don't know if this would be something they can do.
In Arti, we're a lot more open to that kind of stuff. We're not planning to start a relay implementation till next year at the earliest, but please remind us when we start working on ExtOrPort and related things and we can try to figure out a design that works.