How to reduce tor CPU load on a single bridge?

List overview All Threads
Download

newer

older

Setting a relay as an HSDirectory?

Status of Bug #7349 - Obfsbridges...

David Fifield

27 Dec 2021 27 Dec '21

7:05 p.m.

The main Snowflake bridge (https://metrics.torproject.org/rs.html#details/5481936581E23D2D178105D44DB69...) is starting to become overloaded, because of a recent substantial increase in users. I think the host has sufficient CPU and memory headroom, and pluggable transport process (that receies WebSocket connections and forwards them to tor) is scaling across multiple cores. But the tor process is constantly using 100% of one CPU core, and I suspect that the tor process has become a bottleneck.

Here are issues about a recent CPU upgrade on the bridge, and observations about the proportion of CPU used by different processes: https://bugs.torproject.org/tpo/anti-censorship/pluggable-transports/snowfla... https://bugs.torproject.org/tpo/anti-censorship/pluggable-transports/snowfla...

I have the impression that tor cannot use more than one CPU core—is that correct? If so, what can be done to permit a bridge to scale beyond 1×100% CPU? We can fairly easily scale the Snowflake-specific components around the tor process, but ultimately, a tor client process expects to connect to a bridge having a certain fingerprint, and that is the part I don't know how to easily scale.

* Surely it's not possible to run multiple instances of tor with the same fingerprint? Or is it? Does the answer change if all instances are on the same IP address? If the OR ports are never used? * OnionBalance does not help with this, correct? * Are there configuration options we could set to increase parallelism? * Is migrating to a host with better single-core performance the only immediate option for scaling the tor process?

Separate from the topic of scaling a single bridge, here is a past issue with thoughts on scaling beyond one bridge. it looks as though there are not ways to do it that do not require changes to the way tor handles its Bridge lines. https://bugs.torproject.org/tpo/anti-censorship/pluggable-transports/snowfla...

* Using multiple snowflake Bridge lines does not work well, despite that we could arrange to have the Snowflake proxy connect the user to the expected bridge, because tor will try to connect to all of them, not choose one at random. * Removing the fingerprint from the snowflake Bridge line in Tor Browser would permit the Snowflake proxies to round-robin clients over several bridges, but then the first hop would be unauthenticated (at the Tor layer). It would be nice if it were possible to specify a small set of permitted bridge fingerprints.

Show replies by date

Roger Dingledine

27 Dec 27 Dec

9 p.m.

On Mon, Dec 27, 2021 at 12:05:26PM -0700, David Fifield wrote:

...

I have the impression that tor cannot use more than one CPU core???is that correct? If so, what can be done to permit a bridge to scale beyond 1×100% CPU? We can fairly easily scale the Snowflake-specific components around the tor process, but ultimately, a tor client process expects to connect to a bridge having a certain fingerprint, and that is the part I don't know how to easily scale.

Surely it's not possible to run multiple instances of tor with the same fingerprint? Or is it? Does the answer change if all instances are on the same IP address? If the OR ports are never used?

Good timing -- Cecylia pointed out the higher load on Flakey a few days ago, and I've been meaning to post a suggestion somewhere. You actually *can* run more than one bridge with the same fingerprint. Just set it up in two places, with the same identity key, and then whichever one the client connects to, the client will be satisfied that it's reaching the right bridge.

There are two catches to the idea:

(A) Even though the bridges will have the same identity key, they won't have the same circuit-level onion key, so it will be smart to "pin" each client to a single bridge instance -- so when they fetch the bridge descriptor, which specifies the onion key, they will continue to use that bridge instance with that onion key. Snowflake in particular might also want to pin clients to specific bridges because of the KCP state.

(Another option, instead of pinning clients to specific instances, would be to try to share state among all the bridges on the backend, e.g. so they use the same onion key, can resume the same KCP sessions, etc. This option seems hard.)

(B) It's been a long time since anybody tried this, so there might be surprises. :) But it *should* work, so if there are surprises, we should try to fix them.

This overall idea is similar to the "router twins" idea from the distant distant past: https://lists.torproject.org/pipermail/tor-dev/2002-July/001122.html https://lists.torproject.org/pipermail/tor-commits/2003-October/024388.html https://lists.torproject.org/pipermail/tor-dev/2003-August/000236.html

...

Removing the fingerprint from the snowflake Bridge line in Tor Browser would permit the Snowflake proxies to round-robin clients over several bridges, but then the first hop would be unauthenticated (at the Tor layer). It would be nice if it were possible to specify a small set of permitted bridge fingerprints.

This approach would also require clients to pin to a particular bridge, right? Because of the different state that each bridge will have?

--Roger

Gary C. New

9:38 p.m.

David/Roger: Search the tor-relay mail archive for my previous responses on loadbalancing Tor Relays, which I've been successfully doing for the past 6 months with Nginx (it's possible to do with HAProxy as well). I haven't had time to implement it with a Tor Bridge, but I assume it will be very similar. Keep in mind it's critical to configure each Tor instance to use the same DirectoryAuthority and to disable the upstream timeouts on Nginx/HAProxy. Happy Tor Loadbalancing! Respectfully,

Gary P.S. I believe there's a torrc config option to specify which cpu core a given Tor instance should use, too.

On Monday, December 27, 2021, 2:00:50 PM MST, Roger Dingledine arma@torproject.org wrote:

On Mon, Dec 27, 2021 at 12:05:26PM -0700, David Fifield wrote:

...

I have the impression that tor cannot use more than one CPU core???is that correct? If so, what can be done to permit a bridge to scale beyond 1×100% CPU? We can fairly easily scale the Snowflake-specific components around the tor process, but ultimately, a tor client process expects to connect to a bridge having a certain fingerprint, and that is the part I don't know how to easily scale.

Surely it's not possible to run multiple instances of tor with the

same fingerprint? Or is it? Does the answer change if all instances are on the same IP address? If the OR ports are never used?

There are two catches to the idea:

(B) It's been a long time since anybody tried this, so there might be surprises. :) But it *should* work, so if there are surprises, we should try to fix them.

...

Removing the fingerprint from the snowflake Bridge line in Tor Browser

would permit the Snowflake proxies to round-robin clients over several bridges, but then the first hop would be unauthenticated (at the Tor layer). It would be nice if it were possible to specify a small set of permitted bridge fingerprints.

This approach would also require clients to pin to a particular bridge, right? Because of the different state that each bridge will have?

--Roger

_______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

Gary C. New

10:08 p.m.

BTW... I just fact-checked my post-script and the cpu affinity configuration I was thinking of is for Nginx (not Tor). Tor should consider adding a cpu affinity configuration option. What happens if you configure additional Tor instances on the same machine (my Tor instances are on different machines) and start them up? Do they bind to a different or the same cpu core? Respectfully,

Gary

On Monday, December 27, 2021, 2:44:59 PM MST, Gary C. New via tor-relays tor-relays@lists.torproject.org wrote:

Gary P.S. I believe there's a torrc config option to specify which cpu core a given Tor instance should use, too.

On Monday, December 27, 2021, 2:00:50 PM MST, Roger Dingledine arma@torproject.org wrote:

On Mon, Dec 27, 2021 at 12:05:26PM -0700, David Fifield wrote:

...

I have the impression that tor cannot use more than one CPU core???is that correct? If so, what can be done to permit a bridge to scale beyond 1×100% CPU? We can fairly easily scale the Snowflake-specific components around the tor process, but ultimately, a tor client process expects to connect to a bridge having a certain fingerprint, and that is the part I don't know how to easily scale.

Surely it's not possible to run multiple instances of tor with the

same fingerprint? Or is it? Does the answer change if all instances are on the same IP address? If the OR ports are never used?

There are two catches to the idea:

(B) It's been a long time since anybody tried this, so there might be surprises. :) But it *should* work, so if there are surprises, we should try to fix them.

...

Removing the fingerprint from the snowflake Bridge line in Tor Browser

would permit the Snowflake proxies to round-robin clients over several bridges, but then the first hop would be unauthenticated (at the Tor layer). It would be nice if it were possible to specify a small set of permitted bridge fingerprints.

This approach would also require clients to pin to a particular bridge, right? Because of the different state that each bridge will have?

--Roger

_______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

abuse＠lokodlare.com

28 Dec 28 Dec

8:39 p.m.

Hi Gary,

why would that be needed? Linux has a pretty good thread scheduler imo and will shuffle loads around as needed.

Even Windows' thread scheduler is quite decent these days and tools like "Process Lasso" exist if additional fine tuning is needed.

Attached is one of my servers running multiple tor instances on a 12/24C platform. The load is spread quite evenly across all cores.

Best Regards, Kristian

Dec 27, 2021, 22:08 by tor-relays@lists.torproject.org:

...

BTW... I just fact-checked my post-script and the cpu affinity configuration I was thinking of is for Nginx (not Tor). Tor should consider adding a cpu affinity configuration option. What happens if you configure additional Tor instances on the same machine (my Tor instances are on different machines) and start them up? Do they bind to a different or the same cpu core?

Respectfully,

Gary

On Monday, December 27, 2021, 2:44:59 PM MST, Gary C. New via tor-relays tor-relays@lists.torproject.org wrote:

David/Roger:

Search the tor-relay mail archive for my previous responses on loadbalancing Tor Relays, which I've been successfully doing for the past 6 months with Nginx (it's possible to do with HAProxy as well). I haven't had time to implement it with a Tor Bridge, but I assume it will be very similar. Keep in mind it's critical to configure each Tor instance to use the same DirectoryAuthority and to disable the upstream timeouts on Nginx/HAProxy.

Happy Tor Loadbalancing!

Respectfully,

Gary

P.S. I believe there's a torrc config option to specify which cpu core a given Tor instance should use, too.

On Monday, December 27, 2021, 2:00:50 PM MST, Roger Dingledine arma@torproject.org wrote:

On Mon, Dec 27, 2021 at 12:05:26PM -0700, David Fifield wrote:

...
I have the impression that tor cannot use more than one CPU core???is that correct? If so, what can be done to permit a bridge to scale beyond 1×100% CPU? We can fairly easily scale the Snowflake-specific components around the tor process, but ultimately, a tor client process expects to connect to a bridge having a certain fingerprint, and that is the part I don't know how to easily scale.

Surely it's not possible to run multiple instances of tor with the

same fingerprint? Or is it? Does the answer change if all instances are on the same IP address? If the OR ports are never used?

Good timing -- Cecylia pointed out the higher load on Flakey a few days ago, and I've been meaning to post a suggestion somewhere. You actually *can* run more than one bridge with the same fingerprint. Just set it up in two places, with the same identity key, and then whichever one the client connects to, the client will be satisfied that it's reaching the right bridge.

There are two catches to the idea:

(A) Even though the bridges will have the same identity key, they won't have the same circuit-level onion key, so it will be smart to "pin" each client to a single bridge instance -- so when they fetch the bridge descriptor, which specifies the onion key, they will continue to use that bridge instance with that onion key. Snowflake in particular might also want to pin clients to specific bridges because of the KCP state.

(Another option, instead of pinning clients to specific instances, would be to try to share state among all the bridges on the backend, e.g. so they use the same onion key, can resume the same KCP sessions, etc. This option seems hard.)

(B) It's been a long time since anybody tried this, so there might be surprises. :) But it *should* work, so if there are surprises, we should try to fix them.

This overall idea is similar to the "router twins" idea from the distant distant past: https://lists.torproject.org/pipermail/tor-dev/2002-July/001122.html https://lists.torproject.org/pipermail/tor-commits/2003-October/024388.html https://lists.torproject.org/pipermail/tor-dev/2003-August/000236.html

...

Removing the fingerprint from the snowflake Bridge line in Tor Browser

would permit the Snowflake proxies to round-robin clients over several bridges, but then the first hop would be unauthenticated (at the Tor layer). It would be nice if it were possible to specify a small set of permitted bridge fingerprints.

This approach would also require clients to pin to a particular bridge, right? Because of the different state that each bridge will have?

--Roger

tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

Gary C. New

29 Dec 29 Dec

1:46 a.m.

Hi Kristian, Thanks for the screenshot. Nice Machine! Not everyone is as fortunate as you when it comes to resources for their Tor deployments. While a cpu affinity option isn't high on the priority list, as you point out, many operating systems do a decent job of load management and there are third-party options available for cpu affinity, but it might be helpful for some to have an application layer option to tune their implementations natively. As an aside... Presently, are you using a single, public address with many ports or many, public addresses with a single port for your Tor deployments? Have you ever considered putting all those Tor instances behind a single, public address:port (fingerprint) to create one super bridge/relay? I'm just wondering if it makes sense to conserve and rotate through public address space to stay ahead of the blacklisting curve? Also... Do you mind disclosing what all your screen instances are for? Are you running your Tor instances manually and not in daemon mode? "Inquiring minds want to know." 😁 As always... It is great to engage in dialogue with you. Respectfully,

Gary

On Tuesday, December 28, 2021, 1:39:31 PM MST, abuse@lokodlare.com abuse@lokodlare.com wrote:

Hi Gary,

why would that be needed? Linux has a pretty good thread scheduler imo and will shuffle loads around as needed.

Even Windows' thread scheduler is quite decent these days and tools like "Process Lasso" exist if additional fine tuning is needed.

Attached is one of my servers running multiple tor instances on a 12/24C platform. The load is spread quite evenly across all cores.

Best Regards, Kristian

Dec 27, 2021, 22:08 by tor-relays@lists.torproject.org:

Respectfully,

Gary

On Monday, December 27, 2021, 2:44:59 PM MST, Gary C. New via tor-relays tor-relays@lists.torproject.org wrote:

David/Roger:

Search the tor-relay mail archive for my previous responses on loadbalancing Tor Relays, which I've been successfully doing for the past 6 months with Nginx (it's possible to do with HAProxy as well). I haven't had time to implement it with a Tor Bridge, but I assume it will be very similar. Keep in mind it's critical to configure each Tor instance to use the same DirectoryAuthority and to disable the upstream timeouts on Nginx/HAProxy.

Happy Tor Loadbalancing!

Respectfully,

Gary

P.S. I believe there's a torrc config option to specify which cpu core a given Tor instance should use, too.

On Monday, December 27, 2021, 2:00:50 PM MST, Roger Dingledine arma@torproject.org wrote:

On Mon, Dec 27, 2021 at 12:05:26PM -0700, David Fifield wrote:

...

I have the impression that tor cannot use more than one CPU core???is that correct? If so, what can be done to permit a bridge to scale beyond 1×100% CPU? We can fairly easily scale the Snowflake-specific components around the tor process, but ultimately, a tor client process expects to connect to a bridge having a certain fingerprint, and that is the part I don't know how to easily scale.

Surely it's not possible to run multiple instances of tor with the

same fingerprint? Or is it? Does the answer change if all instances are on the same IP address? If the OR ports are never used?

There are two catches to the idea:

(B) It's been a long time since anybody tried this, so there might be surprises. :) But it *should* work, so if there are surprises, we should try to fix them.

...

Removing the fingerprint from the snowflake Bridge line in Tor Browser

would permit the Snowflake proxies to round-robin clients over several bridges, but then the first hop would be unauthenticated (at the Tor layer). It would be nice if it were possible to specify a small set of permitted bridge fingerprints.

This approach would also require clients to pin to a particular bridge, right? Because of the different state that each bridge will have?

--Roger

abuse＠lokodlare.com

10:20 a.m.

Hi Gary,

thanks!

...

As an aside... Presently, are you using a single, public address with many ports or many, public addresses with a single port for your Tor deployments? Have you ever considered putting all those Tor instances behind a single, public address:port (fingerprint) to create one super bridge/relay? I'm just wondering if it makes sense to conserve and rotate through public address space to stay ahead of the blacklisting curve?

Almost all of my dedicated servers have multiple IPv4 addresses, and you can have up to two tor relays per IPv4. So, the answer is multiple IPs and on multiple different ports. A "super relay" still has no real merit for me. I am not really concerned about my IPs being blacklisted as these are normal relays, not bridges.

What I am doing now for new servers is running them for a week or two as bridges and only then I move them over to hosting relays. In the past I have not seen a lot of traffic on bridges, but this has changed very recently. I saw 200+ unique users in the past 6 hours on one of my new bridges yesterday with close to 100 Mbit/s of consistent traffic. There appears to be an increased need right now, which I am happy to tend to.

...

Also... Do you mind disclosing what all your screen instances are for? Are you running your Tor instances manually and not in daemon mode? "Inquiring minds want to know." 😁

In that area I am a little bit old school, and I am indeed running them manually for now. I don’t think there is a technical reason for it. It’s just me being me.

Best Regards, Kristian

Dec 29, 2021, 01:46 by tor-relays@lists.torproject.org:

...

Hi Kristian,

Thanks for the screenshot. Nice Machine! Not everyone is as fortunate as you when it comes to resources for their Tor deployments. While a cpu affinity option isn't high on the priority list, as you point out, many operating systems do a decent job of load management and there are third-party options available for cpu affinity, but it might be helpful for some to have an application layer option to tune their implementations natively.

As an aside... > Presently, are you using a single, public address with many ports or many, public addresses with a single port for your Tor deployments? > Have you ever considered putting all those Tor instances behind a single, public address:port (fingerprint) to create one super bridge/relay? I'm just wondering if it makes sense to conserve and rotate through public address space to stay ahead of the blacklisting curve?

Also... Do you mind disclosing what all your screen instances are for? Are you running your Tor instances manually and not in daemon mode? "Inquiring minds want to know." 😁

As always... It is great to engage in dialogue with you.

Respectfully,

Gary

On Tuesday, December 28, 2021, 1:39:31 PM MST, abuse@lokodlare.com abuse@lokodlare.com wrote:

Hi Gary,

why would that be needed? Linux has a pretty good thread scheduler imo and will shuffle loads around as needed.

Even Windows' thread scheduler is quite decent these days and tools like "Process Lasso" exist if additional fine tuning is needed.

Attached is one of my servers running multiple tor instances on a 12/24C platform. The load is spread quite evenly across all cores.

Best Regards, Kristian

Dec 27, 2021, 22:08 by tor-relays@lists.torproject.org:

...
BTW... I just fact-checked my post-script and the cpu affinity configuration I was thinking of is for Nginx (not Tor). Tor should consider adding a cpu affinity configuration option. What happens if you configure additional Tor instances on the same machine (my Tor instances are on different machines) and start them up? Do they bind to a different or the same cpu core?

Respectfully,

Gary

On Monday, December 27, 2021, 2:44:59 PM MST, Gary C. New via tor-relays tor-relays@lists.torproject.org wrote:

David/Roger:

Search the tor-relay mail archive for my previous responses on loadbalancing Tor Relays, which I've been successfully doing for the past 6 months with Nginx (it's possible to do with HAProxy as well). I haven't had time to implement it with a Tor Bridge, but I assume it will be very similar. Keep in mind it's critical to configure each Tor instance to use the same DirectoryAuthority and to disable the upstream timeouts on Nginx/HAProxy.

Happy Tor Loadbalancing!

Respectfully,

Gary

P.S. I believe there's a torrc config option to specify which cpu core a given Tor instance should use, too.

On Monday, December 27, 2021, 2:00:50 PM MST, Roger Dingledine arma@torproject.org wrote:

On Mon, Dec 27, 2021 at 12:05:26PM -0700, David Fifield wrote:

...
I have the impression that tor cannot use more than one CPU core???is that correct? If so, what can be done to permit a bridge to scale beyond 1×100% CPU? We can fairly easily scale the Snowflake-specific components around the tor process, but ultimately, a tor client process expects to connect to a bridge having a certain fingerprint, and that is the part I don't know how to easily scale.

Surely it's not possible to run multiple instances of tor with the

same fingerprint? Or is it? Does the answer change if all instances are on the same IP address? If the OR ports are never used?

Good timing -- Cecylia pointed out the higher load on Flakey a few days ago, and I've been meaning to post a suggestion somewhere. You actually *can* run more than one bridge with the same fingerprint. Just set it up in two places, with the same identity key, and then whichever one the client connects to, the client will be satisfied that it's reaching the right bridge.

There are two catches to the idea:

(A) Even though the bridges will have the same identity key, they won't have the same circuit-level onion key, so it will be smart to "pin" each client to a single bridge instance -- so when they fetch the bridge descriptor, which specifies the onion key, they will continue to use that bridge instance with that onion key. Snowflake in particular might also want to pin clients to specific bridges because of the KCP state.

(Another option, instead of pinning clients to specific instances, would be to try to share state among all the bridges on the backend, e.g. so they use the same onion key, can resume the same KCP sessions, etc. This option seems hard.)

(B) It's been a long time since anybody tried this, so there might be surprises. :) But it *should* work, so if there are surprises, we should try to fix them.

This overall idea is similar to the "router twins" idea from the distant distant past: https://lists.torproject.org/pipermail/tor-dev/2002-July/001122.html https://lists.torproject.org/pipermail/tor-commits/2003-October/024388.html https://lists.torproject.org/pipermail/tor-dev/2003-August/000236.html

...

Removing the fingerprint from the snowflake Bridge line in Tor Browser

would permit the Snowflake proxies to round-robin clients over several bridges, but then the first hop would be unauthenticated (at the Tor layer). It would be nice if it were possible to specify a small set of permitted bridge fingerprints.

This approach would also require clients to pin to a particular bridge, right? Because of the different state that each bridge will have?

--Roger

tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

Gary C. New

8 Jan 8 Jan

9:28 p.m.

Kristian,

...

I am not really concerned about my IPs being blacklisted as these are normal relays, not bridges.

I suppose if you have the address space and are running your relays in a server environment--it's your prerogative. In my case, I'm running my super relay, from home, with limited address space, so it is more suited to my needs.

...

In that area I am a little bit old school, and I am indeed running them manually for now. I don’t think there is a technical reason for it. It’s just me being me.

I'm a proponent of individuality. Keep being you.

Respectfully,

Gary On Wednesday, December 29, 2021, 03:32:55 AM MST, abuse--- via tor-relays tor-relays@lists.torproject.org wrote:

Hi Gary,

thanks!

...

As an aside... Presently, are you using a single, public address with many ports or many, public addresses with a single port for your Tor deployments? Have you ever considered putting all those Tor instances behind a single, public address:port (fingerprint) to create one super bridge/relay? I'm just wondering if it makes sense to conserve and rotate through public address space to stay ahead of the blacklisting curve?

...

Also... Do you mind disclosing what all your screen instances are for? Are you running your Tor instances manually and not in daemon mode? "Inquiring minds want to know." 😁

In that area I am a little bit old school, and I am indeed running them manually for now. I don’t think there is a technical reason for it. It’s just me being me.

Best Regards, Kristian

Dec 29, 2021, 01:46 by tor-relays@lists.torproject.org:

Hi Kristian,

Thanks for the screenshot. Nice Machine! Not everyone is as fortunate as you when it comes to resources for their Tor deployments. While a cpu affinity option isn't high on the priority list, as you point out, many operating systems do a decent job of load management and there are third-party options available for cpu affinity, but it might be helpful for some to have an application layer option to tune their implementations natively.

As an aside... Presently, are you using a single, public address with many ports or many, public addresses with a single port for your Tor deployments? Have you ever considered putting all those Tor instances behind a single, public address:port (fingerprint) to create one super bridge/relay? I'm just wondering if it makes sense to conserve and rotate through public address space to stay ahead of the blacklisting curve?

Also... Do you mind disclosing what all your screen instances are for? Are you running your Tor instances manually and not in daemon mode? "Inquiring minds want to know." 😁

As always... It is great to engage in dialogue with you.

Respectfully,

Gary

On Tuesday, December 28, 2021, 1:39:31 PM MST, abuse@lokodlare.com abuse@lokodlare.com wrote:

Hi Gary,

why would that be needed? Linux has a pretty good thread scheduler imo and will shuffle loads around as needed.

Even Windows' thread scheduler is quite decent these days and tools like "Process Lasso" exist if additional fine tuning is needed.

Attached is one of my servers running multiple tor instances on a 12/24C platform. The load is spread quite evenly across all cores.

Best Regards, Kristian

Dec 27, 2021, 22:08 by tor-relays@lists.torproject.org:

Respectfully,

Gary

On Monday, December 27, 2021, 2:44:59 PM MST, Gary C. New via tor-relays tor-relays@lists.torproject.org wrote:

David/Roger:

Happy Tor Loadbalancing!

Respectfully,

Gary

P.S. I believe there's a torrc config option to specify which cpu core a given Tor instance should use, too.

On Monday, December 27, 2021, 2:00:50 PM MST, Roger Dingledine arma@torproject.org wrote:

On Mon, Dec 27, 2021 at 12:05:26PM -0700, David Fifield wrote:

...

I have the impression that tor cannot use more than one CPU core???is that correct? If so, what can be done to permit a bridge to scale beyond 1×100% CPU? We can fairly easily scale the Snowflake-specific components around the tor process, but ultimately, a tor client process expects to connect to a bridge having a certain fingerprint, and that is the part I don't know how to easily scale.

Surely it's not possible to run multiple instances of tor with the

same fingerprint? Or is it? Does the answer change if all instances are on the same IP address? If the OR ports are never used?

There are two catches to the idea:

(B) It's been a long time since anybody tried this, so there might be surprises. :) But it *should* work, so if there are surprises, we should try to fix them.

...

Removing the fingerprint from the snowflake Bridge line in Tor Browser

would permit the Snowflake proxies to round-robin clients over several bridges, but then the first hop would be unauthenticated (at the Tor layer). It would be nice if it were possible to specify a small set of permitted bridge fingerprints.

This approach would also require clients to pin to a particular bridge, right? Because of the different state that each bridge will have?

--Roger

_______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

Roman Mamedov

29 Dec 29 Dec

11:18 a.m.

On Tue, 28 Dec 2021 21:39:27 +0100 (CET) abuse--- via tor-relays tor-relays@lists.torproject.org wrote:

...

why would that be needed? Linux has a pretty good thread scheduler imo and will shuffle loads around as needed.

To improve cache locality, as in modern CPUs L1/L2/L3 cache is partitioned into various schemes per core or core cluster. So it is benificial if the same running thread gets stuck to a particular core or set of cores, as that's where it would have all cached data still warm in cache from its previous timeslices, and is not shuffled around to other cores.

But in theory the OS scheduler should be smart enough to ensure that without manual intervention.

Also I am not sure how relevant that is for the kind of computation that Tor does. And in any case, it is a "nice to have" which usually shouldn't make a huge difference.

Ideally though, the application thread handling the incoming data should also run on the same CPU core that just handled the incoming IRQ from NIC. But that requires support across all of the application, OS, NIC hardware and driver, and very careful tuning.

-- With respect, Roman

David Fifield

31 Dec 31 Dec

5:42 a.m.

On Mon, Dec 27, 2021 at 04:00:34PM -0500, Roger Dingledine wrote:

...

On Mon, Dec 27, 2021 at 12:05:26PM -0700, David Fifield wrote:

...
I have the impression that tor cannot use more than one CPU core???is that correct? If so, what can be done to permit a bridge to scale beyond 1×100% CPU? We can fairly easily scale the Snowflake-specific components around the tor process, but ultimately, a tor client process expects to connect to a bridge having a certain fingerprint, and that is the part I don't know how to easily scale.

Surely it's not possible to run multiple instances of tor with the same fingerprint? Or is it? Does the answer change if all instances are on the same IP address? If the OR ports are never used?

Good timing -- Cecylia pointed out the higher load on Flakey a few days ago, and I've been meaning to post a suggestion somewhere. You actually *can* run more than one bridge with the same fingerprint. Just set it up in two places, with the same identity key, and then whichever one the client connects to, the client will be satisfied that it's reaching the right bridge.

Thanks for this information. I've done a test with one instance of obfs4proxy forwarding through a load balancer to two instances of tor that have the same keys, and it seems to work. It seems like this could work for Snowflake.

...

(A) Even though the bridges will have the same identity key, they won't have the same circuit-level onion key, so it will be smart to "pin" each client to a single bridge instance -- so when they fetch the bridge descriptor, which specifies the onion key, they will continue to use that bridge instance with that onion key. Snowflake in particular might also want to pin clients to specific bridges because of the KCP state.

(Another option, instead of pinning clients to specific instances, would be to try to share state among all the bridges on the backend, e.g. so they use the same onion key, can resume the same KCP sessions, etc. This option seems hard.)

Let's make a distinction between the "frontend" snowflake-server pluggable transport process, and the "backend" tor process. These don't necessarily have to be 1:1; either one could be run in multiple instances. Currently, the "backend" tor is the limiting factor, because it uses only 1 CPU core. The "frontend" snowflake-server can scale to multiple cores in a single process and is comparatively unrestrained. So I propose to keep snowflake-server as a single process, and to run multiple tor processes. That eliminates the dimension of KCP state coordination, and should last us until snowflake-server outgrows the resources of a single host.

The snowflake-server program is a managed proxy; i.e., it expects to run with certain environment variables set by a managing process, normally tor. We'll need to instead run snowflake-server apart from any single tor instance. Probably the easiest way to do that in the short term is with ptadapter (https://github.com/twisteroidambassador/ptadapter), which converts a pluggable transport into a TCP proxy, forwarding to an address you specify.

Then we can have ptadapter forward to a load balancer like haproxy. The load balancer will then round-robin over the ORPorts of the available tor instances. The tor instances can all be on the same host (run as many instances as you have CPU cores), which may or may not be the same host on which snowflake-server is running.

Currently we have this: ________________ ___ -->|snowflake-server|-->|tor| ---------------- --- (run by tor) The design I'm proposing is this: ___ .->|tor| ________________ _______ | --- -->|snowflake-server|-->|haproxy|-+->|tor| ---------------- ------- | --- (run by ptadapter) '->|tor| ---

I believe that the "pinning" of a client session to particular tor instance will work automatically by the fact that snowflake-server keeps an outgoing connection alive (i.e., through the load balancer) as long as a KCP session exists.

One complication we'll have to work out is that ptadapter doesn't have a setting for ExtORPort forwarding. ptadapter absorbs any ExtORPort information and forwards an unadorned connection onward. The idea I had to to work around this limitation is to have ptadapter, rather than execute snowflake-server directly, execute a shell script that sets TOR_PT_EXTENDED_SERVER_PORT to a hardcoded address (i.e., to haproxy) before running snowflake-server. Though, I am not sure what to do about the extended_orport_auth_cookie file, which will be different for different tor instances.

## Demo instructions

This is what I did to do a test of one instance of obfs4proxy communicating with two instances of tor that have the same keys, on Debian 11.

Install a second instance of tor and configure it as a bridge (with a different ORPort): # tor-instance-create o2 /etc/tor/instances/o2/torrc: BridgeRelay 1 PublishServerDescriptor 0 AssumeReachable 1 SocksPort 0 ORPort 127.0.0.1:9002 But before starting the second instance the first time, copy keys from the first instance: # cp -r /var/lib/tor-instances/o1/keys /var/lib/tor-instances/o2/ # chown -R _tor-o2:_tor-o2 /var/lib/tor-instances/o2/keys/ # systemctl start tor@o2

The two instances should have the same fingerprint: # cat /var/lib/tor-instances/*/fingerprint Unnamed 4808CD98E4C1D4F282DA741A860A44D755701F2F Unnamed 4808CD98E4C1D4F282DA741A860A44D755701F2F

Install haproxy and configure it to forward to the two tor instances: # apt install haproxy /etc/haproxy/haproxy.cfg: frontend tor mode tcp bind 127.0.0.1:9000 default_backend tor-o backend tor-o mode tcp server o1 127.0.0.1:9001 server o2 127.0.0.1:9002 Restart haproxy with the new configuration: # systemctl restart haproxy

Install ptadapter and configure it to listen on an external address and forward to haproxy: # apt install python3-pip # pip3 install pdadapter ptadapter.ini: [server] exec = /usr/bin/obfs4proxy state = pt_state forward = 127.0.0.1:9000 tunnels = server_obfs4 [server_obfs4] transport = obfs4 listen = [::]:443 Run ptadapter: ptadapter -S ptadapter.ini

On the client, make a torrc file with the information from the pt_state/obfs4_bridgeline.txt file created by ptadapter: UseBridges 1 SocksPort auto Bridge obfs4 172.105.3.197:443 4808CD98E4C1D4F282DA741A860A44D755701F2F cert=1SCzqyYyPh/SiXTJa9nLFxMyjWQITVCKeICME+SwxgNcTTSUQ7+vM/ghofU7oaalIRBILg iat-mode=0 ClientTransportPlugin obfs4 exec /usr/bin/obfs4proxy DataDir datadir Then run tor with the torrc: tor -f torrc

If you restart tor multiple times on the client, you can see haproxy alternating between the two backend servers (o1 and o2) in /var/log/haproxy.log: Dec 31 04:30:31 localhost haproxy[9707]: 127.0.0.1:55500 [31/Dec/2021:04:30:21.235] tor tor-o/o1 1/0/10176 11435 -- 1/1/0/0/0 0/0 Dec 31 04:30:51 localhost haproxy[9707]: 127.0.0.1:55514 [31/Dec/2021:04:30:46.925] tor tor-o/o2 1/0/4506 17682 -- 1/1/0/0/0 0/0 Dec 31 04:38:41 localhost haproxy[9707]: 127.0.0.1:55528 [31/Dec/2021:04:30:55.540] tor tor-o/o1 1/0/466049 78751 -- 1/1/0/0/0 0/0 Dec 31 05:34:52 localhost haproxy[9707]: 127.0.0.1:55594 [31/Dec/2021:05:34:50.083] tor tor-o/o2 1/0/2209 13886 -- 1/1/0/0/0 0/0

David Fifield

1 Jan 1 Jan

2:11 a.m.

On Thu, Dec 30, 2021 at 10:42:51PM -0700, David Fifield wrote:

...

One complication we'll have to work out is that ptadapter doesn't have a setting for ExtORPort forwarding. ptadapter absorbs any ExtORPort information and forwards an unadorned connection onward. The idea I had to to work around this limitation is to have ptadapter, rather than execute snowflake-server directly, execute a shell script that sets TOR_PT_EXTENDED_SERVER_PORT to a hardcoded address (i.e., to haproxy) before running snowflake-server. Though, I am not sure what to do about the extended_orport_auth_cookie file, which will be different for different tor instances.

There are a number of potential ways to deal with the complication of ExtORPort authentication, from alternative ExtORPort authentication types, to ExtORPort-aware load balancing. With a view towards deploying something in the near future, I wrote this program that enables an external pluggable transport to talk to tor's ExtORPort and authenticate as if it had an unchanging authentication cookie.

https://gitlab.torproject.org/dcf/extor-static-cookie

The difficulty with load-balancing multiple tor instances, with respect to ExtORPort, is that to authenticate with the ExtORPort you need to read a cookie from a file on disk, which tor overwrites randomly every time it starts. If you do not know which instance of tor will receive your forwarded traffic, you do not know which ExtORPort cookie to use.

The extor-static-cookie program presents an ExtORPort interface, however it reads its authentication cookie that is independent of any instance of tor, which you can write once and then leave alone. The external server pluggable transport can read from the shared authentication cookie file as well. Every instance of tor runs a copy of extor-static-cookie, all using the same authentication cookie file. The extor-static-cookie instances receive ExtORPort authentication from the external server pluggable transport, along with the USERADDR and TRANSPORT metadata, then re-authenticate and echo that information to their respective tor's ExtORPort.

So we change from this: ___ .->|tor| ________________ _______ | --- ->|snowflake-server|->|haproxy|-+->|tor| ---------------- ------- | --- '->|tor| --- to this: ___________________ ___ .->|extor-static-cookie|->|tor| ________________ _______ | ------------------- --- ->|snowflake-server|->|haproxy|-+->|extor-static-cookie|->|tor| ---------------- ------- | ------------------- --- '->|extor-static-cookie|->|tor| ------------------- ---

I have a similar setup running now on a test bridge, with one instance of obfs4proxy load-balancing to two instances of tor.

## Setup notes

Install extor-static-cookie: # apt install golang # git clone https://gitlab.torproject.org/dcf/extor-static-cookie # (cd extor-static-cookie && go build) # install -o root -g root extor-static-cookie/extor-static-cookie /usr/local/bin/

Generate a shared authentication cookie file: # mkdir -m 755 /var/lib/extor-static-cookie # extor-static-cookie/gen-auth-cookie > /var/lib/extor-static-cookie/static_extended_orport_auth_cookie

Install a first instance of tor and configure it as a bridge: # apt install tor # tor-instance-create o1 /etc/tor/instances/o1/torrc: BridgeRelay 1 PublishServerDescriptor 0 AssumeReachable 1 SocksPort 0 ORPort 127.0.0.1:auto ExtORPort auto ServerTransportPlugin extor_static_cookie exec /usr/local/bin/extor-static-cookie /var/lib/extor-static-cookie/static_extended_orport_auth_cookie ServerTransportListenAddr extor_static_cookie 127.0.0.1:10001 Notice we set `ExtORPort auto` (this is tor's own ExtORPort), and also pass `127.0.0.1:10001` to extor-static-cookie, which is the ExtORPort that the external server pluggable transport will talk to. Start the first instance, which will generate keys: systemctl start tor@o1

Install a second instance of tor and configure it as a bridge (with a different ServerTransportListenAddr port): # tor-instance-create o2 /etc/tor/instances/o2/torrc: BridgeRelay 1 PublishServerDescriptor 0 AssumeReachable 1 SocksPort 0 ORPort 127.0.0.1:auto ExtORPort auto ServerTransportPlugin extor_static_cookie exec /usr/local/bin/extor-static-cookie /var/lib/extor-static-cookie/static_extended_orport_auth_cookie ServerTransportListenAddr extor_static_cookie 127.0.0.1:10002 But before starting the second instance the first time, copy keys from the first instance: # cp -r /var/lib/tor-instances/o1/keys /var/lib/tor-instances/o2/ # chown -R _tor-o2:_tor-o2 /var/lib/tor-instances/o2/keys/ # systemctl start tor@o2

The two instances should have the same fingerprint: # cat /var/lib/tor-instances/*/fingerprint Unnamed 4808CD98E4C1D4F282DA741A860A44D755701F2F Unnamed 4808CD98E4C1D4F282DA741A860A44D755701F2F

Install haproxy and configure it to forward to the two instances of extor-static-cookie (which will then forward to the ExtORPort of their respective tor instances): # apt install haproxy /etc/haproxy/haproxy.cfg: frontend tor mode tcp bind 127.0.0.1:10000 default_backend tor-o backend tor-o mode tcp server o1 127.0.0.1:10001 server o2 127.0.0.1:10002 Restart haproxy with the new configuration: # systemctl restart haproxy

Instead of ptadapter, I found it more convenient to start the external server pluggable transport with a shell script that sets up the necessary variables: extor.sh: #!/bin/sh

# Usage: extor.sh 127.0.0.1:10000 /var/lib/extor-static-cookie/static_extended_orport_auth_cookie /usr/bin/obfs4proxy

set -e

EXTOR_ADDR="${1:?missing ExtORPort address}" EXTOR_COOKIE_FILE="${2:?missing ExtORPort auth cookie file}" shift 2

BINDADDR='[::]:443' TRANSPORT=obfs4

TOR_PT_MANAGED_TRANSPORT_VER=1 \ TOR_PT_SERVER_TRANSPORTS="$TRANSPORT" \ TOR_PT_SERVER_BINDADDR="$TRANSPORT"-"$BINDADDR" \ TOR_PT_EXTENDED_SERVER_PORT="$EXTOR_ADDR" \ TOR_PT_AUTH_COOKIE_FILE="$EXTOR_COOKIE_FILE" \ TOR_PT_STATE_LOCATION=pt_state \ TOR_PT_EXIT_ON_STDIN_CLOSE=1 \ exec "$@"

Then I run the shell script, giving the address of the haproxy frontend, the path to the shared authentication cookie file, and a command to run: # ./extor.sh 127.0.0.1:10000 /var/lib/extor-static-cookie/static_extended_orport_auth_cookie /usr/bin/obfs4proxy

On the client, make a torrc file with the information from pt_state/obfs4_bridgeline.txt: UseBridges 1 SocksPort auto Bridge obfs4 172.105.3.197:443 4808CD98E4C1D4F282DA741A860A44D755701F2F cert=1SCzqyYyPh/SiXTJa9nLFxMyjWQITVCKeICME+SwxgNcTTSUQ7+vM/ghofU7oaalIRBILg iat-mode=0 ClientTransportPlugin obfs4 exec /usr/bin/obfs4proxy DataDir datadir Then run tor with the torrc: $ tor -f torrc

Roger Dingledine

5 Jan 5 Jan

4:57 a.m.

[I'm about to go off-line for some days, so I am sending my current suboptimally-organized reply, which I hope is better than waiting another week to respond :)]

On Thu, Dec 30, 2021 at 10:42:51PM -0700, David Fifield wrote:

...

Let's make a distinction between the "frontend" snowflake-server pluggable transport process, and the "backend" tor process. These don't necessarily have to be 1:1; either one could be run in multiple instances. Currently, the "backend" tor is the limiting factor, because it uses only 1 CPU core. The "frontend" snowflake-server can scale to multiple cores in a single process and is comparatively unrestrained.

Excellent point, and yes this simplifies. Great.

...

I believe that the "pinning" of a client session to particular tor instance will work automatically by the fact that snowflake-server keeps an outgoing connection alive (i.e., through the load balancer) as long as a KCP session exists. [...] But before starting the second instance the first time, copy keys from the first instance:

Hm. It looks promising! But we might still have a Tor-side problem remaining. I think it boils down to how long the KCP sessions last.

The details on how exactly these bridge instances will diverge over time:

The keys directory will start out the same, but after four weeks (DEFAULT_ONION_KEY_LIFETIME_DAYS, used to be one week but in Tor 0.3.1.1-alpha, proposal 274, we bumped it up to four weeks) each bridge will rotate its onion key (the one clients use for circuit-level crypto). That is, each instance will generate its own fresh onion key.

The two bridge instances actually haven't diverged completely at that point, since Tor remembers the previous onion key (i.e. the onion key from the previous period) and is willing to receive create cells that use it for one further week (DEFAULT_ONION_KEY_GRACE_PERIOD_DAYS). So it is after 5 weeks that the original (shared) onion key will no longer work.

Where this matters is (after this 5 weeks have passed) if the client connects to the bridge, fetches and caches the bridge descriptor of instance A, and then later it connects to the bridge again and gets passed to instance B. In this case, the create cell that the client generates will use the onion key for instance A, and instance B won't know how to decrypt it so it will send a destroy cell back.

If this is an issue, we can definitely work around it, by e.g. disabling the onion key rotation on the bridges, or setting up a periodic rsync+hup between the bridges, or teaching clients to use createfast cells in this situation (this type of circuit crypto doesn't use the onion key at all, and just relies on TLS for security -- which can only be done for the first hop of the circuit but that's the one we're talking about here).

But before we think about workarounds, maybe we don't need one: how long does "the KCP session" last?

Tor clients try to fetch a fresh bridge descriptor every three-ish hours, and once they fetch a bridge descriptor from their "current" bridge instance, they should know the onion key that it wants to use. So it is that up-to-three-hour window where I think things could go wrong. And that timeframe sounds promising.

(I also want to double-check that clients don't try to use the onion key from the current cached descriptor while fetching the updated descriptor. That could become an ugly bug in the wrong circumstances, and would be something we want to fix if it's happening.)

Here's how you can simulate a pair of bridge instances that have diverged after five weeks, so you can test how things would work with them:

Copy the keys directory as before, but "rm secret_onion_key*" in the keys directory on n-1 of the instances, before starting them.)

Thanks! --Roger

Gary C. New

9 Jan 9 Jan

3:08 a.m.

David, Roger, et al.,

I just got back from holidays and really enjoyed this thread!

I run my Loadbalanced Tor Relay as a Guard/Middle Relay, very similar to David's topology diagram, without the Snoflake-Server proxy. I'm using Nginx (which forks a child process per core) instead of HAProxy. My Backend Tor Relay Nodes are running on several, different Physical Servers; thus, I'm using Private Address Space instead of Loopback Address Space.

In this configuration, I discovered that I had to configure Nginix/HAProxy to use Transparent Streaming Mode, use Source IP Address Sticky Sessions (Pinning), configure the Loadbalancer to send the Backend Tor Relay Nodes' traffic back to Nginx/HAProxy (Kernel & IPTables), configure all Backend Tor Relay Nodes to use a copy of the same .tordb (I wasn't able to get the Backend Tor Relay Nodes working with the same .tordb (over NFS) without the DirectoryAuthorities complaining), and configure the Backend Tor Relay Nodes to use the same DirectoryAuthority (to ensure each Backend Tor Relay Node sends Meta-Data to the same DirectoryAuthority). Moreover, I've enabled logging to a central Syslog Server for each Backend Tor Relay Node and created a number of Shell Scripts to help remotely manage each Backend Tor Relay Node.

Here are some sample configurations for reference.

Nginx Config:

upstream orport_tornodes { #least_conn; hash $remote_addr consistent; #server 192.168.0.1:9001 weight=1 max_fails=1 fail_timeout=10s; #server 192.168.0.1:9001 down; server 192.168.0.11:9001 weight=4 max_fails=0 fail_timeout=0s; server 192.168.0.21:9001 weight=4 max_fails=0 fail_timeout=0s; #server 192.168.0.31:9001 weight=4 max_fails=3 fail_timeout=300s; server 192.168.0.41:9001 weight=4 max_fails=0 fail_timeout=0s; server 192.168.0.51:9001 weight=4 max_fails=0 fail_timeout=0s; #zone orport_torfarm 64k;

HAProxy Config (Alternate):

frontend tornodes # Log to global config log global

# Bind to port 443 on a specified interface bind 0.0.0.0:9001 transparent

# We're proxying TCP here... mode tcp

default_backend orport_tornodes

# Simple TCP source consistent over several servers using the specified # source 0.0.0.0 usesrc clientip backend orport_tornodes

balance source hash-type consistent #server tornode1 192.168.0.1:9001 check disabled #server tornode11 192.168.0.11:9001 source 192.168.0.1 server tornode11 192.168.0.11:9001 source 0.0.0.0 usesrc clientip check disabled server tornode21 192.168.0.21:9001 source 0.0.0.0 usesrc clientip check disabled #server tornode31 192.168.0.31:9001 source 0.0.0.0 usesrc clientip check disabled server tornode41 192.168.0.41:9001 source 0.0.0.0 usesrc clientip check disabled server tornode51 192.168.0.51:9001 source 0.0.0.0 usesrc clientip check disabled

Linux Kernel & IPTables Config:

modprobe xt_socket modprobe xt_TPROXY

echo 1 > /proc/sys/net/ipv4/ip_forward; cat /proc/sys/net/ipv4/ip_forward echo 1 > /proc/sys/net/ipv4/ip_nonlocal_bind; cat /proc/sys/net/ipv4/ip_nonlocal_bind echo 15000 64000 > /proc/sys/net/ipv4/ip_local_port_range; cat /proc/sys/net/ipv4/ip_local_port_range

ip rule del fwmark 1 lookup 100 2>/dev/null # Ensure Duplicate Rule is not Created ip rule add fwmark 1 lookup 100 # ip rule show ip route add local 0.0.0.0/0 dev lo table 100 # ip route show table wan0; ip route show table 100

iptables -I INPUT -p tcp --dport 9001 -j ACCEPT iptables -t mangle -N TOR iptables -t mangle -A PREROUTING -p tcp -m socket -j TOR iptables -t mangle -A TOR -j MARK --set-mark 1 iptables -t mangle -A TOR -j ACCEPT #iptables -t mangle -A PREROUTING -p tcp -s 192.168.0.0/24 --sport 9001 -j MARK --set-xmark 0x1/0xffffffff #iptables -t mangle -A PREROUTING -p tcp --dport 9001 -j TPROXY --tproxy-mark 0x1/0x1 --on-port 9001 --on-ip 127.0.0.1

Backend Tor Relay Node Configs:

# cat /tmp/torrc Nickname xxxxxxxxxxxxxxxxxx ORPort xxx.xxx.xxx.xxx:9001 NoListen ORPort 192.168.0.11:9001 NoAdvertise SocksPort 9050 SocksPort 192.168.0.11:9050 ControlPort 9051 DirAuthority longclaw orport=443 no-v2 v3ident=23D15D965BC35114467363C165C4F724B64B4F66 199.58.81.140:80 74A910646BCEEFBCD2E874FC1DC997430F968145 FallbackDir 193.23.244.244:80 orport=443 id=7BE683E65D48141321C5ED92F075C55364AC7123 DirCache 0 ExitRelay 0 MaxMemInQueues 192 MB GeoIPFile /opt/share/tor/geoip Log notice file /tmp/torlog Log notice syslog VirtualAddrNetwork 10.192.0.0/10 AutomapHostsOnResolve 1 TransPort 192.168.0.11:9040 DNSPort 192.168.0.11:9053 RunAsDaemon 1 DataDirectory /tmp/tor/torrc.d/.tordb AvoidDiskWrites 1 User tor ContactInfo tor-operator@your-emailaddress-domain

# cat /tmp/torrc Nickname xxxxxxxxxxxxxxxxxx ORPort xxx.xxx.xxx.xxx:9001 NoListen ORPort 192.168.0.41:9001 NoAdvertise SocksPort 9050 SocksPort 192.168.0.41:9050 ControlPort 9051 DirAuthority longclaw orport=443 no-v2 v3ident=23D15D965BC35114467363C165C4F724B64B4F66 199.58.81.140:80 74A910646BCEEFBCD2E874FC1DC997430F968145 FallbackDir 193.23.244.244:80 orport=443 id=7BE683E65D48141321C5ED92F075C55364AC7123 DirCache 0 ExitRelay 0 MaxMemInQueues 192 MB GeoIPFile /opt/share/tor/geoip Log notice file /tmp/torlog Log notice syslog VirtualAddrNetwork 10.192.0.0/10 AutomapHostsOnResolve 1 TransPort 192.168.0.41:9040 DNSPort 192.168.0.41:9053 RunAsDaemon 1 DataDirectory /tmp/tor/torrc.d/.tordb AvoidDiskWrites 1 User tor ContactInfo tor-operator@your-emailaddress-domain

Shell Scripts to Remotely Manage Tor Relay Nodes:

# cat /usr/sbin/stat-tor-nodes #!/bin/sh uptime-all-nodes; memfree-all-nodes; netstat-tor-nodes

# cat /usr/sbin/uptime-all-nodes #!/bin/sh /usr/bin/ssh -t admin@192.168.0.11 'hostname; uptime' /usr/bin/ssh -t admin@192.168.0.21 'hostname; uptime' /usr/bin/ssh -t admin@192.168.0.31 'hostname; uptime' /usr/bin/ssh -t admin@192.168.0.41 'hostname; uptime' /usr/bin/ssh -t admin@192.168.0.51 'hostname; uptime'

# cat /usr/sbin/memfree-all-nodes #!/bin/sh /usr/bin/ssh -t admin@192.168.0.11 'hostname; grep MemFree /proc/meminfo' /usr/bin/ssh -t admin@192.168.0.21 'hostname; grep MemFree /proc/meminfo' /usr/bin/ssh -t admin@192.168.0.31 'hostname; grep MemFree /proc/meminfo' /usr/bin/ssh -t admin@192.168.0.41 'hostname; grep MemFree /proc/meminfo' /usr/bin/ssh -t admin@192.168.0.51 'hostname; grep MemFree /proc/meminfo'

# cat /jffs/sbin/ps-tor-nodes #!/bin/sh /usr/bin/ssh -t admin@192.168.0.11 'hostname; ps w | grep -i tor' /usr/bin/ssh -t admin@192.168.0.21 'hostname; ps w | grep -i tor' /usr/bin/ssh -t admin@192.168.0.31 'hostname; ps w | grep -i tor' /usr/bin/ssh -t admin@192.168.0.41 'hostname; ps w | grep -i tor' /usr/bin/ssh -t admin@192.168.0.51 'hostname; ps w | grep -i tor'

# cat /usr/sbin/killall-tor-nodes #!/bin/sh read -r -p "Are you sure? [y/N] " input case "$input" in [yY]) /usr/bin/ssh -t admin@192.168.0.11 'killall tor' /usr/bin/ssh -t admin@192.168.0.21 'killall tor' #/usr/bin/ssh -t admin@192.168.0.31 'killall tor' /usr/bin/ssh -t admin@192.168.0.41 'killall tor' /usr/bin/ssh -t admin@192.168.0.51 'killall tor' return 0 ;; *) return 1 ;; esac

# cat /usr/sbin/restart-tor-nodes #!/bin/sh read -r -p "Are you sure? [y/N] " input case "$input" in [yY]) /usr/bin/ssh -t admin@192.168.0.11 '/usr/sbin/tor -f /tmp/torrc --quiet' /usr/bin/ssh -t admin@192.168.0.21 '/usr/sbin/tor -f /tmp/torrc --quiet' #/usr/bin/ssh -t admin@192.168.0.31 '/usr/sbin/tor -f /tmp/torrc --quiet' /usr/bin/ssh -t admin@192.168.0.41 '/usr/sbin/tor -f /tmp/torrc --quiet' /usr/bin/ssh -t admin@192.168.0.51 '/usr/sbin/tor -f /tmp/torrc --quiet' return 0 ;; *) return 1 ;; esac

I've been meaning to put together a tutorial on Loadbalancing Tor Relays, but haven't found the time as of yet. Perhaps, this will help, until I am able to find the time.

I appreciate your knowledge sharing and for furthering the topic of Loadbalancing Tor Relays; especially, with regard to Bridging and Exit Relays.

Keep up the Great Work!

Respectfully,

Gary On Tuesday, January 4, 2022, 09:57:52 PM MST, Roger Dingledine arma@torproject.org wrote:

[I'm about to go off-line for some days, so I am sending my current suboptimally-organized reply, which I hope is better than waiting another week to respond :)]

On Thu, Dec 30, 2021 at 10:42:51PM -0700, David Fifield wrote:

...

Let's make a distinction between the "frontend" snowflake-server pluggable transport process, and the "backend" tor process. These don't necessarily have to be 1:1; either one could be run in multiple instances. Currently, the "backend" tor is the limiting factor, because it uses only 1 CPU core. The "frontend" snowflake-server can scale to multiple cores in a single process and is comparatively unrestrained.

Excellent point, and yes this simplifies. Great.

...

I believe that the "pinning" of a client session to particular tor instance will work automatically by the fact that snowflake-server keeps an outgoing connection alive (i.e., through the load balancer) as long as a KCP session exists. [...] But before starting the second instance the first time, copy keys from the first instance:

Hm. It looks promising! But we might still have a Tor-side problem remaining. I think it boils down to how long the KCP sessions last.

The details on how exactly these bridge instances will diverge over time:

But before we think about workarounds, maybe we don't need one: how long does "the KCP session" last?

Here's how you can simulate a pair of bridge instances that have diverged after five weeks, so you can test how things would work with them:

Copy the keys directory as before, but "rm secret_onion_key*" in the keys directory on n-1 of the instances, before starting them.)

Thanks! --Roger

_______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

David Fifield

17 Jan 17 Jan

6:46 p.m.

On Tue, Jan 04, 2022 at 11:57:36PM -0500, Roger Dingledine wrote:

...

Hm. It looks promising! But we might still have a Tor-side problem remaining. I think it boils down to how long the KCP sessions last.

The details on how exactly these bridge instances will diverge over time:

The keys directory will start out the same, but after four weeks (DEFAULT_ONION_KEY_LIFETIME_DAYS, used to be one week but in Tor 0.3.1.1-alpha, proposal 274, we bumped it up to four weeks) each bridge will rotate its onion key (the one clients use for circuit-level crypto). That is, each instance will generate its own fresh onion key.

The two bridge instances actually haven't diverged completely at that point, since Tor remembers the previous onion key (i.e. the onion key from the previous period) and is willing to receive create cells that use it for one further week (DEFAULT_ONION_KEY_GRACE_PERIOD_DAYS). So it is after 5 weeks that the original (shared) onion key will no longer work.

Where this matters is (after this 5 weeks have passed) if the client connects to the bridge, fetches and caches the bridge descriptor of instance A, and then later it connects to the bridge again and gets passed to instance B. In this case, the create cell that the client generates will use the onion key for instance A, and instance B won't know how to decrypt it so it will send a destroy cell back.

I've done an experiment with a second snowflake bridge that has the same identity keys but different onion keys. A client can bootstrap with either one starting from a clean state, but it fails if you bootstrap with one and then try to bootstrap with the other using the same DataDirectory. The error you get is ```plain onion_skin_ntor_client_handshake(): Invalid result from curve25519 handshake: 4 ```

The first bridge is the existing "prod" snowflake bridge with nickname: * flakey

The other "staging" bridge is the load-balanced configuration with four instances. All four instances currently have the same onion keys; which however are different from the "prod"'s onion keys. (The onion keys actually come from a backup I made.) * flakey1 * flakey2 * flakey3 * flakey4

Bootstrapping "prod" with a fresh DataDirectory "datadir.prod" works. Here is torrc.prod:

```plain UseBridges 1 SocksPort auto DataDirectory datadir.prod ClientTransportPlugin snowflake exec ./client -keep-local-addresses -log snowflake.log Bridge snowflake 192.0.2.3:1 2B280B23E1107BB62ABFC40DDCC8824814F80A72 url=https://snowflake-broker.torproject.net/ max=1 ice=stun:stun.voip.blackberry.com:3478,stun:stun.altar.com.pl:3478,stun:stun.antisip.com:3478,stun:stun.bluesip.net:3478,stun:stun.dus.net:3478,stun:stun.epygi.com:3478,stun:stun.sonetel.com:3478,stun:stun.sonetel.net:3478,stun:stun.stunprotocol.org:3478,stun:stun.uls.co.za:3478,stun:stun.voipgate.com:3478,stun:stun.voys.nl:3478 ```

Notice `new bridge descriptor 'flakey' (fresh)`:

```plain snowflake/client$ tor -f torrc.prod [notice] Tor 0.3.5.16 running on Linux with Libevent 2.1.8-stable, OpenSSL 1.1.1d, Zlib 1.2.11, Liblzma 5.2.4, and Libzstd 1.3.8. [notice] Bootstrapped 0%: Starting [notice] Starting with guard context "bridges" [notice] Delaying directory fetches: No running bridges [notice] Bootstrapped 5%: Connecting to directory server [notice] Bootstrapped 10%: Finishing handshake with directory server [notice] Bootstrapped 15%: Establishing an encrypted directory connection [notice] Bootstrapped 20%: Asking for networkstatus consensus [notice] new bridge descriptor 'flakey' (fresh): $2B280B23E1107BB62ABFC40DDCC8824814F80A72~flakey at 192.0.2.3 [notice] Bootstrapped 25%: Loading networkstatus consensus [notice] I learned some more directory information, but not enough to build a circuit: We have no usable consensus. [notice] Bootstrapped 40%: Loading authority key certs [notice] The current consensus has no exit nodes. Tor can only build internal paths, such as paths to onion services. [notice] Bootstrapped 45%: Asking for relay descriptors for internal paths [notice] I learned some more directory information, but not enough to build a circuit: We need more microdescriptors: we have 0/6673, and can only build 0% of likely paths. (We have 100% of guards bw, 0% of midpoint bw, and 0% of end bw (no exits in consensus, using mid) = 0% of path bw.) [notice] Bootstrapped 50%: Loading relay descriptors for internal paths [notice] The current consensus contains exit nodes. Tor can build exit and internal paths. [notice] Bootstrapped 57%: Loading relay descriptors [notice] Bootstrapped 64%: Loading relay descriptors [notice] Bootstrapped 73%: Loading relay descriptors [notice] Bootstrapped 78%: Loading relay descriptors [notice] Bootstrapped 80%: Connecting to the Tor network [notice] Bootstrapped 85%: Finishing handshake with first hop [notice] Bootstrapped 90%: Establishing a Tor circuit [notice] Bootstrapped 100%: Done ```

Bootstrapping "staging" with a fresh DataDirectory "datadir.staging" also works. Here is torrc.staging:

```plain UseBridges 1 SocksPort auto DataDirectory datadir.staging ClientTransportPlugin snowflake exec ./client -keep-local-addresses -log snowflake.log Bridge snowflake 192.0.2.3:1 2B280B23E1107BB62ABFC40DDCC8824814F80A72 url=http://127.0.0.1:8000/ max=1 ice=stun:stun.voip.blackberry.com:3478,stun:stun.altar.com.pl:3478,stun:stun.antisip.com:3478,stun:stun.bluesip.net:3478,stun:stun.dus.net:3478,stun:stun.epygi.com:3478,stun:stun.sonetel.com:3478,stun:stun.sonetel.net:3478,stun:stun.stunprotocol.org:3478,stun:stun.uls.co.za:3478,stun:stun.voipgate.com:3478,stun:stun.voys.nl:3478 ```

Notice `new bridge descriptor 'flakey4' (fresh)`:

```plain snowflake/broker$ ./broker -disable-tls -addr 127.0.0.1:8000 snowflake/proxy$ ./proxy -capacity 10 -broker http://127.0.0.1:8000/ -keep-local-addresses -relay wss://snowflake-staging.bamsoftware.com/ snowflake/client$ tor -f torrc.staging [notice] Tor 0.3.5.16 running on Linux with Libevent 2.1.8-stable, OpenSSL 1.1.1d, Zlib 1.2.11, Liblzma 5.2.4, and Libzstd 1.3.8. [notice] Bootstrapped 0%: Starting [notice] Starting with guard context "bridges" [notice] Delaying directory fetches: No running bridges [notice] Bootstrapped 5%: Connecting to directory server [notice] Bootstrapped 10%: Finishing handshake with directory server [notice] Bootstrapped 15%: Establishing an encrypted directory connection [notice] Bootstrapped 20%: Asking for networkstatus consensus [notice] new bridge descriptor 'flakey4' (fresh): $2B280B23E1107BB62ABFC40DDCC8824814F80A72~flakey4 at 192.0.2.3 [notice] Bootstrapped 25%: Loading networkstatus consensus [notice] I learned some more directory information, but not enough to build a circuit: We have no usable consensus. [notice] Bootstrapped 40%: Loading authority key certs [notice] The current consensus has no exit nodes. Tor can only build internal paths, such as paths to onion services. [notice] Bootstrapped 45%: Asking for relay descriptors for internal paths [notice] I learned some more directory information, but not enough to build a circuit: We need more microdescriptors: we have 0/6673, and can only build 0% of likely paths. (We have 100% of guards bw, 0% of midpoint bw, and 0% of end bw (no exits in consensus, using mid) = 0% of path bw.) [notice] Bootstrapped 50%: Loading relay descriptors for internal paths [notice] The current consensus contains exit nodes. Tor can build exit and internal paths. [notice] Bootstrapped 57%: Loading relay descriptors [notice] Bootstrapped 63%: Loading relay descriptors [notice] Bootstrapped 72%: Loading relay descriptors [notice] Bootstrapped 77%: Loading relay descriptors [notice] Bootstrapped 80%: Connecting to the Tor network [notice] Bootstrapped 85%: Finishing handshake with first hop [notice] Bootstrapped 90%: Establishing a Tor circuit [notice] Bootstrapped 100%: Done ```

But now, if we try running torrc.staging but give it the DataDirectory "datadir.prod", it fails at 90%. Notice `new bridge descriptor 'flakey' (cached)`: if the descriptor had not been cached it would have been flakey[1234] instead.

```plain $ tor -f torrc.staging DataDirectory datadir.prod Log "notice stderr" Log "info file info.log" [notice] Tor 0.3.5.16 running on Linux with Libevent 2.1.8-stable, OpenSSL 1.1.1d, Zlib 1.2.11, Liblzma 5.2.4, and Libzstd 1.3.8. [notice] Bootstrapped 0%: Starting [notice] Starting with guard context "bridges" [notice] new bridge descriptor 'flakey' (cached): $2B280B23E1107BB62ABFC40DDCC8824814F80A72~flakey at 192.0.2.3 [notice] Delaying directory fetches: Pluggable transport proxies still configuring [notice] Bootstrapped 5%: Connecting to directory server [notice] Bootstrapped 10%: Finishing handshake with directory server [notice] Bootstrapped 80%: Connecting to the Tor network [notice] Bootstrapped 90%: Establishing a Tor circuit [notice] Delaying directory fetches: No running bridges ```

Here is an excerpt from the info-level log that shows the error. The important part seems to be `onion_skin_ntor_client_handshake(): Invalid result from curve25519 handshake: 4`.

```plain [notice] new bridge descriptor 'flakey' (cached): $2B280B23E1107BB62ABFC40DDCC8824814F80A72~flakey at 192.0.2.3 [notice] Delaying directory fetches: Pluggable transport proxies still configuring [info] extend_info_from_node(): Including Ed25519 ID for $2B280B23E1107BB62ABFC40DDCC8824814F80A72~flakey at 192.0.2.3 [info] onion_pick_cpath_exit(): Using requested exit node '$2B280B23E1107BB62ABFC40DDCC8824814F80A72~flakey at 192.0.2.3' [info] circuit_handle_first_hop(): Next router is [scrubbed]: Not connected. Connecting. [notice] Bootstrapped 5%: Connecting to directory server [info] connection_or_set_canonical(): Channel 0 chose an idle timeout of 247. [info] connection_or_set_identity_digest(): Set identity digest for 0x55c3f9356770 ([scrubbed]): 2B280B23E1107BB62ABFC40DDCC8824814F80A72 1zOHpg+FxqQfi/6jDLtCpHHqBTH8gjYmCKXkus1D5Ko. [info] connection_or_set_identity_digest(): (Previously: 0000000000000000000000000000000000000000 <unset>) [info] connection_or_set_canonical(): Channel 1 chose an idle timeout of 232. [info] circuit_predict_and_launch_new(): Have 0 clean circs (0 internal), need another exit circ. [info] choose_good_exit_server_general(): Found 1336 servers that might support 0/0 pending connections. [info] choose_good_exit_server_general(): Chose exit server '$0F1C8168DFD0AADBE61BD71194D37C867FED5A21~FreeExit at 81.17.18.60' [info] extend_info_from_node(): Including Ed25519 ID for $0F1C8168DFD0AADBE61BD71194D37C867FED5A21~FreeExit at 81.17.18.60 [info] select_primary_guard_for_circuit(): Selected primary guard $2B280B23E1107BB62ABFC40DDCC8824814F80A72 ($2B280B23E1107BB62ABFC40DDCC8824814F80A72) for circuit. [info] extend_info_from_node(): Including Ed25519 ID for $2B280B23E1107BB62ABFC40DDCC8824814F80A72~flakey at 192.0.2.3 [info] extend_info_from_node(): Including Ed25519 ID for $7158D1E0D9F90F7999ACB3B073DA762C9B2C3275~maltimore at 207.180.224.17 [info] circuit_handle_first_hop(): Next router is [scrubbed]: Connection in progress; waiting. [info] connection_edge_process_inbuf(): data from edge while in 'waiting for circuit' state. Leaving it on buffer. [info] connection_edge_process_inbuf(): data from edge while in 'waiting for circuit' state. Leaving it on buffer. [notice] Bootstrapped 10%: Finishing handshake with directory server [notice] Bootstrapped 80%: Connecting to the Tor network [info] parse_socks_client(): SOCKS 5 client: need authentication. [info] parse_socks_client(): SOCKS 5 client: authentication successful. [info] connection_read_proxy_handshake(): Proxy Client: connection to 192.0.2.3:1 successful [info] circuit_predict_and_launch_new(): Have 1 clean circs (0 internal), need another exit circ. [info] choose_good_exit_server_general(): Found 1336 servers that might support 0/0 pending connections. [info] choose_good_exit_server_general(): Chose exit server '$D8A1F5A8EA1AF53E3414B9C48FE6B10C31ACC9B2~privexse1exit at 185.130.44.108' [info] extend_info_from_node(): Including Ed25519 ID for $D8A1F5A8EA1AF53E3414B9C48FE6B10C31ACC9B2~privexse1exit at 185.130.44.108 [info] select_primary_guard_for_circuit(): Selected primary guard $2B280B23E1107BB62ABFC40DDCC8824814F80A72 ($2B280B23E1107BB62ABFC40DDCC8824814F80A72) for circuit. [info] extend_info_from_node(): Including Ed25519 ID for $2B280B23E1107BB62ABFC40DDCC8824814F80A72~flakey at 192.0.2.3 [info] extend_info_from_node(): Including Ed25519 ID for $2F9AFDE43DC8E3F05803304C01BD3DBF329169AC~dutreuil at 213.152.168.27 [info] circuit_handle_first_hop(): Next router is [scrubbed]: Connection in progress; waiting. [info] circuit_predict_and_launch_new(): Have 2 clean circs (0 uptime-internal, 0 internal), need another hidden service circ. [info] extend_info_from_node(): Including Ed25519 ID for $8967A8912E61070FCFA9B8EC9869E5AC8F94949A~4Freunde at 145.239.154.56 [info] select_primary_guard_for_circuit(): Selected primary guard $2B280B23E1107BB62ABFC40DDCC8824814F80A72 ($2B280B23E1107BB62ABFC40DDCC8824814F80A72) for circuit. [info] extend_info_from_node(): Including Ed25519 ID for $2B280B23E1107BB62ABFC40DDCC8824814F80A72~flakey at 192.0.2.3 [info] extend_info_from_node(): Including Ed25519 ID for $9367EB01DF75DE6265A0971249204029D6A55877~oddling at 5.182.210.231 [info] circuit_handle_first_hop(): Next router is [scrubbed]: Connection in progress; waiting. [info] circuit_predict_and_launch_new(): Have 3 clean circs (1 uptime-internal, 1 internal), need another hidden service circ. [info] extend_info_from_node(): Including Ed25519 ID for $AF85E6556FD5692BC554A93BAC9FACBFC2D79EFD~whoUSicebeer09b at 192.187.103.74 [info] select_primary_guard_for_circuit(): Selected primary guard $2B280B23E1107BB62ABFC40DDCC8824814F80A72 ($2B280B23E1107BB62ABFC40DDCC8824814F80A72) for circuit. [info] extend_info_from_node(): Including Ed25519 ID for $2B280B23E1107BB62ABFC40DDCC8824814F80A72~flakey at 192.0.2.3 [info] extend_info_from_node(): Including Ed25519 ID for $9515B435D8D063E537AB137FCF5A97B1ACE3CA2A~corvuscorone at 135.181.178.197 [info] circuit_handle_first_hop(): Next router is [scrubbed]: Connection in progress; waiting. [info] circuit_predict_and_launch_new(): Have 4 clean circs (2 uptime-internal, 2 internal), need another hidden service circ. [info] extend_info_from_node(): Including Ed25519 ID for $68A9F0DFFC7C8F57B3DEA3801D6CF001652A809F~vpskilobug at 213.164.206.145 [info] select_primary_guard_for_circuit(): Selected primary guard $2B280B23E1107BB62ABFC40DDCC8824814F80A72 ($2B280B23E1107BB62ABFC40DDCC8824814F80A72) for circuit. [info] extend_info_from_node(): Including Ed25519 ID for $2B280B23E1107BB62ABFC40DDCC8824814F80A72~flakey at 192.0.2.3 [info] extend_info_from_node(): Including Ed25519 ID for $2C13A54E3E8A6AFB18E0DE5890E5B08AAF5B0F36~history at 138.201.123.109 [info] circuit_handle_first_hop(): Next router is [scrubbed]: Connection in progress; waiting. [info] channel_tls_process_versions_cell(): Negotiated version 5 with [scrubbed]:1; Waiting for CERTS cell [info] connection_or_client_learned_peer_id(): learned peer id for 0x55c3f9356770 ([scrubbed]): 2B280B23E1107BB62ABFC40DDCC8824814F80A72, 1zOHpg+FxqQfi/6jDLtCpHHqBTH8gjYmCKXkus1D5Ko [info] channel_tls_process_certs_cell(): Got some good certificates from [scrubbed]:1: Authenticated it with RSA and Ed25519 [info] circuit_send_first_onion_skin(): First hop: finished sending CREATE cell to '$2B280B23E1107BB62ABFC40DDCC8824814F80A72~flakey at 192.0.2.3' [notice] Bootstrapped 90%: Establishing a Tor circuit [info] circuit_send_first_onion_skin(): First hop: finished sending CREATE cell to '$2B280B23E1107BB62ABFC40DDCC8824814F80A72~flakey at 192.0.2.3' [info] circuit_send_first_onion_skin(): First hop: finished sending CREATE cell to '$2B280B23E1107BB62ABFC40DDCC8824814F80A72~flakey at 192.0.2.3' [info] circuit_send_first_onion_skin(): First hop: finished sending CREATE cell to '$2B280B23E1107BB62ABFC40DDCC8824814F80A72~flakey at 192.0.2.3' [info] circuit_send_first_onion_skin(): First hop: finished sending CREATE cell to '$2B280B23E1107BB62ABFC40DDCC8824814F80A72~flakey at 192.0.2.3' [info] circuit_send_first_onion_skin(): First hop: finished sending CREATE cell to '$2B280B23E1107BB62ABFC40DDCC8824814F80A72~flakey at 192.0.2.3' [info] channel_tls_process_netinfo_cell(): Got good NETINFO cell from [scrubbed]:1; OR connection is now open, using protocol version 5. Its ID digest is 2B280B23E1107BB62ABFC40DDCC8824814F80A72. Our address is apparently [scrubbed]. [info] onion_skin_ntor_client_handshake(): Invalid result from curve25519 handshake: 4 [info] circuit_mark_for_close_(): Circuit 3457244666 (id: 1) marked for close at ../src/core/or/command.c:443 (orig reason: 1, new reason: 0) [info] onion_skin_ntor_client_handshake(): Invalid result from curve25519 handshake: 4 [info] circuit_mark_for_close_(): Circuit 4237434553 (id: 2) marked for close at ../src/core/or/command.c:443 (orig reason: 1, new reason: 0) [info] onion_skin_ntor_client_handshake(): Invalid result from curve25519 handshake: 4 [info] circuit_mark_for_close_(): Circuit 3082862549 (id: 6) marked for close at ../src/core/or/command.c:443 (orig reason: 1, new reason: 0) [info] onion_skin_ntor_client_handshake(): Invalid result from curve25519 handshake: 4 [info] circuit_mark_for_close_(): Circuit 2596950236 (id: 4) marked for close at ../src/core/or/command.c:443 (orig reason: 1, new reason: 0) [info] circuit_build_failed(): Our circuit 3457244666 (id: 1) failed to get a response from the first hop (192.0.2.3:1). I'm going to try to rotate to a better connection. [info] connection_ap_fail_onehop(): Closing one-hop stream to '$2B280B23E1107BB62ABFC40DDCC8824814F80A72/192.0.2.3' because the OR conn just failed. [info] circuit_free_(): Circuit 0 (id: 1) has been freed. [info] circuit_build_failed(): Our circuit 4237434553 (id: 2) failed to get a response from the first hop (192.0.2.3:1). I'm going to try to rotate to a better connection. [info] circuit_free_(): Circuit 0 (id: 2) has been freed. [info] circuit_build_failed(): Our circuit 3082862549 (id: 6) failed to get a response from the first hop (192.0.2.3:1). I'm going to try to rotate to a better connection. [info] circuit_free_(): Circuit 0 (id: 6) has been freed. [info] circuit_build_failed(): Our circuit 2596950236 (id: 4) failed to get a response from the first hop (192.0.2.3:1). I'm going to try to rotate to a better connection. [info] circuit_free_(): Circuit 0 (id: 4) has been freed. [info] connection_free_minimal(): Freeing linked Socks connection [waiting for circuit] with 121 bytes on inbuf, 0 on outbuf. [info] connection_dir_client_reached_eof(): 'fetch' response not all here, but we're at eof. Closing. [info] entry_guards_note_guard_failure(): Recorded failure for primary confirmed guard $2B280B23E1107BB62ABFC40DDCC8824814F80A72 ($2B280B23E1107BB62ABFC40DDCC8824814F80A72) [info] connection_dir_client_request_failed(): Giving up on serverdesc/extrainfo fetch from directory server at '192.0.2.3'; retrying [info] connection_free_minimal(): Freeing linked Directory connection [client reading] with 0 bytes on inbuf, 0 on outbuf. [info] onion_skin_ntor_client_handshake(): Invalid result from curve25519 handshake: 4 [info] circuit_mark_for_close_(): Circuit 2912328161 (id: 5) marked for close at ../src/core/or/command.c:443 (orig reason: 1, new reason: 0) [info] onion_skin_ntor_client_handshake(): Invalid result from curve25519 handshake: 4 [info] circuit_mark_for_close_(): Circuit 2793970028 (id: 3) marked for close at ../src/core/or/command.c:443 (orig reason: 1, new reason: 0) [info] circuit_build_failed(): Our circuit 2912328161 (id: 5) failed to get a response from the first hop (192.0.2.3:1). I'm going to try to rotate to a better connection. [info] circuit_free_(): Circuit 0 (id: 5) has been freed. [info] circuit_build_failed(): Our circuit 2793970028 (id: 3) failed to get a response from the first hop (192.0.2.3:1). I'm going to try to rotate to a better connection. [info] circuit_free_(): Circuit 0 (id: 3) has been freed. [info] connection_ap_make_link(): Making internal direct tunnel to [scrubbed]:1 ... [info] connection_ap_make_link(): ... application connection created and linked. [info] should_delay_dir_fetches(): Delaying dir fetches (no running bridges known) [notice] Delaying directory fetches: No running bridges ```

As you suggested, CREATE_FAST in place of CREATE works. I hacked `should_use_create_fast_for_circuit` to always return true:

```diff diff --git a/src/core/or/circuitbuild.c b/src/core/or/circuitbuild.c index 2bcc642a97..4005ba56ce 100644 --- a/src/core/or/circuitbuild.c +++ b/src/core/or/circuitbuild.c @@ -801,6 +801,7 @@ should_use_create_fast_for_circuit(origin_circuit_t *circ) tor_assert(circ->cpath); tor_assert(circ->cpath->extend_info);

+ return true; return ! circuit_has_usable_onion_key(circ); } ```

And then the mixed configuration with the "staging" bridge and the "prod" DataDirectory bootstraps. Notice `new bridge descriptor 'flakey' (cached)` followed later by `new bridge descriptor 'flakey1' (fresh)`.

```plain $ ~/tor/src/app/tor -f torrc.staging DataDirectory datadir.prod [notice] Tor 0.4.6.8 (git-d5efc2c98619568e) running on Linux with Libevent 2.1.8-stable, OpenSSL 1.1.1d, Zlib 1.2.11, Liblzma 5.2.4, Libzstd N/A and Glibc 2.28 as libc. [notice] Bootstrapped 0% (starting): Starting [notice] Starting with guard context "bridges" [notice] new bridge descriptor 'flakey' (cached): $2B280B23E1107BB62ABFC40DDCC8824814F80A72~flakey [1zOHpg+FxqQfi/6jDLtCpHHqBTH8gjYmCKXkus1D5Ko] at 192.0.2.3 [notice] Delaying directory fetches: Pluggable transport proxies still configuring [notice] Bootstrapped 1% (conn_pt): Connecting to pluggable transport [notice] Bootstrapped 2% (conn_done_pt): Connected to pluggable transport [notice] Bootstrapped 10% (conn_done): Connected to a relay [notice] Bootstrapped 14% (handshake): Handshaking with a relay [notice] Bootstrapped 15% (handshake_done): Handshake with a relay done [notice] Bootstrapped 75% (enough_dirinfo): Loaded enough directory info to build circuits [notice] Bootstrapped 95% (circuit_create): Establishing a Tor circuit [notice] new bridge descriptor 'flakey1' (fresh): $2B280B23E1107BB62ABFC40DDCC8824814F80A72~flakey1 [1zOHpg+FxqQfi/6jDLtCpHHqBTH8gjYmCKXkus1D5Ko] at 192.0.2.3 [notice] Bootstrapped 100% (done): Done ```

...

If this is an issue, we can definitely work around it, by e.g. disabling the onion key rotation on the bridges, or setting up a periodic rsync+hup between the bridges, or teaching clients to use createfast cells in this situation (this type of circuit crypto doesn't use the onion key at all, and just relies on TLS for security -- which can only be done for the first hop of the circuit but that's the one we're talking about here).

What do you recommend trying? I guess the quickest way to get more capacity on the snowflake bridge is to disable onion key rotation by patching the tor source code, though I wouldn't want to maintain that long-term.

Gary, I was wondering how you are dealing with the changing onion key issue, and I suppose it is [this](https://forum.torproject.net/t/tor-relays-how-to-reduce-tor-cpu-load-on-a-si...):

...

use Source IP Address Sticky Sessions (Pinning)

The same client source address gets pinned to the same tor instance and therefore the same onion key. If I understand correctly, there's a potential failure if a client changes its IP address and later gets mapped to a different instance. Is that right?

Gary C. New

10:01 p.m.

On Monday, January 17, 2022, 11:47:11 AM MST, David Fifield david@bamsoftware.com wrote:

...

Gary, I was wondering how you are dealing with the changing onion key issue, and I suppose it is [this](https://forum.torproject.net/t/tor-relays-how-to-reduce-tor-cpu-load-on-a-si...):

...

...
use Source IP Address Sticky Sessions (Pinning)

...

The same client source address gets pinned to the same tor instance and therefore the same onion key. If I understand correctly, there's a potential failure if a client changes its IP address and later gets mapped to a different instance. Is that right?

Yes... That is correct. As long as circuits originate from the same Source IP Address, Nginx/HAProxy ensures they are pinned to the same loadbalanced Upstream Tor Node; unless, the originating Source IP Address changes (low-risk) or one of the Upstream Tor Nodes goes down (low-risk with UPS) and surviving circuits migrate to the remaining Upstream Tor Nodes, which effectively forces building of new circuits with relavent keys. The issue I find more challenging, in loadbalancing Upstream Tor Nodes, is when the Medium-Term Key is updated after running for some time (it's consistent with the previously mentioned 4 - 5 week time period). It is at this point that I notice all circuits bleed-off from the Upstream Tor Nodes with the exception of the Tor Node where the Medium-Term Key was successfully updated. It's at this point that I am forced to shutdown all Upstream Tor Nodes, copy the .tordb containing the updated Medium-Term Key to the other Upstream Tor Nodes, and restart all Upstream Tor Nodes. If there was a way for a Family of Tor Instances to share a Medium-Term Key, I believe that might solve the long-term issue of running a Loadbalanced Tor Relay. As it stands... I can run my Loadbalanced Tor Relay for 4 - 5 weeks without any intervention. Hope that answers your question. Respectfully,

Gary— This Message Originated by the Sun. iBigBlue 63W Solar Array (~12 Hour Charge) + 2 x Charmast 26800mAh Power Banks = iPhone XS Max 512GB (~2 Weeks Charged)

David Fifield

25 Jan 25 Jan

9:31 p.m.

The DNS record for the Snowflake bridge was switched to a temporary staging server, running the load balancing setup, at 2022-01-25 17:41:00. We were debugging some initial problems until 2022-01-25 18:47:00. You can read about it here:

https://bugs.torproject.org/tpo/anti-censorship/pluggable-transports/snowfla...

Snowflake sessions are now using the staging bridge, except for those that started before the change happened and haven't finished yet, and perhaps some proxies that still have the IP address of the production bridge in their DNS cache. I am not sure yet what will happen with metrics, but we'll see after a few days.

On the matter of onion key rotation, I had the idea of making the onion key files read-only. Roger did some source code investigation and said that it might work to prevent onion key rotation, with some minor side effects. I plan to give the idea a try on a different bridge. The possible side effects are that tor will continue trying and failing to rotate the onion key every hour, and "force a router descriptor rebuild, so it will try to publish a new descriptor each hour."

https://gitweb.torproject.org/tor.git/tree/src/feature/relay/router.c?h=tor-... ``` if (curve25519_keypair_write_to_file(&new_curve25519_keypair, fname, "onion") < 0) { log_err(LD_FS,"Couldn't write curve25519 onion key to "%s".",fname); goto error; } // ... error: log_warn(LD_GENERAL, "Couldn't rotate onion key."); if (prkey) crypto_pk_free(prkey); ```

Gary C. New

11:21 p.m.

David, Excellent documentation of your loadbalanced Snowflake endeavors!

...

The DNS record for the Snowflake bridge was switched to a temporary staging server, running the load balancing setup, at 2022-01-25 17:41:00. We were debugging some initial problems until 2022-01-25 18:47:00. You can read about it here:

...

https://bugs.torproject.org/tpo/anti-censorship/pluggable-transports/snowfla...

It's nice to see that the Snowflake daemon offers a native configuration option for LimitNOFile. I ran into a similar issue with my initial loadbalanced Tor Relay Nodes that was solved at the O/S level using ulimit. It would be nice if torrc had a similar option.

...

From your documentation, it sounds like you're running everything on the same machine? When expanding to additional machines, similar to the file limit issue, you'll have to expand the usable ports as well.

I'd like to see more of your HAProxy configuration. Do you not have to use transparent proxy mode with Snowflake instances as you do with Tor Relay instances? I hadn't realized HAProxy had a client timeout. Thank you for that tidbit. And thank you for referencing my comments as well.

...

Snowflake sessions are now using the staging bridge, except for those that started before the change happened and haven't finished yet, and perhaps some proxies that still have the IP address of the production bridge in their DNS cache. I am not sure yet what will happen with metrics, but we'll see after a few days.

Currently, as I only use IPv4, I can't offer much insight as to the lack of IPv6 connections being reported (that's what my logs report, too). Your Heartbeat messages are looking good with a symmetric balance of connections and data. They look very similar to my Heartbeat logs; except, you can tell you offer more computing power, which is great to see extrapolated! I've found that the Heartbeat logs are key to knowing the health of your loadbalanced Tor implementation. You might consider setting up syslog with a Snowflake filter to aggregate your Snowflake logs for easier readability.

Regarding metrics.torproject.org... I expect you'll see that written-bytes and read-bytes only reflect that of a single Snowflake instance. However, your consensus weight will reflect the aggregate of all Snowflake instances.

...

On the matter of onion key rotation, I had the idea of making the onion key files read-only. Roger did some source code investigation and said that it might work to prevent onion key rotation, with some minor side effects. I plan to give the idea a try on a different bridge. The possible side effects are that tor will continue trying and failing to rotate the onion key every hour, and "force a router descriptor rebuild, so it will try to publish a new descriptor each hour."

I'm interested to hear how the prospective read-only file fix plays out. However, from my observations, I would assume that connects will eventually start bleeding off any instances that fail to update the key. We really need a long-term solution to this issue for this style of deployment. Keep up the Great Work! Respectfully,

Gary — This Message Originated by the Sun. iBigBlue 63W Solar Array (~12 Hour Charge) + 2 x Charmast 26800mAh Power Banks = iPhone XS Max 512GB (~2 Weeks Charged)

Gary C. New

26 Jan 26 Jan

6:39 a.m.

David,

...

I'd like to see more of your HAProxy configuration. Do you not have to use transparent proxy mode with Snowflake instances as you do with Tor Relay instances? I hadn't realized HAProxy had a client timeout. Thank you for that tidbit. And thank you for referencing my comments as well.

I found your HAProxy configuration in your "Draft installation guide." It seems you're using regular TCP streaming mode with the Snowflake instances vs transparent TCP streaming mode, which is a notable difference with the directly loadbalanced Tor Relay configuration. I also noticed you've configured the backend node timeout globally vs per node, which is just a nuance. You might test using a timeout value of 0s (to disable the timeout at the loadbalancer) and allow the Snowflake instances to preform state checking to ensure HAProxy isn't throttling your bridge. I've tested both and I'm still not sure which timeout configuration makes most sense for this style implementation. Currently, I'm running with the 0s (disabled) timeout. Any reason why you chose HAProxy over Nginx? I did notice that you're using the AssumeReachable 1 directive in your torrc files. Are you running into an issue where your Tor instances are failing the reachability test? Initially, I ran into a reachability issue and after digging through mountains of Tor debug logs discovered I needed to use transparent TCP streaming mode along with the Linux kernel and iptables changes to route the Tor traffic back from the Tor Relay Nodes to the loadbalancer. You shouldn't need to run your Tor instances with the AssumeReachable 1 directive. This might suggest something in your configuration isn't quite right. One of my initial tests was staggering the startup of my instances to see how they randomly reported to the DirectoryAuthorities. It's how I discovered that Tor instances pushed instead polled meta-data (different uptimes). The later would work better in a loadbalanced style deployment. Do your Snowflake instances not have issues reporting to different DirectoryAuthorities? My Tor instances have issues if I don't have them all report to the same DirectoryAuthority. Keep up the excellent work. Respectfully,

Gary— This Message Originated by the Sun. iBigBlue 63W Solar Array (~12 Hour Charge) + 2 x Charmast 26800mAh Power Banks = iPhone XS Max 512GB (~2 Weeks Charged)

David Fifield

27 Jan 27 Jan

8:03 a.m.

On Tue, Jan 25, 2022 at 11:21:10PM +0000, Gary C. New via tor-relays wrote:

...

It's nice to see that the Snowflake daemon offers a native configuration option for LimitNOFile. I ran into a similar issue with my initial loadbalanced Tor Relay Nodes that was solved at the O/S level using ulimit. It would be nice if torrc had a similar option.

LimitNOFile is actually not a Snowflake thing, it's a systemd thing. It's the same as `ulimit -n`. See: https://www.freedesktop.org/software/systemd/man/systemd.exec.html#Process%2...

...

From your documentation, it sounds like you're running everything on the same machine? When expanding to additional machines, similar to the file limit issue, you'll have to expand the usable ports as well.

I don't think I understand your point. At 64K simultaneous connections, you run out of source ports for making connection 4-tuple unique, but I don't see how the same or different hosts makes a difference, in that respect.

...

I found your HAProxy configuration in your “Draft installation guide.” It seems you’re using regular TCP streaming mode with the Snowflake instances vs transparent TCP streaming mode, which is a notable difference with the directly loadbalanced Tor Relay configuration.

I admit I did not understand your point about transparent proxying. If it's about retaining the client's source IP address for source IP address pinning, I don't think that helps us. This is a bridge, not a relay, and the source IP address that haproxy sees is several steps removed from the client's actual IP address. haproxy receives connections from a localhost web server (the server pluggable transport that receives WebSocket connections); the web server receives connections from Snowflake proxies (which can and do have different IP addresses during the lifetime of a client session); only the Snowflake proxies themselves receive direct traffic from the client's own source IP address. The client's IP address is tunnelled all the way through to tor, for metrics purposes, but that uses the ExtORPort protocol and the load balancer isn't going to understand that. I think that transparent proxying would only transparently proxy the localhost IP addresses from the web server, which doesn't have any benefit, I don't think.

What's written in the draft installation guide is not the whole file. There's additionally the default settings as follows:

``` global log /dev/log local0 log /dev/log local1 notice chroot /var/lib/haproxy stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners stats timeout 30s user haproxy group haproxy daemon

# Default SSL material locations ca-base /etc/ssl/certs crt-base /etc/ssl/private

# See: https://ssl-config.mozilla.org/#server=haproxy&server-version=2.0.3&... ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384 ssl-default-bind-ciphersuites TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256 ssl-default-bind-options ssl-min-ver TLSv1.2 no-tls-tickets

defaults log global mode http option httplog option dontlognull timeout connect 5000 timeout client 50000 timeout server 50000 errorfile 400 /etc/haproxy/errors/400.http errorfile 403 /etc/haproxy/errors/403.http errorfile 408 /etc/haproxy/errors/408.http errorfile 500 /etc/haproxy/errors/500.http errorfile 502 /etc/haproxy/errors/502.http errorfile 503 /etc/haproxy/errors/503.http errorfile 504 /etc/haproxy/errors/504.http ```

...

You might test using a timeout value of 0s (to disable the timeout at the loadbalancer) and allow the Snowflake instances to preform state checking to ensure HAProxy isn’t throttling your bridge.

Thanks for that hint. So far, 10-minute timeouts seem not to be causing a problem. I don't know this software too well, but I think it's an idle timeout, not an absolute limit on connection lifetime.

...

Currently, as I only use IPv4, I can't offer much insight as to the lack of IPv6 connections being reported (that's what my logs report, too).

On further reflection, I don't think there's a problem here. The instances' bridge-stats and end-stats show a mix of countries and v4/v6. https://bugs.torproject.org/tpo/anti-censorship/pluggable-transports/snowfla...

...

Regarding metrics.torproject.org... I expect you'll see that written-bytes and read-bytes only reflect that of a single Snowflake instance. However, your consensus weight will reflect the aggregate of all Snowflake instances.

Indeed, the first few data points after the switchover show an apparent decrease in read/written bytes per second, even though the on-bridge bandwidth monitors show much more bandwidth being used than before. I suppose it could be selecting from any of 5 instances that currently share the same identity fingerprint: the 4 new load-balanced instances on the "staging" bridge, plus the 1 instance which is still running concurrently on the "production" bridge. When we finish the upgrade and get all the instances back on the production bridge, if the metrics are wrong, they will at least be uniformly wrong. https://metrics.torproject.org/rs.html#details/5481936581E23D2D178105D44DB69... https://bugs.torproject.org/tpo/anti-censorship/pluggable-transports/snowfla...

...

Any reason why you chose HAProxy over Nginx?

Shelikhoo drafted a configuration using Nginx, which for the time being you can see here: https://bugs.torproject.org/tpo/anti-censorship/pluggable-transports/snowfla... https://pad.riseup.net/p/pvKoxaIcejfiIbvVAV7j#L416

I don't have a strong preference and I don't have a lot of experience with either one. haproxy seemed to offer fewer opportunities for error, because the default Nginx installation expects to run a web server, which I would have to disable and ensure it did not fight with snowflake-server for port 443. It just seemed simpler to have one configuration file to edit and restart the daemon.

...

I did notice that you’re using the AssumeReachable 1 directive in your torrc files. Are you running into an issue where your Tor instances are failing the reachability test?

It's because this bridge does not expose its ORPort, which is the recommended configuration for default bridges. The torrc has `ORPort 127.0.0.1:auto`, so the bridges will never be reachable over their ORPort, which is intentional. Bridges that want to be distributed by BridgeDB need to expose their ORPort, which is an unfortunate technical limitation that makes the bridges more detectable (https://bugs.torproject.org/tpo/core/tor/7349), but for default bridges it's not necessary. To be honest, I'm not sure that `AssumeReachable` is even required anymore for this kind of configuration; it's just something I remember having to do years ago for some reason. It may be superfluous now that we have `BridgeDistribution none`.

...

Do your Snowflake instances not have issues reporting to different DirectoryAuthorities?

Other than the possible metrics anomalies, I don't know what kind of issue you mean. It could be that, being a bridge, it has fewer constraints than your relays. A bridge doesn't have to be listed in the consensus, for example.

Gary C. New

26 Jan 26 Jan

10:47 p.m.

David,

...

Snowflake sessions are now using the staging bridge, except for those that started before the change happened and haven't finished yet, and perhaps some proxies that still have the IP address of the production bridge in their DNS cache. I am not sure yet what will happen with metrics, but we'll see after a few days.

With regard to loadbalanced Snowflake sessions, I'm curious to know what connections (i.e., inbound, outbound, directory, control, etc) are being displayed within nyx? Much Appreciated.

Gary— This Message Originated by the Sun. iBigBlue 63W Solar Array (~12 Hour Charge) + 2 x Charmast 26800mAh Power Banks = iPhone XS Max 512GB (~2 Weeks Charged)

Gary C. New

11:52 p.m.

David, I've been following your progress in the "Add load balancing to bridge (#40095)" issue.

...

The apparent decrease has to be spurious, since even at the beginning the bridge was moving more than 10 MB/s in both directions. A couple of hypotheses about what might be happening:

- Onionoo is only showing us one instance out of the four. The actual numbers are four times higher. Per my previous response, my findings are consistent with yours in that Onionoo only shows metrics for a single instance; except, for consensus weight.

...

Here are the most recent heartbeat logs. It looks like the load is fairly balanced, with each of the four tor instances having sent between 400 and 500 GB since being started.

Your Heartbeat logs continue to appear to be in good health. When keys are rotated, the Heartbeat logs will be a key indicator in validating health whether connections are bleeding off from or remaining with a particular instance.

...

I worried a bit about the "0 with IPv6" in a previous comment. Looking at the bridge-stats files, I don't think there's a problem.

I'm glad to hear you feel the IPv6 reporting appears to be a false-negative. Does this mean there's something wrong with IPv6 Heartbeat reporting?

...

Despite the load balancing, the 8 CPUs are pretty close to maxed. I would not mind having 16 cores right now. We may be in an induced demand situation where we make the bridge faster → the bridge gets more users → bridge gets slower.

I believe your observation is correct with regard to an induced traffic situation. As cpu resources increase, it will likely be lagged by increased traffic, until demand is satisfied or you run out of cpu resources, again. Are your existing 8 cpu's only single cores? Is it too difficult to upgrade with your VPS provider? The O/S should detect the virtual hardware changes and add them accordingly. My current resource constraint is RAM, but I'm using bare-metal machines. Great Progress!

Gary— This Message Originated by the Sun. iBigBlue 63W Solar Array (~12 Hour Charge) + 2 x Charmast 26800mAh Power Banks = iPhone XS Max 512GB (~2 Weeks Charged)

David Fifield

28 Jan 28 Jan

6:14 a.m.

...

With regard to loadbalanced Snowflake sessions, I'm curious to know what connections (i.e., inbound, outbound, directory, control, etc) are being displayed within nyx?

I'm not using nyx. I'm just looking at the bandwidth on the network interface.

...

Your Heartbeat logs continue to appear to be in good health. When keys are rotated,

We're trying to avoid rotating keys at all. If the read-only files do not work, we'll instead probably periodically rewrite the state file to push the rotation into the future.

...

...
I worried a bit about the "0 with IPv6" in a previous comment. Looking at the bridge-stats files, I don't think there's a problem.

I'm glad to hear you feel the IPv6 reporting appears to be a false-negative. Does this mean there's something wrong with IPv6 Heartbeat reporting?

I don't know if it's wrong, exactly. It's reporting something different than what ExtORPort is providing. The proximate connections to tor are indeed all IPv4.

...

Are your existing 8 cpu's only single cores? Is it too difficult to upgrade with your VPS provider?

Sure, there are plenty of ways to increase resources of the bridge, but I feel that's a different topic.

Thanks for your comments.

Gary C. New

29 Jan 29 Jan

2:54 a.m.

David,

On Thursday, January 27, 2022, 1:03:25 AM MST, David Fifield david@bamsoftware.com wrote:

...

...
It's nice to see that the Snowflake daemon offers a native configuration option for LimitNOFile. I ran into a similar issue with my initial loadbalanced Tor Relay Nodes that was solved at the O/S level using ulimit. It would be nice if torrc had a similar option.

...

LimitNOFile is actually not a Snowflake thing, it's a systemd thing. It's the same as `ulimit -n`. See:

https://www.freedesktop.org/software/systemd/man/systemd.exec.html#Process%2...

Ah... My mistake. In my cursory review of your "Draft installation guide" I only saw snowflake-server. and assumed it was .conf where in actuality it is .service. I should have noticed the /etc/systemd path. Thank you for the correction.

...

...
From your documentation, it sounds like you're running everything on the same machine? When expanding to additional machines, similar to the file limit issue, you'll have to expand the usable ports as well.

...

I don't think I understand your point. At 64K simultaneous connections, you run out of source ports for making connection 4-tuple unique, but I don't see how the same or different hosts makes a difference, in that respect.

On many Linux distros, the default ip_local_port_range is between 32768 - 61000.

# cat /proc/sys/net/ipv4/ip_local_port_range

32768 61000

The Tor Project recommends increasing it.

# echo 15000 64000 > /proc/sys/net/ipv4/ip_local_port_range

...

...
I found your HAProxy configuration in your “Draft installation guide.” It seems you’re using regular TCP streaming mode with the Snowflake instances vs transparent TCP streaming mode, which is a notable difference with the directly loadbalanced Tor Relay configuration.

...

I admit I did not understand your point about transparent proxying. If it's about retaining the client's source IP address for source IP address pinning, I don't think that helps us.

In Transparent TCP Steam mode, the Loadbalancer clones the IP address of the connecting Tor Client/Relay for use on the internal interface with connections to the upstream Tor Relay Nodes, so the Upstream Tor Relay Nodes believe they're talking to the actual connecting Tor Client/Relay.

...

This is a bridge, not a relay, and the source IP address that haproxy sees is several steps removed from the client's actual IP address. haproxy receives connections from a localhost web server (the server pluggable transport that receives WebSocket connections); the web server receives connections from Snowflake proxies (which can and do have different IP addresses during the lifetime of a client session); only the Snowflake proxies themselves receive direct traffic from the client's own source IP address.

You are correct. This makes more sense why HAProxy's Regular TCP Streaming Mode works in this paradigm. I believe what was confusing was the naming convention of your Tor instances (i.e., snowflake#), which lead me to believe that your Snowflake proxy instances were upstream and not downstream. However, correlating the IP address assignments between configurations confirms HAProxy is loadbalancing upstream to your Tor Nodes.

...

The client's IP address is tunnelled all the way through to tor, for metrics purposes, but that uses the ExtORPort protocol and the load balancer isn't going to understand that.

As long as HAProxy is configured to use TCP Streaming Mode, it doesn't matter what protocol is used as it will be passed through encapsulated in TCP. That's the beauty of TCP Streaming Mode.

...

I think that transparent proxying would only transparently proxy the localhost IP addresses from the web server, which doesn't have any benefit, I don't think.

Agreed.

...

...
You might test using a timeout value of 0s (to disable the timeout at the loadbalancer) and allow the Snowflake instances to preform state checking to ensure HAProxy isn’t throttling your bridge.

...

Thanks for that hint. So far, 10-minute timeouts seem not to be causing a problem. I don't know this software too well, but I think it's an idle timeout, not an absolute limit on connection lifetime.

It's HAProxy's Passive Health Check Timeout. The reason why I disabled (0s) this timeout is I felt that the Tor instances know their state threshold better and if they became overloaded would tell the DirectoryAuthorities. One scenario where a lengthy HAProxy timeout might be of value is if a single instance was having issues and causing a reported overloaded state for the rest. However, this would more likely occur in a multi-physical/virtual-node environment. You'll have to continue to update me with your thoughts on this subject as you continue your testing.

...

...
Any reason why you chose HAProxy over Nginx?

...

Shelikhoo drafted a configuration using Nginx, which for the time being you can see here:

https://bugs.torproject.org/tpo/anti-censorship/pluggable-transports/snowfla...

https://pad.riseup.net/p/pvKoxaIcejfiIbvVAV7j#L416

...

I don't have a strong preference and I don't have a lot of experience with either one. haproxy seemed to offer fewer opportunities for error, because the default Nginx installation expects to run a web server, which I would have to disable and ensure it did not fight with snowflake-server for port 443. It just seemed simpler to have one configuration file to edit and restart the daemon.

My Nginx configuration is actually smaller than my HAProxy configuration. All you really need from either Nginx/HAProxy configurations are the Global Default settings (especially the file/connection limits) and your TCP Streaming settings. As stated previously, I would recommend using Nginx simply for the fact that it forks additional child processes as connections/demand increases, which I could never figured out with HAProxy.

...

...
I did notice that you’re using the AssumeReachable 1 directive in your torrc files. Are you running into an issue where your Tor instances are failing the reachability test?

...

It's because this bridge does not expose its ORPort, which is the recommended configuration for default bridges. The torrc has `ORPort 127.0.0.1:auto`, so the bridges will never be reachable over their ORPort, which is intentional. Bridges that want to be distributed by BridgeDB need to expose their ORPort, which is an unfortunate technical limitation that makes the bridges more detectable (https://bugs.torproject.org/tpo/core/tor/7349), but for default bridges it's not necessary. To be honest, I'm not sure that `AssumeReachable` is even required anymore for this kind of configuration; it's just something I remember having to do years ago for some reason. It may be superfluous now that we have `BridgeDistribution none`.

Interesting... This shows my lack of knowledge regarding bridges as I have never run a bridge. Additionally, it highlights the major differences in running a Loadbalanced Tor Bridge vs a Loadbalanced Tor Relay and the necessity of using Transparent TCP Streaming Mode when the ORPort is exposed vs using Regular TCP Streaming Mode when the ORPort is not exposed. My Nginx Loadbalancer sits on the border of my network, listens on ORPort 9001, and uses Transparent TCP Streaming to loadbalance connections upstream to my Tor Relay Nodes.

...

...
Do your Snowflake instances not have issues reporting to different DirectoryAuthorities?

...

Other than the possible metrics anomalies, I don't know what kind of issue you mean. It could be that, being a bridge, it has fewer constraints than your relays. A bridge doesn't have to be listed in the consensus, for example.

Yes... It's issues with consensus that I run into, if I don't configure my Tor Relay Nodes to send updates to a single DirectoryAuthority. This appears to be another major difference between running a Loadbalanced Tor Bridge vs a Loadbalanced Tor Relay.

...

...
With regard to loadbalanced Snowflake sessions, I'm curious to know what connections (i.e., inbound, outbound, directory, control, etc) are being displayed within nyx?

...

I'm not using nyx. I'm just looking at the bandwidth on the network

interface.

If you have time, would you mind installing nyx to validate observed similarities/differences between our loadbalanced configurations?

...

...
Your Heartbeat logs continue to appear to be in good health. When keys are rotated,

...

We're trying to avoid rotating keys at all. If the read-only files do not work, we'll instead probably periodically rewrite the state file to push the rotation into the future.

I'm especially interested in this topic. Please keep me updated!

...

...
...
I worried a bit about the "0 with IPv6" in a previous comment. Looking at the bridge-stats files, I don't think there's a problem.

...

...
I'm glad to hear you feel the IPv6 reporting appears to be a false-negative. Does this mean there's something wrong with IPv6 Heartbeat reporting?

...

I don't know if it's wrong, exactly. It's reporting something different than what ExtORPort is providing. The proximate connections to tor are indeed all IPv4.

I see. Perhaps IPv6 connections are less prolific and require more time to ramp?

...

...
Are your existing 8 cpu's only single cores? Is it too difficult to upgrade with your VPS provider?

...

Sure, there are plenty of ways to increase resources of the bridge, but I feel that's a different topic.

After expanding my reading of your related "issues," I see that your VPS provider only offers up to 8 cores. Is it possible to spin-up another VPS environment, with the same provider, on a separate VLAN, allowing route/firewall access between the two VPS environments? This way you could test loadbalancing a Tor Bridge over a local network using multiple virtual environments. Perhaps, the Tor Project might even assist you with such a short-term investment (I read the meeting notes). ;-)

...

Thanks for your comments.

Thank you for your responses.

Respectfully,

Gary

—

This Message Originated by the Sun.

iBigBlue 63W Solar Array (~12 Hour Charge)

+ 2 x Charmast 26800mAh Power Banks

= iPhone XS Max 512GB (~2 Weeks Charged)

David Fifield

30 Jan 30 Jan

5:46 a.m.

On Sat, Jan 29, 2022 at 02:54:40AM +0000, Gary C. New via tor-relays wrote:

...

...
...
From your documentation, it sounds like you're running everything on the same machine? When expanding to additional machines, similar to the file limit issue, you'll have to expand the usable ports as well.

...
I don't think I understand your point. At 64K simultaneous connections, you run out of source ports for making connection 4-tuple unique, but I don't see how the same or different hosts makes a difference, in that respect.

On many Linux distros, the default ip_local_port_range is between 32768 - 61000.

The Tor Project recommends increasing it.
# echo 15000 64000 > /proc/sys/net/ipv4/ip_local_port_range

Thanks, that's a good tip. I added it to the installation guide.

...

...
I'm not using nyx. I'm just looking at the bandwidth on the network interface.

If you have time, would you mind installing nyx to validate observed similarities/differences between our loadbalanced configurations?

I don't have plans to do that.

...

...
...
I'm glad to hear you feel the IPv6 reporting appears to be a false-negative. Does this mean there's something wrong with IPv6 Heartbeat reporting?

...
I don't know if it's wrong, exactly. It's reporting something different than what ExtORPort is providing. The proximate connections to tor are indeed all IPv4.

I see. Perhaps IPv6 connections are less prolific and require more time to ramp?

No, it's not that. The bridge has plenty of connections from clients that use an IPv6 address, as the bridge-stats file shows:

```plain bridge-ip-versions v4=15352,v6=1160 ```

It's just that, unlike a direct TCP connection as the the case with a guard relay, the client connections pass through a chain of proxies and processes on the way to the tor: client → Snowflake proxy → snowflake-server WebSocket server → extor-static-cookie adapter → tor. The last link in the chain is IPv4, and evidently that is what the heartbeat log reports. The client's actual IP address is tunnelled, for metrics purposes, through this chain of proxies and processes, to tor using a special protocol called ExtORPort (see USERADDR at https://gitweb.torproject.org/torspec.git/tree/proposals/196-transport-contr...). It looks like the bridge-stats descriptor pays attention to the USERADDR information and the heartbeat log does not, that's all.

...

After expanding my reading of your related "issues," I see that your VPS provider only offers up to 8 cores. Is it possible to spin-up another VPS environment, with the same provider, on a separate VLAN, allowing route/ firewall access between the two VPS environments? This way you could test loadbalancing a Tor Bridge over a local network using multiple virtual environments.

Yes, there are many other potential ways to further expand the deployment, but I do not have much interest in that topic right now. I started the thread for help with a non-obvious point, namely getting past the bottleneck of a single-core tor process. I think that we have collectively found a satisfactory solution for that. The steps after that for further scaling are relatively straightforward, I think. Running one instance of snowflake-server on one host and all the instances of tor on a nearby host is a logical next step.

Gary C. New

8:41 p.m.

On Saturday, January 29, 2022, 9:46:59 PM PST, David Fifield david@bamsoftware.com wrote:

...

...
...
I'm not using nyx. I'm just looking at the bandwidth on the network interface.

...

...
If you have time, would you mind installing nyx to validate observed similarities/differences between our loadbalanced configurations?

...

I don't have plans to do that.

I appreciate you setting expectations.

...

...
...
...
I'm glad to hear you feel the IPv6 reporting appears to be a false-negative. Does this mean there's something wrong with IPv6 Heartbeat reporting?

...

...
...
I don't know if it's wrong, exactly. It's reporting something different than what ExtORPort is providing. The proximate connections to tor are indeed all IPv4.

...

...
I see. Perhaps IPv6 connections are less prolific and require more time to ramp?

...

No, it's not that. The bridge has plenty of connections from clients that use an IPv6 address, as the bridge-stats file shows:

...

```plain bridge-ip-versions v4=15352,v6=1160 ```

...

It's just that, unlike a direct TCP connection as the the case with a guard relay, the client connections pass through a chain of proxies and processes on the way to the tor: client → Snowflake proxy → snowflake-server WebSocket server → extor-static-cookie adapter → tor. The last link in the chain is IPv4, and evidently that is what the heartbeat log reports. The client's actual IP address is tunnelled, for metrics purposes, through this chain of proxies and processes, to tor using a special protocol called ExtORPort (see USERADDR at https://gitweb.torproject.org/torspec.git/tree/proposals/196-transport-contr...). It looks like the bridge-stats descriptor pays attention to the USERADDR information and the heartbeat log does not, that's all.

Ah... Gotcha. Thank you for clarifying.

...

...
After expanding my reading of your related "issues," I see that your VPS provider only offers up to 8 cores. Is it possible to spin-up another VPS environment, with the same provider, on a separate VLAN, allowing route/ firewall access between the two VPS environments? This way you could test loadbalancing a Tor Bridge over a local network using multiple virtual environments.

Yes, there are many other potential ways to further expand the deployment, but I do not have much interest in that topic right now. I started the thread for help with a non-obvious point, namely getting past the bottleneck of a single-core tor process. I think that we have collectively found a satisfactory solution for that. The steps after that for further scaling are relatively straightforward, I think. Running one instance of snowflake-server on one host and all the instances of tor on a nearby host is a logical next step.

Understand. I appreciate the work you have done and the opportunity to compare and contrast Loadbalanced Tor Bridges vs Loadbalanced Tor Relays. Please update the tor-relays mailing-list with any new findings related to subversion of the onion keys rotation. Excellent Work! Respectfully,

Gary— This Message Originated by the Sun. iBigBlue 63W Solar Array (~12 Hour Charge) + 2 x Charmast 26800mAh Power Banks = iPhone XS Max 512GB (~2 Weeks Charged)

David Fifield

29 Jan 29 Jan

2:58 a.m.

...

On the matter of onion key rotation, I had the idea of making the onion key files read-only. Roger did some source code investigation and said that it might work to prevent onion key rotation, with some minor side effects. I plan to give the idea a try on a different bridge. The possible side effects are that tor will continue trying and failing to rotate the onion key every hour, and "force a router descriptor rebuild, so it will try to publish a new descriptor each hour."

Making secret_onion_key and secret_onion_key_ntor read-only does not quite work, because tor first renames them to secret_onion_key.old and secret_onion_key_ntor.old before writing new files. (Making the *.old files read-only does not work either, because the `tor_rename` function first unlinks the destination.) https://gitweb.torproject.org/tor.git/tree/src/feature/relay/router.c?h=tor-...

But a slight variation does work: make secret_onion_key.old and secret_onion_key_ntor.old *directories*, so that tor_rename cannot rename a file over them. It does result in an hourly `BUG` stack trace, but otherwise it seems effective.

I did a test with two tor instances. The rot1 instance had the directory hack to prevent onion key rotation. The rot2 had nothing to prevent onion key rotation.

```plain # tor-instance-create rot1 # tor-instance-create rot2 ```

/etc/tor/instances/rot1/torrc: ```plain Log info file /var/lib/tor-instances/rot1/onionrotate.info.log BridgeRelay 1 AssumeReachable 1 BridgeDistribution none ORPort 127.0.0.1:auto ExtORPort auto SocksPort 0 Nickname onionrotate1 ```

/etc/tor/instances/rot2/torrc: ```plain Log info file /var/lib/tor-instances/rot2/onionrotate.info.log BridgeRelay 1 AssumeReachable 1 BridgeDistribution none ORPort 127.0.0.1:auto ExtORPort auto SocksPort 0 Nickname onionrotate2 ```

Start rot1, copy its keys to rot2, then start rot2:

```plain # service tor@rot1 start # cp -r /var/lib/tor-instances/rot1/keys /var/lib/tor-instances/rot2/ # chown -R _tor-rot2:_tor-rot2 /var/lib/tor-instances/rot2/keys # service tor@rot2 start ```

Stop the two instances, check that the onion keys are the same, and that `LastRotatedOnionKey` is set in both state files:

```plain # service tor@rot1 stop # service tor@rot2 stop # ls -l /var/lib/tor-instances/rot*/keys/secret_onion_key* -rw------- 1 _tor-rot1 _tor-rot1 888 Jan 28 22:57 /var/lib/tor-instances/rot1/keys/secret_onion_key -rw------- 1 _tor-rot1 _tor-rot1 96 Jan 28 22:57 /var/lib/tor-instances/rot1/keys/secret_onion_key_ntor -rw------- 1 _tor-rot2 _tor-rot2 888 Jan 28 23:05 /var/lib/tor-instances/rot2/keys/secret_onion_key -rw------- 1 _tor-rot2 _tor-rot2 96 Jan 28 23:05 /var/lib/tor-instances/rot2/keys/secret_onion_key_ntor # md5sum /var/lib/tor-instances/rot*/keys/secret_onion_key* fb2a8a8f9de56f061eccbb3fedd700c4 /var/lib/tor-instances/rot1/keys/secret_onion_key 2066ab7e01595adf42fc791ad36e1fc5 /var/lib/tor-instances/rot1/keys/secret_onion_key_ntor fb2a8a8f9de56f061eccbb3fedd700c4 /var/lib/tor-instances/rot2/keys/secret_onion_key 2066ab7e01595adf42fc791ad36e1fc5 /var/lib/tor-instances/rot2/keys/secret_onion_key_ntor # grep LastRotatedOnionKey /var/lib/tor-instances/rot*/state /var/lib/tor-instances/rot1/state:LastRotatedOnionKey 2022-01-28 22:57:14 /var/lib/tor-instances/rot2/state:LastRotatedOnionKey 2022-01-28 23:11:04 ```

Set `LastRotatedOnionKey` 6 weeks into the past to force an attempt to rotate the keys the next time tor is restarted:

```plain # sed -i -e 's/^LastRotatedOnionKey .*/LastRotatedOnionKey 2021-12-15 00:00:00/' /var/lib/tor-instances/rot*/state # grep LastRotatedOnionKey /var/lib/tor-instances/rot*/state /var/lib/tor-instances/rot1/state:LastRotatedOnionKey 2021-12-15 00:00:00 /var/lib/tor-instances/rot2/state:LastRotatedOnionKey 2021-12-15 00:00:00 ```

Create the secret_onion_key.old and secret_onion_key_ntor.old directories in the rot1 instance.

```plain # mkdir -m 700 /var/lib/tor-instances/rot1/keys/secret_onion_key{,_ntor}.old ```

Check the identity of keys before starting:

```plain # md5sum /var/lib/tor-instances/rot*/keys/secret_onion_key* fb2a8a8f9de56f061eccbb3fedd700c4 /var/lib/tor-instances/rot1/keys/secret_onion_key 2066ab7e01595adf42fc791ad36e1fc5 /var/lib/tor-instances/rot1/keys/secret_onion_key_ntor md5sum: /var/lib/tor-instances/rot1/keys/secret_onion_key_ntor.old: Is a directory md5sum: /var/lib/tor-instances/rot1/keys/secret_onion_key.old: Is a directory fb2a8a8f9de56f061eccbb3fedd700c4 /var/lib/tor-instances/rot2/keys/secret_onion_key 2066ab7e01595adf42fc791ad36e1fc5 /var/lib/tor-instances/rot2/keys/secret_onion_key_ntor ```

Start both instances:

```plain # service tor@rot1 start # service tor@rot2 start ```

Verify that the rot1 instance is still using the same onion keys, while rot2 has rotated them:

```plain # ls -ld /var/lib/tor-instances/rot*/keys/secret_onion_key* -rw------- 1 _tor-rot1 _tor-rot1 888 Jan 28 23:45 /var/lib/tor-instances/rot1/keys/secret_onion_key -rw------- 1 _tor-rot1 _tor-rot1 96 Jan 28 23:45 /var/lib/tor-instances/rot1/keys/secret_onion_key_ntor drwx--S--- 2 root _tor-rot1 4096 Jan 28 23:44 /var/lib/tor-instances/rot1/keys/secret_onion_key_ntor.old drwx--S--- 2 root _tor-rot1 4096 Jan 28 23:44 /var/lib/tor-instances/rot1/keys/secret_onion_key.old -rw------- 1 _tor-rot2 _tor-rot2 888 Jan 28 23:47 /var/lib/tor-instances/rot2/keys/secret_onion_key -rw------- 1 _tor-rot2 _tor-rot2 96 Jan 28 23:47 /var/lib/tor-instances/rot2/keys/secret_onion_key_ntor -rw------- 1 _tor-rot2 _tor-rot2 96 Jan 28 23:05 /var/lib/tor-instances/rot2/keys/secret_onion_key_ntor.old -rw------- 1 _tor-rot2 _tor-rot2 888 Jan 28 23:05 /var/lib/tor-instances/rot2/keys/secret_onion_key.old # md5sum /var/lib/tor-instances/rot*/keys/secret_onion_key* fb2a8a8f9de56f061eccbb3fedd700c4 /var/lib/tor-instances/rot1/keys/secret_onion_key 2066ab7e01595adf42fc791ad36e1fc5 /var/lib/tor-instances/rot1/keys/secret_onion_key_ntor md5sum: /var/lib/tor-instances/rot1/keys/secret_onion_key_ntor.old: Is a directory md5sum: /var/lib/tor-instances/rot1/keys/secret_onion_key.old: Is a directory fb8a5e8787141dba4e935267f818cc2a /var/lib/tor-instances/rot2/keys/secret_onion_key 2c3f7d81e96641e2c04fb9c452296337 /var/lib/tor-instances/rot2/keys/secret_onion_key_ntor 2066ab7e01595adf42fc791ad36e1fc5 /var/lib/tor-instances/rot2/keys/secret_onion_key_ntor.old fb2a8a8f9de56f061eccbb3fedd700c4 /var/lib/tor-instances/rot2/keys/secret_onion_key.old ```

The rot1 instance's `LastRotatedOnionKey` remains the same, while rot2's is updated:

```plain # grep LastRotatedOnionKey /var/lib/tor-instances/rot*/state /var/lib/tor-instances/rot1/state:LastRotatedOnionKey 2021-12-15 00:00:00 /var/lib/tor-instances/rot2/state:LastRotatedOnionKey 2022-01-28 23:47:02 ```

The rot1 instance's log shows the failure to rotate the keys:

/var/lib/tor-instances/rot1/onionrotate.info.log ```plain Jan 28 23:46:59.000 [info] rotate_onion_key_callback(): Rotating onion key. Jan 28 23:46:59.000 [warn] Couldn't rotate onion key. Jan 28 23:46:59.000 [info] router_rebuild_descriptor(): Rebuilding relay descriptor (forced) ... Jan 28 23:46:59.000 [info] check_onion_keys_expiry_time_callback(): Expiring old onion keys. ```

While the rot2 rotation was successful:

/var/lib/tor-instances/rot2/onionrotate.info.log ```plain Jan 28 23:47:02.000 [info] rotate_onion_key_callback(): Rotating onion key. Jan 28 23:47:02.000 [info] rotate_onion_key(): Rotating onion key Jan 28 23:47:02.000 [info] mark_my_descriptor_dirty(): Decided to publish new relay descriptor: rotated onion key ```

After 1 hour, the rot1 instance tries to rebuild its relay descriptor, and triggers a `BUG` non-fatal assertion failure in [`router_rebuild_descriptor`](https://gitweb.torproject.org/tor.git/tree/src/feature/relay/router.c?h=tor-...). I let it run for 1 more hour after that, and it happened again.

/var/lib/tor-instances/rot1/onionrotate.info.log ```plain Jan 29 00:46:59.000 [info] router_rebuild_descriptor(): Rebuilding relay descriptor (forced) Jan 29 00:46:59.000 [warn] The IPv4 ORPort address 127.0.0.1 does not match the descriptor address 172.105.3.197. If you have a static public IPv4 address, use 'Address <IPv4>' and 'OutboundBindAddress <IPv4>'. If you are behind a NAT, use two ORPort lines: 'ORPort <PublicPort> NoListen' and 'ORPort <InternalPort> NoAdvertise'. Jan 29 00:46:59.000 [info] extrainfo_dump_to_string_stats_helper(): Adding stats to extra-info descriptor. Jan 29 00:46:59.000 [info] read_file_to_str(): Could not open "/var/lib/tor-instances/rot1/stats/bridge-stats": No such file or directory Jan 29 00:46:59.000 [warn] tor_bug_occurred_(): Bug: ../src/feature/relay/router.c:2452: router_rebuild_descriptor: Non-fatal assertion !(desc_gen_reason == NULL) failed. (on Tor 0.4.5.10 ) Jan 29 00:46:59.000 [warn] Bug: Tor 0.4.5.10: Non-fatal assertion !(desc_gen_reason == NULL) failed in router_rebuild_descriptor at ../src/feature/relay/router.c:2452. Stack trace: (on Tor 0.4.5.10 ) Jan 29 00:46:59.000 [warn] Bug: /usr/bin/tor(log_backtrace_impl+0x57) [0x5638b9538047] (on Tor 0.4.5.10 ) Jan 29 00:46:59.000 [warn] Bug: /usr/bin/tor(tor_bug_occurred_+0x16b) [0x5638b954327b] (on Tor 0.4.5.10 ) Jan 29 00:46:59.000 [warn] Bug: /usr/bin/tor(router_rebuild_descriptor+0x13d) [0x5638b94f4e1d] (on Tor 0.4.5.10 ) Jan 29 00:46:59.000 [warn] Bug: /usr/bin/tor(+0x21f163) [0x5638b9665163] (on Tor 0.4.5.10 ) Jan 29 00:46:59.000 [warn] Bug: /usr/bin/tor(+0x83577) [0x5638b94c9577] (on Tor 0.4.5.10 ) Jan 29 00:46:59.000 [warn] Bug: /lib/x86_64-linux-gnu/libevent-2.1.so.7(+0x239ef) [0x7f701bae49ef] (on Tor 0.4.5.10 ) Jan 29 00:46:59.000 [warn] Bug: /lib/x86_64-linux-gnu/libevent-2.1.so.7(event_base_loop+0x52f) [0x7f701bae528f] (on Tor 0.4.5.10 ) Jan 29 00:46:59.000 [warn] Bug: /usr/bin/tor(do_main_loop+0x101) [0x5638b94b1321] (on Tor 0.4.5.10 ) Jan 29 00:46:59.000 [warn] Bug: /usr/bin/tor(tor_run_main+0x1d5) [0x5638b94acdd5] (on Tor 0.4.5.10 ) Jan 29 00:46:59.000 [warn] Bug: /usr/bin/tor(tor_main+0x49) [0x5638b94a92e9] (on Tor 0.4.5.10 ) Jan 29 00:46:59.000 [warn] Bug: /usr/bin/tor(main+0x19) [0x5638b94a8ec9] (on Tor 0.4.5.10 ) Jan 29 00:46:59.000 [warn] Bug: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea) [0x7f701b391d0a] (on Tor 0.4.5.10 ) Jan 29 00:46:59.000 [warn] Bug: /usr/bin/tor(_start+0x2a) [0x5638b94a8f1a] (on Tor 0.4.5.10 ) Jan 29 00:46:59.000 [info] router_upload_dir_desc_to_dirservers(): Uploading relay descriptor to directory authorities Jan 29 00:46:59.000 [info] directory_post_to_dirservers(): Uploading an extrainfo too (length 822) Jan 29 00:46:59.000 [info] rep_hist_note_used_internal(): New port prediction added. Will continue predictive circ building for 3332 more seconds. Jan 29 00:46:59.000 [info] connection_ap_make_link(): Making internal anonymized tunnel to [scrubbed]:9001 ... Jan 29 00:46:59.000 [info] connection_ap_make_link(): ... application connection created and linked. Jan 29 00:46:59.000 [info] check_onion_keys_expiry_time_callback(): Expiring old onion keys. ```

Stopping and restarting the tor1 instance keeps the same onion keys, and the first rotation does not hit the assertion failure:

```plain # service tor@rot1 stop # service tor@rot1 start # md5sum /var/lib/tor-instances/rot*/keys/secret_onion_key* fb2a8a8f9de56f061eccbb3fedd700c4 /var/lib/tor-instances/rot1/keys/secret_onion_key 2066ab7e01595adf42fc791ad36e1fc5 /var/lib/tor-instances/rot1/keys/secret_onion_key_ntor md5sum: /var/lib/tor-instances/rot1/keys/secret_onion_key_ntor.old: Is a directory md5sum: /var/lib/tor-instances/rot1/keys/secret_onion_key.old: Is a directory fb8a5e8787141dba4e935267f818cc2a /var/lib/tor-instances/rot2/keys/secret_onion_key 2c3f7d81e96641e2c04fb9c452296337 /var/lib/tor-instances/rot2/keys/secret_onion_key_ntor 2066ab7e01595adf42fc791ad36e1fc5 /var/lib/tor-instances/rot2/keys/secret_onion_key_ntor.old fb2a8a8f9de56f061eccbb3fedd700c4 /var/lib/tor-instances/rot2/keys/secret_onion_key.old ```

/var/lib/tor-instances/rot1/onionrotate.info.log ```plain Jan 29 02:06:13.000 [info] rotate_onion_key_callback(): Rotating onion key. Jan 29 02:06:13.000 [warn] Couldn't rotate onion key. Jan 29 02:06:13.000 [info] router_rebuild_descriptor(): Rebuilding relay descriptor (forced) ... Jan 29 02:06:13.000 [info] check_onion_keys_expiry_time_callback(): Expiring old onion keys. ```

Gary C. New

3:26 a.m.

David,

...

Making secret_onion_key and secret_onion_key_ntor read-only does not quite work, because tor first renames them to secret_onion_key.old and secret_onion_key_ntor.old before writing new files. (Making the *.old files read-only does not work either, because the `tor_rename` function first unlinks the destination.)

https://gitweb.torproject.org/tor.git/tree/src/feature/relay/router.c?h=tor-...

...

But a slight variation does work: make secret_onion_key.old and secret_onion_key_ntor.old *directories*, so that tor_rename cannot rename a file over them. It does result in an hourly `BUG` stack trace, but otherwise it seems effective.

Directories instead of read-only files. Nice Out-Of-The-Box Thinking!

Now, the question becomes whether there are any adverse side-effects, with the DirectoryAuthorities, from the secret_onion_keys not being updated over time?

Excellent Work! Much Respect.

Gary— This Message Originated by the Sun. iBigBlue 63W Solar Array (~12 Hour Charge) + 2 x Charmast 26800mAh Power Banks = iPhone XS Max 512GB (~2 Weeks Charged)

Roman Mamedov

9:03 p.m.

On Fri, 28 Jan 2022 19:58:49 -0700 David Fifield david@bamsoftware.com wrote:

...

...
On the matter of onion key rotation, I had the idea of making the onion key files read-only. Roger did some source code investigation and said that it might work to prevent onion key rotation, with some minor side effects. I plan to give the idea a try on a different bridge. The possible side effects are that tor will continue trying and failing to rotate the onion key every hour, and "force a router descriptor rebuild, so it will try to publish a new descriptor each hour."

Making secret_onion_key and secret_onion_key_ntor read-only does not quite work, because tor first renames them to secret_onion_key.old and secret_onion_key_ntor.old before writing new files. (Making the *.old files read-only does not work either, because the `tor_rename` function first unlinks the destination.) https://gitweb.torproject.org/tor.git/tree/src/feature/relay/router.c?h=tor-...

But a slight variation does work: make secret_onion_key.old and secret_onion_key_ntor.old *directories*, so that tor_rename cannot rename a file over them. It does result in an hourly `BUG` stack trace, but otherwise it seems effective.

I did a test with two tor instances. The rot1 instance had the directory hack to prevent onion key rotation. The rot2 had nothing to prevent onion key rotation.

I did not follow the thread closely, but if you want a file or directory contents unchangeable, and not allowed to rename/delete even by root, there's the "immutable" attribute (chattr +i).

-- With respect, Roman

Gary C. New

31 Jan 31 Jan

9:06 p.m.

On Sunday, January 30, 2022, 2:26:08 AM PST, Roman Mamedov rm@romanrm.net wrote:

On Fri, 28 Jan 2022 19:58:49 -0700 David Fifield david@bamsoftware.com wrote:

...

...
But a slight variation does work: make secret_onion_key.old and secret_onion_key_ntor.old *directories*, so that tor_rename cannot rename a file over them. It does result in an hourly `BUG` stack trace, but otherwise it seems effective.

...

...
I did a test with two tor instances. The rot1 instance had the directory hack to prevent onion key rotation. The rot2 had nothing to prevent onion key rotation.

...

I did not follow the thread closely, but if you want a file or directory

contents unchangeable, and not allowed to rename/delete even by root, there's the "immutable" attribute (chattr +i).

I like the immutable attribute approach. It can be applied to the original secret_onion_key and secret_onion_key_ntor files. Appreciate the input. Respectfully,

Gary— This Message Originated by the Sun. iBigBlue 63W Solar Array (~12 Hour Charge) + 2 x Charmast 26800mAh Power Banks = iPhone XS Max 512GB (~2 Weeks Charged)

David Fifield

8 Feb 8 Feb

5:02 p.m.

The load-balanced Snowflake bridge is running in production since 2022-01-31. Thanks Roger, Gary, Roman for your input.

Hopefully reproducible installation instructions: https://gitlab.torproject.org/tpo/anti-censorship/team/-/wikis/Survival-Guid... Observations since: https://bugs.torproject.org/tpo/anti-censorship/pluggable-transports/snowfla...

Metrics graphs are currently confused by multiple instances of tor uploading descriptors under the same fingerprint. Particularly in the interval between 2022-01-25 and 2022-02-03, when a production bridge and staging bridge were running in parallel, with four instances being used and another four being mostly unused. https://metrics.torproject.org/rs.html#details/5481936581E23D2D178105D44DB69... https://metrics.torproject.org/userstats-bridge-transport.html?start=2021-11... Since 2022-02-03, it appears that Metrics is showing only one of the four running instances per day. Because all four instances are about equally used (as if load balanced, go figure), the values on the graph are 1/4 what they should be. The reported bandwidth of 5 MB/s is actually 20 MB/s, and the 2500 clients are actually 10000. All the necessary data are present in Collector, it's just a question of data processing. I opened an issue for the Metrics graphs, where you can also see some manually made graphs that are closer to the true values. https://bugs.torproject.org/tpo/network-health/metrics/onionoo/40022

I started a thread on tor-dev about the issues of onion key rotation and ExtORPort authentication. https://lists.torproject.org/pipermail/tor-dev/2022-February/thread.html

Gary C. New

7:04 p.m.

David, Excellent Documentation and References! I hope the proposed RFC's (auth, key, and metrics) for loadbalanced Tor topologies are seriously considered and implemented by Tor Core and Tor Metrics. Great Work! Respectfully,

Gary— This Message Originated by the Sun. iBigBlue 63W Solar Array (~12 Hour Charge) + 2 x Charmast 26800mAh Power Banks = iPhone XS Max 512GB (~2 Weeks Charged)

On Tuesday, February 8, 2022, 10:02:53 AM MST, David Fifield david@bamsoftware.com wrote:

The load-balanced Snowflake bridge is running in production since 2022-01-31. Thanks Roger, Gary, Roman for your input.

Gary C. New

3 Mar 3 Mar

8:13 p.m.

David, Has Tor Metrics implemented your RFC related to Written Bytes per Second and Read Bytes per Second on Onionoo? As of the 27th of February, I've noticed a change in reporting that accurately reflects the aggregate of my Tor Relay Nodes opposed to the previously reported Single Tor Node. Are you seeing a similar change for snowflake.torproject.org? Additionally, other than the hourly stacktrace errors in the syslog, the secure_onion_key workaround seems to be working well without any ill side-effects. I've been able to operate with the same secure_onion_key for close to 5 weeks, now. Have you run into any issues? Thank you for your response. Respectfully,

Gary— This Message Originated by the Sun. iBigBlue 63W Solar Array (~12 Hour Charge) + 2 x Charmast 26800mAh Power Banks = iPhone XS Max 512GB (~2 Weeks Charged)

On Tuesday, February 8, 2022, 11:49:47 PM MST, Gary C. New via tor-relays tor-relays@lists.torproject.org wrote:

Gary— This Message Originated by the Sun. iBigBlue 63W Solar Array (~12 Hour Charge) + 2 x Charmast 26800mAh Power Banks = iPhone XS Max 512GB (~2 Weeks Charged)

On Tuesday, February 8, 2022, 10:02:53 AM MST, David Fifield david@bamsoftware.com wrote:

The load-balanced Snowflake bridge is running in production since 2022-01-31. Thanks Roger, Gary, Roman for your input.

I started a thread on tor-dev about the issues of onion key rotation and ExtORPort authentication. https://lists.torproject.org/pipermail/tor-dev/2022-February/thread.html _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

Georg Koppen

8:27 p.m.

Gary C. New via tor-relays:

...

David, Has Tor Metrics implemented your RFC related to Written Bytes per Second and Read Bytes per Second on Onionoo?

That's probably

https://gitlab.torproject.org/tpo/network-health/metrics/onionoo/-/issues/40...

, no?

Georg

...

As of the 27th of February, I've noticed a change in reporting that accurately reflects the aggregate of my Tor Relay Nodes opposed to the previously reported Single Tor Node. Are you seeing a similar change for snowflake.torproject.org? Additionally, other than the hourly stacktrace errors in the syslog, the secure_onion_key workaround seems to be working well without any ill side-effects. I've been able to operate with the same secure_onion_key for close to 5 weeks, now. Have you run into any issues? Thank you for your response. Respectfully,

Gary— This Message Originated by the Sun. iBigBlue 63W Solar Array (~12 Hour Charge)

2 x Charmast 26800mAh Power Banks

= iPhone XS Max 512GB (~2 Weeks Charged)
 On Tuesday, February 8, 2022, 11:49:47 PM MST, Gary C. New via tor-relays <tor-relays@lists.torproject.org> wrote:
David, Excellent Documentation and References! I hope the proposed RFC's (auth, key, and metrics) for loadbalanced Tor topologies are seriously considered and implemented by Tor Core and Tor Metrics. Great Work! Respectfully,

Gary— This Message Originated by the Sun. iBigBlue 63W Solar Array (~12 Hour Charge)

2 x Charmast 26800mAh Power Banks

= iPhone XS Max 512GB (~2 Weeks Charged)
 On Tuesday, February 8, 2022, 10:02:53 AM MST, David Fifield <david@bamsoftware.com> wrote:
The load-balanced Snowflake bridge is running in production since 2022-01-31. Thanks Roger, Gary, Roman for your input.

Hopefully reproducible installation instructions:     https://gitlab.torproject.org/tpo/anti-censorship/team/-/wikis/Survival-Guid... Observations since:     https://bugs.torproject.org/tpo/anti-censorship/pluggable-transports/snowfla...

Metrics graphs are currently confused by multiple instances of tor uploading descriptors under the same fingerprint. Particularly in the interval between 2022-01-25 and 2022-02-03, when a production bridge and staging bridge were running in parallel, with four instances being used and another four being mostly unused.     https://metrics.torproject.org/rs.html#details/5481936581E23D2D178105D44DB69...     https://metrics.torproject.org/userstats-bridge-transport.html?start=2021-11... Since 2022-02-03, it appears that Metrics is showing only one of the four running instances per day. Because all four instances are about equally used (as if load balanced, go figure), the values on the graph are 1/4 what they should be. The reported bandwidth of 5 MB/s is actually 20 MB/s, and the 2500 clients are actually 10000. All the necessary data are present in Collector, it's just a question of data processing. I opened an issue for the Metrics graphs, where you can also see some manually made graphs that are closer to the true values.     https://bugs.torproject.org/tpo/network-health/metrics/onionoo/40022

I started a thread on tor-dev about the issues of onion key rotation and ExtORPort authentication.     https://lists.torproject.org/pipermail/tor-dev/2022-February/thread.html _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

Gary C. New

8:58 p.m.

Georg, Yes! That is precisely it! Please know that the change appears to be working with my loadbalanced Tor Relay deployment as well. Are there any "Issues" submitted for a similar change to Concensus Weight and Relay Probability to Tor Metrics on Onionoo? It appears these values are still only being reported for a Single Tor Node. A BIG Thank You to the Tor Metrics Team for the Issue-40022 implementation. Respectfully,

Gary— This Message Originated by the Sun. iBigBlue 63W Solar Array (~12 Hour Charge) + 2 x Charmast 26800mAh Power Banks = iPhone XS Max 512GB (~2 Weeks Charged)

On Thursday, March 3, 2022, 1:28:12 PM MST, Georg Koppen gk@torproject.org wrote:

Gary C. New via tor-relays:

...

David, Has Tor Metrics implemented your RFC related to Written Bytes per Second and Read Bytes per Second on Onionoo?

That's probably

https://gitlab.torproject.org/tpo/network-health/metrics/onionoo/-/issues/40...

, no?

Georg

...

As of the 27th of February, I've noticed a change in reporting that accurately reflects the aggregate of my Tor Relay Nodes opposed to the previously reported Single Tor Node. Are you seeing a similar change for snowflake.torproject.org? Additionally, other than the hourly stacktrace errors in the syslog, the secure_onion_key workaround seems to be working well without any ill side-effects. I've been able to operate with the same secure_onion_key for close to 5 weeks, now. Have you run into any issues? Thank you for your response. Respectfully,

Gary— This Message Originated by the Sun. iBigBlue 63W Solar Array (~12 Hour Charge)

2 x Charmast 26800mAh Power Banks

= iPhone XS Max 512GB (~2 Weeks Charged)

On Tuesday, February 8, 2022, 11:49:47 PM MST, Gary C. New via tor-relays tor-relays@lists.torproject.org wrote: David, Excellent Documentation and References! I hope the proposed RFC's (auth, key, and metrics) for loadbalanced Tor topologies are seriously considered and implemented by Tor Core and Tor Metrics. Great Work! Respectfully,

Gary— This Message Originated by the Sun. iBigBlue 63W Solar Array (~12 Hour Charge)

2 x Charmast 26800mAh Power Banks

= iPhone XS Max 512GB (~2 Weeks Charged)

On Tuesday, February 8, 2022, 10:02:53 AM MST, David Fifield david@bamsoftware.com wrote: The load-balanced Snowflake bridge is running in production since 2022-01-31. Thanks Roger, Gary, Roman for your input.

Hopefully reproducible installation instructions:     https://gitlab.torproject.org/tpo/anti-censorship/team/-/wikis/Survival-Guid... Observations since:     https://bugs.torproject.org/tpo/anti-censorship/pluggable-transports/snowfla...

Metrics graphs are currently confused by multiple instances of tor uploading descriptors under the same fingerprint. Particularly in the interval between 2022-01-25 and 2022-02-03, when a production bridge and staging bridge were running in parallel, with four instances being used and another four being mostly unused.     https://metrics.torproject.org/rs.html#details/5481936581E23D2D178105D44DB69...     https://metrics.torproject.org/userstats-bridge-transport.html?start=2021-11... Since 2022-02-03, it appears that Metrics is showing only one of the four running instances per day. Because all four instances are about equally used (as if load balanced, go figure), the values on the graph are 1/4 what they should be. The reported bandwidth of 5 MB/s is actually 20 MB/s, and the 2500 clients are actually 10000. All the necessary data are present in Collector, it's just a question of data processing. I opened an issue for the Metrics graphs, where you can also see some manually made graphs that are closer to the true values.     https://bugs.torproject.org/tpo/network-health/metrics/onionoo/40022

I started a thread on tor-dev about the issues of onion key rotation and ExtORPort authentication.     https://lists.torproject.org/pipermail/tor-dev/2022-February/thread.html _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

_______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

Georg Koppen

4 Mar 4 Mar

7:21 a.m.

Gary C. New via tor-relays:

...

Georg, Yes! That is precisely it! Please know that the change appears to be working with my loadbalanced Tor Relay deployment as well. Are there any "Issues" submitted for a similar change to Concensus Weight and Relay Probability to Tor Metrics on Onionoo? It appears these values are still only being reported for a Single Tor Node.

Hrm, good question. I don't think so and I am not sure yet, whether we should make such a change.

...

A BIG Thank You to the Tor Metrics Team for the Issue-40022 implementation.

You are welcome. It seems, though, the implementation was not correct. We therefore reverted it for now. However, we are on it. :)

Georg

...

Respectfully,

Gary— This Message Originated by the Sun. iBigBlue 63W Solar Array (~12 Hour Charge)

2 x Charmast 26800mAh Power Banks

= iPhone XS Max 512GB (~2 Weeks Charged)
 On Thursday, March 3, 2022, 1:28:12 PM MST, Georg Koppen <gk@torproject.org> wrote:
Gary C. New via tor-relays:

...
David, Has Tor Metrics implemented your RFC related to Written Bytes per Second and Read Bytes per Second on Onionoo?

That's probably

https://gitlab.torproject.org/tpo/network-health/metrics/onionoo/-/issues/40...

, no?

Georg

...
As of the 27th of February, I've noticed a change in reporting that accurately reflects the aggregate of my Tor Relay Nodes opposed to the previously reported Single Tor Node. Are you seeing a similar change for snowflake.torproject.org? Additionally, other than the hourly stacktrace errors in the syslog, the secure_onion_key workaround seems to be working well without any ill side-effects. I've been able to operate with the same secure_onion_key for close to 5 weeks, now. Have you run into any issues? Thank you for your response. Respectfully,

Gary— This Message Originated by the Sun. iBigBlue 63W Solar Array (~12 Hour Charge)

2 x Charmast 26800mAh Power Banks

= iPhone XS Max 512GB (~2 Weeks Charged)

On Tuesday, February 8, 2022, 11:49:47 PM MST, Gary C. New via tor-relays tor-relays@lists.torproject.org wrote:

David, Excellent Documentation and References! I hope the proposed RFC's (auth, key, and metrics) for loadbalanced Tor topologies are seriously considered and implemented by Tor Core and Tor Metrics. Great Work! Respectfully,

Gary— This Message Originated by the Sun. iBigBlue 63W Solar Array (~12 Hour Charge)

2 x Charmast 26800mAh Power Banks

= iPhone XS Max 512GB (~2 Weeks Charged)

On Tuesday, February 8, 2022, 10:02:53 AM MST, David Fifield david@bamsoftware.com wrote:

The load-balanced Snowflake bridge is running in production since 2022-01-31. Thanks Roger, Gary, Roman for your input.

Hopefully reproducible installation instructions:     https://gitlab.torproject.org/tpo/anti-censorship/team/-/wikis/Survival-Guid... Observations since:     https://bugs.torproject.org/tpo/anti-censorship/pluggable-transports/snowfla...

Metrics graphs are currently confused by multiple instances of tor uploading descriptors under the same fingerprint. Particularly in the interval between 2022-01-25 and 2022-02-03, when a production bridge and staging bridge were running in parallel, with four instances being used and another four being mostly unused.     https://metrics.torproject.org/rs.html#details/5481936581E23D2D178105D44DB69...     https://metrics.torproject.org/userstats-bridge-transport.html?start=2021-11... Since 2022-02-03, it appears that Metrics is showing only one of the four running instances per day. Because all four instances are about equally used (as if load balanced, go figure), the values on the graph are 1/4 what they should be. The reported bandwidth of 5 MB/s is actually 20 MB/s, and the 2500 clients are actually 10000. All the necessary data are present in Collector, it's just a question of data processing. I opened an issue for the Metrics graphs, where you can also see some manually made graphs that are closer to the true values.     https://bugs.torproject.org/tpo/network-health/metrics/onionoo/40022

I started a thread on tor-dev about the issues of onion key rotation and ExtORPort authentication.     https://lists.torproject.org/pipermail/tor-dev/2022-February/thread.html _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

Gary C. New

9:53 p.m.

Georg,

...

...
Are there any "Issues" submitted for a similar change to Concensus Weight and Relay Probability to Tor Metrics on Onionoo? It appears these values are still only being reported for a Single Tor Node.

...

Hrm, good question. I don't think so and I am not sure yet, whether we

should make such a change.

Do you mind me asking what the reluctance might be to extending Tor Metrics to include correct reporting of Concensus Weight and Relay Probability for Loadbalanced Tor Relays? It would provide a more accurate assessment of Tor Network Resources and assist DirectoryAuthorities in making more informed decisions. I would be happy to open an "Issue" on the topic for official Request For Consideration. Thank you and the Tor Metrics Team for all that you do in improving the Tor Network. Respectfully,

Gary— This Message Originated by the Sun. iBigBlue 63W Solar Array (~12 Hour Charge) + 2 x Charmast 26800mAh Power Banks = iPhone XS Max 512GB (~2 Weeks Charged)

On Friday, March 4, 2022, 12:22:06 AM MST, Georg Koppen gk@torproject.org wrote:

Gary C. New via tor-relays:

...

Georg, Yes! That is precisely it! Please know that the change appears to be working with my loadbalanced Tor Relay deployment as well. Are there any "Issues" submitted for a similar change to Concensus Weight and Relay Probability to Tor Metrics on Onionoo? It appears these values are still only being reported for a Single Tor Node.

Hrm, good question. I don't think so and I am not sure yet, whether we should make such a change.

...

A BIG Thank You to the Tor Metrics Team for the Issue-40022 implementation.

You are welcome. It seems, though, the implementation was not correct. We therefore reverted it for now. However, we are on it. :)

Georg

...

Respectfully,

Gary— This Message Originated by the Sun. iBigBlue 63W Solar Array (~12 Hour Charge)

2 x Charmast 26800mAh Power Banks

= iPhone XS Max 512GB (~2 Weeks Charged)

On Thursday, March 3, 2022, 1:28:12 PM MST, Georg Koppen gk@torproject.org wrote: Gary C. New via tor-relays:

...
David, Has Tor Metrics implemented your RFC related to Written Bytes per Second and Read Bytes per Second on Onionoo?

That's probably

https://gitlab.torproject.org/tpo/network-health/metrics/onionoo/-/issues/40...

, no?

Georg

...
As of the 27th of February, I've noticed a change in reporting that accurately reflects the aggregate of my Tor Relay Nodes opposed to the previously reported Single Tor Node. Are you seeing a similar change for snowflake.torproject.org? Additionally, other than the hourly stacktrace errors in the syslog, the secure_onion_key workaround seems to be working well without any ill side-effects. I've been able to operate with the same secure_onion_key for close to 5 weeks, now. Have you run into any issues? Thank you for your response. Respectfully,

Gary— This Message Originated by the Sun. iBigBlue 63W Solar Array (~12 Hour Charge)

2 x Charmast 26800mAh Power Banks

= iPhone XS Max 512GB (~2 Weeks Charged)

On Tuesday, February 8, 2022, 11:49:47 PM MST, Gary C. New via tor-relays tor-relays@lists.torproject.org wrote: David, Excellent Documentation and References! I hope the proposed RFC's (auth, key, and metrics) for loadbalanced Tor topologies are seriously considered and implemented by Tor Core and Tor Metrics. Great Work! Respectfully,

Gary— This Message Originated by the Sun. iBigBlue 63W Solar Array (~12 Hour Charge)

2 x Charmast 26800mAh Power Banks

= iPhone XS Max 512GB (~2 Weeks Charged)

On Tuesday, February 8, 2022, 10:02:53 AM MST, David Fifield david@bamsoftware.com wrote: The load-balanced Snowflake bridge is running in production since 2022-01-31. Thanks Roger, Gary, Roman for your input.

Hopefully reproducible installation instructions:     https://gitlab.torproject.org/tpo/anti-censorship/team/-/wikis/Survival-Guid... Observations since:     https://bugs.torproject.org/tpo/anti-censorship/pluggable-transports/snowfla...

Metrics graphs are currently confused by multiple instances of tor uploading descriptors under the same fingerprint. Particularly in the interval between 2022-01-25 and 2022-02-03, when a production bridge and staging bridge were running in parallel, with four instances being used and another four being mostly unused.     https://metrics.torproject.org/rs.html#details/5481936581E23D2D178105D44DB69...     https://metrics.torproject.org/userstats-bridge-transport.html?start=2021-11... Since 2022-02-03, it appears that Metrics is showing only one of the four running instances per day. Because all four instances are about equally used (as if load balanced, go figure), the values on the graph are 1/4 what they should be. The reported bandwidth of 5 MB/s is actually 20 MB/s, and the 2500 clients are actually 10000. All the necessary data are present in Collector, it's just a question of data processing. I opened an issue for the Metrics graphs, where you can also see some manually made graphs that are closer to the true values.     https://bugs.torproject.org/tpo/network-health/metrics/onionoo/40022

I started a thread on tor-dev about the issues of onion key rotation and ExtORPort authentication.     https://lists.torproject.org/pipermail/tor-dev/2022-February/thread.html _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

_______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

Georg Koppen

7 Mar 7 Mar

9:15 a.m.

Gary C. New via tor-relays:

...

Georg,

...
...
Are there any "Issues" submitted for a similar change to Concensus Weight and Relay Probability to Tor Metrics on Onionoo? It appears these values are still only being reported for a Single Tor Node.

...
Hrm, good question. I don't think so and I am not sure yet, whether we

should make such a change.

Do you mind me asking what the reluctance might be to extending Tor Metrics to include correct reporting of Concensus Weight and Relay Probability for Loadbalanced Tor Relays? It would provide a more accurate assessment of Tor Network Resources and assist DirectoryAuthorities in making more informed decisions.

There is no real reluctance here on my side. It's just that I don't have thought about yet what kind of extra work it would involve and what the pros and cons of that actually are.

...

I would be happy to open an "Issue" on the topic for official Request For Consideration.

Yes, please do. I think https://gitlab.torproject.org/tpo/network-health/metrics/relay-search is a good project to file the issue and have some discussion and context in and then we can open child tickets in other projects in case we need to do work somewhere else as well to make your request happen.

...

Thank you and the Tor Metrics Team for all that you do in improving the Tor Network.

You are welcome! Thanks for running relays.

Georg

...

Respectfully,

Gary— This Message Originated by the Sun. iBigBlue 63W Solar Array (~12 Hour Charge)

2 x Charmast 26800mAh Power Banks

= iPhone XS Max 512GB (~2 Weeks Charged)
 On Friday, March 4, 2022, 12:22:06 AM MST, Georg Koppen <gk@torproject.org> wrote:
Gary C. New via tor-relays:

...
Georg, Yes! That is precisely it! Please know that the change appears to be working with my loadbalanced Tor Relay deployment as well. Are there any "Issues" submitted for a similar change to Concensus Weight and Relay Probability to Tor Metrics on Onionoo? It appears these values are still only being reported for a Single Tor Node.

Hrm, good question. I don't think so and I am not sure yet, whether we should make such a change.

...
A BIG Thank You to the Tor Metrics Team for the Issue-40022 implementation.

You are welcome. It seems, though, the implementation was not correct. We therefore reverted it for now. However, we are on it. :)

Georg

...
Respectfully,

Gary— This Message Originated by the Sun. iBigBlue 63W Solar Array (~12 Hour Charge)

2 x Charmast 26800mAh Power Banks

= iPhone XS Max 512GB (~2 Weeks Charged)

On Thursday, March 3, 2022, 1:28:12 PM MST, Georg Koppen gk@torproject.org wrote:

Gary C. New via tor-relays:

...
David, Has Tor Metrics implemented your RFC related to Written Bytes per Second and Read Bytes per Second on Onionoo?

That's probably

https://gitlab.torproject.org/tpo/network-health/metrics/onionoo/-/issues/40...

, no?

Georg

...
As of the 27th of February, I've noticed a change in reporting that accurately reflects the aggregate of my Tor Relay Nodes opposed to the previously reported Single Tor Node. Are you seeing a similar change for snowflake.torproject.org? Additionally, other than the hourly stacktrace errors in the syslog, the secure_onion_key workaround seems to be working well without any ill side-effects. I've been able to operate with the same secure_onion_key for close to 5 weeks, now. Have you run into any issues? Thank you for your response. Respectfully,

Gary— This Message Originated by the Sun. iBigBlue 63W Solar Array (~12 Hour Charge)

2 x Charmast 26800mAh Power Banks

= iPhone XS Max 512GB (~2 Weeks Charged)

On Tuesday, February 8, 2022, 11:49:47 PM MST, Gary C. New via tor-relays tor-relays@lists.torproject.org wrote:

David, Excellent Documentation and References! I hope the proposed RFC's (auth, key, and metrics) for loadbalanced Tor topologies are seriously considered and implemented by Tor Core and Tor Metrics. Great Work! Respectfully,

Gary— This Message Originated by the Sun. iBigBlue 63W Solar Array (~12 Hour Charge)

2 x Charmast 26800mAh Power Banks

= iPhone XS Max 512GB (~2 Weeks Charged)

On Tuesday, February 8, 2022, 10:02:53 AM MST, David Fifield david@bamsoftware.com wrote:

The load-balanced Snowflake bridge is running in production since 2022-01-31. Thanks Roger, Gary, Roman for your input.

Hopefully reproducible installation instructions:     https://gitlab.torproject.org/tpo/anti-censorship/team/-/wikis/Survival-Guid... Observations since:     https://bugs.torproject.org/tpo/anti-censorship/pluggable-transports/snowfla...

Metrics graphs are currently confused by multiple instances of tor uploading descriptors under the same fingerprint. Particularly in the interval between 2022-01-25 and 2022-02-03, when a production bridge and staging bridge were running in parallel, with four instances being used and another four being mostly unused.     https://metrics.torproject.org/rs.html#details/5481936581E23D2D178105D44DB69...     https://metrics.torproject.org/userstats-bridge-transport.html?start=2021-11... Since 2022-02-03, it appears that Metrics is showing only one of the four running instances per day. Because all four instances are about equally used (as if load balanced, go figure), the values on the graph are 1/4 what they should be. The reported bandwidth of 5 MB/s is actually 20 MB/s, and the 2500 clients are actually 10000. All the necessary data are present in Collector, it's just a question of data processing. I opened an issue for the Metrics graphs, where you can also see some manually made graphs that are closer to the true values.     https://bugs.torproject.org/tpo/network-health/metrics/onionoo/40022

I started a thread on tor-dev about the issues of onion key rotation and ExtORPort authentication.     https://lists.torproject.org/pipermail/tor-dev/2022-February/thread.html _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

David Fifield

4 Mar 4 Mar

1:59 a.m.

On Thu, Mar 03, 2022 at 08:13:34PM +0000, Gary C. New wrote:

...

Has Tor Metrics implemented your RFC related to Written Bytes per Second and Read Bytes per Second on Onionoo?

As of the 27th of February, I've noticed a change in reporting that accurately reflects the aggregate of my Tor Relay Nodes opposed to the previously reported Single Tor Node. Are you seeing a similar change for snowflake.torproject.org?

You're right. I see a change since 2022-02-27, but in the case of the snowflake bridge the numbers look wrong, about 8× too high. I posted an update on the issue. Thanks for noticing.

https://gitlab.torproject.org/tpo/network-health/metrics/onionoo/-/issues/40...

...

Additionally, other than the hourly stacktrace errors in the syslog, the secure_onion_key workaround seems to be working well without any ill side-effects. I've been able to operate with the same secure_onion_key for close to 5 weeks, now. Have you run into any issues?

Yes, it's still working well here.

Gary C. New

9:40 p.m.

David, I see that the metrics change has been reverted. If/When the metrics change is implemented, will loadbalanced Tor Relay Nodes need to be uniquely named or will they all be able to use the same nickname? I'm glad to hear your loadbalanced Snowflake Relay continues to work well. Thanks, again, for your efforts. Respectfully,

Gary— This Message Originated by the Sun. iBigBlue 63W Solar Array (~12 Hour Charge) + 2 x Charmast 26800mAh Power Banks = iPhone XS Max 512GB (~2 Weeks Charged)

On Thursday, March 3, 2022, 6:59:26 PM MST, David Fifield david@bamsoftware.com wrote:

On Thu, Mar 03, 2022 at 08:13:34PM +0000, Gary C. New wrote:

...

Has Tor Metrics implemented your RFC related to Written Bytes per Second and Read Bytes per Second on Onionoo?

As of the 27th of February, I've noticed a change in reporting that accurately reflects the aggregate of my Tor Relay Nodes opposed to the previously reported Single Tor Node. Are you seeing a similar change for snowflake.torproject.org?

You're right. I see a change since 2022-02-27, but in the case of the snowflake bridge the numbers look wrong, about 8× too high. I posted an update on the issue. Thanks for noticing.

https://gitlab.torproject.org/tpo/network-health/metrics/onionoo/-/issues/40...

...

Additionally, other than the hourly stacktrace errors in the syslog, the secure_onion_key workaround seems to be working well without any ill side-effects. I've been able to operate with the same secure_onion_key for close to 5 weeks, now. Have you run into any issues?

Yes, it's still working well here. _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

David Fifield

11:52 p.m.

On Fri, Mar 04, 2022 at 09:40:01PM +0000, Gary C. New wrote:

...

I see that the metrics change has been reverted.

If/When the metrics change is implemented, will loadbalanced Tor Relay Nodes need to be uniquely named or will they all be able to use the same nickname?

When I made my own combined graphs, I relied on different instances having different nicknames. I don't know an easy way to distinguish the descriptors of different instances otherwise.

Gary C. New

5 Mar 5 Mar

12:55 a.m.

David,

...

When I made my own combined graphs, I relied on different instances

having different nicknames. I don't know an easy way to distinguish the descriptors of different instances otherwise.

Please let me know what the Tor Metrics Team decides, if/when they reimplement the change.

...

You could conceivably do it by analyzing the periodicity of different

instances' publishing schedules. (Start one instance on the hour, another at :10, another at :20, etc.) But that seems fragile, not to mention annoying to deal with.

I agree. I'd rather manage unique nicknames. Thanks, again.

Gary— This Message Originated by the Sun. iBigBlue 63W Solar Array (~12 Hour Charge) + 2 x Charmast 26800mAh Power Banks = iPhone XS Max 512GB (~2 Weeks Charged)

On Friday, March 4, 2022, 4:52:49 PM MST, David Fifield david@bamsoftware.com wrote:

On Fri, Mar 04, 2022 at 09:40:01PM +0000, Gary C. New wrote:

...

I see that the metrics change has been reverted.

If/When the metrics change is implemented, will loadbalanced Tor Relay Nodes need to be uniquely named or will they all be able to use the same nickname?

When I made my own combined graphs, I relied on different instances having different nicknames. I don't know an easy way to distinguish the descriptors of different instances otherwise.

You could conceivably do it by analyzing the periodicity of different instances' publishing schedules. (Start one instance on the hour, another at :10, another at :20, etc.) But that seems fragile, not to mention annoying to deal with. _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

Gary C. New

9 Dec 9 Dec

1:09 a.m.

David,

I finally have time to migrate my loadbalanced Tor relay to a loadbalanced Tor obfs4proxy configuration.

In the process, I've been reviewing this thread and was hoping you could help with one clarification regarding your loadbalanced Tor snowflake configuration?

I noticed that you are using "AssumeReachable 1" in your torrc and was wondering whether you are exposing your ORPort to the World?

In the obfs4proxy configuration examples, it states that the ORPort needs to be open to the World, but it isn't clear in your torrc example whether you expose it to the World.

Is it truly necessary to expose the ORPort to the World in a pluggable transport configuration?

Thank you for your assistance.

Respectfully,

Gary — This Message Originated by the Sun. iBigBlue 63W Solar Array (~12 Hour Charge) + 2 x Charmast 26800mAh Power Banks = iPhone XS Max 512GB (~2 Weeks Charged)

On Friday, March 4, 2022, 05:55:48 PM MST, Gary C. New garycnew@yahoo.com wrote:

David,

...

When I made my own combined graphs, I relied on different instances

having different nicknames. I don't know an easy way to distinguish the descriptors of different instances otherwise.

Please let me know what the Tor Metrics Team decides, if/when they reimplement the change.

...

You could conceivably do it by analyzing the periodicity of different

instances' publishing schedules. (Start one instance on the hour, another at :10, another at :20, etc.) But that seems fragile, not to mention annoying to deal with.

I agree. I'd rather manage unique nicknames. Thanks, again.

Gary— This Message Originated by the Sun. iBigBlue 63W Solar Array (~12 Hour Charge) + 2 x Charmast 26800mAh Power Banks = iPhone XS Max 512GB (~2 Weeks Charged)

On Friday, March 4, 2022, 4:52:49 PM MST, David Fifield david@bamsoftware.com wrote:

On Fri, Mar 04, 2022 at 09:40:01PM +0000, Gary C. New wrote:

...

I see that the metrics change has been reverted.

If/When the metrics change is implemented, will loadbalanced Tor Relay Nodes need to be uniquely named or will they all be able to use the same nickname?

When I made my own combined graphs, I relied on different instances having different nicknames. I don't know an easy way to distinguish the descriptors of different instances otherwise.

David Fifield

6:02 a.m.

On Fri, Dec 09, 2022 at 01:09:05AM +0000, Gary C. New wrote:

...

Is it truly necessary to expose the ORPort to the World in a pluggable transport configuration?

I don't know if it is necessary for ordinary bridges to expose the ORPort. For a long time, it was necessary, because BridgeDB used the ORPort to check that a bridge was running, before distributing it to users. See: https://bugs.torproject.org/tpo/core/tor/7349 But now there is rdsys and bridgestrap, which may have the ability to test the obfs4 port rather than the ORPort. I cannot say whether that removes the requirement to expose the ORPort. https://gitlab.torproject.org/tpo/anti-censorship/rdsys/-/merge_requests/36

For the special case of the default bridges shipped with Tor Browser, it is not necessary to export the ORPort, because those bridges are not distributed by rdsys.

Gary C. New

8:43 a.m.

David, In my implementation of the loadbalanced OBFS4 configuration, it appears that BridgeDB still tests the ORPort for availability and without it marks the OBFS4 bridge as being down. I gather that default bridges don't require a DistributionMethod as your loadbalanced Snowflake configuration is set to "none?" BTW... I have the loadbalanced OBFS4 configuration up and running, and am able to manually confirm loadbalanced OBFS4 connections are successfull. nginx => obfs4proxy => tor I believe it's time to enable a DistributionMethod. Thank you for the clarifications. Respectfully,

Gary— This Message Originated by the Sun. iBigBlue 63W Solar Array (~12 Hour Charge) + 2 x Charmast 26800mAh Power Banks = iPhone XS Max 512GB (~2 Weeks Charged)

On Thursday, December 8, 2022, 10:03:09 PM PST, David Fifield david@bamsoftware.com wrote:

On Fri, Dec 09, 2022 at 01:09:05AM +0000, Gary C. New wrote:

...

Is it truly necessary to expose the ORPort to the World in a pluggable transport configuration?

For the special case of the default bridges shipped with Tor Browser, it is not necessary to export the ORPort, because those bridges are not distributed by rdsys. _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

David Fifield

10 Dec 10 Dec

4:42 a.m.

On Fri, Dec 09, 2022 at 08:43:26AM +0000, Gary C. New wrote:

...

In my implementation of the loadbalanced OBFS4 configuration, it appears that BridgeDB still tests the ORPort for availability and without it marks the OBFS4 bridge as being down.

I see. Then yes, I suppose it is still necessary to expose the ORPort.

...

I gather that default bridges don't require a DistributionMethod as your loadbalanced Snowflake configuration is set to "none?"

That's correct. Default bridges are not distributed by rdsys, they are distributed in the configuration of Tor Browser itself. See extensions.torlauncher.default_bridge.* in about:config.

Gary C. New

5:19 a.m.

David, I'm in the process of trying to cross-compile snowflake for OpenWRT and Entware. Are there any other dependencies to compile snowflake other than Go? Do you know if it's possible to configure multiple pluggable transports with different listeners within a single torrc? Thanks, again.

Gary— This Message Originated by the Sun. iBigBlue 63W Solar Array (~12 Hour Charge) + 2 x Charmast 26800mAh Power Banks = iPhone XS Max 512GB (~2 Weeks Charged)

On Friday, December 9, 2022, 8:43:03 PM PST, David Fifield david@bamsoftware.com wrote:

On Fri, Dec 09, 2022 at 08:43:26AM +0000, Gary C. New wrote:

...

In my implementation of the loadbalanced OBFS4 configuration, it appears that BridgeDB still tests the ORPort for availability and without it marks the OBFS4 bridge as being down.

I see. Then yes, I suppose it is still necessary to expose the ORPort.

...

I gather that default bridges don't require a DistributionMethod as your loadbalanced Snowflake configuration is set to "none?"

That's correct. Default bridges are not distributed by rdsys, they are distributed in the configuration of Tor Browser itself. See extensions.torlauncher.default_bridge.* in about:config. _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

David Fifield

3 p.m.

On Sat, Dec 10, 2022 at 05:19:43AM +0000, Gary C. New via tor-relays wrote:

...

I'm in the process of trying to cross-compile snowflake for OpenWRT and Entware. Are there any other dependencies to compile snowflake other than Go?

The README should list dependencies. Setting GOOS and GOARCH should be sufficient.

...

Do you know if it's possible to configure multiple pluggable transports with different listeners within a single torrc?

Yes. You cannot configure multiple listeners for the same transport, but you can use multiple different transports at once. Use use different sets of ServerTransportPlugin / ServerTransportListenAddr / ServerTransportOptions, or ClientTransportPlugin / Bridge for the client side.

Gary C. New

7:05 p.m.

On Saturday, December 10, 2022, 8:01:15 AM MST, David Fifield david@bamsoftware.com wrote:

On Sat, Dec 10, 2022 at 05:19:43AM +0000, Gary C. New via tor-relays wrote:

...

...
I'm in the process of trying to cross-compile snowflake for OpenWRT and Entware. Are there any other dependencies to compile snowflake other than Go?

...

The README should list dependencies. Setting GOOS and GOARCH should be sufficient.

...

...
Do you know if it's possible to configure multiple pluggable transports with different listeners within a single torrc?

...

Yes. You cannot configure multiple listeners for the same transport, but you can use multiple different transports at once. Use use different sets of ServerTransportPlugin / ServerTransportListenAddr / ServerTransportOptions, or ClientTransportPlugin / Bridge for the client side.

Great! I'll work on compiling the Standalone Snowflake Proxy and see about implementing a loadbalanced OBFS & Snowflake configuration in parallel. Thank you for your assistance. Respectfully,

Gary

Gary C. New

11 Dec 11 Dec

4:25 a.m.

David,

I was successfully able to get Snowflake cross-compiled and installed for OpenWRT and Entware as a package.

# opkg install ./snowflake_2.4.1-1_armv7-2.6.ipk Installing snowflake (2.4.1-1) to root... Configuring snowflake.

# opkg info snowflake Package: snowflake Version: 2.4.1-1 Depends: libc, libssp, librt, libpthread Status: install user installed Architecture: armv7-2.6 Installed-Time: 1670730403

# opkg depends snowflake snowflake depends on: libc libssp librt libpthread

# opkg files snowflake Package snowflake (2.4.1-1) is installed on root and has the following files: /opt/bin/proxy /opt/bin/client /opt/bin/probetest /opt/bin/broker /opt/bin/server /opt/bin/distinctcounter

# /opt/bin/proxy -version snowflake-proxy 2.4.1

However, I still need to configure it within the torrc file and test it with its own listener in parallel with the loadbalanced OBFS configuration.

Thanks, again, for your guidance.

Respectfully,

Gary

P.S. I posted the Snowflake Package Makefile on the OpenWRT forum for reference:

https://forum.openwrt.org/t/snowflake-makefile/145259

On Saturday, December 10, 2022, 12:05:51 PM MST, Gary C. New garycnew@yahoo.com wrote:

On Saturday, December 10, 2022, 8:01:15 AM MST, David Fifield david@bamsoftware.com wrote:

On Sat, Dec 10, 2022 at 05:19:43AM +0000, Gary C. New via tor-relays wrote:

...

...
I'm in the process of trying to cross-compile snowflake for OpenWRT and Entware. Are there any other dependencies to compile snowflake other than Go?

...

The README should list dependencies. Setting GOOS and GOARCH should be sufficient.

...

...
Do you know if it's possible to configure multiple pluggable transports with different listeners within a single torrc?

...

Yes. You cannot configure multiple listeners for the same transport, but you can use multiple different transports at once. Use use different sets of ServerTransportPlugin / ServerTransportListenAddr / ServerTransportOptions, or ClientTransportPlugin / Bridge for the client side.

Great! I'll work on compiling the Standalone Snowflake Proxy and see about implementing a loadbalanced OBFS & Snowflake configuration in parallel. Thank you for your assistance. Respectfully,

Gary

David Fifield

12 Dec 12 Dec

3:31 p.m.

On Sun, Dec 11, 2022 at 04:25:06AM +0000, Gary C. New via tor-relays wrote:

...

I was successfully able to get Snowflake cross-compiled and installed for OpenWRT and Entware as a package.

Thanks, nice work.

...

# opkg files snowflake Package snowflake (2.4.1-1) is installed on root and has the following files: /opt/bin/proxy /opt/bin/client /opt/bin/probetest /opt/bin/broker /opt/bin/server /opt/bin/distinctcounter

I don't think it makes sense to package the server or broker for OpenWRT. The client and proxy, sure. But the server and broker do not even run on the same host in an actual deployment. distinctcounter is just a metrics utility for the broker: https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf...

Gary C. New

8:19 p.m.

On Monday, December 12, 2022, 08:31:43 AM MST, David Fifield david@bamsoftware.com wrote:

On Sun, Dec 11, 2022 at 04:25:06AM +0000, Gary C. New via tor-relays wrote:

...

...
I was successfully able to get Snowflake cross-compiled and installed for OpenWRT and Entware as a package.

...

Thanks, nice work.

...

...
# opkg files snowflake Package snowflake (2.4.1-1) is installed on root and has the following files: /opt/bin/proxy /opt/bin/client /opt/bin/probetest /opt/bin/broker /opt/bin/server /opt/bin/distinctcounter

...

I don't think it makes sense to package the server or broker for OpenWRT. The client and proxy, sure. But the server and broker do not even run on the same host in an actual deployment. distinctcounter is just a metrics utility for the broker: https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf...

I agree it makes sense to package the client and proxy separate from the broker and server. This was just a quick and dirty test to see if I could get Snowflake cross-compiled and working on the OpenWRT and Entware platforms.

I am having some issues or misunderstandings with implementing Snowflake Proxy within Tor. I assumed that implementing Snowflake Proxy within Tor would be similar to OBFS4Bridge in that Tor would initialize Snowflake Proxy as a managed Pluggable Transport listening on the assigned ServerTransportListenAddr. I can see Snowflake Proxy initiate outbound requests, but I don't see it listen on the specified ServerTransportListenAddr and Port.

NOTE: Basic Inbound Connection Flow is Nginx (xxx.xxx.xxx.xxx:6031) => Snowflake Proxy (192.168.0.31:6031) => Tor (192.168.0.31:9001)

NOTE: I am only running Snowflake Proxy within the test torrc configuration.

# cat torrc ... Nickname Snowflake31 ORPort xxx.xxx.xxx.xxx:443 NoListen ORPort 192.168.0.31:9001 NoAdvertise BridgeRelay 1 BridgeDistribution moat ExtORPort 192.168.0.31:auto ###ServerTransportPlugin obfs31-1 exec /opt/bin/obfs4proxy -enableLogging ###ServerTransportListenAddr obfs31-1 192.168.0.31:3031 ServerTransportPlugin snowflake31-1 exec /opt/bin/proxy -log /tmp/snowflake.log -verbose ServerTransportListenAddr snowflake31-1 192.168.0.31:6031

# ps w | grep -I tor 26303 tor 253m S /opt/sbin/tor -f /tmp/torrc --quiet 26304 tor 795m S /opt/bin/proxy -log /tmp/snowflake.log -verbose

# netstat -anp | grep proxy tcp 0 0 192.168.0.31:49850 37.218.245.111:443 ESTABLISHED 26304/proxy udp 0 0 192.168.0.31:33961 0.0.0.0:* 26304/proxy udp 0 0 0.0.0.0:52654 0.0.0.0:* 26304/proxy

# tail -f /tmp/snowflake.log ... 2022/12/12 04:28:33 snowflake-proxy 2.4.1 2022/12/12 04:28:33 Proxy starting 2022/12/12 04:28:33 WebRTC: Created offer 2022/12/12 04:28:33 WebRTC: Set local description 2022/12/12 04:28:33 Offer: {"type":"offer","sdp":"v=0\r\no=- 4129729503856148472 1670819313 IN IP4 [scrubbed]\r\ns=-\r\nt=0 0\r\na=fingerprint:sha-256 3B:60:50:33:72:A1:35:91:44:7E:02:2E:F2:4E:0E:21:C2:24:1C:47:F7:43:A1:A7:F3:DE:BA:AB:3E:82:9E:11\r\na=extmap-allow-mixed\r\na=group:BUNDLE 0\r\nm=application 9 UDP/DTLS/SCTP webrtc-datachannel\r\nc=IN IP4 [scrubbed]\r\na=setup:actpass\r\na=mid:0\r\na=sendrecv\r\na=sctp-port:5000\r\na=ice-ufrag:glNJtRHnBjaRYRkg\r\na=ice-pwd:OxntNuRslEPhLgSstUnzwJFTPzPUGmzt\r\na=candidate:551460743 1 udp 2130706431 [scrubbed] 50786 typ host\r\na=candidate:551460743 2 udp 2130706431 [scrubbed] 50786 typ host\r\na=candidate:1335998215 1 udp 1694498815 [scrubbed] 45684 typ srflx raddr [scrubbed] rport 45684\r\na=candidate:1335998215 2 udp 1694498815 [scrubbed] 45684 typ srflx raddr [scrubbed] rport 45684\r\na=end-of-candidates\r\n"} 2022/12/12 04:29:00 NAT Type measurement: unknown -> restricted = restricted 2022/12/12 04:29:00 NAT type: restricted ... 2022/12/12 04:29:11 sdp offer successfully received. 2022/12/12 04:29:11 Generating answer... ... 2022/12/12 04:29:31 Timed out waiting for client to open data channel. 2022/12/12 04:29:41 sdp offer successfully received. 2022/12/12 04:29:41 Generating answer... 2022/12/12 04:30:02 Timed out waiting for client to open data channel. ... 2022/12/12 04:32:05 sdp offer successfully received. 2022/12/12 04:32:05 Generating answer... 2022/12/12 04:32:26 Timed out waiting for client to open data channel.

Is it possible to use Snowflake Proxy as a managed Pluggable Transport similar to OBFS4Bridge within Tor? It would be helpful to have a torrc configuration example within the Standalone Snowflake Proxy documentation.

Thanks, again, for your guidance and assistance.

Respectfully,

Gary

David Fifield

13 Dec 13 Dec

6:11 p.m.

On Mon, Dec 12, 2022 at 08:19:53PM +0000, Gary C. New via tor-relays wrote:

...

I am having some issues or misunderstandings with implementing Snowflake Proxy within Tor. I assumed that implementing Snowflake Proxy within Tor would be similar to OBFS4Bridge in that Tor would initialize Snowflake Proxy as a managed Pluggable Transport listening on the assigned ServerTransportListenAddr. I can see Snowflake Proxy initiate outbound requests, but I don't see it listen on the specified ServerTransportListenAddr and Port.

The Snowflake proxy is not a pluggable transport. You just run it as a normal command-line program. There is no torrc involved, and the proxy does not interact with a tor process at all.

Unlike, say, obfs4, in Snowflake the bridges are centralized and the proxies are decentralized. If you run a proxy you don't also run a bridge.

If it helps the mental model: the standalone proxy program in Snowflake does exactly the same thing as the browser extension proxy (https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf...). Those browser proxies don't have an attached tor process; neither does the command-line proxy.

Gary C. New

7:29 p.m.

On Tuesday, December 13, 2022, 10:11:41 AM PST, David Fifield david@bamsoftware.com wrote:

...

The Snowflake proxy is not a pluggable transport. You just > run it as a normal command-line program. There is no torrc involved, and the proxy does not interact with a tor process at all.

Thank you for the clarification. It seems I incorrectly assumed that extor-static-cookie was a wrapper for snowflake-proxy.

"To work around this problem, there is a shim called extor-static-cookie that presents an ExtORPort with a fixed, unchanging authentication key on a static port, and forwards the connections (again as ExtORPort) to tor, using that instance of tor's authentication key on an ephemeral port. One extor-static-cookie process is run per instance of tor, using ServerTransportPlugin and ServerTransportListenAddr." Am I correct in assuming extor-static-cookie is only useful within the context of bridging connections between snowflake-server and tor (not as a pluggable transport similar to obfs4proxy)? What about a connection flow of haproxy/nginx => (snowflake-server => extor-static-cookie => tor) on separate servers? Thanks, again.

Gary

David Fifield

14 Dec 14 Dec

2:34 a.m.

On Tue, Dec 13, 2022 at 07:29:45PM +0000, Gary C. New via tor-relays wrote:

...

On Tuesday, December 13, 2022, 10:11:41 AM PST, David Fifield david@bamsoftware.com wrote:

...
The Snowflake proxy is not a pluggable transport. You just > run it as a normal command-line program. There is no torrc involved, and the proxy does not interact with a tor process at all.

Thank you for the clarification. It seems I incorrectly assumed that extor-static-cookie was a wrapper for snowflake-proxy.

"To work around this problem, there is a shim called [1]extor-static-cookie that presents an ExtORPort with a fixed, unchanging authentication key on a static port, and forwards the connections (again as ExtORPort) to tor, using that instance of tor's authentication key on an ephemeral port. One extor-static-cookie process is run per instance of tor, using ServerTransportPlugin and ServerTransportListenAddr."

Am I correct in assuming extor-static-cookie is only useful within the context of bridging connections between snowflake-server and tor (not as a pluggable transport similar to obfs4proxy)?

That's correct. extor-static-cookie is a workaround for a technical problem with tor's Extended ORPort. It serves a narrow and specialized purpose. It happens to use the normal pluggable transports machinery, but it is not a circumvention transport on its own. It's strictly for interprocess communication and is not exposed to the Internet. You don't need it to run a Snowflake proxy.

I am not sure what your plans are with running multiple obfs4proxy, but if you just want multiple obfs4 listeners, with different keys, running on different ports on the same host, you don't need a load balancer, extor-static-cookie, or any of that. Just run multiple instances of tor, each with its corresponding instance of obfs4proxy. The separate instances don't need any coordination or communication. The reason for all the complexity in the Snowflake is that there is *one* instance of the pluggable transport (snowflake-server) that needs to communicate with N instances of tor. And the only reason for doing that is that tor (C-tor) doesn't scale beyond one CPU. If tor could use more than one CPU (as we hope Arti will), we would not need or use any of these workarounds.

You could, in principle, use the same load-balanced setup with obfs4proxy, but I expect that a normal bridge will not get enough users to justify it. It only makes sense when the tor process hits 100% CPU and becomes a bottleneck, which for the Snowflake bridge only started to happen at around 6,000 simultaneous users.

...

What about a connection flow of haproxy/nginx => (snowflake-server => extor-static-cookie => tor) on separate servers?

You have the order wrong (it's snowflake-server → haproxy → extor-static-cookie → tor), but yes, you could divide the chain at any of the arrows and run things on different hosts. You could also run half the extor-static-cookie + tor on one host and half on another, etc.

Gary C. New

16 Dec 16 Dec

4:27 a.m.

On Tuesday, December 13, 2022, 07:35:23 PM MST, David Fifield david@bamsoftware.com wrote:

On Tue, Dec 13, 2022 at 07:29:45PM +0000, Gary C. New via tor-relays wrote:

...

...
On Tuesday, December 13, 2022, 10:11:41 AM PST, David Fifield david@bamsoftware.com wrote:

Am I correct in assuming extor-static-cookie is only useful within the context of bridging connections between snowflake-server and tor (not as a pluggable transport similar to obfs4proxy)?

...

That's correct. extor-static-cookie is a workaround for a technical problem with tor's Extended ORPort. It serves a narrow and specialized purpose. It happens to use the normal pluggable transports machinery, but it is not a circumvention transport on its own. It's strictly for interprocess communication and is not exposed to the Internet. You don't need it to run a Snowflake proxy.

Created a Makefile for extra-static-cookie for OpenWRT and Entware:

https://forum.openwrt.org/t/extor-static-cookie-makefile/145694

...

I am not sure what your plans are with running multiple obfs4proxy, but if you just want multiple obfs4 listeners, with different keys, running on different ports on the same host, you don't need a load balancer, extor-static-cookie, or any of that. Just run multiple instances of tor, each with its corresponding instance of obfs4proxy. The separate instances don't need any coordination or communication.

The goal of running multiple obfs4proxy listeners is to offer numerous, unique bridges distributed across several servers maximizing resources and availability.

...

You could, in principle, use the same load-balanced setup with obfs4proxy, but I expect that a normal bridge will not get enough users to justify it. It only makes sense when the tor process hits 100% CPU and becomes a bottleneck, which for the Snowflake bridge only started to happen at around 6,000 simultaneous users.

Hmm... If normal bridges will not see enough users to justify the deployment of numerous, unique bridges distributed over several servers--this may be a deciding factor. I don't have enough experience with normal bridges to know.

...

...
What about a connection flow of haproxy/nginx => (snowflake-server => extor-static-cookie => tor) on separate servers?

...

You have the order wrong (it's snowflake-server → haproxy → extor-static-cookie → tor), but yes, you could divide the chain at any of the arrows and run things on different hosts. You could also run half the extor-static-cookie + tor on one host and half on another, etc.

I've installed and started configuring snowflake-server and have some questions after reading the README:

https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf...

1. How are Snowflake Bridges advertised? Will they compromise a Normal Bridge running on the same public addresses?

2. I already have a DNS Let's Encrypt process in place for certificates and port 80 (HTTP) is already in use by another daemon on my server. Is there an alternative method to provide snowflake-server with the required certificates?

3. I'm using an init.d (not systemd) operating system. Do you have any init.d examples for snowflake-server?

In short, I'm trying to get a sense of whether it makes sense to run a Snowflake Bridge and Normal Bridge on the same public addresses?

Thanks, again, for your assistance.

Respectfully,

Gary

David Fifield

19 Dec 19 Dec

1:10 a.m.

On Fri, Dec 16, 2022 at 04:27:06AM +0000, Gary C. New via tor-relays wrote:

...

On Tuesday, December 13, 2022, 07:35:23 PM MST, David Fifield david@bamsoftware.com wrote:

On Tue, Dec 13, 2022 at 07:29:45PM +0000, Gary C. New via tor-relays wrote:

...
...
On Tuesday, December 13, 2022, 10:11:41 AM PST, David Fifield david@bamsoftware.com wrote:

Am I correct in assuming extor-static-cookie is only useful within the context of bridging connections between snowflake-server and tor (not as a pluggable transport similar to obfs4proxy)?

...
That's correct. extor-static-cookie is a workaround for a technical problem with tor's Extended ORPort. It serves a narrow and specialized purpose. It happens to use the normal pluggable transports machinery, but it is not a circumvention transport on its own. It's strictly for interprocess communication and is not exposed to the Internet. You don't need it to run a Snowflake proxy.

Created a Makefile for extra-static-cookie for OpenWRT and Entware:

https://forum.openwrt.org/t/extor-static-cookie-makefile/145694

I appreciate the enthusiasm, but I should reiterate: there is no reason to ever use this tool on OpenWRT. Packaging it is a mistake. If you think you need it, you misunderstand what it is for.

...

...
I am not sure what your plans are with running multiple obfs4proxy, but if you just want multiple obfs4 listeners, with different keys, running on different ports on the same host, you don't need a load balancer, extor-static-cookie, or any of that. Just run multiple instances of tor, each with its corresponding instance of obfs4proxy. The separate instances don't need any coordination or communication.

The goal of running multiple obfs4proxy listeners is to offer numerous, unique bridges distributed across several servers maximizing resources and availability.

If the purpose is running on several different servers, you don't need a load balancer and you don't need extor-static-cookie. Those tools are meant for running *one* instance of a pluggable transport on *one* server. If you want to distribute bridges over multiple servers, just run one instance each of tor and obfs4proxy on multiple servers, in the normal way. You don't need anything fancy.

...

...
You could, in principle, use the same load-balanced setup with obfs4proxy, but I expect that a normal bridge will not get enough users to justify it. It only makes sense when the tor process hits 100% CPU and becomes a bottleneck, which for the Snowflake bridge only started to happen at around 6,000 simultaneous users.

Hmm... If normal bridges will not see enough users to justify the deployment of numerous, unique bridges distributed over several servers--this may be a deciding factor. I don't have enough experience with normal bridges to know.

Some pluggable transports, like obfs4, need there to be many bridges, because they are vulnerable to being blocked by IP address. Each individual bridge does not get much traffic, because there are so many of them. With obfs4, it's not about load, it's about address diversity. Just run multiple independent bridges if you want to increase your contribution.

Snowflake is unlike obfs4 in that it does not depends on there being multiple bridges for its blocking resistance. Snowflake gets its address diversity at a different layer—the Snowflake proxies. There are many proxies, but there only needs to be one bridge. However, that one bridge, because it receives the concentrated traffic of many users, needs special scaling techniques.

...

...
...
What about a connection flow of haproxy/nginx => (snowflake-server => extor-static-cookie => tor) on separate servers?

...
You have the order wrong (it's snowflake-server → haproxy → extor-static-cookie → tor), but yes, you could divide the chain at any of the arrows and run things on different hosts. You could also run half the extor-static-cookie + tor on one host and half on another, etc.

I've installed and started configuring snowflake-server and have some questions after reading the README:

In short, I'm trying to get a sense of whether it makes sense to run a Snowflake Bridge and Normal Bridge on the same public addresses?

There is no reason at all to run a Snowflake bridge. No user will ever connect to it, because Snowflake bridges are not distributed through BridgeDB like obfs4 bridges are; they are shipping in configuration files with Tor Browser or Orbot. There is no need for volunteers to run Snowflake bridges, and no benefit to them doing so. If you want to help, run a Snowflake proxy.

There is no reason for a volunteer bridge operator to run snowflake-server or extor-static-cookie, ever. Packaging them for OpenWRT can only cause confusion. You do not need these programs.

Toralf Förster

9 Dec 9 Dec

9:16 a.m.

On 12/9/22 07:02, David Fifield wrote:

...

But now there is rdsys and bridgestrap, which may have the ability to test the obfs4 port rather than the ORPort. I cannot say whether that removes the requirement to expose the ORPort.

Would be a step toward to make scanning for bridges harder IMO, if the ORPort is no longer needed to be exposed.

-- Toralf

David Fifield

10 Dec 10 Dec

4:41 a.m.

On Fri, Dec 09, 2022 at 10:16:47AM +0100, Toralf Förster wrote:

...

On 12/9/22 07:02, David Fifield wrote:

...
But now there is rdsys and bridgestrap, which may have the ability to test the obfs4 port rather than the ORPort. I cannot say whether that removes the requirement to expose the ORPort.

Would be a step toward to make scanning for bridges harder IMO, if the ORPort is no longer needed to be exposed.

You are entirely correct. It's been noted as a discoverability vulnerability for over 10 years now. But so far attempts to resolve https://bugs.torproject.org/tpo/core/tor/7349 have fallen short.

689

Age (days ago)

1046

Last active (days ago)

tor-relays@lists.torproject.org

57 comments

7 participants

tags (0)

participants (7)

abuse＠lokodlare.com
David Fifield
Gary C. New
Georg Koppen
Roger Dingledine
Roman Mamedov
Toralf Förster