Greetings,
I hope this is an appropriate mailing list to discuss a technical issue with Tor's Snowflake project. Please redirect me to the right place if not.
I am the original author and maintainer of the open source project, Stuntman. Stuntman is an implementation of the STUN protocol, which includes the STUN server. More details at www.stunprotocol.org. In short, a STUN server helps bootstream direct "p2p" connections such as WebRTC sessions or similar VOIP scenarios by allowing internet devices to self-discover their own public IP address and obtain a (UDP) port for communicating with another node.
I also run a public instance of a STUN server with the code at stun.stunprotocol.org. It's been up and running for about 10 years now. It's hosted on AWS. In recent years, the hosting bills for this server have started to get on the high side, even with reserved instances. The number of STUN queries it processes per day is now on the order of hundreds of millions. The stunprotocol.org domain receives nearly a million DNS queries on Route 53 daily. What used to cost a trivial number of dollars to run is now starting to reach $1000 in annual service costs. This isn't paid for by a corporation or well funded internet organization. I pay this out of my personal pocket.
It's been a mystery what has been driving the increasing traffic to the server - especially redundant requests from the same IPs. I was inspecting the DNS logs the other day and started to investigate the nodes sending out redundant DNS requests repetitively. Trying to understand why these nodes wouldn't leverage DNS caching. And to my surprise, one of the IPs was running a web server that presented a TOR landing page. That led me to discover this discussion online:
https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf...
And a quick inspection of the Snowflake code leads me to find that stun.stunprotocol.org is the default STUN server for Snowflake proxy and listed throughout the documentation as well.
While the Snowflake project has good intentions, it doesn't appear to take my hosting costs into consideration. I'm hoping we can have a good discussion on the following:
1) How many snowflake clients and proxies are active and how many STUN requests are each generating towards stunprotocol.org? Do we think the entire worldwide usage of Snowflake could be responsible for millions of STUN queries to stunprotocol.org per day?
2) Expected number of DNS queries (it's a 3-day TTL on these DNS entries, so it blows my mind that there are so many redundant requests). Does Pion or any other part of the Snowflake code tend to go direct to the namespace server itself?
3) Removing stun.stunprotocol.org as the default STUN server.
OR...
4) Alternatively, I'm always open to accepting donations to help run the service costs of stunprotocol.org. I'm definitely not getting rich running this thing.
Thanks, John Selbie
On 12/26/22 00:55, John Selbie wrote:
Greetings,
I hope this is an appropriate mailing list to discuss a technical issue with Tor's Snowflake project. Please redirect me to the right place if not.
I am the original author and maintainer of the open source project, Stuntman. Stuntman is an implementation of the STUN protocol, which includes the STUN server. More details at www.stunprotocol.org http://www.stunprotocol.org. In short, a STUN server helps bootstream direct "p2p" connections such as WebRTC sessions or similar VOIP scenarios by allowing internet devices to self-discover their own public IP address and obtain a (UDP) port for communicating with another node.
I also run a public instance of a STUN server with the code at stun.stunprotocol.org http://stun.stunprotocol.org. It's been up and running for about 10 years now. It's hosted on AWS. In recent years, the hosting bills for this server have started to get on the high side, even with reserved instances. The number of STUN queries it processes per day is now on the order of hundreds of millions. The stunprotocol.org http://stunprotocol.org domain receives nearly a million DNS queries on Route 53 daily. What used to cost a trivial number of dollars to run is now starting to reach $1000 in annual service costs. This isn't paid for by a corporation or well funded internet organization. I pay this out of my personal pocket.
It's been a mystery what has been driving the increasing traffic to the server - especially redundant requests from the same IPs. I was inspecting the DNS logs the other day and started to investigate the nodes sending out redundant DNS requests repetitively. Trying to understand why these nodes wouldn't leverage DNS caching. And to my surprise, one of the IPs was running a web server that presented a TOR landing page. That led me to discover this discussion online:
https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf...
And a quick inspection of the Snowflake code leads me to find that stun.stunprotocol.org http://stun.stunprotocol.org is the default STUN server for Snowflake proxy and listed throughout the documentation as well.
While the Snowflake project has good intentions, it doesn't appear to take my hosting costs into consideration. I'm hoping we can have a good discussion on the following:
- How many snowflake clients and proxies are active and how many STUN
requests are each generating towards stunprotocol.org http://stunprotocol.org? Do we think the entire worldwide usage of Snowflake could be responsible for millions of STUN queries to stunprotocol.org http://stunprotocol.org per day?
- Expected number of DNS queries (it's a 3-day TTL on these DNS
entries, so it blows my mind that there are so many redundant requests). Does Pion or any other part of the Snowflake code tend to go direct to the namespace server itself?
- Removing stun.stunprotocol.org http://stun.stunprotocol.org as
the default STUN server.
OR...
- Alternatively, I'm always open to accepting donations to help run
the service costs of stunprotocol.org http://stunprotocol.org. I'm definitely not getting rich running this thing.
Thanks, John Selbie
Hi John,
Thank you for reaching out. This is exactly the right place to discuss this. This was an oversight on my part not to reach out to you as the operator of our configured default STUN server and I'm very sorry for the unexpected increased costs. We can absolutely remove stunprotocol.org as the default.
First, to answer some of your questions:
1) I would definitely believe the amount of snowflake traffic to stunprotocol.org to be this high. We have over 100,000 proxies. According to recent metrics[0], there are around 8 million matches a day and therefore that many WebRTC ICE gathering requests coming from just the proxies. The clients use a randomized subset of configured STUN servers, so the number is slightly different but it's safe to say they are also generating a few million STUN queries to your server.
2) I'm not sure about the DNS queries. It also surprises me that there are this many, I'll open an issue to investigate why.
Now for immediate next steps. I've sent an email to people internally to start the process of looking into sending some funds your way for the costs. We might not get an answer until after everyone is back at work in January. In the meantime:
- I'd like to remove stunprotocol.org as the default STUN server for the proxies. The reason we added to our list in the first place was because it implements RFC 5780 which we are using on the client side to determine NAT matching and filtering types (thank you for this implementation BTW!). The proxies no longer use this however, so there's no reason we can't have them use any number of other public STUN servers.
- We can remove stunprotocol.org from the list of *default* client STUN servers and reserve it for a small subset of users instead. This would really cut down on the traffic, but keep it open as an option for clients in places that have blocked other industry STUN servers.
How does this sound to you?
Thanks again for all the work you've done on Stuntman and in running a public server. I'm glad you reached about this.
- Cecylia
On Mon, Dec 26, 2022 at 01:43:32PM -0500, Cecylia Bocovich wrote:
On 12/26/22 00:55, John Selbie wrote:
While the Snowflake project has good intentions, it doesn't appear to take my hosting costs into consideration. I'm hoping we can have a good discussion on the following:
- How many snowflake clients and proxies are active and how many STUN
requests are each generating towards stunprotocol.org http://stunprotocol.org? Do we think the entire worldwide usage of Snowflake could be responsible for millions of STUN queries to stunprotocol.org http://stunprotocol.org per day?
- Expected number of DNS queries (it's a 3-day TTL on these DNS
entries, so it blows my mind that there are so many redundant requests). Does Pion or any other part of the Snowflake code tend to go direct to the namespace server itself?
- Removing stun.stunprotocol.org http://stun.stunprotocol.org as the
default STUN server.
OR...
- Alternatively, I'm always open to accepting donations to help run the
service costs of stunprotocol.org http://stunprotocol.org. I'm definitely not getting rich running this thing.
Thank you for reaching out. This is exactly the right place to discuss this. This was an oversight on my part not to reach out to you as the operator of our configured default STUN server and I'm very sorry for the unexpected increased costs. We can absolutely remove stunprotocol.org as the default.
Likewise, thanks for reaching out. Snowflake has grown faster then expected, in part because of a few crisis situations when it was one of few unblocked communications channels. Load on STUN servers was not much of a consideration in the early days when the system was small, but clearly it's something we need to think about now.
- I would definitely believe the amount of snowflake traffic to
stunprotocol.org to be this high. We have over 100,000 proxies. According to recent metrics[0], there are around 8 million matches a day and therefore that many WebRTC ICE gathering requests coming from just the proxies. The clients use a randomized subset of configured STUN servers, so the number is slightly different but it's safe to say they are also generating a few million STUN queries to your server.
- I'm not sure about the DNS queries. It also surprises me that there are
this many, I'll open an issue to investigate why.
My guess is that the proxies that fail to cache DNS are standalone proxies (and to a lesser extent clients). The Go standard library may or may not cache DNS, depending on how it was compiled and possibly even on runtime conditions.
https://pkg.go.dev/net#hdr-Name_Resolution On Unix systems, the resolver has two options for resolving names. It can use a pure Go resolver that sends DNS requests directly to the servers listed in /etc/resolv.conf, or it can use a cgo-based resolver that calls C library routines such as getaddrinfo and getnameinfo. ... The resolver decision can be overridden by setting the netdns value of the GODEBUG environment variable to go or cgo ... or while building the Go source tree by setting the netgo or netcgo build tag.
When I run `GODEBUG=netdns=1 ./proxy`, it says go package net: dynamic selection of DNS resolver Perhaps we should document `go build -tags netcgo` as the way to build the proxy, in order to use the OS resolver which is more likely to cache response records.
Now for immediate next steps. I've sent an email to people internally to start the process of looking into sending some funds your way for the costs. We might not get an answer until after everyone is back at work in January. In the meantime:
- I'd like to remove stunprotocol.org as the default STUN server for the
proxies. The reason we added to our list in the first place was because it implements RFC 5780 which we are using on the client side to determine NAT matching and filtering types (thank you for this implementation BTW!). The proxies no longer use this however, so there's no reason we can't have them use any number of other public STUN servers.
- We can remove stunprotocol.org from the list of *default* client STUN
servers and reserve it for a small subset of users instead. This would really cut down on the traffic, but keep it open as an option for clients in places that have blocked other industry STUN servers.
Another thing to consider is reducing the default polling frequency of standalone proxies and/or increasing the long-polling delay at the broker.
Thank you David. As I mentioned in the other thread, I'm not 100% sure if Snowflake is the only service generating excessive DNS queries. But I'll keep an eye on it and let you guys know if there's something unusual.
I ran some Wireshark traces after compiling the proxy code on Windows last night. I didn't see anything unusual. I'll try Linux later.
On Tue, Dec 27, 2022 at 8:34 AM David Fifield david@bamsoftware.com wrote:
On Mon, Dec 26, 2022 at 01:43:32PM -0500, Cecylia Bocovich wrote:
On 12/26/22 00:55, John Selbie wrote:
While the Snowflake project has good intentions, it doesn't appear to take my hosting costs into consideration. I'm hoping we can have a good discussion on the following:
- How many snowflake clients and proxies are active and how many STUN
requests are each generating towards stunprotocol.org http://stunprotocol.org? Do we think the entire worldwide usage of Snowflake could be responsible for millions of STUN queries to stunprotocol.org http://stunprotocol.org per day?
- Expected number of DNS queries (it's a 3-day TTL on these DNS
entries, so it blows my mind that there are so many redundant
requests).
Does Pion or any other part of the Snowflake code tend to go direct to the namespace server itself?
- Removing stun.stunprotocol.org http://stun.stunprotocol.org as
the
default STUN server.
OR...
- Alternatively, I'm always open to accepting donations to help run
the
service costs of stunprotocol.org http://stunprotocol.org. I'm definitely not getting rich running this thing.
Thank you for reaching out. This is exactly the right place to discuss
this.
This was an oversight on my part not to reach out to you as the operator
of
our configured default STUN server and I'm very sorry for the unexpected increased costs. We can absolutely remove stunprotocol.org as the
default.
Likewise, thanks for reaching out. Snowflake has grown faster then expected, in part because of a few crisis situations when it was one of few unblocked communications channels. Load on STUN servers was not much of a consideration in the early days when the system was small, but clearly it's something we need to think about now.
- I would definitely believe the amount of snowflake traffic to
stunprotocol.org to be this high. We have over 100,000 proxies.
According to
recent metrics[0], there are around 8 million matches a day and therefore that many WebRTC ICE gathering requests coming from just the proxies. The clients use a randomized subset of configured STUN servers, so the
number is
slightly different but it's safe to say they are also generating a few million STUN queries to your server.
- I'm not sure about the DNS queries. It also surprises me that there
are
this many, I'll open an issue to investigate why.
My guess is that the proxies that fail to cache DNS are standalone proxies (and to a lesser extent clients). The Go standard library may or may not cache DNS, depending on how it was compiled and possibly even on runtime conditions.
https://pkg.go.dev/net#hdr-Name_Resolution On Unix systems, the resolver has two options for resolving names. It can use a pure Go resolver that sends DNS requests directly to the servers listed in /etc/resolv.conf, or it can use a cgo-based resolver that calls C library routines such as getaddrinfo and getnameinfo. ... The resolver decision can be overridden by setting the netdns value of the GODEBUG environment variable to go or cgo ... or while building the Go source tree by setting the netgo or netcgo build tag.
When I run `GODEBUG=netdns=1 ./proxy`, it says go package net: dynamic selection of DNS resolver Perhaps we should document `go build -tags netcgo` as the way to build the proxy, in order to use the OS resolver which is more likely to cache response records.
Now for immediate next steps. I've sent an email to people internally to start the process of looking into sending some funds your way for the
costs.
We might not get an answer until after everyone is back at work in
January.
In the meantime:
- I'd like to remove stunprotocol.org as the default STUN server for the
proxies. The reason we added to our list in the first place was because
it
implements RFC 5780 which we are using on the client side to determine
NAT
matching and filtering types (thank you for this implementation BTW!).
The
proxies no longer use this however, so there's no reason we can't have
them
use any number of other public STUN servers.
- We can remove stunprotocol.org from the list of *default* client STUN
servers and reserve it for a small subset of users instead. This would really cut down on the traffic, but keep it open as an option for
clients in
places that have blocked other industry STUN servers.
Another thing to consider is reducing the default polling frequency of standalone proxies and/or increasing the long-polling delay at the broker.
Thank you Cecylia. I think this is a good plan. I like the idea of stun.stunprotocol.org being "in the rotation" for these nodes. Just not the "exclusive default" unless a user manually configures it that way. Does that work for you?
That being said, I don't have hard evidence that Snowflake is causing load issues or excessive DNS queries. There's no headers or attributes in these STUN requests that indicate the originating app. A few million STUN requests per day from a service isn't much bother. A few hundred million hits per day gets noticed.
As for the DNS queries, it was only a sample of a few IP addresses in the Route 53 logs that led me to this group. stun.stunprotocol.org has continuously been hit with DDOS attacks over the years. It wouldn't surprise me if these same attackers are going after the DNS service as well.
I ran the Snowflake extension from the browser and the proxy code from the command line while simultaneously running Wireshark to monitor traffic. I didn't see anything unusual with regards to STUN. I'll double check DNS, but I thought I evaluated that too last night.
It's nice to meet everyone. I'll respond to David's messages next.
Thanks, jrs
On Mon, Dec 26, 2022 at 10:43 AM Cecylia Bocovich cohosh@torproject.org wrote:
On 12/26/22 00:55, John Selbie wrote:
Greetings,
I hope this is an appropriate mailing list to discuss a technical issue with Tor's Snowflake project. Please redirect me to the right place if not.
I am the original author and maintainer of the open source project, Stuntman. Stuntman is an implementation of the STUN protocol, which includes the STUN server. More details at www.stunprotocol.org http://www.stunprotocol.org. In short, a STUN server helps bootstream direct "p2p" connections such as WebRTC sessions or similar VOIP scenarios by allowing internet devices to self-discover their own public IP address and obtain a (UDP) port for communicating with another node.
I also run a public instance of a STUN server with the code at stun.stunprotocol.org http://stun.stunprotocol.org. It's been up and running for about 10 years now. It's hosted on AWS. In recent years, the hosting bills for this server have started to get on the high side, even with reserved instances. The number of STUN queries it processes per day is now on the order of hundreds of millions. The stunprotocol.org http://stunprotocol.org domain receives nearly a million DNS queries on Route 53 daily. What used to cost a trivial number of dollars to run is now starting to reach $1000 in annual service costs. This isn't paid for by a corporation or well funded internet organization. I pay this out of my personal pocket.
It's been a mystery what has been driving the increasing traffic to the server - especially redundant requests from the same IPs. I was inspecting the DNS logs the other day and started to investigate the nodes sending out redundant DNS requests repetitively. Trying to understand why these nodes wouldn't leverage DNS caching. And to my surprise, one of the IPs was running a web server that presented a TOR landing page. That led me to discover this discussion online:
https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf...
And a quick inspection of the Snowflake code leads me to find that stun.stunprotocol.org http://stun.stunprotocol.org is the default STUN server for Snowflake proxy and listed throughout the documentation as well.
While the Snowflake project has good intentions, it doesn't appear to take my hosting costs into consideration. I'm hoping we can have a good discussion on the following:
- How many snowflake clients and proxies are active and how many STUN
requests are each generating towards stunprotocol.org http://stunprotocol.org? Do we think the entire worldwide usage of Snowflake could be responsible for millions of STUN queries to stunprotocol.org http://stunprotocol.org per day?
- Expected number of DNS queries (it's a 3-day TTL on these DNS
entries, so it blows my mind that there are so many redundant requests). Does Pion or any other part of the Snowflake code tend to go direct to the namespace server itself?
- Removing stun.stunprotocol.org http://stun.stunprotocol.org as
the default STUN server.
OR...
- Alternatively, I'm always open to accepting donations to help run
the service costs of stunprotocol.org http://stunprotocol.org. I'm definitely not getting rich running this thing.
Thanks, John Selbie
Hi John,
Thank you for reaching out. This is exactly the right place to discuss this. This was an oversight on my part not to reach out to you as the operator of our configured default STUN server and I'm very sorry for the unexpected increased costs. We can absolutely remove stunprotocol.org as the default.
First, to answer some of your questions:
- I would definitely believe the amount of snowflake traffic to
stunprotocol.org to be this high. We have over 100,000 proxies. According to recent metrics[0], there are around 8 million matches a day and therefore that many WebRTC ICE gathering requests coming from just the proxies. The clients use a randomized subset of configured STUN servers, so the number is slightly different but it's safe to say they are also generating a few million STUN queries to your server.
- I'm not sure about the DNS queries. It also surprises me that there
are this many, I'll open an issue to investigate why.
Now for immediate next steps. I've sent an email to people internally to start the process of looking into sending some funds your way for the costs. We might not get an answer until after everyone is back at work in January. In the meantime:
- I'd like to remove stunprotocol.org as the default STUN server for the
proxies. The reason we added to our list in the first place was because it implements RFC 5780 which we are using on the client side to determine NAT matching and filtering types (thank you for this implementation BTW!). The proxies no longer use this however, so there's no reason we can't have them use any number of other public STUN servers.
- We can remove stunprotocol.org from the list of *default* client STUN
servers and reserve it for a small subset of users instead. This would really cut down on the traffic, but keep it open as an option for clients in places that have blocked other industry STUN servers.
How does this sound to you?
Thanks again for all the work you've done on Stuntman and in running a public server. I'm glad you reached about this.
- Cecylia
[0] https://snowflake-broker.bamsoftware.com/metrics
anti-censorship-team mailing list anti-censorship-team@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/anti-censorship-team
On 12/27/22 15:41, John Selbie wrote:
Thank you Cecylia. I think this is a good plan. I like the idea of stun.stunprotocol.org http://stun.stunprotocol.org being "in the rotation" for these nodes. Just not the "exclusive default" unless a user manually configures it that way. Does that work for you?
Sounds good. Here's the issue where we're tracking the changes: https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf...
It could be a few weeks until you see the traffic drop. Snowflake is distributed and there are a variety of update channels we have to push changes to. For the client traffic, we're dependent on the Tor Browser release schedule.
- Cecylia
On Dec 31, 2022, at 12:28 PM, Cecylia Bocovich cohosh@torproject.org wrote:
On 12/27/22 15:41, John Selbie wrote:
Thank you Cecylia. I think this is a good plan. I like the idea of stun.stunprotocol.org http://stun.stunprotocol.org being "in the rotation" for these nodes. Just not the "exclusive default" unless a user manually configures it that way. Does that work for you?
Sounds good. Here's the issue where we're tracking the changes: https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf...
It could be a few weeks until you see the traffic drop. Snowflake is distributed and there are a variety of update channels we have to push changes to. For the client traffic, we're dependent on the Tor Browser release schedule.
We’ll take a look at some of the questions around DNS cacheing and stun server rotation in the mobile IPtProxy library and Orbot use of Snowflake this week.
Best, Nathan
anti-censorship-team@lists.torproject.org