I was alerted by trinity-1686a on irc that Snowflake standalone proxy operators were reporting on #tor-relays about increased OOM errors from increased load as of 2023-08-28.
After looking at the Snowfake broker metrics[0], there's a huge jump in client polls (seen by summing the client-denied-count and client-snowflake-match-count).
I've attached a graph of the collected prometheus metrics that shows this spike happening at exactly 17:40 UTC on 2023-08-27. It looks like way too sharp an increase to me to be a censorship event, perhaps it is a DoS?
It's still too early to see the bridge metrics from the metrics page, but we should start to see the effects there tomorrow.
[0] https://metrics.torproject.org/collector.html#snowflake-stats
On 2023-08-30 14:28, Cecylia Bocovich wrote:
I was alerted by trinity-1686a on irc that Snowflake standalone proxy operators were reporting on #tor-relays about increased OOM errors from increased load as of 2023-08-28.
After looking at the Snowfake broker metrics[0], there's a huge jump in client polls (seen by summing the client-denied-count and client-snowflake-match-count).
I've attached a graph of the collected prometheus metrics that shows this spike happening at exactly 17:40 UTC on 2023-08-27. It looks like way too sharp an increase to me to be a censorship event, perhaps it is a DoS?
It's still too early to see the bridge metrics from the metrics page, but we should start to see the effects there tomorrow.
[0] https://metrics.torproject.org/collector.html#snowflake-stats
I should add that it looks to have stopped around 12:25 UTC earlier today (2023-08-30).
On 2023-08-30 14:31, Cecylia Bocovich wrote:
On 2023-08-30 14:28, Cecylia Bocovich wrote:
I was alerted by trinity-1686a on irc that Snowflake standalone proxy operators were reporting on #tor-relays about increased OOM errors from increased load as of 2023-08-28.
After looking at the Snowfake broker metrics[0], there's a huge jump in client polls (seen by summing the client-denied-count and client-snowflake-match-count).
I've attached a graph of the collected prometheus metrics that shows this spike happening at exactly 17:40 UTC on 2023-08-27. It looks like way too sharp an increase to me to be a censorship event, perhaps it is a DoS?
It's still too early to see the bridge metrics from the metrics page, but we should start to see the effects there tomorrow.
[0] https://metrics.torproject.org/collector.html#snowflake-stats
I should add that it looks to have stopped around 12:25 UTC earlier today (2023-08-30).
The bridge metrics show a small increase in usage on August 30th (the day it stopped), but not for August 28th or 29th, and definitely not proportional to the increase in client polls. This could indicate a DoS attack that only affected the broker, where connections are severed or the SDP information didn't contain enough information to establish a datachannel between the client and matched proxy. Or it could be that most of these polls were coming from a small set of IP addresses.
Whatever it was, the biggest consequence was a shortage of proxies with unrestricted NAT types, and that standalone proxies with the default configuration of no limit to the number of clients were OOM killed[0]
I haven't noticed any unusual activity again since August 30th.
[0] https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf...
On Sat, Sep 02, 2023 at 09:59:39AM -0400, Cecylia Bocovich wrote:
The bridge metrics show a small increase in usage on August 30th (the day it stopped), but not for August 28th or 29th, and definitely not proportional to the increase in client polls. This could indicate a DoS attack that only affected the broker, where connections are severed or the SDP information didn't contain enough information to establish a datachannel between the client and matched proxy. Or it could be that most of these polls were coming from a small set of IP addresses.
Whatever it was, the biggest consequence was a shortage of proxies with unrestricted NAT types, and that standalone proxies with the default configuration of no limit to the number of clients were OOM killed[0]
Is this behavior also compatible with somebody who tried to enumerate all of the available snowflake proxies?
That is, this is what it would look like if somebody did an enumeration attack on the broker?
--Roger
On 2023-09-02 14:33, Roger Dingledine wrote:
On Sat, Sep 02, 2023 at 09:59:39AM -0400, Cecylia Bocovich wrote:
The bridge metrics show a small increase in usage on August 30th (the day it stopped), but not for August 28th or 29th, and definitely not proportional to the increase in client polls. This could indicate a DoS attack that only affected the broker, where connections are severed or the SDP information didn't contain enough information to establish a datachannel between the client and matched proxy. Or it could be that most of these polls were coming from a small set of IP addresses.
Whatever it was, the biggest consequence was a shortage of proxies with unrestricted NAT types, and that standalone proxies with the default configuration of no limit to the number of clients were OOM killed[0]
Is this behavior also compatible with somebody who tried to enumerate all of the available snowflake proxies?
That is, this is what it would look like if somebody did an enumeration attack on the broker?
--Roger
Yes, though to be honest I'd expect it to be quieter. I'm sure you could get just as much information over the course of three days and not be as obvious about it. But yeah, it's very possible that's what this was.
On Wed, Aug 30, 2023 at 02:31:09PM -0400, Cecylia Bocovich wrote:
On 2023-08-30 14:28, Cecylia Bocovich wrote:
I was alerted by trinity-1686a on irc that Snowflake standalone proxy operators were reporting on #tor-relays about increased OOM errors from increased load as of 2023-08-28.
After looking at the Snowfake broker metrics[0], there's a huge jump in client polls (seen by summing the client-denied-count and client-snowflake-match-count).
I've attached a graph of the collected prometheus metrics that shows this spike happening at exactly 17:40 UTC on 2023-08-27. It looks like way too sharp an increase to me to be a censorship event, perhaps it is a DoS?
It's still too early to see the bridge metrics from the metrics page, but we should start to see the effects there tomorrow.
[0] https://metrics.torproject.org/collector.html#snowflake-stats
I should add that it looks to have stopped around 12:25 UTC earlier today (2023-08-30).
I was having a look at the graphs and I have an explanation. The snowflake-02 bridge was, for some reason, not processing any connections during this time. The times in the logs match up exactly with the client polls graph. It stopped working at 2023-08-27 17:45 and began working again at 2023-08-30 12:30.
2023/08/27 17:45:33 reading token: read tcp [scrubbed]->[scrubbed]: read: connection timed out (766 more "connection timed out" lines) 2023/08/27 17:47:28 reading token: read tcp [scrubbed]->[scrubbed]: read: connection timed out 2023/08/28 11:03:26 in the past 86400 s, 105514/105990 connections had client_ip 2023/08/29 11:03:26 in the past 86400 s, 0/0 connections had client_ip 2023/08/30 11:03:26 in the past 86400 s, 0/0 connections had client_ip 2023/08/30 12:30:14 reading token: websocket: close 1006 (abnormal closure): unexpected EOF (working again)
What would have happened is clients that randomly selected snowflake-02 as their bridge would have timed out and had to re-rendezvous until they happened to randomly select snowflake-01. Meanwhile snowflake-01 likely was overloaded because it alone is not able to handle all existing clients.
I don't know what went wrong with snowflake-02. The server did not reboot, and as far as I can tell all the processes kept running.
On Mon, Sep 04, 2023 at 01:30:29AM -0600, David Fifield wrote:
I was having a look at the graphs and I have an explanation. The snowflake-02 bridge was, for some reason, not processing any connections during this time. The times in the logs match up exactly with the client polls graph. It stopped working at 2023-08-27 17:45 and began working again at 2023-08-30 12:30.
2023/08/27 17:45:33 reading token: read tcp [scrubbed]->[scrubbed]: read: connection timed out (766 more "connection timed out" lines) 2023/08/27 17:47:28 reading token: read tcp [scrubbed]->[scrubbed]: read: connection timed out 2023/08/28 11:03:26 in the past 86400 s, 105514/105990 connections had client_ip 2023/08/29 11:03:26 in the past 86400 s, 0/0 connections had client_ip 2023/08/30 11:03:26 in the past 86400 s, 0/0 connections had client_ip 2023/08/30 12:30:14 reading token: websocket: close 1006 (abnormal closure): unexpected EOF (working again)
What would have happened is clients that randomly selected snowflake-02 as their bridge would have timed out and had to re-rendezvous until they happened to randomly select snowflake-01. Meanwhile snowflake-01 likely was overloaded because it alone is not able to handle all existing clients.
I don't know what went wrong with snowflake-02. The server did not reboot, and as far as I can tell all the processes kept running.
The cause of the outage was a disconnection at the university campus where the bridge is hosted.
https://www.insidehighered.com/news/tech-innovation/teaching-learning/2023/0... https://web.archive.org/web/20230905230409/https://umich.edu/announcements/
anti-censorship-team@lists.torproject.org