General overload -> DNS timeouts

List overview All Threads
Download

newer

older

Leaving tor

Tor relay struggling with circuit...

Intrepid Ibex

5 Nov 2021 5 Nov '21

3:05 p.m.

Hi everybody,

I am new to tor (as server operator) and try to support the network by operating an exit node.

Machine is running Ubuntu 20.04 - Tor 0.4.6.8.

nyx is giving me a notive every 10 minutes or so: [NOTICE] General overload -> DNS timeouts (6) fraction 1.4742% is above threshold of 1.0000%

DNS on this machine however works perfectly. I told my tor browser to use my specific exit node, everything works fine.

Node is running since 4 days now. Load is as expected.

Thanks in advance for any help!

Ibex

-- Sent using MsgSafe.io's Free Plan Private, encrypted, online communication For everyone. https://www.msgsafe.io

Attachments:

attachment.html (text/html — 951 bytes)

Show replies by date

nusenu

7 Nov 7 Nov

10:43 a.m.

Hi,

since out of 447 exit relays that support the new overload system (it got added in tor 0.4.6.x that recently hit the torproject's debian repo) over 400 (minus those affected by an onionoo bug) are overloaded as per tor's definition of it, I'll write some general recommendations for the DNS timeout case because many more operators will want to solve similar issues in the near future. The general overload does not imply it is always a DNS issue but in your case it is.

Generally speaking it is a bit unfortunate that the new MetricsPort prometheus feature in tor is not available in the same tor releases as the new overload design since one of the first recommendation to investigate DNS timeouts is to enable MetricsPort to monitor DNS timeout rates.

The currently best tor version to use MetricsPort with is 0.4.7.2-alpha since 0.4.7.1-alpha is affected by a bug in that area but due to another issue [1] there are no debian/ubuntu tor alpha packages for 0.4.7.2-alpha on deb.torproject.org yet.

I would recommend to upgrade to alpha packages once they become available.

An exit relay can generate large amounts of DNS queries that the configured resolvers need to handle. Your exit relay is still ramping up and so will the DNS query rate. So if tor sees timeout already now the situation might becomes more problematic as your exit gets more traffic.

How does your DNS resolution work on your exit relay? Do you have a local recursive resolver running? Do you have operational monitoring for it that show you timeout rates?

the relay documentation has a short section about DNS on exits: https://community.torproject.org/relay/setup/exit/#dns-on-exit-relays

The overload documentation also has a short section on DNS but since you are running on Linux the default timeout (5s) in resolv.conf is more then enough I would not change it. https://support.torproject.org/relay-operators/relay-bridge-overloaded/

[1] https://gitlab.torproject.org/tpo/core/tor/-/issues/40505

Intrepid Ibex via tor-relays:

...

nyx is giving me a notive every 10 minutes or so: [NOTICE] General overload -> DNS timeouts (6) fraction 1.4742% is above threshold of 1.0000%

DNS on this machine however works perfectly. I told my tor browser to use my specific exit node, everything works fine.

a timeout rate of about 1% is likely hard to "see" when testing manually because it also depends on what query you send (due to DNS caching) but it is still something that affects tor users of your exit.

That said, since the entire overload system is new in tor it is good to have some operational monitoring of the DNS resolver (ideally only used by your tor daemon) that can confirm what tor reports.

Once you have MetricsPort setup and graphs for the data you can try to adapt the DNS settings or tune your resolver and see if things improve (timeout rate goes down).

kind regards, nusenu

-- https://nusenu.github.io

Anders Trier Olesen

8 Nov 8 Nov

11:25 p.m.

Hi Ibex

There's some discussion about this issue on #tor-relays. First of all, I actually don't think there's any problem with your Exit node. IMO 1.5% is expected and the current threshold is just too low (or the timeout too low).

We're hosting some fairly high capacity exit nodes (around 100mbit/s each), generating about 100 DNS queries/sec in total. We're also seeing around 1.5% failed DNS queries.

Here's what I think is going on: For some queries (1.5%?), the authoritative DNS servers for the requested domain are not responding. Recursive DNS resolvers are by default more patient than the Tor software. Eventually the recursive resolver will return a 'SERVFAIL' to Tor, but by then, Tor has already given up, and counts it as a timeout.

One example of a domain that is currently (2021-11-08T23:40+01:00) failing to resolve because of its authoritative DNS servers being down is mzfgbh.com. Try running `dig +trace +additional mzfgbh.com` (for some reason, dig ignores the 'additional' section of one of the answers. Essentially the problem is that this query (and a few others) times out: `dig mzfgbh.com @ 1.1.1.20`).

On a related note, when this metric was introduced, we were seeing around 5-6% failed queries. The recursive DNS resolver we were using was hosted on the same IP as a guard node. Apparently many authoritative DNS servers block traffic from all Tor relays!

The most extreme example of this I found, is that the authoritative DNS servers for the entire .by ccTLD (Belarus) are blocking DNS requests from guard and exit nodes. This resulted in all <domain>.by queries timing out!

By moving our recursive DNS resolver to an IP not used by any Tor relays, the DNS timeouts fraction reported by Tor dropped from 5-6% to 1.5%.

The Tor relay guide should recommend running your recursive resolver (unbound) on a different IP than your exit: https://community.torproject.org/relay/setup/exit/

- Anders Trier Olesen

On Sat, Nov 6, 2021 at 5:53 PM Intrepid Ibex via tor-relays < tor-relays@lists.torproject.org> wrote:

...

Hi everybody,

I am new to tor (as server operator) and try to support the network by operating an exit node.

Machine is running Ubuntu 20.04 - Tor 0.4.6.8.

nyx is giving me a notive every 10 minutes or so: [NOTICE] General overload -> DNS timeouts (6) fraction 1.4742% is above threshold of 1.0000%

DNS on this machine however works perfectly. I told my tor browser to use my specific exit node, everything works fine.

Node is running since 4 days now. Load is as expected.

Thanks in advance for any help!

Ibex

-- Sent using MsgSafe.io https:/www.msgsafe.io/?utm_source=msgsafe&utm_medium=email&utm_campaign=freemailsignature's Free Plan Private, encrypted, online communication For everyone. www.msgsafe.io https:/www.msgsafe.io/?utm_source=msgsafe&utm_medium=email&utm_campaign=freemailsignature _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

John Csuti

9 Nov 9 Nov

11:25 a.m.

Hello all,

I would have to agree on this it appears that the DNS failure timeout is too low. I have more then enough bandwidth to host tor exit nodes, and my own unbound full recursive relay and yet i still get the timeout message 1-1.5%. Sometimes even weird amounts such as 40-50%.

I have been working with a few people on this issue and nothing we have tried has fixed this. The other thing is that all other servers i run have no issue with DNS timeouts. It appears to only be a TOR issue. I would even say that some DNS queries that TOR makes are to taken down sites, fake sites or non-existent domains.

Thanks, John C. +1 (216) XXX-XXXX

On 2021-11-08 06:25 PM, Anders Trier Olesen wrote:

...

Hi Ibex

There's some discussion about this issue on #tor-relays. First of all, I actually don't think there's any problem with your Exit node. IMO 1.5% is expected and the current threshold is just too low (or the timeout too low).

We're hosting some fairly high capacity exit nodes (around 100mbit/s each), generating about 100 DNS queries/sec in total. We're also seeing around 1.5% failed DNS queries.

Here's what I think is going on: For some queries (1.5%?), the authoritative DNS servers for the requested domain are not responding. Recursive DNS resolvers are by default more patient than the Tor software. Eventually the recursive resolver will return a 'SERVFAIL' to Tor, but by then, Tor has already given up, and counts it as a timeout.

One example of a domain that is currently (2021-11-08T23:40+01:00) failing to resolve because of its authoritative DNS servers being down is mzfgbh.com [2]. Try running `dig +trace +additional mzfgbh.com [2]` (for some reason, dig ignores the 'additional' section of one of the answers. Essentially the problem is that this query (and a few others) times out: `dig mzfgbh.com [2] @1.1.1.20 [3]`).

On a related note, when this metric was introduced, we were seeing around 5-6% failed queries. The recursive DNS resolver we were using was hosted on the same IP as a guard node. Apparently many authoritative DNS servers block traffic from all Tor relays!

The most extreme example of this I found, is that the authoritative DNS servers for the entire .by ccTLD (Belarus) are blocking DNS requests from guard and exit nodes. This resulted in all <domain>.by queries timing out!

By moving our recursive DNS resolver to an IP not used by any Tor relays, the DNS timeouts fraction reported by Tor dropped from 5-6% to 1.5%.

The Tor relay guide should recommend running your recursive resolver (unbound) on a different IP than your exit: https://community.torproject.org/relay/setup/exit/

Anders Trier Olesen

On Sat, Nov 6, 2021 at 5:53 PM Intrepid Ibex via tor-relays tor-relays@lists.torproject.org wrote:

...
Hi everybody,

I am new to tor (as server operator) and try to support the network by operating an exit node.

Machine is running Ubuntu 20.04 - Tor 0.4.6.8.

nyx is giving me a notive every 10 minutes or so: [NOTICE] General overload -> DNS timeouts (6) fraction 1.4742% is above threshold of 1.0000%

DNS on this machine however works perfectly. I told my tor browser to use my specific exit node, everything works fine.

Node is running since 4 days now. Load is as expected.

Thanks in advance for any help!

Ibex

-- Sent using MsgSafe.io [1]'s Free Plan Private, encrypted, online communication For everyone. www.msgsafe.io [1] _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

Links: ------ [1] https:/www.msgsafe.io/?utm_source=msgsafe&utm_medium=email&utm_campaign=freemailsignature [2] http://mzfgbh.com [3] http://1.1.1.20

Imre Jonk

17 Nov 17 Nov

6:38 p.m.

On Tue, Nov 09, 2021 at 06:25:31AM -0500, John Csuti via tor-relays wrote:

...

Hello all,

I would have to agree on this it appears that the DNS failure timeout is too low. I have more then enough bandwidth to host tor exit nodes, and my own unbound full recursive relay and yet i still get the timeout message 1-1.5%. Sometimes even weird amounts such as 40-50%.

I have been working with a few people on this issue and nothing we have tried has fixed this. The other thing is that all other servers i run have no issue with DNS timeouts. It appears to only be a TOR issue. I would even say that some DNS queries that TOR makes are to taken down sites, fake sites or non-existent domains.

I've been scratching my head with this as well. My exit family is shown as overloaded on Tor Metrics [1]. All four instances run on one OpenBSD box with ~50% CPU utilization. I've tried a local Unbound resolver as well as the resolver provided by my colocation network, but the Tor log and the metrics port keep showing ~1.5% DNS timeouts. I myself don't notice any DNS issues, but I'm not actively monitoring it. The metrics port and Tor log don't show any other issues besides DNS timeouts.

I don't know what the default OpenBSD DNS timeout is. It's not configurable in /etc/resolv.conf, nor is it described in its man page. My own testing shows that an nslookup timeout takes 15 seconds.

Should I just ignore Tor Metrics saying that my relay is overloaded and the Tor log saying that the DNS timeouts are above threshold? I understand that DNS issues are really bad for UX so I want to fix this if possible.

Thanks,

Imre

[1] https://metrics.torproject.org/rs.html#search/family:1C4147BDE31ED65715FE1CF...

Olaf Grimm

8:11 p.m.

My big family with the same behavior; in the metrics all relays "yellow" after update of tor software.

Olaf

Am 17.11.21 um 19:38 schrieb Imre Jonk:

...

On Tue, Nov 09, 2021 at 06:25:31AM -0500, John Csuti via tor-relays wrote:

...
Hello all,

I would have to agree on this it appears that the DNS failure timeout is too low. I have more then enough bandwidth to host tor exit nodes, and my own unbound full recursive relay and yet i still get the timeout message 1-1.5%. Sometimes even weird amounts such as 40-50%.

I have been working with a few people on this issue and nothing we have tried has fixed this. The other thing is that all other servers i run have no issue with DNS timeouts. It appears to only be a TOR issue. I would even say that some DNS queries that TOR makes are to taken down sites, fake sites or non-existent domains.

I've been scratching my head with this as well. My exit family is shown as overloaded on Tor Metrics [1]. All four instances run on one OpenBSD box with ~50% CPU utilization. I've tried a local Unbound resolver as well as the resolver provided by my colocation network, but the Tor log and the metrics port keep showing ~1.5% DNS timeouts. I myself don't notice any DNS issues, but I'm not actively monitoring it. The metrics port and Tor log don't show any other issues besides DNS timeouts.

I don't know what the default OpenBSD DNS timeout is. It's not configurable in /etc/resolv.conf, nor is it described in its man page. My own testing shows that an nslookup timeout takes 15 seconds.

Should I just ignore Tor Metrics saying that my relay is overloaded and the Tor log saying that the DNS timeouts are above threshold? I understand that DNS issues are really bad for UX so I want to fix this if possible.

Thanks,

Imre

[1] https://metrics.torproject.org/rs.html#search/family:1C4147BDE31ED65715FE1CF...

tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

bobby stickel

8:25 p.m.

Georg Koppen

18 Nov 18 Nov

8:08 a.m.

bobby stickel:

...

I get that too I've noticed that Tor makes a lot of quest to non-existent domains. I run a pihole DNS without the ad blocking. I think this is a bug. They should at least give us the ability to control the warning level

It seems only one of your exit relays is affected by a general overload, right? So, it's not clear whether you see the same DNS overload issue other folks are reporting, given that one would expect to see that on all of your relays. Maybe that's a different overload you are seeing which is worth investigating?

Tor does indeed make requests to non-existant domains. That's, in short, to test whether your resolver is behaving as it is supposed to. If you are interested in what tor is actually doing here then dns_launch_correctness_checks() in dns.c[1] is the entry point and your friend.

Georg

[1] https://gitlab.torproject.org/tpo/core/tor/-/blob/main/src/feature/relay/dns...

...

On Nov 17, 2021 10:38 AM, Imre Jonk imre@imrejonk.nl wrote:

 On Tue, Nov 09, 2021 at 06:25:31AM -0500, John Csuti via tor-relays wrote:
  > Hello all,
  >
  > I would have to agree on this it appears that the DNS failure timeout is
  > too low. I have more then enough bandwidth to host tor exit nodes, and
  > my own unbound full recursive relay and yet i still get the timeout
  > message 1-1.5%. Sometimes even weird amounts such as 40-50%.
  >
  > I have been working with a few people on this issue and nothing we have
  > tried has fixed this. The other thing is that all other servers i run
  > have no issue with DNS timeouts. It appears to only be a TOR issue. I
  > would even say that some DNS queries that TOR makes are to taken down
  > sites, fake sites or non-existent domains.

 I've been scratching my head with this as well. My exit family is shown
 as overloaded on Tor Metrics [1]. All four instances run on one OpenBSD
 box with ~50% CPU utilization. I've tried a local Unbound resolver as
 well as the resolver provided by my colocation network, but the Tor log
 and the metrics port keep showing ~1.5% DNS timeouts. I myself don't
 notice any DNS issues, but I'm not actively monitoring it. The metrics
 port and Tor log don't show any other issues besides DNS timeouts.

 I don't know what the default OpenBSD DNS timeout is. It's not
 configurable in /etc/resolv.conf, nor is it described in its man page.
 My own testing shows that an nslookup timeout takes 15 seconds.

 Should I just ignore Tor Metrics saying that my relay is overloaded and
 the Tor log saying that the DNS timeouts are above threshold? I
 understand that DNS issues are really bad for UX so I want to fix this
 if possible.

 Thanks,

 Imre

 [1]
 https://metrics.torproject.org/rs.html#search/family:1C4147BDE31ED65715FE1CF088570E145BF46AA1

 _______________________________________________
 tor-relays mailing list
 tor-relays@lists.torproject.org
 https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

Georg Koppen

8:30 a.m.

Imre Jonk:

...

On Tue, Nov 09, 2021 at 06:25:31AM -0500, John Csuti via tor-relays wrote:

...
Hello all,

I would have to agree on this it appears that the DNS failure timeout is too low. I have more then enough bandwidth to host tor exit nodes, and my own unbound full recursive relay and yet i still get the timeout message 1-1.5%. Sometimes even weird amounts such as 40-50%.

I have been working with a few people on this issue and nothing we have tried has fixed this. The other thing is that all other servers i run have no issue with DNS timeouts. It appears to only be a TOR issue. I would even say that some DNS queries that TOR makes are to taken down sites, fake sites or non-existent domains.

I've been scratching my head with this as well. My exit family is shown as overloaded on Tor Metrics [1]. All four instances run on one OpenBSD box with ~50% CPU utilization. I've tried a local Unbound resolver as well as the resolver provided by my colocation network, but the Tor log and the metrics port keep showing ~1.5% DNS timeouts. I myself don't notice any DNS issues, but I'm not actively monitoring it. The metrics port and Tor log don't show any other issues besides DNS timeouts.

I don't know what the default OpenBSD DNS timeout is. It's not configurable in /etc/resolv.conf, nor is it described in its man page. My own testing shows that an nslookup timeout takes 15 seconds.

Should I just ignore Tor Metrics saying that my relay is overloaded and the Tor log saying that the DNS timeouts are above threshold? I understand that DNS issues are really bad for UX so I want to fix this if possible.

If the overload is related to non-DNS issues, please address it. For the DNS case it is currently a bit tricky. We are actively investigating what is going on and suspect we are dealing with a bunch of different issues leading to the DNS timeouts you and others are seeing. E.g. there might still be bugs in our code and there is probably blacklisting of DNS requests stemming from Tor related IP addresses involved and likely things we do not fully understand yet.

So, I think until we got down to the root(s) of the DNS timeout problem and have a clear understanding about what is going on and how to fix things I'd say please ignore the problem for now. We heard that having the local resolver using non-Tor IP addresses does make a difference timeout-wise[1] which seems related to the Tor-IP-addresses-getting-blocked-at-DNS-level angle I mentioned above. Thus, you could set up that if you have not already.

Some folks might consider switching to non-exit nodes to just get rid of the overload message. Please bear with us while we are debugging the problem and don't do that. :) We'll keep this list in the loop.

Thanks, Georg

[1] https://gitlab.torproject.org/tpo/web/community/-/issues/239

...

Thanks,

Imre

[1] https://metrics.torproject.org/rs.html#search/family:1C4147BDE31ED65715FE1CF...

tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

Arlen Yaroslav

10:01 a.m.

...

Some folks might consider switching to non-exit nodes to just get rid of

the overload message. Please bear with us while we are debugging the

problem and don't do that. :) We'll keep this list in the loop.

The undocumented configuration option 'OverloadStatistics' can be used to disable the reporting of an overloaded state. E.g. place the following in your torrc:

OverloadStatistics 0

May be worth considering until the reporting feature becomes a bit more mature and the issues around DNS resolution become a bit clearer.

nusenu

12:24 p.m.

New subject: OverloadStatistics man page

Arlen Yaroslav via tor-relays:

...

The undocumented configuration option 'OverloadStatistics'

-- https://nusenu.github.io

David Goulet

9 Dec 9 Dec

2:58 p.m.

On 18 Nov (10:01:09), Arlen Yaroslav via tor-relays wrote:

...

...
Some folks might consider switching to non-exit nodes to just get rid of

the overload message. Please bear with us while we are debugging the

problem and don't do that. :) We'll keep this list in the loop.

The undocumented configuration option 'OverloadStatistics' can be used to disable the reporting of an overloaded state. E.g. place the following in your torrc:

OverloadStatistics 0

May be worth considering until the reporting feature becomes a bit more mature and the issues around DNS resolution become a bit clearer.

Greetings everyone!

We wanted to follow up with all of you on this. It has been a while but we finally got down to the problem.

We made this ticket public which is where we pulled together the information we had from Exit operators helping us in private:

https://gitlab.torproject.org/tpo/network-health/team/-/issues/139

You can find here the summary of the problem: https://gitlab.torproject.org/tpo/network-health/team/-/issues/139#note_2764...

The gist is that tor imposes a 5 seconds timeout basically dictating libevent to give up on the DNS resolve after 5 seconds. And it will do that 3 times before an error is returned to tor.

That very error is a "DNS TIMEOUT" which is what we expose on the MetricsPort and also use for the overload general indicator.

The problem lies with that very error. It is in fact _not_ a "real" DNS timeout but rather just "took too long for the parameters I have". So these timeouts should more be seen as a "UX issue" rather than "network issue".

For that reason, we will remove the DNS timeout from the overload general indicator and we will rename also the "dns timeout" metrics on the MetricsPort to something with a more meaningful name.

Operators can still use the DNS metrics to monitor health of the DNS by looking at all other possible errors especially "serverfailed".

Finally, we will most likely also bring down the Tor DNS timeout from 5 seconds to 1 seconds in order to improve UX:

https://gitlab.torproject.org/tpo/core/tor/-/issues/40312

We will likely fix this the current 0.4.7.x development version and backport it into 0.4.6 stable. Release time line is to come but we hope as soon as possible.

Thanks everyone for your help, feedback and patience with this problem! In particular, thanks a lot to Anders Trier for their help and providing us with an Exit relay we could experiment with and toralf for providing so much useful information from their relays.

Cheers! David

-- u6A7qkchZSncFBzpYV44fV8NYMmiQ60PU5/P9VOyegk=

nusenu

16 Dec 16 Dec

1:41 p.m.

To stop confusing operators it would make sense to remove the

"This relay is overloaded since"

banner from Relay Search for all tor versions prior to 0.4.6.9 and 0.4.7.3-alpha, no?

kind regards, nusenu

-- https://nusenu.github.io

John Csuti

1:47 p.m.

I agree its kinda pointless if you know the issue already...

Thanks, John C.

On 2021-12-16 08:41 AM, nusenu wrote:

...

To stop confusing operators it would make sense to remove the

"This relay is overloaded since"

banner from Relay Search for all tor versions prior to 0.4.6.9 and 0.4.7.3-alpha, no?

kind regards, nusenu

abuse department

5:55 p.m.

...

On 16. Dec 2021, at 14:47, John Csuti via tor-relays tor-relays@lists.torproject.org wrote:

Signed PGP part I agree its kinda pointless if you know the issue already...

Thanks, John C.

On 2021-12-16 08:41 AM, nusenu wrote:

...
To stop confusing operators it would make sense to remove the

"This relay is overloaded since"

banner from Relay Search for all tor versions prior to 0.4.6.9 and 0.4.7.3-alpha, no?

kind regards, nusenu

<0xB77A70C2.asc>

lists＠for-privacy.net

6:18 p.m.

On Thursday, December 16, 2021 2:41:27 PM CET nusenu wrote:

+1 Very wise suggestion.

...

To stop confusing operators it would make sense to remove the

"This relay is overloaded since"

banner from Relay Search for all tor versions prior to 0.4.6.9 and 0.4.7.3-alpha, no?

kind regards, nusenu

-- ╰_╯ Ciao Marco! Debian GNU/Linux It's free software and it gives you freedom!

Georg Koppen

17 Dec 17 Dec

8:21 a.m.

nusenu:

...

To stop confusing operators it would make sense to remove the

"This relay is overloaded since"

banner from Relay Search for all tor versions prior to 0.4.6.9 and 0.4.7.3-alpha, no?

Well, not all potential overload is DNS related overload. There are a bunch of different criteria for emitting a general overload warning. Onionoo and this relay search have a hard time differentiating between DNS related (general) overload and other (general) overload. Thus, I don't think this change is easily to make.

I think the best option here is to upgrade swiftly to 0.4.6.9/0.4.7.3-alpha.

That said, we should update our documentation accordingly. I've filed a ticket for that.[1]

Georg

[1] https://gitlab.torproject.org/tpo/web/support/-/issues/279

nusenu

10:03 a.m.

Georg Koppen:

...

Well, not all potential overload is DNS related overload. There are a bunch of different criteria for emitting a general overload warning. Onionoo and this relay search have a hard time differentiating between DNS related (general) overload and other (general) overload. Thus, I don't think this change is easily to make.

To have the DNS trigger included in a shared trigger info was a deliberate design decision as I understood it.

In my opinion it is better to remove this notice from Relay Search for all affected versions, even if it will also remove the warning in cases where the trigger was not DNS related, because it potentially causes alarm fatique and operators will continue to ignore the banner even after it got improved.

...

I think the best option here is to upgrade swiftly to 0.4.6.9/0.4.7.3-alpha.

That is not easy for all of the operators that use the Torproject's Debian repos since these versions are usually not "swiftly" available on deb.torproject.org yet (unless you switch to nightly packages which I wouldn't recommend). currently: Version: 0.4.6.8-1~d10.buster+1 [1]

kind regards, nusenu

[1] https://deb.torproject.org/torproject.org/dists/buster/main/binary-amd64/Pac...

-- https://nusenu.github.io

bobby stickel

20 Dec 20 Dec

1:39 a.m.

nusenu

21 Dec 21 Dec

12:39 p.m.

bobby stickel:

...

It would be nice if we could make the DNS time out percentage threshold higher in our config file so Tor isn't reporting our exit relays has overloaded

if you run debian and use deb.torproject.org packages, running apt update && apt upgrade should be your solution now since the stable repo has been updated to tor 0.4.6.9 and the experimental repo contains 0.4.7.3-alpha which also includes your desired change.

kind regards, nusenu

-- https://nusenu.github.io

John Csuti

24 Dec 24 Dec

5:26 a.m.

Well, I have to say thanks to the update to tor 0.4.6.9 the DNS overload issue is gone. My consensus Weight went down sightly due to the constant overload flag. Lets see if time will help heal that.

Good work so far.

Thanks, John C.

On 2021-12-21 07:39 AM, nusenu wrote:

...

bobby stickel:

...
It would be nice if we could make the DNS time out percentage threshold higher in our config file so Tor isn't reporting our exit relays has overloaded

if you run debian and use deb.torproject.org packages, running apt update && apt upgrade should be your solution now since the stable repo has been updated to tor 0.4.6.9 and the experimental repo contains 0.4.7.3-alpha which also includes your desired change.

kind regards, nusenu

AMuse

7 Jan 7 Jan

7:19 a.m.

Hey all, I wanted to chime in on this thread because I'm suddenly seeing DNS "Overload" errors (and corresponding notices that my system is overloaded on prometheus) lately as well.

The hardware and OS and configs for my public exit haven't changed - what has changed is that I upgraded tor itself, and added ipv6.

I suspect a decent amount of my DNS failures are actually lookups for AAAA records that don't exist, because my exit supports v6 but the destination site doesn't, or only half-configured it.

The system itself is definitely NOT overloaded. ( load averages: 0.07, 0.23, 0.24 )

On Fri, Dec 17, 2021 at 2:03 AM nusenu nusenu-lists@riseup.net wrote:

...

Georg Koppen:

...
Well, not all potential overload is DNS related overload. There are a bunch of different criteria for emitting a general overload warning. Onionoo and this relay search have a hard time differentiating between DNS related (general) overload and other (general) overload. Thus, I don't think this change is easily to make.

To have the DNS trigger included in a shared trigger info was a deliberate design decision as I understood it.

In my opinion it is better to remove this notice from Relay Search for all affected versions, even if it will also remove the warning in cases where the trigger was not DNS related, because it potentially causes alarm fatique and operators will continue to ignore the banner even after it got improved.

...
I think the best option here is to upgrade swiftly to 0.4.6.9/0.4.7.3-alpha.

That is not easy for all of the operators that use the Torproject's Debian repos since these versions are usually not "swiftly" available on deb.torproject.org yet (unless you switch to nightly packages which I wouldn't recommend). currently: Version: 0.4.6.8-1~d10.buster+1 [1]

kind regards, nusenu

[1] https://deb.torproject.org/torproject.org/dists/buster/main/binary-amd64/Pac...

-- https://nusenu.github.io _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

nusenu

2:30 p.m.

AMuse:

...

Hey all, I wanted to chime in on this thread because I'm suddenly seeing DNS "Overload" errors (and corresponding notices that my system is overloaded on prometheus) lately as well.

The hardware and OS and configs for my public exit haven't changed - what has changed is that I upgraded tor itself, and added ipv6.

You appear to be running tor 0.4.6.8 on FreeBSD.

As has been previously stated on this thread the code involved has been changed in tor 0.4.6.9 and 0.4.7.3-alpha.

So you will have to wait till FreeBSD ports ship that version.

https://www.freshports.org/security/tor/

After upgrading the tor version the overload indicator on will disappear when it it was DNS related (there can be other reasons).

kind regards, nusenu

-- https://nusenu.github.io

Imre Jonk

18 Nov 18 Nov

6:42 p.m.

On Thu, Nov 18, 2021 at 08:30:16AM +0000, Georg Koppen wrote:

...

If the overload is related to non-DNS issues, please address it. For the DNS case it is currently a bit tricky. We are actively investigating what is going on and suspect we are dealing with a bunch of different issues leading to the DNS timeouts you and others are seeing. E.g. there might still be bugs in our code and there is probably blacklisting of DNS requests stemming from Tor related IP addresses involved and likely things we do not fully understand yet.

So, I think until we got down to the root(s) of the DNS timeout problem and have a clear understanding about what is going on and how to fix things I'd say please ignore the problem for now. We heard that having the local resolver using non-Tor IP addresses does make a difference timeout-wise[1] which seems related to the Tor-IP-addresses-getting-blocked-at-DNS-level angle I mentioned above. Thus, you could set up that if you have not already.

Thanks, I'll keep an eye on this list for further developments on this topic.

To clarify, I'm currently using my colocation network's DNS resolver. The fallback is Hurricane Electric's anycast resolver. Both perform DNSSEC validation.

...

Some folks might consider switching to non-exit nodes to just get rid of the overload message. Please bear with us while we are debugging the problem and don't do that. :) We'll keep this list in the loop.

Don't worry, this is not something I would quit running an exit for :)

nusenu

9 Nov 9 Nov

6:43 p.m.

Anders Trier Olesen:

...

The Tor relay guide should recommend running your recursive resolver (unbound) on a different IP than your exit: https://community.torproject.org/relay/setup/exit/

yes, that is a good idea, here is a PR for it:

https://github.com/torproject/community/pull/169/files

-- https://nusenu.github.io

Georg Koppen

10 Nov 10 Nov

11:24 a.m.

nusenu:

...

Anders Trier Olesen:

...
The Tor relay guide should recommend running your recursive resolver (unbound) on a different IP than your exit: https://community.torproject.org/relay/setup/exit/

yes, that is a good idea, here is a PR for it:

https://github.com/torproject/community/pull/169/files

Thanks. I created a ticket[1] for it in our bug tracker, so your PR does not fall through the cracks.

Georg

[1] https://gitlab.torproject.org/tpo/web/community/-/issues/239

1035

Age (days ago)

1098

Last active (days ago)

tor-relays@lists.torproject.org

25 comments

13 participants

tags (0)

participants (13)

abuse department
AMuse
Anders Trier Olesen
Arlen Yaroslav
bobby stickel
David Goulet
Georg Koppen
Imre Jonk
Intrepid Ibex
John Csuti
lists＠for-privacy.net
nusenu
Olaf Grimm