Hello all,
One of our goals with our current performance work is to reduce the overload of relays in the network. The implementation of proposal 328[1] a while back made different overload indicators available to relay operators and since a couple of weeks ago those can be tracked via Onionoo[2] as well.
As we know that a lot of our relay operators use relay search to check for the health of their relays, we have launched a new feature there, too, to help them know when their relays are overloaded.
When a relay is in the overloaded state we show an amber dot next to the relay nickname.
Currently we are counting between 50 and 80 overloaded relays and between 10 and 20 overloaded bridges. The overloaded state is reached when one or many of the possible load metrics have been triggered. When this happens we show it for 72 hours after the relay has recovered [3]. Note, though, that not all of the exposed overload metrics are triggering the overload indicator on relay search yet.
If you noticed your relay is overloaded, please check the following support article to find out how you can recover to a "normal" state:
https://support.torproject.org/relay-operators/relay-bridge-overloaded/
Let us known how you find this new feature.
Cheers, -hiro
[1] https://gitweb.torproject.org/torspec.git/tree/proposals/328-relay-overload-... [2] https://lists.torproject.org/pipermail/tor-project/2021-August/003168.html [3] https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n637
This looks like an awesome feature! I super appreciate it.
Random question though (and I'm the first to admit I may be doing something wrong), I notice that on Mobile it says my relays are overloaded however when I view it on a normal computer I don't get the overloaded indicator. I've tried refreshing multiple times but getting the same results. Is anyone seeing the same thing?
Family Members: F01E382DA524A57F2BFB3C4FF270A23D5CD3311D 623CCCC1A1370700DD03046A85D953D35CAB5C21 F9A28AB71D7E4E446308641A556EA53BA55FCB50 23F74D581DE92AC59D3527DE4D448E036139D81E A00E900534DFF76371064C03714753EAF8B88820 C232D8EE677E6BDF5CFFDDCAC4E2B1682DCE7AE5
- The Friendly Exit Node Family
Sent with ProtonMail Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, September 23rd, 2021 at 8:39 AM, Silvia/Hiro hiro@torproject.org wrote:
Hello all,
One of our goals with our current performance work is to reduce the
overload of relays in the network. The implementation of proposal 328[1]
a while back made different overload indicators available to relay
operators and since a couple of weeks ago those can be tracked via
Onionoo[2] as well.
As we know that a lot of our relay operators use relay search to check
for the health of their relays, we have launched a new feature there,
too, to help them know when their relays are overloaded.
When a relay is in the overloaded state we show an amber dot next to the
relay nickname.
Currently we are counting between 50 and 80 overloaded relays and
between 10 and 20 overloaded bridges.
The overloaded state is reached when one or many of the possible load
metrics have been triggered. When this happens we show it for 72 hours
after the relay has recovered [3]. Note, though, that not all of the
exposed overload metrics are triggering the overload indicator on relay
search yet.
If you noticed your relay is overloaded, please check the following
support article to find out how you can recover to a "normal" state:
https://support.torproject.org/relay-operators/relay-bridge-overloaded/
Let us known how you find this new feature.
Cheers,
-hiro
[1]
https://gitweb.torproject.org/torspec.git/tree/proposals/328-relay-overload-...
[2]
https://lists.torproject.org/pipermail/tor-project/2021-August/003168.html
[3] https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n637
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
On 9/23/21 10:54 PM, friendlyexitnode via tor-relays wrote:
This looks like an awesome feature! I super appreciate it.
Random question though (and I'm the first to admit I may be doing something wrong), I notice that on Mobile it says my relays are overloaded however when I view it on a normal computer I don't get the overloaded indicator. I've tried refreshing multiple times but getting the same results. Is anyone seeing the same thing?
Hi,
could you let me know when you accessed the page via mobile approximately?
I'll try to check if any of your relays were overloaded in the past. When a node is overloaded the state is kept for 72 hours.
Cheers, -hiro
Family Members: F01E382DA524A57F2BFB3C4FF270A23D5CD3311D 623CCCC1A1370700DD03046A85D953D35CAB5C21 F9A28AB71D7E4E446308641A556EA53BA55FCB50 23F74D581DE92AC59D3527DE4D448E036139D81E A00E900534DFF76371064C03714753EAF8B88820 C232D8EE677E6BDF5CFFDDCAC4E2B1682DCE7AE5
- The Friendly Exit Node Family
Sent with ProtonMail Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, September 23rd, 2021 at 8:39 AM, Silvia/Hiro hiro@torproject.org wrote:
Hello all,
One of our goals with our current performance work is to reduce the
overload of relays in the network. The implementation of proposal 328[1]
a while back made different overload indicators available to relay
operators and since a couple of weeks ago those can be tracked via
Onionoo[2] as well.
As we know that a lot of our relay operators use relay search to check
for the health of their relays, we have launched a new feature there,
too, to help them know when their relays are overloaded.
When a relay is in the overloaded state we show an amber dot next to the
relay nickname.
Currently we are counting between 50 and 80 overloaded relays and
between 10 and 20 overloaded bridges.
The overloaded state is reached when one or many of the possible load
metrics have been triggered. When this happens we show it for 72 hours
after the relay has recovered [3]. Note, though, that not all of the
exposed overload metrics are triggering the overload indicator on relay
search yet.
If you noticed your relay is overloaded, please check the following
support article to find out how you can recover to a "normal" state:
https://support.torproject.org/relay-operators/relay-bridge-overloaded/
Let us known how you find this new feature.
Cheers,
-hiro
[1]
https://gitweb.torproject.org/torspec.git/tree/proposals/328-relay-overload-...
[2]
https://lists.torproject.org/pipermail/tor-project/2021-August/003168.html
[3] https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n637
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
Hey hiro, thanks!
I've also attached some screenshots too if it helps (sorry, I should have done that before). I had first noticed this around 3:45 PM CST on September 23.
- The Friendly Exit Node Family
Sent with ProtonMail Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, September 24th, 2021 at 4:47 AM, Silvia/Hiro hiro@torproject.org wrote:
On 9/23/21 10:54 PM, friendlyexitnode via tor-relays wrote:
This looks like an awesome feature! I super appreciate it.
Random question though (and I'm the first to admit I may be doing something wrong), I notice that on Mobile it says my relays are overloaded however when I view it on a normal computer I don't get the overloaded indicator. I've tried refreshing multiple times but getting the same results. Is anyone seeing the same thing?
Hi,
could you let me know when you accessed the page via mobile approximately?
I'll try to check if any of your relays were overloaded in the past.
When a node is overloaded the state is kept for 72 hours.
Cheers,
-hiro
Family Members:
F01E382DA524A57F2BFB3C4FF270A23D5CD3311D
623CCCC1A1370700DD03046A85D953D35CAB5C21
F9A28AB71D7E4E446308641A556EA53BA55FCB50
23F74D581DE92AC59D3527DE4D448E036139D81E
A00E900534DFF76371064C03714753EAF8B88820
C232D8EE677E6BDF5CFFDDCAC4E2B1682DCE7AE5
- The Friendly Exit Node Family
Sent with ProtonMail Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, September 23rd, 2021 at 8:39 AM, Silvia/Hiro hiro@torproject.org wrote:
Hello all,
One of our goals with our current performance work is to reduce the
overload of relays in the network. The implementation of proposal 328[1]
a while back made different overload indicators available to relay
operators and since a couple of weeks ago those can be tracked via
Onionoo[2] as well.
As we know that a lot of our relay operators use relay search to check
for the health of their relays, we have launched a new feature there,
too, to help them know when their relays are overloaded.
When a relay is in the overloaded state we show an amber dot next to the
relay nickname.
Currently we are counting between 50 and 80 overloaded relays and
between 10 and 20 overloaded bridges.
The overloaded state is reached when one or many of the possible load
metrics have been triggered. When this happens we show it for 72 hours
after the relay has recovered [3]. Note, though, that not all of the
exposed overload metrics are triggering the overload indicator on relay
search yet.
If you noticed your relay is overloaded, please check the following
support article to find out how you can recover to a "normal" state:
https://support.torproject.org/relay-operators/relay-bridge-overloaded/
Let us known how you find this new feature.
Cheers,
-hiro
[1]
https://gitweb.torproject.org/torspec.git/tree/proposals/328-relay-overload-...
[2]
https://lists.torproject.org/pipermail/tor-project/2021-August/003168.html
[3] https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n637
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
Hi, I went back in history and tried to find out whenever your node FriendlyExit1 was overloaded. I couldn't find the exact descriptor.
One thing I can think of is that on the 22nd when I deployed this I noticed a few typos in the code and had to make a second release. Maybe something was cached for a while and you what you were accessing from mobile was the buggy page.
If it happens again there are two buttons at the end of the page where you can see the latest server and extra-info descriptors. If you download the server one you would be able to verify that there is a "overload-general" field in there. If there isn't we have a bug :).
Please let me know if this happens again.
Cheers, -hiro
On 9/24/21 2:39 PM, friendlyexitnode via tor-relays wrote:
Hey hiro, thanks!
I've also attached some screenshots too if it helps (sorry, I should have done that before). I had first noticed this around 3:45 PM CST on September 23.
- The Friendly Exit Node Family
Sent with ProtonMail Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, September 24th, 2021 at 4:47 AM, Silvia/Hiro hiro@torproject.org wrote:
On 9/23/21 10:54 PM, friendlyexitnode via tor-relays wrote:
This looks like an awesome feature! I super appreciate it.
Random question though (and I'm the first to admit I may be doing something wrong), I notice that on Mobile it says my relays are overloaded however when I view it on a normal computer I don't get the overloaded indicator. I've tried refreshing multiple times but getting the same results. Is anyone seeing the same thing?
Hi,
could you let me know when you accessed the page via mobile approximately?
I'll try to check if any of your relays were overloaded in the past.
When a node is overloaded the state is kept for 72 hours.
Cheers,
-hiro
Family Members:
F01E382DA524A57F2BFB3C4FF270A23D5CD3311D
623CCCC1A1370700DD03046A85D953D35CAB5C21
F9A28AB71D7E4E446308641A556EA53BA55FCB50
23F74D581DE92AC59D3527DE4D448E036139D81E
A00E900534DFF76371064C03714753EAF8B88820
C232D8EE677E6BDF5CFFDDCAC4E2B1682DCE7AE5
- The Friendly Exit Node Family
Sent with ProtonMail Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, September 23rd, 2021 at 8:39 AM, Silvia/Hiro hiro@torproject.org wrote:
Hello all,
One of our goals with our current performance work is to reduce the
overload of relays in the network. The implementation of proposal 328[1]
a while back made different overload indicators available to relay
operators and since a couple of weeks ago those can be tracked via
Onionoo[2] as well.
As we know that a lot of our relay operators use relay search to check
for the health of their relays, we have launched a new feature there,
too, to help them know when their relays are overloaded.
When a relay is in the overloaded state we show an amber dot next to the
relay nickname.
Currently we are counting between 50 and 80 overloaded relays and
between 10 and 20 overloaded bridges.
The overloaded state is reached when one or many of the possible load
metrics have been triggered. When this happens we show it for 72 hours
after the relay has recovered [3]. Note, though, that not all of the
exposed overload metrics are triggering the overload indicator on relay
search yet.
If you noticed your relay is overloaded, please check the following
support article to find out how you can recover to a "normal" state:
https://support.torproject.org/relay-operators/relay-bridge-overloaded/
Let us known how you find this new feature.
Cheers,
-hiro
[1]
https://gitweb.torproject.org/torspec.git/tree/proposals/328-relay-overload-...
[2]
https://lists.torproject.org/pipermail/tor-project/2021-August/003168.html
[3] https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n637
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
Toralf Förster:
On 9/25/21 4:11 PM, Silvia/Hiro wrote:
If it happens again there are two buttons at the end of the page where you can see the latest server and extra-info descriptors.
Only, if the DirPort is (still) opened, or ?
Yes, I think so. (Good catch)
Georg
Hiro, Presently, I'm seeing a similar issue. On my laptop, I'm observing an overloaded status for my relay. However, the same relay shows a green status on my phone. Do you do any user-agent detection? I'm still interested in those magic numbers, which determine whether a relay has reached an overloaded state. Thank You. Respectfully,
Gary
On Saturday, September 25, 2021, 7:11:31 AM PDT, Silvia/Hiro hiro@torproject.org wrote:
Hi, I went back in history and tried to find out whenever your node FriendlyExit1 was overloaded. I couldn't find the exact descriptor.
One thing I can think of is that on the 22nd when I deployed this I noticed a few typos in the code and had to make a second release. Maybe something was cached for a while and you what you were accessing from mobile was the buggy page.
If it happens again there are two buttons at the end of the page where you can see the latest server and extra-info descriptors. If you download the server one you would be able to verify that there is a "overload-general" field in there. If there isn't we have a bug :).
Please let me know if this happens again.
Cheers, -hiro
On 9/24/21 2:39 PM, friendlyexitnode via tor-relays wrote:
Hey hiro, thanks!
I've also attached some screenshots too if it helps (sorry, I should have done that before). I had first noticed this around 3:45 PM CST on September 23.
- The Friendly Exit Node Family
Sent with ProtonMail Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, September 24th, 2021 at 4:47 AM, Silvia/Hiro hiro@torproject.org wrote:
On 9/23/21 10:54 PM, friendlyexitnode via tor-relays wrote:
This looks like an awesome feature! I super appreciate it.
Random question though (and I'm the first to admit I may be doing something wrong), I notice that on Mobile it says my relays are overloaded however when I view it on a normal computer I don't get the overloaded indicator. I've tried refreshing multiple times but getting the same results. Is anyone seeing the same thing?
Hi,
could you let me know when you accessed the page via mobile approximately?
I'll try to check if any of your relays were overloaded in the past.
When a node is overloaded the state is kept for 72 hours.
Cheers,
-hiro
Family Members:
F01E382DA524A57F2BFB3C4FF270A23D5CD3311D
623CCCC1A1370700DD03046A85D953D35CAB5C21
F9A28AB71D7E4E446308641A556EA53BA55FCB50
23F74D581DE92AC59D3527DE4D448E036139D81E
A00E900534DFF76371064C03714753EAF8B88820
C232D8EE677E6BDF5CFFDDCAC4E2B1682DCE7AE5
- The Friendly Exit Node Family
Sent with ProtonMail Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, September 23rd, 2021 at 8:39 AM, Silvia/Hiro hiro@torproject.org wrote:
Hello all,
One of our goals with our current performance work is to reduce the
overload of relays in the network. The implementation of proposal 328[1]
a while back made different overload indicators available to relay
operators and since a couple of weeks ago those can be tracked via
Onionoo[2] as well.
As we know that a lot of our relay operators use relay search to check
for the health of their relays, we have launched a new feature there,
too, to help them know when their relays are overloaded.
When a relay is in the overloaded state we show an amber dot next to the
relay nickname.
Currently we are counting between 50 and 80 overloaded relays and
between 10 and 20 overloaded bridges.
The overloaded state is reached when one or many of the possible load
metrics have been triggered. When this happens we show it for 72 hours
after the relay has recovered [3]. Note, though, that not all of the
exposed overload metrics are triggering the overload indicator on relay
search yet.
If you noticed your relay is overloaded, please check the following
support article to find out how you can recover to a "normal" state:
https://support.torproject.org/relay-operators/relay-bridge-overloaded/
Let us known how you find this new feature.
Cheers,
-hiro
[1]
https://gitweb.torproject.org/torspec.git/tree/proposals/328-relay-overload-...
[2]
https://lists.torproject.org/pipermail/tor-project/2021-August/003168.html
[3] https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n637
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
_______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
Gary,
Replying off list. Can I know which one is your relay? We don't do user-agent detection.
Cheers, -hiro
On 9/26/21 4:27 AM, Gary C. New via tor-relays wrote:
Hiro, Presently, I'm seeing a similar issue. On my laptop, I'm observing an overloaded status for my relay. However, the same relay shows a green status on my phone. Do you do any user-agent detection? I'm still interested in those magic numbers, which determine whether a relay has reached an overloaded state. Thank You. Respectfully,
Gary
On Saturday, September 25, 2021, 7:11:31 AM PDT, Silvia/Hiro <hiro@torproject.org> wrote:
Hi, I went back in history and tried to find out whenever your node FriendlyExit1 was overloaded. I couldn't find the exact descriptor.
One thing I can think of is that on the 22nd when I deployed this I noticed a few typos in the code and had to make a second release. Maybe something was cached for a while and you what you were accessing from mobile was the buggy page.
If it happens again there are two buttons at the end of the page where you can see the latest server and extra-info descriptors. If you download the server one you would be able to verify that there is a "overload-general" field in there. If there isn't we have a bug :).
Please let me know if this happens again.
Cheers, -hiro
On 9/24/21 2:39 PM, friendlyexitnode via tor-relays wrote:
Hey hiro, thanks!
I've also attached some screenshots too if it helps (sorry, I should have done that before). I had first noticed this around 3:45 PM CST on September 23.
- The Friendly Exit Node Family
Sent with ProtonMail Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, September 24th, 2021 at 4:47 AM, Silvia/Hiro hiro@torproject.org wrote:
On 9/23/21 10:54 PM, friendlyexitnode via tor-relays wrote:
This looks like an awesome feature! I super appreciate it.
Random question though (and I'm the first to admit I may be doing something wrong), I notice that on Mobile it says my relays are overloaded however when I view it on a normal computer I don't get the overloaded indicator. I've tried refreshing multiple times but getting the same results. Is anyone seeing the same thing?
Hi,
could you let me know when you accessed the page via mobile approximately?
I'll try to check if any of your relays were overloaded in the past.
When a node is overloaded the state is kept for 72 hours.
Cheers,
-hiro
Family Members:
F01E382DA524A57F2BFB3C4FF270A23D5CD3311D
623CCCC1A1370700DD03046A85D953D35CAB5C21
F9A28AB71D7E4E446308641A556EA53BA55FCB50
23F74D581DE92AC59D3527DE4D448E036139D81E
A00E900534DFF76371064C03714753EAF8B88820
C232D8EE677E6BDF5CFFDDCAC4E2B1682DCE7AE5
- The Friendly Exit Node Family
Sent with ProtonMail Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, September 23rd, 2021 at 8:39 AM, Silvia/Hiro hiro@torproject.org wrote:
Hello all,
One of our goals with our current performance work is to reduce the
overload of relays in the network. The implementation of proposal 328[1]
a while back made different overload indicators available to relay
operators and since a couple of weeks ago those can be tracked via
Onionoo[2] as well.
As we know that a lot of our relay operators use relay search to check
for the health of their relays, we have launched a new feature there,
too, to help them know when their relays are overloaded.
When a relay is in the overloaded state we show an amber dot next to the
relay nickname.
Currently we are counting between 50 and 80 overloaded relays and
between 10 and 20 overloaded bridges.
The overloaded state is reached when one or many of the possible load
metrics have been triggered. When this happens we show it for 72 hours
after the relay has recovered [3]. Note, though, that not all of the
exposed overload metrics are triggering the overload indicator on relay
search yet.
If you noticed your relay is overloaded, please check the following
support article to find out how you can recover to a "normal" state:
https://support.torproject.org/relay-operators/relay-bridge-overloaded/
Let us known how you find this new feature.
Cheers,
-hiro
[1]
https://gitweb.torproject.org/torspec.git/tree/proposals/328-relay-overload-...
[2]
https://lists.torproject.org/pipermail/tor-project/2021-August/003168.html
[3] https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n637
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
Ofc I meant you can reply off list.
On 9/27/21 11:16 AM, Silvia/Hiro wrote:
Gary,
Replying off list. Can I know which one is your relay? We don't do user-agent detection.
Cheers, -hiro
On 9/26/21 4:27 AM, Gary C. New via tor-relays wrote:
Hiro, Presently, I'm seeing a similar issue. On my laptop, I'm observing an overloaded status for my relay. However, the same relay shows a green status on my phone. Do you do any user-agent detection? I'm still interested in those magic numbers, which determine whether a relay has reached an overloaded state. Thank You. Respectfully,
Gary
On Saturday, September 25, 2021, 7:11:31 AM PDT, Silvia/Hiro <hiro@torproject.org> wrote:
Hi, I went back in history and tried to find out whenever your node FriendlyExit1 was overloaded. I couldn't find the exact descriptor.
One thing I can think of is that on the 22nd when I deployed this I noticed a few typos in the code and had to make a second release. Maybe something was cached for a while and you what you were accessing from mobile was the buggy page.
If it happens again there are two buttons at the end of the page where you can see the latest server and extra-info descriptors. If you download the server one you would be able to verify that there is a "overload-general" field in there. If there isn't we have a bug :).
Please let me know if this happens again.
Cheers, -hiro
On 9/24/21 2:39 PM, friendlyexitnode via tor-relays wrote:
Hey hiro, thanks!
I've also attached some screenshots too if it helps (sorry, I should have done that before). I had first noticed this around 3:45 PM CST on September 23.
- The Friendly Exit Node Family
Sent with ProtonMail Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, September 24th, 2021 at 4:47 AM, Silvia/Hiro hiro@torproject.org wrote:
On 9/23/21 10:54 PM, friendlyexitnode via tor-relays wrote:
This looks like an awesome feature! I super appreciate it.
Random question though (and I'm the first to admit I may be doing something wrong), I notice that on Mobile it says my relays are overloaded however when I view it on a normal computer I don't get the overloaded indicator. I've tried refreshing multiple times but getting the same results. Is anyone seeing the same thing?
Hi,
could you let me know when you accessed the page via mobile approximately?
I'll try to check if any of your relays were overloaded in the past.
When a node is overloaded the state is kept for 72 hours.
Cheers,
-hiro
Family Members:
F01E382DA524A57F2BFB3C4FF270A23D5CD3311D
623CCCC1A1370700DD03046A85D953D35CAB5C21
F9A28AB71D7E4E446308641A556EA53BA55FCB50
23F74D581DE92AC59D3527DE4D448E036139D81E
A00E900534DFF76371064C03714753EAF8B88820
C232D8EE677E6BDF5CFFDDCAC4E2B1682DCE7AE5
- The Friendly Exit Node Family
Sent with ProtonMail Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, September 23rd, 2021 at 8:39 AM, Silvia/Hiro hiro@torproject.org wrote:
Hello all,
One of our goals with our current performance work is to reduce the
overload of relays in the network. The implementation of proposal 328[1]
a while back made different overload indicators available to relay
operators and since a couple of weeks ago those can be tracked via
Onionoo[2] as well.
As we know that a lot of our relay operators use relay search to check
for the health of their relays, we have launched a new feature there,
too, to help them know when their relays are overloaded.
When a relay is in the overloaded state we show an amber dot next to the
relay nickname.
Currently we are counting between 50 and 80 overloaded relays and
between 10 and 20 overloaded bridges.
The overloaded state is reached when one or many of the possible load
metrics have been triggered. When this happens we show it for 72 hours
after the relay has recovered [3]. Note, though, that not all of the
exposed overload metrics are triggering the overload indicator on relay
search yet.
If you noticed your relay is overloaded, please check the following
support article to find out how you can recover to a "normal" state:
https://support.torproject.org/relay-operators/relay-bridge-overloaded/
Let us known how you find this new feature.
Cheers,
-hiro
[1]
https://gitweb.torproject.org/torspec.git/tree/proposals/328-relay-overload-...
[2]
https://lists.torproject.org/pipermail/tor-project/2021-August/003168.html
[3] https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n637
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
Gary C. New via tor-relays:
Hiro, Presently, I'm seeing a similar issue. On my laptop, I'm observing an overloaded status for my relay. However, the same relay shows a green status on my phone. Do you do any user-agent detection? I'm still interested in those magic numbers, which determine whether a relay has reached an overloaded state.
Which numbers do you mean? Is there anything missing from the support article[1] you feel should be there?
Georg
[1] https://support.torproject.org/relay-operators/relay-bridge-overloaded/
Thank You. Respectfully,
Gary
On Saturday, September 25, 2021, 7:11:31 AM PDT, Silvia/Hiro <hiro@torproject.org> wrote:
Hi, I went back in history and tried to find out whenever your node FriendlyExit1 was overloaded. I couldn't find the exact descriptor.
One thing I can think of is that on the 22nd when I deployed this I noticed a few typos in the code and had to make a second release. Maybe something was cached for a while and you what you were accessing from mobile was the buggy page.
If it happens again there are two buttons at the end of the page where you can see the latest server and extra-info descriptors. If you download the server one you would be able to verify that there is a "overload-general" field in there. If there isn't we have a bug :).
Please let me know if this happens again.
Cheers, -hiro
On 9/24/21 2:39 PM, friendlyexitnode via tor-relays wrote:
Hey hiro, thanks!
I've also attached some screenshots too if it helps (sorry, I should have done that before). I had first noticed this around 3:45 PM CST on September 23.
- The Friendly Exit Node Family
Sent with ProtonMail Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, September 24th, 2021 at 4:47 AM, Silvia/Hiro hiro@torproject.org wrote:
On 9/23/21 10:54 PM, friendlyexitnode via tor-relays wrote:
This looks like an awesome feature! I super appreciate it.
Random question though (and I'm the first to admit I may be doing something wrong), I notice that on Mobile it says my relays are overloaded however when I view it on a normal computer I don't get the overloaded indicator. I've tried refreshing multiple times but getting the same results. Is anyone seeing the same thing?
Hi,
could you let me know when you accessed the page via mobile approximately?
I'll try to check if any of your relays were overloaded in the past.
When a node is overloaded the state is kept for 72 hours.
Cheers,
-hiro
Family Members:
F01E382DA524A57F2BFB3C4FF270A23D5CD3311D
623CCCC1A1370700DD03046A85D953D35CAB5C21
F9A28AB71D7E4E446308641A556EA53BA55FCB50
23F74D581DE92AC59D3527DE4D448E036139D81E
A00E900534DFF76371064C03714753EAF8B88820
C232D8EE677E6BDF5CFFDDCAC4E2B1682DCE7AE5
- The Friendly Exit Node Family
Sent with ProtonMail Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, September 23rd, 2021 at 8:39 AM, Silvia/Hiro hiro@torproject.org wrote:
Hello all,
One of our goals with our current performance work is to reduce the
overload of relays in the network. The implementation of proposal 328[1]
a while back made different overload indicators available to relay
operators and since a couple of weeks ago those can be tracked via
Onionoo[2] as well.
As we know that a lot of our relay operators use relay search to check
for the health of their relays, we have launched a new feature there,
too, to help them know when their relays are overloaded.
When a relay is in the overloaded state we show an amber dot next to the
relay nickname.
Currently we are counting between 50 and 80 overloaded relays and
between 10 and 20 overloaded bridges.
The overloaded state is reached when one or many of the possible load
metrics have been triggered. When this happens we show it for 72 hours
after the relay has recovered [3]. Note, though, that not all of the
exposed overload metrics are triggering the overload indicator on relay
search yet.
If you noticed your relay is overloaded, please check the following
support article to find out how you can recover to a "normal" state:
https://support.torproject.org/relay-operators/relay-bridge-overloaded/
Let us known how you find this new feature.
Cheers,
-hiro
[1]
https://gitweb.torproject.org/torspec.git/tree/proposals/328-relay-overload-...
[2]
https://lists.torproject.org/pipermail/tor-project/2021-August/003168.html
[3] https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n637
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
George, The referenced support article provides recommendations as to what might be causing the overloaded state, but it doesn't provide the metric(s) for how Tor decides whether a relay is overloaded. I'm trying to ascertain the later. I would assume the overloaded state metric(S) is/are a maximum timeout value and/or reoccurrence value, etc. By knowing what the overloaded state metric is, I can tune my Tor relay just short of it. Thank you for your reply. Respectfully,
Gary
On Monday, September 27, 2021, 2:44:35 AM PDT, Georg Koppen gk@torproject.org wrote:
Gary C. New via tor-relays:
Hiro, Presently, I'm seeing a similar issue. On my laptop, I'm observing an overloaded status for my relay. However, the same relay shows a green status on my phone. Do you do any user-agent detection? I'm still interested in those magic numbers, which determine whether a relay has reached an overloaded state.
Which numbers do you mean? Is there anything missing from the support article[1] you feel should be there?
Georg
[1] https://support.torproject.org/relay-operators/relay-bridge-overloaded/
Thank You. Respectfully,
Gary
On Saturday, September 25, 2021, 7:11:31 AM PDT, Silvia/Hiro hiro@torproject.org wrote: Hi, I went back in history and tried to find out whenever your node FriendlyExit1 was overloaded. I couldn't find the exact descriptor.
One thing I can think of is that on the 22nd when I deployed this I noticed a few typos in the code and had to make a second release. Maybe something was cached for a while and you what you were accessing from mobile was the buggy page.
If it happens again there are two buttons at the end of the page where you can see the latest server and extra-info descriptors. If you download the server one you would be able to verify that there is a "overload-general" field in there. If there isn't we have a bug :).
Please let me know if this happens again.
Cheers, -hiro
On 9/24/21 2:39 PM, friendlyexitnode via tor-relays wrote:
Hey hiro, thanks!
I've also attached some screenshots too if it helps (sorry, I should have done that before). I had first noticed this around 3:45 PM CST on September 23.
- The Friendly Exit Node Family
Sent with ProtonMail Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, September 24th, 2021 at 4:47 AM, Silvia/Hiro hiro@torproject.org wrote:
On 9/23/21 10:54 PM, friendlyexitnode via tor-relays wrote:
This looks like an awesome feature! I super appreciate it.
Random question though (and I'm the first to admit I may be doing something wrong), I notice that on Mobile it says my relays are overloaded however when I view it on a normal computer I don't get the overloaded indicator. I've tried refreshing multiple times but getting the same results. Is anyone seeing the same thing?
Hi,
could you let me know when you accessed the page via mobile approximately?
I'll try to check if any of your relays were overloaded in the past.
When a node is overloaded the state is kept for 72 hours.
Cheers,
-hiro
Family Members:
F01E382DA524A57F2BFB3C4FF270A23D5CD3311D
623CCCC1A1370700DD03046A85D953D35CAB5C21
F9A28AB71D7E4E446308641A556EA53BA55FCB50
23F74D581DE92AC59D3527DE4D448E036139D81E
A00E900534DFF76371064C03714753EAF8B88820
C232D8EE677E6BDF5CFFDDCAC4E2B1682DCE7AE5
- The Friendly Exit Node Family
Sent with ProtonMail Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, September 23rd, 2021 at 8:39 AM, Silvia/Hiro hiro@torproject.org wrote:
Hello all,
One of our goals with our current performance work is to reduce the
overload of relays in the network. The implementation of proposal 328[1]
a while back made different overload indicators available to relay
operators and since a couple of weeks ago those can be tracked via
Onionoo[2] as well.
As we know that a lot of our relay operators use relay search to check
for the health of their relays, we have launched a new feature there,
too, to help them know when their relays are overloaded.
When a relay is in the overloaded state we show an amber dot next to the
relay nickname.
Currently we are counting between 50 and 80 overloaded relays and
between 10 and 20 overloaded bridges.
The overloaded state is reached when one or many of the possible load
metrics have been triggered. When this happens we show it for 72 hours
after the relay has recovered [3]. Note, though, that not all of the
exposed overload metrics are triggering the overload indicator on relay
search yet.
If you noticed your relay is overloaded, please check the following
support article to find out how you can recover to a "normal" state:
https://support.torproject.org/relay-operators/relay-bridge-overloaded/
Let us known how you find this new feature.
Cheers,
-hiro
[1]
https://gitweb.torproject.org/torspec.git/tree/proposals/328-relay-overload-...
[2]
https://lists.torproject.org/pipermail/tor-project/2021-August/003168.html
[3] https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n637
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
_______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
Agreed, can the thresholds be published publicly for easy reference? Sometimes i get the overloaded flag but i have nothing in my logs, and my cpu/memory is abundant. It would be much easier to ascertain the cause of this if we knew what we were looking for!
Thank you
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Monday, September 27th, 2021 at 10:23 AM, Gary C. New via tor-relays tor-relays@lists.torproject.org wrote:
George,
The referenced support article provides recommendations as to what might be causing the overloaded state, but it doesn't provide the metric(s) for how Tor decides whether a relay is overloaded. I'm trying to ascertain the later.
I would assume the overloaded state metric(S) is/are a maximum timeout value and/or reoccurrence value, etc.
By knowing what the overloaded state metric is, I can tune my Tor relay just short of it.
Thank you for your reply.
Respectfully,
Gary
On Monday, September 27, 2021, 2:44:35 AM PDT, Georg Koppen gk@torproject.org wrote:
Gary C. New via tor-relays:
Hiro,
Presently, I'm seeing a similar issue. On my laptop, I'm observing an overloaded status for my relay. However, the same relay shows a green status on my phone.
Do you do any user-agent detection?
I'm still interested in those magic numbers, which determine whether a relay has reached an overloaded state.
Which numbers do you mean? Is there anything missing from the support
article[1] you feel should be there?
Georg
[1] https://support.torproject.org/relay-operators/relay-bridge-overloaded/
Thank You.
Respectfully,
Gary
On Saturday, September 25, 2021, 7:11:31 AM PDT, Silvia/Hiro hiro@torproject.org wrote:
Hi,
I went back in history and tried to find out whenever your node
FriendlyExit1 was overloaded. I couldn't find the exact descriptor.
One thing I can think of is that on the 22nd when I deployed this I
noticed a few typos in the code and had to make a second release. Maybe
something was cached for a while and you what you were accessing from
mobile was the buggy page.
If it happens again there are two buttons at the end of the page where
you can see the latest server and extra-info descriptors. If you
download the server one you would be able to verify that there is a
"overload-general" field in there. If there isn't we have a bug :).
Please let me know if this happens again.
Cheers,
-hiro
On 9/24/21 2:39 PM, friendlyexitnode via tor-relays wrote:
Hey hiro, thanks!
I've also attached some screenshots too if it helps (sorry, I should have done that before). I had first noticed this around 3:45 PM CST on September 23.
- The Friendly Exit Node Family
Sent with ProtonMail Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, September 24th, 2021 at 4:47 AM, Silvia/Hiro hiro@torproject.org wrote:
On 9/23/21 10:54 PM, friendlyexitnode via tor-relays wrote:
This looks like an awesome feature! I super appreciate it.
Random question though (and I'm the first to admit I may be doing something wrong), I notice that on Mobile it says my relays are overloaded however when I view it on a normal computer I don't get the overloaded indicator. I've tried refreshing multiple times but getting the same results. Is anyone seeing the same thing?
Hi,
could you let me know when you accessed the page via mobile approximately?
I'll try to check if any of your relays were overloaded in the past.
When a node is overloaded the state is kept for 72 hours.
Cheers,
-hiro
Family Members:
F01E382DA524A57F2BFB3C4FF270A23D5CD3311D
623CCCC1A1370700DD03046A85D953D35CAB5C21
F9A28AB71D7E4E446308641A556EA53BA55FCB50
23F74D581DE92AC59D3527DE4D448E036139D81E
A00E900534DFF76371064C03714753EAF8B88820
C232D8EE677E6BDF5CFFDDCAC4E2B1682DCE7AE5
- The Friendly Exit Node Family
Sent with ProtonMail Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, September 23rd, 2021 at 8:39 AM, Silvia/Hiro hiro@torproject.org wrote:
Hello all,
One of our goals with our current performance work is to reduce the
overload of relays in the network. The implementation of proposal 328[1]
a while back made different overload indicators available to relay
operators and since a couple of weeks ago those can be tracked via
Onionoo[2] as well.
As we know that a lot of our relay operators use relay search to check
for the health of their relays, we have launched a new feature there,
too, to help them know when their relays are overloaded.
When a relay is in the overloaded state we show an amber dot next to the
relay nickname.
Currently we are counting between 50 and 80 overloaded relays and
between 10 and 20 overloaded bridges.
The overloaded state is reached when one or many of the possible load
metrics have been triggered. When this happens we show it for 72 hours
after the relay has recovered [3]. Note, though, that not all of the
exposed overload metrics are triggering the overload indicator on relay
search yet.
If you noticed your relay is overloaded, please check the following
support article to find out how you can recover to a "normal" state:
https://support.torproject.org/relay-operators/relay-bridge-overloaded/
Let us known how you find this new feature.
Cheers,
-hiro
[1]
https://gitweb.torproject.org/torspec.git/tree/proposals/328-relay-overload-...
[2]
https://lists.torproject.org/pipermail/tor-project/2021-August/003168.html
[3] https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n637
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
On 27 Sep (14:23:34), Gary C. New via tor-relays wrote:
George, The referenced support article provides recommendations as to what might be causing the overloaded state, but it doesn't provide the metric(s) for how Tor decides whether a relay is overloaded. I'm trying to ascertain the later. I would assume the overloaded state metric(S) is/are a maximum timeout value and/or reoccurrence value, etc. By knowing what the overloaded state metric is, I can tune my Tor relay just short of it. Thank you for your reply. Respectfully,
Hi Gary!
I'll try to answer the best I can from what we've have worked on for these overload metrics.
Essentially, there few places within a Tor relay that we can easily notice an "overloaded" state. I'll list them and tell you how we decide:
1. Out-Of-Memory invocation
Tor has its own OOM and it is invoked when 75% of the total memory tor thinks it can use is reached. Thus, let say tor thinks it can use 2GB in total then at 1.5GB of memory usage, it will start freeing memory. That, is considered an overload state.
Now the real question here is what is the memory "tor thinks" it has. Unfortunately, it is not the greatest estimation but that is what it is. When tor starts, it will use MaxMemInQueues for that value or will look at the total RAM available on the system and apply this algorithm:
if RAM >= 8GB { memory = RAM * 40% } else { memory = RAM * 75% } /* Capped. */ memory = min(memory, 8GB) -> [8GB on 64bit and 2GB on 32bit) /* Minimum value. */ memory = max(250MB, memory)
Why we picked those numbers, I can't tell you that, these come from the very early days of the tor software and I can't tell you why.
And so to avoid such overload state, clearly run a relay above 2GB of RAM on 64bit should be the bare minimum in my opinion. 4GB would be much much better. In DDoS circumstances, there is a whole lot of memory pressure.
One keen observer can notice that this approach also has the problem that it doesn't shield tor from being called by the OS OOM itself. Reason is that because we take the total memory on the system when tor starts, if the overall system has many other applications running using RAM, we end up eating too much memory and the OS could OOM tor without tor even noticing memory pressure. Fortunately, this is not a problem affecting the overload status situation.
2. Onionskins processing
Tor is sadly single threaded _except_ for when the "onion skins" are processed that is the cryptographic work that needs to be done on the famous "onion layers" in every circuits.
For that we have a thread pool and outsource all of that work to that pool. It can happen that this pool starts dropping work due to back pressure and that in turn is an overload state.
Why this can happen, essentially CPU pressure. If your server is running at capacity and it is not only your tor, then this is likely to trigger.
3. DNS Timeout
This applies only to Exits. If tor starts noticing DNS timeouts, you'll get the overload flag. This might not be because your relay is overloaded in terms of resources but it signals a problem on the network.
And DNS timeouts at the Exits are a _huge_ UX problem for tor users and so Exit operators really need to be on top of those to help. It is not clear with the overload line but at least if an operator notices the overload line, it can then investigate DNS timeouts in case there is no resources pressure.
4. TCP port exhaustion
This should be extremely rare though. The idea about this one is that you ran out of TCP ports which is a range that is usually, on Linux, 32768-60999 and so having that many connections would lead to the overload state.
However, I think (I might be wrong though) that nowadays this range is per source IP and not process wide so it would likely have to be deliberate from someone to put your relay in that state.
There are two other overload lines that tor relay report: "overload-ratelimits" and "overload-fd-exhausted" but they are not used yet for the overload status on Metrics. But you can find them in your relay descriptor[0] if you are curious.
They are about when your relay reaches its connection global limit too often and when your relay runs out of file descriptors.
Hope this helps but overall as you can see, a lot of factor can influence these metrics and so the ideal ideal ideal situation for a tor relay is that it runs alone on a fairly good machine. Any kinds of pullback from a tor relay like being overloaded has cascading effects on the network both in terms of UX but also in terms of load balancing which tor is not yet very good at (but we are working on hard on making it much better!!).
Cheers! David
[0]
https://collector.torproject.org/recent/relay-descriptors/server-descriptors...
David, This is exactly the type of information I was hoping for. You should make this an article and link it to the overloaded support page. I guess I assumed that Tor preformed external timeout monitoring apposed to relay reported resource monitoring. It's interesting that you mention loadbalancing Tor as that is precisely what my recent efforts have been geared toward. I'm fairly confident that my last overloaded state was due to migrating one of my Tor relay nodes onto a previously provisioned BotFarm node and forgetting to kill the existing bot processes; thus, having competing resources. I can confirm that when loadbalancing Tor relay nodes that the whole is only as good as the weakest link; thus, it's important to have identical Tor relay nodes to evenly distribute circuits and maintain consensus. In this paradigm, I was hoping to be able to define a timeout value associated with the overloaded state and tune the loadbalancer to redistribute to different upstream nodes should a Tor relay node reach such a value. However, it seems this is a moot point, after reading your summary of the reporting process. At present, I have the upstream, loadbalancing timeout values disabled and let the Tor nodes build or teardown circuits based on available resources per node. I do see spikes alternate through various nodes throughout the day. It would be nice to find an upstream timeout value to better manage those spikes. Any recommendations would be greatly appreciated. Respectfully,
Gary P.S. This is all being done on ASUSWRT-Merlin using AiMesh nodes, but isn't limited to that architecture. I hope to publish a tutorial, after ironing out all the kinks.
On Tuesday, September 28, 2021, 7:01:04 AM MDT, David Goulet dgoulet@torproject.org wrote:
On 27 Sep (14:23:34), Gary C. New via tor-relays wrote:
George, The referenced support article provides recommendations as to what might be causing the overloaded state, but it doesn't provide the metric(s) for how Tor decides whether a relay is overloaded. I'm trying to ascertain the later. I would assume the overloaded state metric(S) is/are a maximum timeout value and/or reoccurrence value, etc. By knowing what the overloaded state metric is, I can tune my Tor relay just short of it. Thank you for your reply. Respectfully,
Hi Gary!
I'll try to answer the best I can from what we've have worked on for these overload metrics.
Essentially, there few places within a Tor relay that we can easily notice an "overloaded" state. I'll list them and tell you how we decide:
1. Out-Of-Memory invocation
Tor has its own OOM and it is invoked when 75% of the total memory tor thinks it can use is reached. Thus, let say tor thinks it can use 2GB in total then at 1.5GB of memory usage, it will start freeing memory. That, is considered an overload state.
Now the real question here is what is the memory "tor thinks" it has. Unfortunately, it is not the greatest estimation but that is what it is. When tor starts, it will use MaxMemInQueues for that value or will look at the total RAM available on the system and apply this algorithm:
if RAM >= 8GB { memory = RAM * 40% } else { memory = RAM * 75% } /* Capped. */ memory = min(memory, 8GB) -> [8GB on 64bit and 2GB on 32bit) /* Minimum value. */ memory = max(250MB, memory)
Why we picked those numbers, I can't tell you that, these come from the very early days of the tor software and I can't tell you why.
And so to avoid such overload state, clearly run a relay above 2GB of RAM on 64bit should be the bare minimum in my opinion. 4GB would be much much better. In DDoS circumstances, there is a whole lot of memory pressure.
One keen observer can notice that this approach also has the problem that it doesn't shield tor from being called by the OS OOM itself. Reason is that because we take the total memory on the system when tor starts, if the overall system has many other applications running using RAM, we end up eating too much memory and the OS could OOM tor without tor even noticing memory pressure. Fortunately, this is not a problem affecting the overload status situation.
2. Onionskins processing
Tor is sadly single threaded _except_ for when the "onion skins" are processed that is the cryptographic work that needs to be done on the famous "onion layers" in every circuits.
For that we have a thread pool and outsource all of that work to that pool. It can happen that this pool starts dropping work due to back pressure and that in turn is an overload state.
Why this can happen, essentially CPU pressure. If your server is running at capacity and it is not only your tor, then this is likely to trigger.
3. DNS Timeout
This applies only to Exits. If tor starts noticing DNS timeouts, you'll get the overload flag. This might not be because your relay is overloaded in terms of resources but it signals a problem on the network.
And DNS timeouts at the Exits are a _huge_ UX problem for tor users and so Exit operators really need to be on top of those to help. It is not clear with the overload line but at least if an operator notices the overload line, it can then investigate DNS timeouts in case there is no resources pressure.
4. TCP port exhaustion
This should be extremely rare though. The idea about this one is that you ran out of TCP ports which is a range that is usually, on Linux, 32768-60999 and so having that many connections would lead to the overload state.
However, I think (I might be wrong though) that nowadays this range is per source IP and not process wide so it would likely have to be deliberate from someone to put your relay in that state.
There are two other overload lines that tor relay report: "overload-ratelimits" and "overload-fd-exhausted" but they are not used yet for the overload status on Metrics. But you can find them in your relay descriptor[0] if you are curious.
They are about when your relay reaches its connection global limit too often and when your relay runs out of file descriptors.
Hope this helps but overall as you can see, a lot of factor can influence these metrics and so the ideal ideal ideal situation for a tor relay is that it runs alone on a fairly good machine. Any kinds of pullback from a tor relay like being overloaded has cascading effects on the network both in terms of UX but also in terms of load balancing which tor is not yet very good at (but we are working on hard on making it much better!!).
Cheers! David
[0]
https://collector.torproject.org/recent/relay-descriptors/server-descriptors...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
I am having a lot of trouble figuring out why my relay keeps showing as overloaded on the search page. I believe I have more than enough memory and cpu power to not be overloaded on hardware. My server internet connection is 10GB up/down unmetered.
I cannot for the life of me figure out why the relay search page continuously tells me i am overloaded. Can someone assist me in troubleshooting this?
Thank you. Pertinent hardware information is pasted below:
output of /proc/meminfo:
--[ BEGIN PASTE ]--
MemTotal: 65777296 kB MemFree: 63088388 kB MemAvailable: 63415088 kB Buffers: 180096 kB Cached: 736396 kB SwapCached: 0 kB Active: 449428 kB Inactive: 1729304 kB Active(anon): 14552 kB Inactive(anon): 1292048 kB Active(file): 434876 kB Inactive(file): 437256 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 33520636 kB SwapFree: 33520636 kB Dirty: 236 kB Writeback: 0 kB AnonPages: 1262300 kB Mapped: 273164 kB Shmem: 48592 kB KReclaimable: 94828 kB Slab: 204624 kB SReclaimable: 94828 kB SUnreclaim: 109796 kB KernelStack: 5040 kB PageTables: 9308 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 66409284 kB Committed_AS: 2374432 kB VmallocTotal: 34359738367 kB VmallocUsed: 34348 kB VmallocChunk: 0 kB Percpu: 16384 kB HardwareCorrupted: 0 kB AnonHugePages: 0 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB FileHugePages: 0 kB FilePmdMapped: 0 kB CmaTotal: 0 kB CmaFree: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB Hugetlb: 0 kB DirectMap4k: 332988 kB DirectMap2M: 7981056 kB DirectMap1G: 58720256 kB --[ END PASTE ]--
output of /proc/cpuinfo:
--[ BEGIN PASTE ]-- processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 113 model name : AMD Ryzen 5 3600 6-Core Processor stepping : 0 microcode : 0x8701021 cpu MHz : 2200.000 cache size : 512 KB physical id : 0 siblings : 12 core id : 0 cpu cores : 6 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 16 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es bugs : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass bogomips : 7202.22 TLB size : 3072 4K pages clflush size : 64 cache_alignment : 64 address sizes : 43 bits physical, 48 bits virtual power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]
processor : 1 vendor_id : AuthenticAMD cpu family : 23 model : 113 model name : AMD Ryzen 5 3600 6-Core Processor stepping : 0 microcode : 0x8701021 cpu MHz : 2200.000 cache size : 512 KB physical id : 0 siblings : 12 core id : 1 cpu cores : 6 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 16 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es bugs : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass bogomips : 7202.22 TLB size : 3072 4K pages clflush size : 64 cache_alignment : 64 address sizes : 43 bits physical, 48 bits virtual power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]
processor : 2 vendor_id : AuthenticAMD cpu family : 23 model : 113 model name : AMD Ryzen 5 3600 6-Core Processor stepping : 0 microcode : 0x8701021 cpu MHz : 2200.000 cache size : 512 KB physical id : 0 siblings : 12 core id : 2 cpu cores : 6 apicid : 4 initial apicid : 4 fpu : yes fpu_exception : yes cpuid level : 16 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs sk init wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsav eopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es bugs : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass bogomips : 7202.22 TLB size : 3072 4K pages clflush size : 64 cache_alignment : 64 address sizes : 43 bits physical, 48 bits virtual power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]
processor : 3 vendor_id : AuthenticAMD cpu family : 23 model : 113 model name : AMD Ryzen 5 3600 6-Core Processor stepping : 0 microcode : 0x8701021 cpu MHz : 2199.014 cache size : 512 KB physical id : 0 siblings : 12 core id : 4 cpu cores : 6 apicid : 8 initial apicid : 8 fpu : yes fpu_exception : yes cpuid level : 16 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs sk init wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsav eopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es bugs : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass bogomips : 7202.22 TLB size : 3072 4K pages clflush size : 64 cache_alignment : 64 address sizes : 43 bits physical, 48 bits virtual power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]
processor : 4 vendor_id : AuthenticAMD cpu family : 23 model : 113 model name : AMD Ryzen 5 3600 6-Core Processor stepping : 0 microcode : 0x8701021 cpu MHz : 2200.000 cache size : 512 KB physical id : 0 siblings : 12 core id : 5 cpu cores : 6 apicid : 10 initial apicid : 10 fpu : yes fpu_exception : yes cpuid level : 16 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs sk init wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsav eopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es bugs : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass bogomips : 7202.22 TLB size : 3072 4K pages clflush size : 64 cache_alignment : 64 address sizes : 43 bits physical, 48 bits virtual power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]
processor : 5 vendor_id : AuthenticAMD cpu family : 23 model : 113 model name : AMD Ryzen 5 3600 6-Core Processor stepping : 0 microcode : 0x8701021 cpu MHz : 2200.000 cache size : 512 KB physical id : 0 siblings : 12 core id : 6 cpu cores : 6 apicid : 12 initial apicid : 12 fpu : yes fpu_exception : yes cpuid level : 16 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs sk init wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsav eopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es bugs : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass bogomips : 7202.22 TLB size : 3072 4K pages clflush size : 64 cache_alignment : 64 address sizes : 43 bits physical, 48 bits virtual power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]
processor : 6 vendor_id : AuthenticAMD cpu family : 23 model : 113 model name : AMD Ryzen 5 3600 6-Core Processor stepping : 0 microcode : 0x8701021 cpu MHz : 2200.000 cache size : 512 KB physical id : 0 siblings : 12 core id : 0 cpu cores : 6 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 16 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs sk init wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsav eopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es bugs : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass bogomips : 7202.22 TLB size : 3072 4K pages clflush size : 64 clflush size : 64 cache_alignment : 64 address sizes : 43 bits physical, 48 bits virtual power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]
processor : 7 vendor_id : AuthenticAMD cpu family : 23 model : 113 model name : AMD Ryzen 5 3600 6-Core Processor stepping : 0 microcode : 0x8701021 cpu MHz : 2199.809 cache size : 512 KB physical id : 0 siblings : 12 core id : 1 cpu cores : 6 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 16 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs sk init wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsav eopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es bugs : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass bogomips : 7202.22 TLB size : 3072 4K pages clflush size : 64 cache_alignment : 64 address sizes : 43 bits physical, 48 bits virtual power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]
processor : 8 vendor_id : AuthenticAMD cpu family : 23 model : 113 model name : AMD Ryzen 5 3600 6-Core Processor stepping : 0 microcode : 0x8701021 cpu MHz : 2200.000 cache size : 512 KB physical id : 0 siblings : 12 core id : 2 cpu cores : 6 apicid : 5 initial apicid : 5 fpu : yes fpu_exception : yes cpuid level : 16 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs sk init wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsav eopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es bugs : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass bogomips : 7202.22 TLB size : 3072 4K pages clflush size : 64 cache_alignment : 64 address sizes : 43 bits physical, 48 bits virtual power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]
processor : 9 vendor_id : AuthenticAMD cpu family : 23 model : 113 model name : AMD Ryzen 5 3600 6-Core Processor stepping : 0 microcode : 0x8701021 cpu MHz : 2199.644 cache size : 512 KB physical id : 0 siblings : 12 core id : 4 cpu cores : 6 apicid : 9 initial apicid : 9 fpu : yes fpu_exception : yes cpuid level : 16 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs sk init wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsav eopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es bugs : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass bogomips : 7202.22 TLB size : 3072 4K pages clflush size : 64 cache_alignment : 64 address sizes : 43 bits physical, 48 bits virtual power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]
processor : 10 vendor_id : AuthenticAMD cpu family : 23 model : 113 model name : AMD Ryzen 5 3600 6-Core Processor stepping : 0 microcode : 0x8701021 cpu MHz : 2200.000 cache size : 512 KB physical id : 0 siblings : 12 core id : 5 cpu cores : 6 apicid : 11 initial apicid : 11 fpu : yes fpu_exception : yes cpuid level : 16 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs sk init wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsav eopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es bugs : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass bogomips : 7202.22 TLB size : 3072 4K pages clflush size : 64 cache_alignment : 64 address sizes : 43 bits physical, 48 bits virtual power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]
processor : 11 vendor_id : AuthenticAMD cpu family : 23 model : 113 model name : AMD Ryzen 5 3600 6-Core Processor stepping : 0 microcode : 0x8701021 cpu MHz : 2200.000 cache size : 512 KB physical id : 0 siblings : 12 core id : 6 cpu cores : 6 apicid : 13 initial apicid : 13 fpu : yes fpu_exception : yes cpuid level : 16 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es bugs : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass bogomips : 7202.22 TLB size : 3072 4K pages clflush size : 64 cache_alignment : 64 address sizes : 43 bits physical, 48 bits virtual power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]
--[END PASTE]--
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Tuesday, September 28th, 2021 at 11:31 AM, Gary C. New via tor-relays tor-relays@lists.torproject.org wrote:
David,
This is exactly the type of information I was hoping for. You should make this an article and link it to the overloaded support page.
I guess I assumed that Tor preformed external timeout monitoring apposed to relay reported resource monitoring.
It's interesting that you mention loadbalancing Tor as that is precisely what my recent efforts have been geared toward.
I'm fairly confident that my last overloaded state was due to migrating one of my Tor relay nodes onto a previously provisioned BotFarm node and forgetting to kill the existing bot processes; thus, having competing resources. I can confirm that when loadbalancing Tor relay nodes that the whole is only as good as the weakest link; thus, it's important to have identical Tor relay nodes to evenly distribute circuits and maintain consensus.
In this paradigm, I was hoping to be able to define a timeout value associated with the overloaded state and tune the loadbalancer to redistribute to different upstream nodes should a Tor relay node reach such a value. However, it seems this is a moot point, after reading your summary of the reporting process.
At present, I have the upstream, loadbalancing timeout values disabled and let the Tor nodes build or teardown circuits based on available resources per node. I do see spikes alternate through various nodes throughout the day. It would be nice to find an upstream timeout value to better manage those spikes.
Any recommendations would be greatly appreciated.
Respectfully,
Gary
P.S. This is all being done on ASUSWRT-Merlin using AiMesh nodes, but isn't limited to that architecture. I hope to publish a tutorial, after ironing out all the kinks.
On Tuesday, September 28, 2021, 7:01:04 AM MDT, David Goulet dgoulet@torproject.org wrote:
On 27 Sep (14:23:34), Gary C. New via tor-relays wrote:
George,
The referenced support article provides recommendations as to what might be
causing the overloaded state, but it doesn't provide the metric(s) for how
Tor decides whether a relay is overloaded. I'm trying to ascertain the
later. I would assume the overloaded state metric(S) is/are a maximum
timeout value and/or reoccurrence value, etc. By knowing what the
overloaded state metric is, I can tune my Tor relay just short of it. Thank
you for your reply. Respectfully,
Hi Gary!
I'll try to answer the best I can from what we've have worked on for these
overload metrics.
Essentially, there few places within a Tor relay that we can easily notice an
"overloaded" state. I'll list them and tell you how we decide:
- Out-Of-Memory invocation
Tor has its own OOM and it is invoked when 75% of the total memory tor
thinks it can use is reached. Thus, let say tor thinks it can use 2GB in
total then at 1.5GB of memory usage, it will start freeing memory. That, is
considered an overload state.
Now the real question here is what is the memory "tor thinks" it has.
Unfortunately, it is not the greatest estimation but that is what it is.
When tor starts, it will use MaxMemInQueues for that value or will look at
the total RAM available on the system and apply this algorithm:
if RAM >= 8GB {
memory = RAM * 40%
} else {
memory = RAM * 75%
}
/* Capped. */
memory = min(memory, 8GB) -> [8GB on 64bit and 2GB on 32bit)
/* Minimum value. */
memory = max(250MB, memory)
Why we picked those numbers, I can't tell you that, these come from the very
early days of the tor software and I can't tell you why.
And so to avoid such overload state, clearly run a relay above 2GB of RAM on
64bit should be the bare minimum in my opinion. 4GB would be much much
better. In DDoS circumstances, there is a whole lot of memory pressure.
One keen observer can notice that this approach also has the problem that it
doesn't shield tor from being called by the OS OOM itself. Reason is that
because we take the total memory on the system when tor starts, if the
overall system has many other applications running using RAM, we end up
eating too much memory and the OS could OOM tor without tor even noticing
memory pressure. Fortunately, this is not a problem affecting the overload
status situation.
- Onionskins processing
Tor is sadly single threaded _except_ for when the "onion skins" are
processed that is the cryptographic work that needs to be done on the famous
"onion layers" in every circuits.
For that we have a thread pool and outsource all of that work to that pool.
It can happen that this pool starts dropping work due to back pressure and
that in turn is an overload state.
Why this can happen, essentially CPU pressure. If your server is running at
capacity and it is not only your tor, then this is likely to trigger.
- DNS Timeout
This applies only to Exits. If tor starts noticing DNS timeouts, you'll get
the overload flag. This might not be because your relay is overloaded in
terms of resources but it signals a problem on the network.
And DNS timeouts at the Exits are a _huge_ UX problem for tor users and so
Exit operators really need to be on top of those to help. It is not clear
with the overload line but at least if an operator notices the overload
line, it can then investigate DNS timeouts in case there is no resources
pressure.
- TCP port exhaustion
This should be extremely rare though. The idea about this one is that you
ran out of TCP ports which is a range that is usually, on Linux, 32768-60999
and so having that many connections would lead to the overload state.
However, I think (I might be wrong though) that nowadays this range is per
source IP and not process wide so it would likely have to be deliberate from
someone to put your relay in that state.
There are two other overload lines that tor relay report:
"overload-ratelimits" and "overload-fd-exhausted" but they are not used yet
for the overload status on Metrics. But you can find them in your relay
descriptor[0] if you are curious.
They are about when your relay reaches its connection global limit too often
and when your relay runs out of file descriptors.
Hope this helps but overall as you can see, a lot of factor can influence
these metrics and so the ideal ideal ideal situation for a tor relay is that
it runs alone on a fairly good machine. Any kinds of pullback from a tor relay
like being overloaded has cascading effects on the network both in terms of UX
but also in terms of load balancing which tor is not yet very good at (but we
are working on hard on making it much better!!).
Cheers!
David
[0]
https://collector.torproject.org/recent/relay-descriptors/server-descriptors...
--
+7Xz1XCshqTyudrO7K4kGBEl+NghDNbqiTGYZpSOw4U=
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
David Goulet:
[snip]
There are two other overload lines that tor relay report: "overload-ratelimits" and "overload-fd-exhausted" but they are not used yet for the overload status on Metrics. But you can find them in your relay descriptor[0] if you are curious.
Small correction here: those two made it into the *extra-info* descriptors. Thus, you want to have a look at those[1] and not the server descriptors in case you want to figure out what is going on.
That said, I am currently working on network-health tools to get a better overview about relays' overload and I plan to contact all operators to help with configuration/tuning to avoid the overload we are seeing. I started with the "overload-fd-exhausted" condition as that is likely the easiest to fix and will get to the other ones once that one is resolved.
They are about when your relay reaches its connection global limit too often and when your relay runs out of file descriptors.
[snip]
Georg
[0]
https://collector.torproject.org/recent/relay-descriptors/server-descriptors...
[1] https://collector.torproject.org/recent/relay-descriptors/extra-infos/
Hi All! Curious... What are the magic numbers (i.e., max timeout, reoccurrence, etc) that earn a relay overloaded status? I'm trying to tune my portion of the Tor network and finding that sweet spot has proven elusive. Thanks!
Gary
On Friday, September 24, 2021, 3:48:18 AM MDT, Silvia/Hiro hiro@torproject.org wrote:
On 9/23/21 10:54 PM, friendlyexitnode via tor-relays wrote:
This looks like an awesome feature! I super appreciate it.
Random question though (and I'm the first to admit I may be doing something wrong), I notice that on Mobile it says my relays are overloaded however when I view it on a normal computer I don't get the overloaded indicator. I've tried refreshing multiple times but getting the same results. Is anyone seeing the same thing?
Hi,
could you let me know when you accessed the page via mobile approximately?
I'll try to check if any of your relays were overloaded in the past. When a node is overloaded the state is kept for 72 hours.
Cheers, -hiro
Family Members: F01E382DA524A57F2BFB3C4FF270A23D5CD3311D 623CCCC1A1370700DD03046A85D953D35CAB5C21 F9A28AB71D7E4E446308641A556EA53BA55FCB50 23F74D581DE92AC59D3527DE4D448E036139D81E A00E900534DFF76371064C03714753EAF8B88820 C232D8EE677E6BDF5CFFDDCAC4E2B1682DCE7AE5
- The Friendly Exit Node Family
Sent with ProtonMail Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, September 23rd, 2021 at 8:39 AM, Silvia/Hiro hiro@torproject.org wrote:
Hello all,
One of our goals with our current performance work is to reduce the
overload of relays in the network. The implementation of proposal 328[1]
a while back made different overload indicators available to relay
operators and since a couple of weeks ago those can be tracked via
Onionoo[2] as well.
As we know that a lot of our relay operators use relay search to check
for the health of their relays, we have launched a new feature there,
too, to help them know when their relays are overloaded.
When a relay is in the overloaded state we show an amber dot next to the
relay nickname.
Currently we are counting between 50 and 80 overloaded relays and
between 10 and 20 overloaded bridges.
The overloaded state is reached when one or many of the possible load
metrics have been triggered. When this happens we show it for 72 hours
after the relay has recovered [3]. Note, though, that not all of the
exposed overload metrics are triggering the overload indicator on relay
search yet.
If you noticed your relay is overloaded, please check the following
support article to find out how you can recover to a "normal" state:
https://support.torproject.org/relay-operators/relay-bridge-overloaded/
Let us known how you find this new feature.
Cheers,
-hiro
[1]
https://gitweb.torproject.org/torspec.git/tree/proposals/328-relay-overload-...
[2]
https://lists.torproject.org/pipermail/tor-project/2021-August/003168.html
[3] https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n637
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
_______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
On Thursday, September 23, 2021 3:39:08 PM CEST Silvia/Hiro wrote:
When a relay is in the overloaded state we show an amber dot next to the relay nickname.
Nice thing. This flag has noticed me a few days ago.
If you noticed your relay is overloaded, please check the following support article to find out how you can recover to a "normal" state:
https://support.torproject.org/relay-operators/relay-bridge-overloaded/
A question about Enabling Metricsport. Is definitely Prometheus necessary? Or can the Metrics Port write into a LogFile / TextFile?
On 24 Sep (12:36:17), lists@for-privacy.net wrote:
On Thursday, September 23, 2021 3:39:08 PM CEST Silvia/Hiro wrote:
When a relay is in the overloaded state we show an amber dot next to the relay nickname.
Nice thing. This flag has noticed me a few days ago.
If you noticed your relay is overloaded, please check the following support article to find out how you can recover to a "normal" state:
https://support.torproject.org/relay-operators/relay-bridge-overloaded/
A question about Enabling Metricsport. Is definitely Prometheus necessary? Or can the Metrics Port write into a LogFile / TextFile?
The output format is Prometheus but you don't need a prometheus server to get it.
Once opened, you can simply fetch it like this:
wget http://<IP_OF_METRICSPORT>:<METRICS_PORT>/metrics -O /tmp/output.txt
or
curl http://<IP_OF_METRICSPORT>:<METRICS_PORT>/metrics -o /tmp/output.txt
Cheers! David
Hello David!
On Mon, Sep 27, 2021 at 08:22:08AM -0400, David Goulet wrote:
On 24 Sep (12:36:17), lists@for-privacy.net wrote:
On Thursday, September 23, 2021 3:39:08 PM CEST Silvia/Hiro wrote:
When a relay is in the overloaded state we show an amber dot next to the relay nickname.
Nice thing. This flag has noticed me a few days ago.
If you noticed your relay is overloaded, please check the following support article to find out how you can recover to a "normal" state:
https://support.torproject.org/relay-operators/relay-bridge-overloaded/
A question about Enabling Metricsport. Is definitely Prometheus necessary? Or can the Metrics Port write into a LogFile / TextFile?
The output format is Prometheus but you don't need a prometheus server to get it.
Once opened, you can simply fetch it like this:
wget http://<IP_OF_METRICSPORT>:<METRICS_PORT>/metrics -O /tmp/output.txt
or
curl http://<IP_OF_METRICSPORT>:<METRICS_PORT>/metrics -o /tmp/output.txt
I've only ever gotten an empty reply when trying to extract metrics and I still do on Tor 4.6.7. I have some vague memory of the metrics previously only being for hidden services.
All the metrics mentioned in the overload guide[1] seem to be missing. Having a quick look at the Git repository, it seems that only 0.4.71-alpha and latest master contain the necessary changes for these metrics.
Is my assumption correct or am I doing something wrong?
Cordially, Andreas Kempe
[1]: https://support.torproject.org/relay-operators/relay-bridge-overloaded/
On 01 Oct (03:08:20), Andreas Kempe wrote:
Hello David!
On Mon, Sep 27, 2021 at 08:22:08AM -0400, David Goulet wrote:
On 24 Sep (12:36:17), lists@for-privacy.net wrote:
On Thursday, September 23, 2021 3:39:08 PM CEST Silvia/Hiro wrote:
When a relay is in the overloaded state we show an amber dot next to the relay nickname.
Nice thing. This flag has noticed me a few days ago.
If you noticed your relay is overloaded, please check the following support article to find out how you can recover to a "normal" state:
https://support.torproject.org/relay-operators/relay-bridge-overloaded/
A question about Enabling Metricsport. Is definitely Prometheus necessary? Or can the Metrics Port write into a LogFile / TextFile?
The output format is Prometheus but you don't need a prometheus server to get it.
Once opened, you can simply fetch it like this:
wget http://<IP_OF_METRICSPORT>:<METRICS_PORT>/metrics -O /tmp/output.txt
or
curl http://<IP_OF_METRICSPORT>:<METRICS_PORT>/metrics -o /tmp/output.txt
I've only ever gotten an empty reply when trying to extract metrics and I still do on Tor 4.6.7. I have some vague memory of the metrics previously only being for hidden services.
All the metrics mentioned in the overload guide[1] seem to be missing. Having a quick look at the Git repository, it seems that only 0.4.71-alpha and latest master contain the necessary changes for these metrics.
Is my assumption correct or am I doing something wrong?
Correct, relay metrics are only available on >= 0.4.7.1-alpha. Hopefully, we should have an 0.4.7 stable by end of year (or around that time).
David
My relays (Aramis) marked overloaded don't make any sense either. Two of the ones marked with orange are the two with the lowest traffic I have (2-5 MiB/s and 4-9 MiB/s - not pushing any limits here); the third one with that host has more traffic and is fine.
So far this indicator seems to be no help to me.
--Torix
Sent with ProtonMail Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, October 1st, 2021 at 12:22 PM, David Goulet dgoulet@torproject.org wrote:
On 01 Oct (03:08:20), Andreas Kempe wrote:
Hello David!
On Mon, Sep 27, 2021 at 08:22:08AM -0400, David Goulet wrote:
On 24 Sep (12:36:17), lists@for-privacy.net wrote:
On Thursday, September 23, 2021 3:39:08 PM CEST Silvia/Hiro wrote:
When a relay is in the overloaded state we show an amber dot next to the
relay nickname.
Nice thing. This flag has noticed me a few days ago.
If you noticed your relay is overloaded, please check the following
support article to find out how you can recover to a "normal" state:
https://support.torproject.org/relay-operators/relay-bridge-overloaded/
A question about Enabling Metricsport. Is definitely Prometheus necessary?
Or can the Metrics Port write into a LogFile / TextFile?
The output format is Prometheus but you don't need a prometheus server to get
it.
Once opened, you can simply fetch it like this:
wget http://<IP_OF_METRICSPORT>:<METRICS_PORT>/metrics -O /tmp/output.txt
or
curl http://<IP_OF_METRICSPORT>:<METRICS_PORT>/metrics -o /tmp/output.txt
I've only ever gotten an empty reply when trying to extract metrics
and I still do on Tor 4.6.7. I have some vague memory of the metrics
previously only being for hidden services.
All the metrics mentioned in the overload guide[1] seem to be missing.
Having a quick look at the Git repository, it seems that only
0.4.71-alpha and latest master contain the necessary changes for these
metrics.
Is my assumption correct or am I doing something wrong?
Correct, relay metrics are only available on >= 0.4.7.1-alpha. Hopefully, we
should have an 0.4.7 stable by end of year (or around that time).
David
xX/GscsnOTkKMqPla5JDOc2EqZ3GG/imhQ7gx+DQhVE=
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
On Sunday, October 3, 2021 6:52:46 PM CEST Bleedangel Tor Admin wrote:
The troubleshooting page for an overloaded relay mentions running metricsport, but with no instructions or even links to anywhere to explain how to do this.
??
marco@w530:~$ man torrc | grep Metrics MetricsPort [address:]port [format] access policy with MetricsPortPolicy and consider using your operating systems firewall features for defense in depth. MetricsPortPolicy must be defined else every request will be rejected. MetricsPort 1.2.3.4:9035 MetricsPortPolicy accept 5.6.7.8 MetricsPortPolicy policy,policy,... Set an entrance policy for the MetricsPort, to limit who can access it. The policies have the same form as exit policies below, except that port specifiers are ignored. For multiple entries, this line can be used Please, keep in mind here that if the server collecting metrics on the MetricsPort is behind a NAT, then everything behind it can access it. This is similar for the case of allowing localhost, every users on the server
And David explained to me: https://lists.torproject.org/pipermail/tor-relays/2021-September/019841.html
My relay has a dirport of 9030 in my config and nyx confirms this. The relay status website says I have no dirport configured.
Check iptables or port forward on your router.
On 02 Oct (01:29:56), torix via tor-relays wrote:
My relays (Aramis) marked overloaded don't make any sense either. Two of the ones marked with orange are the two with the lowest traffic I have (2-5 MiB/s and 4-9 MiB/s - not pushing any limits here); the third one with that host has more traffic and is fine.
So far this indicator seems to be no help to me.
Keep in mind that the overload state might not be only about traffic capacity. Like this page states, there other factors including CPU and memory pressure.
https://support.torproject.org/relay-operators/relay-bridge-overloaded/
We are in a continuous process of making it better with feedback from the relay community. It is a hard problem because so many things can change or influence things. And different OSes also makes it challenging.
Another thing here to remember, the overload state will be set for 72 hours even if a SINGLE overload event occurred.
For more details: https://lists.torproject.org/pipermail/tor-relays/2021-September/019844.html
(FYI, we are in the process of adding this information in the support page ^).
If you can't find sticking out, that is OK, you can move on and see if it continues to stick. If so, maybe its worth digging more and when 0.4.7 will be stable, you'll be able to enable the MetricsPort (man tor) to get into the rabbit hole a bit deeper.
Cheers! David
On 10/4/21 1:36 PM, David Goulet wrote:
On 02 Oct (01:29:56), torix via tor-relays wrote:
My relays (Aramis) marked overloaded don't make any sense either. Two of the ones marked with orange are the two with the lowest traffic I have (2-5 MiB/s and 4-9 MiB/s - not pushing any limits here); the third one with that host has more traffic and is fine.
So far this indicator seems to be no help to me.
Keep in mind that the overload state might not be only about traffic capacity. Like this page states, there other factors including CPU and memory pressure.
https://support.torproject.org/relay-operators/relay-bridge-overloaded/
We are in a continuous process of making it better with feedback from the relay community. It is a hard problem because so many things can change or influence things. And different OSes also makes it challenging.
Another thing here to remember, the overload state will be set for 72 hours even if a SINGLE overload event occurred.
For more details: https://lists.torproject.org/pipermail/tor-relays/2021-September/019844.html
(FYI, we are in the process of adding this information in the support page ^).
We have now updated the support article at: https://support.torproject.org/relay-operators/relay-bridge-overloaded/
We have tried to clarify how and why the overloaded state is triggered. I hope this can help operators understand better why their relays can be found in this state and how a normal state can be recovered.
Please do let us know what you think.
Cheers, -hiro
If you can't find sticking out, that is OK, you can move on and see if it continues to stick. If so, maybe its worth digging more and when 0.4.7 will be stable, you'll be able to enable the MetricsPort (man tor) to get into the rabbit hole a bit deeper.
Cheers! David
tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
This is much more informational. Great job!
As someone with mystery "overloaded" problems, i'd recommend / request / beg for the following:
1) When the relay is overloaded, a yellow indicator appears on the web page. This indicator remains for 72 hours after the overloaded state is remedied. - This is not helpful in diagnosing anything, because even if the problem is solved, it is not evident that the relay is no longer overloaded for 72 hours. -- once the relay is no longer overloaded, it should have a purple (or any other color) "recovery" indicator for 72 hours. At least the relay operator would know that the overloaded state has been repaired.
2) metricsport - This is such an enigma to someone who is not familiar with prometheus, or torrc beyond the basics. As a matter of fact, when installing the tor-git package on Arch Linux, the man pages dont install automatically, so 'man torrc' gives a very helpful:
$ man torrc No manual entry for torrc
The official tor website, under manuals section, -alpha: https://2019.www.torproject.org/docs/tor-manual-dev.html.en does not include the documentation for metricsport.
i think the troubleshooting guide should contain directions to enable metricsport, and how to view the results:
...
To enable metricsport for advanced diagnosis:
In torrc set MetricsPort and MetricsPortPolicy flags as follows:
MetricsPort <server ip address>:<port> MetricsPortPolicy accept <ip address to accept metricsport queries from>
it is good policy to only allow connections to the metricsport port from localhost as follows:
MetricsPort 127.0.0.1:9035 #This will open the metricsport server on port 9035, listening on localhost (127.0.0.1). MetricsPortPolicy accept 127.0.0.1 #This will allow only localhost (127.0.0.0) to query the metricsport server.
Once these are set and the configuration reloaded (via SIGHUP or tor restart), the data can be queried as follows:
wget http://127.0.0.1:9035/metrics -O metricsport.txt
This will place a file in the current directory called 'metricsport.txt' that can be used to troubleshoot the overloaded relay issues via the information in this document
...
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Tuesday, October 5th, 2021 at 4:33 PM, Silvia/Hiro hiro@torproject.org wrote:
On 10/4/21 1:36 PM, David Goulet wrote:
On 02 Oct (01:29:56), torix via tor-relays wrote:
My relays (Aramis) marked overloaded don't make any sense either. Two of
the ones marked with orange are the two with the lowest traffic I have (2-5
MiB/s and 4-9 MiB/s - not pushing any limits here); the third one with that
host has more traffic and is fine.
So far this indicator seems to be no help to me.
Keep in mind that the overload state might not be only about traffic capacity.
Like this page states, there other factors including CPU and memory pressure.
https://support.torproject.org/relay-operators/relay-bridge-overloaded/
We are in a continuous process of making it better with feedback from the
relay community. It is a hard problem because so many things can change or
influence things. And different OSes also makes it challenging.
Another thing here to remember, the overload state will be set for 72 hours
even if a SINGLE overload event occurred.
For more details:
https://lists.torproject.org/pipermail/tor-relays/2021-September/019844.html
(FYI, we are in the process of adding this information in the support page ^).
We have now updated the support article at:
https://support.torproject.org/relay-operators/relay-bridge-overloaded/
We have tried to clarify how and why the overloaded state is triggered.
I hope this can help operators understand better why their relays can be
found in this state and how a normal state can be recovered.
Please do let us know what you think.
Cheers,
-hiro
If you can't find sticking out, that is OK, you can move on and see if it
continues to stick. If so, maybe its worth digging more and when 0.4.7 will be
stable, you'll be able to enable the MetricsPort (man tor) to get into the
rabbit hole a bit deeper.
Cheers!
David
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
On Tuesday, October 5, 2021 11:13:27 PM CEST Bleedangel Tor Admin wrote:
i think the troubleshooting guide should contain directions to enable metricsport, and how to view the results:
+1
...
To enable metricsport for advanced diagnosis:
In torrc set MetricsPort and MetricsPortPolicy flags as follows:
MetricsPort <server ip address>:<port> MetricsPortPolicy accept <ip address to accept metricsport queries from>
it is good policy to only allow connections to the metricsport port from localhost as follows:
MetricsPort 127.0.0.1:9035 #This will open the metricsport server on port 9035, listening on localhost (127.0.0.1). MetricsPortPolicy accept 127.0.0.1 #This will allow only localhost (127.0.0.0) to query the metricsport server.
Once these are set and the configuration reloaded (via SIGHUP or tor restart), the data can be queried as follows:
wget http://127.0.0.1:9035/metrics -O metricsport.txt
This will place a file in the current directory called 'metricsport.txt' that can be used to troubleshoot the overloaded relay issues via the information in this document
...
Great summary
I have to agree with Bleedangel on both points that they make: 1. Attempting to troubleshoot a relay every 3 days takes some serious patience and 2. Guidance in setting up PortMetrics is as important as understanding is output. Excellent Suggestions! Respectfully,
Gary
On Wednesday, October 6, 2021, 12:48:31 AM PDT, Bleedangel Tor Admin tor@bleedangel.com wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
This is much more informational. Great job!
As someone with mystery "overloaded" problems, i'd recommend / request / beg for the following:
1) When the relay is overloaded, a yellow indicator appears on the web page. This indicator remains for 72 hours after the overloaded state is remedied. - This is not helpful in diagnosing anything, because even if the problem is solved, it is not evident that the relay is no longer overloaded for 72 hours. -- once the relay is no longer overloaded, it should have a purple (or any other color) "recovery" indicator for 72 hours. At least the relay operator would know that the overloaded state has been repaired.
2) metricsport - This is such an enigma to someone who is not familiar with prometheus, or torrc beyond the basics. As a matter of fact, when installing the tor-git package on Arch Linux, the man pages dont install automatically, so 'man torrc' gives a very helpful:
$ man torrc No manual entry for torrc
The official tor website, under manuals section, -alpha: https://2019.www.torproject.org/docs/tor-manual-dev.html.en does not include the documentation for metricsport.
i think the troubleshooting guide should contain directions to enable metricsport, and how to view the results:
...
To enable metricsport for advanced diagnosis:
In torrc set MetricsPort and MetricsPortPolicy flags as follows:
MetricsPort <server ip address>:<port> MetricsPortPolicy accept <ip address to accept metricsport queries from>
it is good policy to only allow connections to the metricsport port from localhost as follows:
MetricsPort 127.0.0.1:9035 #This will open the metricsport server on port 9035, listening on localhost (127.0.0.1). MetricsPortPolicy accept 127.0.0.1 #This will allow only localhost (127.0.0.0) to query the metricsport server.
Once these are set and the configuration reloaded (via SIGHUP or tor restart), the data can be queried as follows:
wget http://127.0.0.1:9035/metrics -O metricsport.txt
This will place a file in the current directory called 'metricsport.txt' that can be used to troubleshoot the overloaded relay issues via the information in this document
...
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Tuesday, October 5th, 2021 at 4:33 PM, Silvia/Hiro hiro@torproject.org wrote:
On 10/4/21 1:36 PM, David Goulet wrote:
On 02 Oct (01:29:56), torix via tor-relays wrote:
My relays (Aramis) marked overloaded don't make any sense either. Two of
the ones marked with orange are the two with the lowest traffic I have (2-5
MiB/s and 4-9 MiB/s - not pushing any limits here); the third one with that
host has more traffic and is fine.
So far this indicator seems to be no help to me.
Keep in mind that the overload state might not be only about traffic capacity.
Like this page states, there other factors including CPU and memory pressure.
https://support.torproject.org/relay-operators/relay-bridge-overloaded/
We are in a continuous process of making it better with feedback from the
relay community. It is a hard problem because so many things can change or
influence things. And different OSes also makes it challenging.
Another thing here to remember, the overload state will be set for 72 hours
even if a SINGLE overload event occurred.
For more details:
https://lists.torproject.org/pipermail/tor-relays/2021-September/019844.html
(FYI, we are in the process of adding this information in the support page ^).
We have now updated the support article at:
https://support.torproject.org/relay-operators/relay-bridge-overloaded/
We have tried to clarify how and why the overloaded state is triggered.
I hope this can help operators understand better why their relays can be
found in this state and how a normal state can be recovered.
Please do let us know what you think.
Cheers,
-hiro
If you can't find sticking out, that is OK, you can move on and see if it
continues to stick. If so, maybe its worth digging more and when 0.4.7 will be
stable, you'll be able to enable the MetricsPort (man tor) to get into the
rabbit hole a bit deeper.
Cheers!
David
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
_______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
Tor has always been very lax at documentation--both creation and updating. I've never seen a thorough site map or directory published. I think this discourages folks who would like to start a tor relay. potlatch
Sent with ProtonMail Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Tuesday, October 5th, 2021 at 2:13 PM, Bleedangel Tor Admin tor@bleedangel.com wrote:
This is much more informational. Great job!
As someone with mystery "overloaded" problems, i'd recommend / request / beg for the following:
- When the relay is overloaded, a yellow indicator appears on the web page. This indicator remains for 72 hours after the overloaded state is remedied.
This is not helpful in diagnosing anything, because even if the problem is solved, it is not evident that the relay is no longer overloaded for 72 hours.
-- once the relay is no longer overloaded, it should have a purple (or any other color) "recovery" indicator for 72 hours. At least the relay operator would know that the overloaded state has been repaired.
- metricsport
This is such an enigma to someone who is not familiar with prometheus, or torrc beyond the basics. As a matter of fact, when installing the tor-git package on Arch Linux, the man pages dont install automatically, so 'man torrc' gives a very helpful:
$ man torrc
No manual entry for torrc
The official tor website, under manuals section, -alpha: https://2019.www.torproject.org/docs/tor-manual-dev.html.en
does not include the documentation for metricsport.
i think the troubleshooting guide should contain directions to enable metricsport, and how to view the results:
...
To enable metricsport for advanced diagnosis:
In torrc set MetricsPort and MetricsPortPolicy flags as follows:
MetricsPort <server ip address>:<port>
MetricsPortPolicy accept <ip address to accept metricsport queries from>
it is good policy to only allow connections to the metricsport port from localhost as follows:
MetricsPort 127.0.0.1:9035 #This will open the metricsport server on port 9035, listening on localhost (127.0.0.1).
MetricsPortPolicy accept 127.0.0.1 #This will allow only localhost (127.0.0.0) to query the metricsport server.
Once these are set and the configuration reloaded (via SIGHUP or tor restart), the data can be queried as follows:
wget http://127.0.0.1:9035/metrics -O metricsport.txt
This will place a file in the current directory called 'metricsport.txt' that can be used to troubleshoot the overloaded relay issues via the information in this document
...
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Tuesday, October 5th, 2021 at 4:33 PM, Silvia/Hiro hiro@torproject.org wrote:
On 10/4/21 1:36 PM, David Goulet wrote:
On 02 Oct (01:29:56), torix via tor-relays wrote:
My relays (Aramis) marked overloaded don't make any sense either. Two of
the ones marked with orange are the two with the lowest traffic I have (2-5
MiB/s and 4-9 MiB/s - not pushing any limits here); the third one with that
host has more traffic and is fine.
So far this indicator seems to be no help to me.
Keep in mind that the overload state might not be only about traffic capacity.
Like this page states, there other factors including CPU and memory pressure.
https://support.torproject.org/relay-operators/relay-bridge-overloaded/
We are in a continuous process of making it better with feedback from the
relay community. It is a hard problem because so many things can change or
influence things. And different OSes also makes it challenging.
Another thing here to remember, the overload state will be set for 72 hours
even if a SINGLE overload event occurred.
For more details:
https://lists.torproject.org/pipermail/tor-relays/2021-September/019844.html
(FYI, we are in the process of adding this information in the support page ^).
We have now updated the support article at:
https://support.torproject.org/relay-operators/relay-bridge-overloaded/
We have tried to clarify how and why the overloaded state is triggered.
I hope this can help operators understand better why their relays can be
found in this state and how a normal state can be recovered.
Please do let us know what you think.
Cheers,
-hiro
If you can't find sticking out, that is OK, you can move on and see if it
continues to stick. If so, maybe its worth digging more and when 0.4.7 will be
stable, you'll be able to enable the MetricsPort (man tor) to get into the
rabbit hole a bit deeper.
Cheers!
David
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
On Wednesday, October 6, 2021 8:55:01 PM CEST potlatch via tor-relays wrote:
Tor has always been very lax at documentation--both creation and updating. I've never seen a thorough site map or directory published. I think this discourages folks who would like to start a tor relay.
Mmm. I'm a stupid hobby admin with no IT background. Tor entry servers were the first thing I set up on rented servers in the data center. The TorRelayGuide on Torproject.org was easy for me. https://community.torproject.org/relay/ Debian's sample torrc has always been well documented.
And micahflee's tor-relay-bootstrap script is forked a dozen times. A relay can be set up in just a few minutes.
The biggest problem is to find a provider where you can rent cheap unmetered servers and who allow exit's to operate. In addition, the provider should ideally be far away from DE-CIX Frankfurt and no or few other Tor servers there.
And don't forget the Tor Project is a Community Project. Everyone can edit the Torproject.org Doku. ;-)
I can always recommend these pages to beginners: https://tor-relay.co/ https://www.torservers.net/wiki/guides
Bleedangel Tor Admin:
Can you link to where I can edit the torproject.org documentation? I cannot find this feature.
This is nowadays tracked in our Gitlab instance.[1] Thus, you could fork the respective project and make a merge request. We'd be happy to review it and improve our documentation that way.
Georg
[1] https://gitlab.torproject.org
Thanks
Sent from ProtonMail for iOS
On Thu, Oct 7, 2021 at 06:48, <lists@for-privacy.net mailto:lists@for-privacy.net> wrote:
On Wednesday, October 6, 2021 8:55:01 PM CEST potlatch via tor-relays wrote:
Tor has always been very lax at documentation--both creation and updating. I've never seen a thorough site map or directory published. I think this discourages folks who would like to start a tor relay.
Mmm. I'm a stupid hobby admin with no IT background. Tor entry servers were the first thing I set up on rented servers in the data center. The TorRelayGuide on Torproject.org was easy for me. https://community.torproject.org/relay/ Debian's sample torrc has always been well documented.
And micahflee's tor-relay-bootstrap script is forked a dozen times. A relay can be set up in just a few minutes.
The biggest problem is to find a provider where you can rent cheap unmetered servers and who allow exit's to operate. In addition, the provider should ideally be far away from DE-CIX Frankfurt and no or few other Tor servers there.
And don't forget the Tor Project is a Community Project. Everyone can edit the Torproject.org Doku. ;-)
I can always recommend these pages to beginners: https://tor-relay.co/ https://www.torservers.net/wiki/guides
-- ╰_╯ Ciao Marco!
Debian GNU/Linux
It's free software and it gives you freedom!_______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
On Fri, Oct 08, 2021 at 06:54:01AM +0000, Georg Koppen wrote:
Bleedangel Tor Admin:
Can you link to where I can edit the torproject.org documentation? I cannot find this feature.
This is nowadays tracked in our Gitlab instance.[1] Thus, you could fork the respective project and make a merge request. We'd be happy to review it and improve our documentation that way.
Georg
If you want to edit Tor Support pages (support.torproject.org), you just need to scroll down and click "Edit this page". It will redirect to GitHub. See for example: https://support.torproject.org/relay-operators/relay-bridge-overloaded/
If you don't like GitHub, you can either import 'Support' git repository to your favorite git platform or just fork on GitLab: https://gitlab.torproject.org/tpo/web/support/
To edit and improve the Tor Relay guide, you need to clone and edit the 'Community' repository. You can send a merge request or a pull request: https://gitlab.torproject.org/tpo/web/community/ https://github.com/torproject/community/
If you need help to edit the website, please join us on IRC: #tor-www (or Element/matrix: #tor-www:matrix.org).
Gus
On Friday, October 8, 2021 12:18:16 AM CEST Bleedangel Tor Admin wrote:
Can you link to where I can edit the torproject.org documentation? I cannot find this feature.
https://community.torproject.org/relay/setup/ at the bottom.
You can do this with a normal github.com account: https://github.com/torproject/community#how-to-contribute
Some things are on: https://gitlab.torproject.org You can register here for this: https://gitlab.onionize.space/
Hi all,
My relay (77D08850C1EE8587451F838D3F49874F75B0B1AC) is showing as overloaded on the Relay Search page:
https://metrics.torproject.org/rs.html#details/77D08850C1EE8587451F838D3F498...
I have enabled MetricsPort and MetricsPortPolicy in the torrc configuration file. I have retrieved the metrics into a file and inspected it but it is still not clear to me what the problem is. The only non-zero values I can see are reason="success" or action="processed". Nothing sticks out in the process logs either.
Can anyone assist?
Regards,
Arlen
Did you use a lot of ram or cpu power recently? I got flagged as overloaded when I was compiling something and used a lot of cpu.
Nothing out of the ordinary. The server is a virtual machine which is dedicated to running a Tor relay. I don't use it for anything else.
I've just checked the server descriptor. It seems to be missing the 'overload-general' flag which means it now isn't overloaded? Yet the relay search page says it is?
This is all seems a bit more troublesome than it should be. Presumably the relay itself knows why it is overloaded so can it not just state the reason clearly in the logs?
Arlen Yaroslav via tor-relays:
Did you use a lot of ram or cpu power recently? I got flagged as overloaded when I was compiling something and used a lot of cpu.
Nothing out of the ordinary. The server is a virtual machine which is dedicated to running a Tor relay. I don't use it for anything else.
I've just checked the server descriptor. It seems to be missing the 'overload-general' flag which means it now isn't overloaded? Yet the relay search page says it is?
I am not sure where you are looking but if you take a look at what Onionoo[1] is saying you get
overload_general_timestamp 1634180400000
which means 10/14/2021 03:00:00 UTC, thus today.
Georg
[1] https://onionoo.torproject.org/details?limit=4&search=VinculumGate
I am not sure where you are looking but if you take a look at what
Onionoo[1] is saying you get
overload_general_timestamp 1634180400000
which means 10/14/2021 03:00:00 UTC, thus today.
Georg
[1] https://onionoo.torproject.org/details?limit=4&search=VinculumGate
Thanks, I was looking at the cached-descriptors file which I can see now is well out of date.
Would anyone have any other suggestions regarding why the relay is overloaded?
The problem is that it’s *not* currently overloaded, there’s nothing to see. Maybe you can check your syslogs for anything out of the ordinary system-wide?
I've checked dmesg. Nothing that stands out as being problematic. Even if there was something there's no way to tell if it's actually the cause :-/
Arlen Yaroslav via tor-relays:
The problem is that it’s *not* currently overloaded, there’s nothing to see. Maybe you can check your syslogs for anything out of the ordinary system-wide?
I've checked dmesg. Nothing that stands out as being problematic. Even if there was something there's no way to tell if it's actually the cause :-/
Feel free to send the metrics port data over to us (see the mail address on the support article). Maybe we can find ideas for a further investigation based on that.
Georg
tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
Hi,
I've done some further analysis on this. The reason my relay is being marked as overloaded is because of DNS timeout errors. I had to dive into the source code to figure this out.
In dns.c, a libevent DNS_ERR_TIMEOUT is being recorded as an OVERLOAD_GENERAL error. Am I correct in saying that a single DNS timeout error within a 72-hour period will result in an overloaded state? If so, it seems overly-stringent given that there are no options available to tune the DNS timeout, max retry etc. parameters. Some lower-specced servers with less than optimal access to DNS resolvers will suffer because of this.
Also, I was wondering why these timeouts were not being recorded in the Metrics output. I've done some digging and I believe there is a bug in the evdns_callback() function. The rep_hist_note_dns_error() is being called as follows:
rep_hist_note_dns_error(type, result);
but I've noticed the 'type' being set to zero whenever libevent returns a DNS error which means the correct dns_stats_t structure is never found, as zero is outside the expected range of values (DNS_IPv4_A, DNS_PTR, DNS_IPv6_AAAA). Adding the BUG assertion confirms this.
Please let me know if I should raise this in the bug tracker or if you need anything else.
Thanks,
Arlen
On 17 Oct (13:54:22), Arlen Yaroslav via tor-relays wrote:
Hi,
Hi Arlen!
I've done some further analysis on this. The reason my relay is being marked as overloaded is because of DNS timeout errors. I had to dive into the source code to figure this out.
In dns.c, a libevent DNS_ERR_TIMEOUT is being recorded as an OVERLOAD_GENERAL error. Am I correct in saying that a single DNS timeout error within a 72-hour period will result in an overloaded state? If so, it seems overly-stringent given that there are no options available to tune the DNS timeout, max retry etc. parameters. Some lower-specced servers with less than optimal access to DNS resolvers will suffer because of this.
Correct, 1 single DNS timeout will trigger the general overload flag. There were discussion to make it N% of all request to timeout before we would report it with a N being around 1% but unfortunately that was never implemented that way. And so, at the moment, 1 timeout is enough to trigger the problem.
And I think you are right, we would benefit on raising that threshold big time.
Also, I was wondering why these timeouts were not being recorded in the Metrics output. I've done some digging and I believe there is a bug in the evdns_callback() function. The rep_hist_note_dns_error() is being called as follows:
rep_hist_note_dns_error(type, result);
but I've noticed the 'type' being set to zero whenever libevent returns a DNS error which means the correct dns_stats_t structure is never found, as zero is outside the expected range of values (DNS_IPv4_A, DNS_PTR, DNS_IPv6_AAAA). Adding the BUG assertion confirms this.
Please let me know if I should raise this in the bug tracker or if you need anything else.
This is an _excellent_ find!
I have opened:
https://gitlab.torproject.org/tpo/core/tor/-/issues/40490
We'll likely attempt to submit a patch to libevent and then fix that in Tor. Until this is fixed in libevent and the entire network can migrate (which can be years...), we'll have to live with DNS errors _not_ being per-type on the MetricsPort likely going from:
tor_relay_exit_dns_error_total{record="A",reason="timeout"} 0 ...
to a line without a "record" because we can't tell:
tor_relay_exit_dns_error_total{reason="timeout"} 0
Note that for a successful request that is reason="success", we can tell which record type but not for errors because of that.
To everyone, expect that API breakage on the MetricsPort for the next 0.4.7.x version and evidently when the stable comes out.
Big thanks for this find!
Cheers! David
David Goulet:
On 17 Oct (13:54:22), Arlen Yaroslav via tor-relays wrote:
Hi,
Hi Arlen!
I've done some further analysis on this. The reason my relay is being marked as overloaded is because of DNS timeout errors. I had to dive into the source code to figure this out.
In dns.c, a libevent DNS_ERR_TIMEOUT is being recorded as an OVERLOAD_GENERAL error. Am I correct in saying that a single DNS timeout error within a 72-hour period will result in an overloaded state? If so, it seems overly-stringent given that there are no options available to tune the DNS timeout, max retry etc. parameters. Some lower-specced servers with less than optimal access to DNS resolvers will suffer because of this.
Correct, 1 single DNS timeout will trigger the general overload flag. There were discussion to make it N% of all request to timeout before we would report it with a N being around 1% but unfortunately that was never implemented that way. And so, at the moment, 1 timeout is enough to trigger the problem.
And I think you are right, we would benefit on raising that threshold big time.
FWIW: that's tracked in
https://gitlab.torproject.org/tpo/core/tor/-/issues/40491
We had that on our radar previously but it fell through the cracks. :(
Georg
Hiro, It took me a minute to find the newly added causal information inline with the example PortMetrics output. I was expecting two distinct sections: Possible Causes and Possible Remedies. Either way, thank you for adding the information. Respectfully,
Gary
On Tuesday, October 5, 2021, 2:33:23 PM MDT, Silvia/Hiro hiro@torproject.org wrote:
On 10/4/21 1:36 PM, David Goulet wrote:
On 02 Oct (01:29:56), torix via tor-relays wrote:
My relays (Aramis) marked overloaded don't make any sense either. Two of the ones marked with orange are the two with the lowest traffic I have (2-5 MiB/s and 4-9 MiB/s - not pushing any limits here); the third one with that host has more traffic and is fine.
So far this indicator seems to be no help to me.
Keep in mind that the overload state might not be only about traffic capacity. Like this page states, there other factors including CPU and memory pressure.
https://support.torproject.org/relay-operators/relay-bridge-overloaded/
We are in a continuous process of making it better with feedback from the relay community. It is a hard problem because so many things can change or influence things. And different OSes also makes it challenging.
Another thing here to remember, the overload state will be set for 72 hours even if a SINGLE overload event occurred.
For more details: https://lists.torproject.org/pipermail/tor-relays/2021-September/019844.html
(FYI, we are in the process of adding this information in the support page ^).
We have now updated the support article at: https://support.torproject.org/relay-operators/relay-bridge-overloaded/
We have tried to clarify how and why the overloaded state is triggered. I hope this can help operators understand better why their relays can be found in this state and how a normal state can be recovered.
Please do let us know what you think.
Cheers, -hiro
If you can't find sticking out, that is OK, you can move on and see if it continues to stick. If so, maybe its worth digging more and when 0.4.7 will be stable, you'll be able to enable the MetricsPort (man tor) to get into the rabbit hole a bit deeper.
Cheers! David
tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
_______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
On Tuesday, October 5, 2021 10:33:04 PM CEST Silvia/Hiro wrote:
We have tried to clarify how and why the overloaded state is triggered. I hope this can help operators understand better why their relays can be found in this state and how a normal state can be recovered.
Please do let us know what you think.
DNS timeouts issues... Maybe add how important a local DNS resolver is on exit servers.
and a link to: https://community.torproject.org/relay/setup/exit/ DNS on Exit Relays
On 9/23/21 3:39 PM, Silvia/Hiro wrote:
Let us known how you find this new feature.
It would be nice if even the search form would have that feature too. Currently here all is green: https://metrics.torproject.org/rs.html#search/zwiebeltoralf wherease the details of each of the 2 relays shows the overload indicator.
-- Toralf
On 9/28/21 8:40 PM, Toralf Förster wrote:
On 9/23/21 3:39 PM, Silvia/Hiro wrote:
Let us known how you find this new feature.
It would be nice if even the search form would have that feature too. Currently here all is green: https://metrics.torproject.org/rs.html#search/zwiebeltoralf wherease the details of each of the 2 relays shows the overload indicator.
Yes, good catch. I have just deployed a few minor fixes, among which the overloaded indicator in the search form. I had the intention to announce it tomorrow together with a few updates to the support article following the email threads on the list, but since you mentioned I though you should know already :))
Cheers, -hiro
-- Toralf _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays@lists.torproject.org