Hi everybody,
15 days, 16 hours and 15 minutes (without any restart) have passed since my 2 involved in this problem are restarted (started at the same time).
And there is something very clear and interesting to see. They both have restarted from nearly zero (~100kB/s) in the first days, and a problem is still completly present on 1 of the 2 identity :
-> The first one (the older one also) is stuck at less than 1MB/sec (5,510 in consensus weight) - ArachnideFR94 -> The second one is growing, growing and growing, now around 9MB/sec (and more than 75,000 in the consensus weight) - ArachnideFR94v2
It can be very clearly seen by observing graph on Tor Atlas. (But there is no more bandwith earthquake like I had on december.)
On this machine, the first daemon is launched by /etc/init.d, while the second one is manually lauched via another non-admin user.
No differences, same nice value, completely the same torrc file apart from nickname, DirPort, ORPort and MyFamily. Pretty large max bandwith for both of them :
RelayBandwidthRate 25600 KB RelayBandwidthBurst 122070 KB ShutdownWaitLength 90
-- Into the log file :
The only one difference was this (I just corrected it) - it was on the server that is working perfectly fine (ArachnideFR94v2) :
Jan 30 06:07:03.000 [warn] Failing because we have 4063 connections already. Please raise your ulimit -n. [6978 similar message(s) suppressed in last 21600 seconds] Jan 30 12:07:17.000 [warn] Failing because we have 4063 connections already. Please raise your ulimit -n. [20004 similar message(s) suppressed in last 21600 seconds]
-> Oops :s I know very well this problem and I just forgot to prevent it this time - Sorry for that - Corrected by editing /etc/security/limits.conf and adding some lines about the user used by my second tor daemon - I just restarted this one by SIGINT, wait and new launch (logged with a much better ulimit !) today at 12:35 UTC.
Absolutely nothing else abnormal into the log of the 2 daemons. And the problem of ulimit was on the one that runs perfectly well! --
Any idea of why the first one is completely stuck at such a low value ?
The next steps would be to set the 2 tor daemons on the /etc/init.d using the Multiple Tor Processes example at www.torservers.net on this machine, and to watch if ArachnideFR94v2 launched by /etc/init.d is still working as well as today.
If no solution appears, may be trying to re-install the entire system, and watch what happens would be interesting. If still no solutions, it will be complicated to understand !
Best regards, and thank you in advance for your ideas Julien ROBIN
[Solved]
Hi folks,
Probably my last email about this, because without having made any change, bandwidth authority algorithm seems to have reconciled with the server involved by this problem. It's a useful information, after all! It's not the only thing I saw.
The recovery has been very slow (see 6E7DA10B115976457A2A2BD42CFD141D1430B91D in Tor atlas) while few days ago, the second daemon was hitting very high bandwidth (between 20 and 25MB/s) and began to be a little bit "shaky" in consensus weight.
With such high but unstable bandwidth (I wonder if I did not had the same graphs before the beginning of problems in december) I was afraid to see the problem to appear a new time, so I decided to put the following value into the torrc file:
RelayBandwidthRate 15360 KB
And now it seems to be more peaceful.
My conclusion after all this:
When the machine is riding at full capacity, but the still growing bandwidth reaches an awesome value that is not reasonable for your machine, it can start to shake. In this case, if you are impressed and want to see how far it's going to rise, bandwidth authority algorithm may be going to "break" your machine, and when the break happens, it’s too late : it will inevitably and gradually go down to some hundred kilobytes. The recovery will be horribly slow and painful, since during approximately 2 month, you are not going to see any improvement.
PS: This machine seems to be more capricious when approaching maximum capacity (Intel Xeon E3-1220 V2 @ 3.10GHz, 4 cores) than other ones (Atom k510, Athlon II X2 240 and Athlon II x2 270), for which bandwidth authority algorithm can choose good value, and work completely fine without bandwith limitation, and without having too many "Your computer is too slow to handle [...]" nor shaky bandwidth.
Here is everything I saw and understood! Have a nice day (or night)
Regards, Julien ROBIN
----- Mail original ----- De: "julien robin28" julien.robin28@free.fr À: tor-relays@lists.torproject.org Envoyé: Jeudi 30 Janvier 2014 14:02:00 Objet: [tor-relays] Problem encountered with Bandwith Authority algorithm (was : "bandwidth authority algorithm is cracked")
Hi everybody,
15 days, 16 hours and 15 minutes (without any restart) have passed since my 2 involved in this problem are restarted (started at the same time).
And there is something very clear and interesting to see. They both have restarted from nearly zero (~100kB/s) in the first days, and a problem is still completly present on 1 of the 2 identity :
-> The first one (the older one also) is stuck at less than 1MB/sec (5,510 in consensus weight) - ArachnideFR94 -> The second one is growing, growing and growing, now around 9MB/sec (and more than 75,000 in the consensus weight) - ArachnideFR94v2
It can be very clearly seen by observing graph on Tor Atlas. (But there is no more bandwith earthquake like I had on december.)
On this machine, the first daemon is launched by /etc/init.d, while the second one is manually lauched via another non-admin user.
No differences, same nice value, completely the same torrc file apart from nickname, DirPort, ORPort and MyFamily. Pretty large max bandwith for both of them :
RelayBandwidthRate 25600 KB RelayBandwidthBurst 122070 KB ShutdownWaitLength 90
-- Into the log file :
The only one difference was this (I just corrected it) - it was on the server that is working perfectly fine (ArachnideFR94v2) :
Jan 30 06:07:03.000 [warn] Failing because we have 4063 connections already. Please raise your ulimit -n. [6978 similar message(s) suppressed in last 21600 seconds] Jan 30 12:07:17.000 [warn] Failing because we have 4063 connections already. Please raise your ulimit -n. [20004 similar message(s) suppressed in last 21600 seconds]
-> Oops :s I know very well this problem and I just forgot to prevent it this time - Sorry for that - Corrected by editing /etc/security/limits.conf and adding some lines about the user used by my second tor daemon - I just restarted this one by SIGINT, wait and new launch (logged with a much better ulimit !) today at 12:35 UTC.
Absolutely nothing else abnormal into the log of the 2 daemons. And the problem of ulimit was on the one that runs perfectly well! --
Any idea of why the first one is completely stuck at such a low value ?
The next steps would be to set the 2 tor daemons on the /etc/init.d using the Multiple Tor Processes example at www.torservers.net on this machine, and to watch if ArachnideFR94v2 launched by /etc/init.d is still working as well as today.
If no solution appears, may be trying to re-install the entire system, and watch what happens would be interesting. If still no solutions, it will be complicated to understand !
Best regards, and thank you in advance for your ideas Julien ROBIN _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays@lists.torproject.org