Over Thanksgiving and into early this week, we ran an experiment to test a feedback mechanism to attempt to allocate usage of the Tor network such that the measured stream capacities through all relays became equal: https://gitweb.torproject.org/torflow.git/blob/HEAD:/NetworkScanners/BwAutho...
This experiment failed in 3 ways:
1. It drove many relays down to 0 utilization. Scott Bennett noted this in his post to tor-relays, and at least one 10Mbit relay operator also commented on the traffic drop-off of their node in #tor.
2. It only created one PID 'setpoint' for the entire network, even though different types of nodes see different load characteristics, and despite it being impossible to shift load from an Exit node to a Middle node, for example.
3. It kept allocating bandwidth to some relays (especially Middle and non-default-policy Exits) until they hit INT32_MAX in the consensus, and everything finally exploded. We then shut off the feedback by removing the consensus parameters.
I've made five major changes to try to address these issues:
1. Don't perform multiple rounds of negative feedback for slow nodes.
2. We now group nodes by their flags into four categories (Guard, Middle, Exit, and Guard+Exit), and compute a different PID setpoint for each class.
2. Circuit failure now counts more. Circuit failure is our CPU overload signal, as nodes that hit CPU overload being dropping onionskins and failing extends. Instead of using the circuit success rate as a multiplier against the pid_error, we now actually compute a circ_error similar to the pid_error, and use it as the pid_error if it is more negative. We also now set FastFirstHopPK 0 to ensure that Guard nodes also get tested for circuit failure.
3. Raised the PID setpoint slightly, which should prevent us from piling quite so much weight onto fast relays.
5. Cap feedback via a consensus parameter.
All of these changes are governed by consensus parameters. See: https://gitweb.torproject.org/torflow.git/blob/HEAD:/NetworkScanners/BwAutho... for more details.
The parameters governing feedback are bwauthnsbw=1 and bwauthti. So long as one or both of these are present and non-zero in the consensus parameter list, the feedback experiment is active.
We'll probably be running this next experiment for about a week (or perhaps longer if it doesn't explode and seems to improve performance on https://metrics.torproject.org/performance.html) starting tonight or tomorrow.
Please keep an eye on your relays and tell us if anything unexpected happens over the next week or so.
Thanks!