Hey all,
I am playing with the load balancing equations in 3.8.3 in dir-spec.txt (https://gitweb.torproject.org/torspec.git/tree//dir-spec.txt#n2454) in order to account for padding, hidden/onion services, and directory traffic, and I've noticed that the fact that nodes can be both Exits and Guards not only vastly complicates the current equations, but will make any attempt at accounting for this additional overhead nearly impossible.
If you've previously tried to understand the current load balancing equations, or wondered why they have 3 major sub-cases and roughly 9 sub-cases, this is entirely to account for Guard+Exit nodes, and to handle relative scarcity of pure Guard and pure Exit nodes in various conditions.
Removing the Guard+Exit ('D') nodes makes all of those cases condense to one single, simple set of solutions:
⎧ E + G + M E + G + M 2⋅E - M - G 2⋅G - M - E ⎫ ⎨Wee: ─────────, Wgg: ─────────, Wme: ───────────, Wmg: ────────────⎬ ⎩ 3⋅E 3⋅G 3⋅E 3⋅G ⎭
No missing constraints, no sub-cases, no boundary conditions. Easy to reason about. Easy to intuitively convince yourself that the solutions are correct.
What I want to do is add two variables to the consensus that allow us to specify expected (or measured) overhead at Guard nodes (G_o) and at Middle nodes (M_o). These two variables would represent the sum of padding, directory overhead, and hidden service traffic experienced by Guard and Middle nodes, respectively. If we ignore Guard+Exit nodes, this makes the load balancing equations become:
(1-G_o)*Wgg*G == (1-M_o)*(M + Wme*E + Wmg*G) # Guard bw == middle bw (1-G_o)*Wgg*G == Wee*E # Guard bw == exit bw Wmg*G + Wgg*G == G # Guards can be middles or Guards Wme*E + Wee*E == E # Exits can be middles or Exits
These solutions are a bit longer so I won't paste them here, but they are still reasonable if we ditch the Guard+Exit idea. They can be simplified even further by using a similar approach as the GuardFraction idea from Proposal 236 (Section 1.3): We can simply adjust each Guard's bandwidth by G_o either before solving the equations, or by adjusting Wgg and Wmg after the solution, then we only need to include the (1-M_o) factor in the equations themselves.
We can't really do this simplification if we keep Guard+Exit, because of the interplay between Exit and Guard weights in the system of equations and cases.
Additionally, if we keep Guard+Exit, the solutions literally consume an entire page of text to represent symbolically, and the sub-cases become even more complicated to evaluate and check for proper bounds and correctness. In fact, George has already run into such edge cases while trying to implement Proposal 236 (See https://trac.torproject.org/projects/tor/ticket/16255).
Moreover, I've been thinking about the surveillance incentives for Guard+Exit nodes, and I actually think it is a net risk to the network for nodes to be both Guards and Exits. It's easy to find an excuse to watch an Exit node. If anything bad ever happens from an Exit, you may be able to convince someone to let you keep an eye on it** "just in case". A Guard-only node shouldn't cause anybody any problems, though, and should be much harder to justify monitoring.
So this means that Guard+Exit nodes allow an adversary to justify monitoring large portions of both the entry and exit traffic of the network without any strange/technical/esoteric arguments about the need to perform complicated statistical attacks with questionable accuracy and high collateral damage, and instead force them to justify their monitoring another way.
I also hear from large node operators that it is still the Exit flag that causes their nodes to really become overloaded, so if that's actually true, then it's likely that Exit nodes make poor Guards anyway, especially if Exit bandwidth is scarce (which it often is).
So, what do people think? Does anyone violently disagree with removing the Guard flag from Guard+Exit nodes?
** I used to run an Exit node, and my lawyer at the time was told exactly this at one point during a police visit about a nonsensical bomb threat to some 4chan equivalent. He was specifically told "We will be monitoring this node from now on" by one of the visiting officers. I was unable to ever discover what that actually meant, and the EFF unfortunately did not feel it was worth pursuing to help me find out. It may have been a scare tactic, but it could just as easily have not been, and for all I know they may have been actually able to convince a judge to let them do this.