Mike Perry:
On 3/2/21 6:01 PM, George Kadianakis wrote:
David Goulet dgoulet@torproject.org writes:
Greetings,
Attached is a proposal from Mike Perry and I. Merge requsest is here:
https://gitlab.torproject.org/tpo/core/torspec/-/merge_requests/22
Hello all,
while working on this proposal I had to change it slightly to add a few more metrics and also to simplify some engineering issues that we would encounter. You can find the changes here: https://gitlab.torproject.org/asn/torspec/-/commit/b57743b9764bd8e6ef8de689d...
Mike, based on your comments in the #40222 ticket, I would appreciate comments on the way the DNS issues will be reported. David argued that they should not be part of the "overload-general" line because they are not an overload and it's not the fault of the network in any way. This is why we added them as separate lines. Furthermore, David suggested we turn them into a threshold "only report if 25% of the total requests have timed out" instead of "only report if at least one time out has occured" since that would be more useful.
I'm confused by this confusion. There's pretty clear precedent for treating packet drops as a sign of network capacity overload. We've also seen it experimentally specifically with respect to DNS, during Rob's experiment. We discussed this on Monday.
However, I agree there's a chance that a single packet drop can be spurious, and/or could be due to ephemeral overload as TCP congestion causes. But 25% is waaaaaaaaaay too high. Even 1% is high IMO, but is more reasonable. We should ask some exits what they see now. The fact that our DNS scanners are not currently seeing this at all, and the issue appeared only for the exact duration of Rob's experiment, suggests that DNS packets drops are extremely rare in healthy network conditions.
Furthermore, revealing the specific type of overload condition increases the ability for the adversary to use this information for various attacks. I'd rather it be combined in all cases, so that the specific cause is not visible. In all cases, the reaction of our systems should be the same: direct less load to relays with this line. If we need to dig, that's what MetricsPort is for.
+1
In fact, this DNS packet drop signal may be particularly useful in traffic analysis attacks. Its reporting, and likely all of this overload reporting, should probably be delayed until something like the top of the hour after it happens. We may even want this delay to be a consensus parameter. Something like "Report only after N minutes", or "Report only N minute windows", perhaps?
That's a good idea, thanks. I am not sure we really need a consensus parameter for that but some delay, which makes sure the DNS packet drop does not aid in traffic analysis, seems indeed to be a smart idea.
Georg
We also decided to simplify the 'overload-ratelimits' line to make it easier to implement (learning whether it was a burst or rate overload in Tor seems to be quite hard, so we decided to merge these two events).
Ok, this makes sense.