On 27 Feb 2017, at 11:35, Nick Mathewson nickm@alum.mit.edu wrote:
Is there any reason to think this amount of smoothing would not be save?
... Given the existing variance is around 2%, and the existing rounding of 0.5%, rounding by 5% seems quite sensible.
(Typical variance *between* bandwidth authorities is much more than this anyway.)
I was wrong: mikeperry raised some serious issues with this scheme.
Issue 1: Increased Inaccuracy
This scheme increases the inaccuracy of the bandwidth authority votes, compared to the bandwidth measurements. This effect is particularly pronounced for high-bandwidth relays, because the bandwidths are distributed according to a power law.
(Mike, feel free to correct me if I've got this wrong.)
For example, the top relay is approximately 1% of the network, or 750 Mbps. The existing variance due to rounding is 0.5% of that, or 4 Mbps.
The variance due to this scheme would be 5.5%, or 41 Mbps.
If the top 10 relays were rounded down, which would happen in approximately 1 in 2**10 consensuses (once every 6 weeks), this could distribute ~500 Mbps of client bandwidth to the rest of the relays arbitrarily and without warning.
This effect gets worse as relay performance increases due to multithreading or optimisation or processor improvements.
Issue 2: Different Levels of Inaccuracy
As a separate but related issue, rounding to 3 significant digits means that the error ratio is much higher when those digits are 100..., compared with when those digits are 999...
The maximum error ratios vary by 10x: 100- 101 0.00499 ... 999-1000 0.00050
This is an existing bug, and contributes to network instability. We should fix it anyway, and this is a good opportunity to do so.
Is there a greedier smoothing algorithm that would produce better results?
...
To address issue 1, I suggest that we report higher bandwidths more accurately, and lower bandwidths less accurately.
Here's a sketch of how that could work:
Top 1%: 3+1 significant digits (900 geometric values per decile) Next 9%: 2+1 significant digits (90 geometric values per decile) Next 90%: 1+1 significant digits (9 geometric values per decile) 0-100: set values: 0, 1, 5, 22 (4 geometric values for 4 deciles) (0-100 is special, because the most common values are 0 and 20.)
To address issue 2, I suggest that we round bandwidths geometrically, rather than linearly.
Here's a sketch of how that could work when rounding 100-1000 to 2 significant digits:
Normal linear rounding yields 90 values: 100, 110, 120, ... , 980, 990, 1000 Using the linear midpoints (a+b)/2: 105, 115, ... , 985, 995
Geometric rounding yields 90 values (using an extra significant digit): 100, 103, 105, ... , 950, 975, 1000 Using the geometric midpoints sqrt(a*b): 101, 104, ... , 962, 987
The maximum geometric error ratios vary by up to 2x, and that's due to the rounding to 3 significant digits (otherwise, all the ratios would be equal): 100- 103 0.01489 ... 975-1000 0.01274
But this is much better than linear rounding, where the ratios vary by up to 10x.
I've attached a script that calculates these geometric series: it's configured to calculate the 900 values for Top 1% rounding, their geometric error ratios, and their linear error differences.
(Changing precision to 2+1 and 1+1 gets you the 90 and 9 series, changing precision to 1, values to 3, and high to 100.0 gets you the tail-end series.)
This would be relatively easy to implement in C, or it could be precalculated and stored using ~2 kilobytes (1003 * uint16_t).
And as we're also doing work on compression:
There are also opportunities for table-driven compression: each value can be specified using 10 bits for the table index, plus a few bits for the power of 10 of the multiplier:
Top: 750 Mbps * 1024 / 8 = 96000 KBps = 9599 * 10**1 (1 bit) 1%: 100 Mbps * 1024 / 8 = 12800 KBps = 129 * 10**2 (2 bits) 10%: 25 Mbps * 1024 / 8 = 3200 KBps = 36 * 10**2 (2 bits) 99%: 4 Mbps * 1024 / 8 = 512 KBps = 46 * 10**1 (1 bit) End: 20 KBps = 20 KBps = 22 * 10**0 (1 bit)
6 extra bits (2 bytes total) would be sufficient to cover a 10,000-fold increase in maximum individual relay bandwidth (up to 8 Tbps).
T
-- Tim Wilson-Brown (teor)
teor2345 at gmail dot com PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B ricochet:ekmygaiu4rzgsk6n xmpp: teor at torproject dot org ------------------------------------------------------------------------