Filename: 276-lower-bw-granularity.txt
Title: Report bandwidth with lower granularity in consensus documents
Author: Nick Mathewson
Created: 20-Feb-2017
Status: Open
Target: 0.3.1.x-alpha
1. Overview
This document proposes that, in order to limit the bandwidth needed for
networkstatus diffs, we lower the granularity with which bandwidth is
reported in consensus documents.
Making this change will reduce the total compressed ed diff download
volume by around 10%.
2. Motivation
Consensus documents currently report bandwidth values as the median
of the measured bandwidth values in the votes. (Or as the median of
all votes' values if there are not enough measurements.) And when
voting, in turn, authorities simply report whatever measured value
they most recently encountered, clipped to 3 significant base-10
figures.
This means that, from one consensus to the next, these weights very
often and with little significance: A large fraction of bandwidth
transitions are under 2% in magnitude.
As we begin to use consensus diffs, each change will take space to
transmit. So lowering the amount of changes will lower client
bandwidth requirements significantly.
3. Proposal
I propose that we round the bandwidth values as they are placed in
the votes to two no more than significant digits. In addition, for
values beginning with decimal "2" through "4", we should round the
first two digits the nearest multiple of 2. For values beginning
with decimal "5" though "9", we should round to the nearest multiple
of 5.
This change does not require a consensus method; it will take effect
once enough authorities have upgraded.
4. Analysis
The rounding proposed above will not round any value by more than
5%, so the overall impact on bandwidth balancing should be small.
In order to assess the bandwidth savings of this approach, I
smoothed the January 2017 consensus documents' Bandwidth fields,
using scripts from [1]. I found that if clients download
consensus diffs once an hour, they can expect 11-13% mean savings
after xz or gz compression. For two-hour intervals, the savings
is 8-10%; for three-hour or four-hour intervals, the savings only
is 6-8%. After that point, we start seeing diminishing returns,
with only 1-2% savings on a 72-hour interval's diff.
[1] https://github.com/nmathewson/consensus-diff-analysis
5. Open questions:
Is there a greedier smoothing algorithm that would produce better
results?
Is there any reason to think this amount of smoothing would not
be save?
Would a time-aware smoothing mechanism work better?