Hi Nick,
This is from week 3:
On 30 Mar 2020, at 07:16, Nick Mathewson nickm@freehaven.net wrote:
There's a faster calculation and more efficient encoding for Weighted, because there's a common factor in each POS() calculation: (1 << 32) / total_weight
If we remove that factor, we end up with a much simpler algorithm, that's also more flexible:
Algorithm: Expanding a "Weighted" indexspec.
Each weighted indexspec also has a multiplier, which may vary
between indexspecs.
Let total_weight = SUM(indexspec.index_weights)
Verify total_weight * multiplier <= UINT64_MAX.
Let total_so_far = 0.
Let result_idx = {} (an empty mapping).
Define POS(b) = b * multiplier
For 0 <= i < LEN(indexspec.indexweights):
Let w = indexspec.indexweights[i].
Let lo = POS(total_so_far).
Let total_so_far = total_so_far + w.
Let hi = POS(total_so_far) - 1.
Append (lo, hi) => i to result_idx.
Return result_idx.
I'm not so sure about this one -- see the note below about the total value.
If multiplier is large, then the weights can be quite small.
(Small weights sacrifice some accuracy.) And if the multiplier
is 1, you effectively have a "Raw" index.
If you make those changes, you should be able to use a similar process to expand all the different index types. (After multiplying, truncating, or hashing, you either end up with a delta, or an absolute position. You can turn deltas into absolute positions, and then feed them all into the same base algorithm.)
There are also a few things I think might be bugs:
Was there meant to be a constraint that the Weighted total is UINT64_MAX? Or close to UINT64_MAX?
The actual intention is that it has to be " = UINT32_MAX", not" <= UINT32_MAX". Remember that the the client picks its next relay by specifying a random index value in range 0..UINT32_MAX, and it expects that the relay to that it is extending to _will_ have a snip that covers that range.
The fixed parameters don't make much sense otherwise.
Hmm, ok, then if we use multiplication, there are going to be some issues with the cumulative error from truncating the original division.
The error won't be too big, on average, it's N/2, worst case, it's N. (Where N is the number of indexes.) That's a tiny amount for large weights, but a huge amount for small weights. (Assuming that we continue to vote for relays with weights 1-100.)
But divisions also have some cumulative error.
Once you've revised these sections, let's review?
Only doing one division on the authorities is very appealing.
T