Hi Nick,
On 21 Mar 2020, at 05:38, Nick Mathewson nickm@torproject.org wrote:
Walking Onions: week 3 update
As you might recall, the SNIPs need to be nice and small, but they need to be completely self-contained. The ENDIVE, on the other hand, needs to be diff-friendly, and compressible. Relays need to derive all the SNIPs from the ENDIVE. The ENDIVE and SNIP formats that I worked on during last week [SNIPFMT] was designed to meet these criteria.
This week, I wrote up the algorithms to extract signed SNIPs from an ENDIVE. This is in section 4 [RECONST] of the specs in my work-in-progress repository [GITREPO] This was mainly straightforward consequence from the work I did last week, but there were a few things that turned up.
One trickier aspect of this work is that we want to move most of the decision-making to the authorities, and have the relays simply do an expansion algorithm. Updating all the relays would be slow, so relays need to be able to handle new index types and data types without fully understanding them. That means that we need to specify an algorithm that can work flexibly with many future kinds of data.
Another tricky aspect is that the data which the relays need to put into SNIPs all needs to be signed, so all of that data needs to come out the same when relays calculate it as the when the authorities calculate it. Thus, all the algorithms for building Merkle trees and indices and SNIPRouterData bodies need to be deterministic.
I think that I got all of this relatively right in [RECONST], but there are probably some mistakes hiding in there.
I'm hoping that there is a cleaner alternative to the weighted-index reconstruction algorithm -- the one that's there right now puts a lot of requirements on the input data, in order to avoid floating-point operations.
There's a shorter encoding for Raw:
for each [i, pos1, pos2] in index_ranges: w = pos2 - pos1 j = the index of pos1 among all sorted pos1s new_encoding = [i, j, w]
[i, j, w] is an efficient encoding if index_ranges is sparse compared with ENDIVERouterData, because: * j has the same cardinality as I * w is smaller than pos1 and pos2
If index_ranges is dense, there may be a more efficient encoding: add missing i with weight 0 drop j
With this encoding, you can drop a few of the constraints.
There's a faster calculation and more efficient encoding for Weighted, because there's a common factor in each POS() calculation: (1 << 32) / total_weight
If we remove that factor, we end up with a much simpler algorithm, that's also more flexible:
Algorithm: Expanding a "Weighted" indexspec.
Each weighted indexspec also has a multiplier, which may vary between indexspecs.
Let total_weight = SUM(indexspec.index_weights) Verify total_weight * multiplier <= UINT64_MAX.
Let total_so_far = 0. Let result_idx = {} (an empty mapping). Define POS(b) = b * multiplier
For 0 <= i < LEN(indexspec.indexweights): Let w = indexspec.indexweights[i]. Let lo = POS(total_so_far). Let total_so_far = total_so_far + w. Let hi = POS(total_so_far) - 1. Append (lo, hi) => i to result_idx.
Return result_idx.
If multiplier is large, then the weights can be quite small. (Small weights sacrifice some accuracy.) And if the multiplier is 1, you effectively have a "Raw" index.
If you make those changes, you should be able to use a similar process to expand all the different index types. (After multiplying, truncating, or hashing, you either end up with a delta, or an absolute position. You can turn deltas into absolute positions, and then feed them all into the same base algorithm.)
There are also a few things I think might be bugs:
Was there meant to be a constraint that the Weighted total is UINT64_MAX? Or close to UINT64_MAX? The fixed parameters don't make much sense otherwise.
I think v2 and v3 onion services assign descriptors to the next highest HSDir RSA id (or ed25519 hash). But INDEX_FROM_RING_KEYS() uses the relay input as the lowest value.
There is no next member for the last member in INDEX_FROM_RING_KEYS(), but the code asks for one. (Perhaps there are some typos here that make the code hard to understand.)
We'll need special treatment for ring wrapping. (That is, 0xFF… is not a real upper limit, it actually means, "use the lowest relay".)
It's weird to call a middle value a "suffix", but "infix" is also a bit of an unusual word.
There are also a bunch of typos, let me know when you're ready for copy-editing.
T