Hi Nick,

On 21 Mar 2020, at 05:38, Nick Mathewson <nickm@torproject.org> wrote:

Walking Onions: week 3 update

As you might recall, the SNIPs need to be nice and small, but they
need to be completely self-contained.  The ENDIVE, on the other
hand, needs to be diff-friendly, and compressible.  Relays need to
derive all the SNIPs from the ENDIVE.  The ENDIVE and SNIP formats
that I worked on during last week [SNIPFMT] was designed to meet these
criteria.

This week, I wrote up the algorithms to extract signed SNIPs from an
ENDIVE.  This is in section 4 [RECONST] of the specs in my
work-in-progress repository [GITREPO] This was mainly straightforward
consequence from the work I did last week, but there were a few
things that turned up.

One trickier aspect of this work is that we want to move most of the
decision-making to the authorities, and have the relays simply do an
expansion algorithm.  Updating all the relays would be slow, so relays
need to be able to handle new index types and data types without
fully understanding them.  That means that we need to specify an
algorithm that can work flexibly with many future kinds of data.

Another tricky aspect is that the data which the relays need to put
into SNIPs all needs to be signed, so all of that data needs to come
out the same when relays calculate it as the when the authorities
calculate it. Thus, all the algorithms for building Merkle trees and
indices and SNIPRouterData bodies need to be deterministic.

I think that I got all of this relatively right in [RECONST], but
there are probably some mistakes hiding in there.

I'm hoping that there is a cleaner alternative to the weighted-index
reconstruction algorithm -- the one that's there right now puts a
lot of requirements on the input data, in order to avoid
floating-point operations.

There's a shorter encoding for Raw:

for each [i, pos1, pos2] in index_ranges:
  w = pos2 - pos1
  j  = the index of pos1 among all sorted pos1s
  new_encoding = [i, j, w]

[i, j, w] is an efficient encoding if index_ranges is sparse compared
with ENDIVERouterData, because:
* j has the same cardinality as I
* w is smaller than pos1 and pos2

If index_ranges is dense, there may be a more efficient encoding:
  add missing i with weight 0
  drop j

With this encoding, you can drop a few of the constraints.

There's a faster calculation and more efficient encoding for
Weighted, because there's a common factor in each POS()
calculation:
(1 << 32) / total_weight

If we remove that factor, we end up with a much simpler algorithm,
that's also more flexible:

Algorithm: Expanding a "Weighted" indexspec.


Each weighted indexspec also has a multiplier, which may vary

between indexspecs.


Let total_weight = SUM(indexspec.index_weights)

Verify total_weight * multiplier <= UINT64_MAX.


Let total_so_far = 0.

Let result_idx = {} (an empty mapping).

Define POS(b) = b * multiplier


For 0 <= i < LEN(indexspec.indexweights):

   Let w = indexspec.indexweights[i].

   Let lo = POS(total_so_far).

   Let total_so_far = total_so_far + w.

   Let hi = POS(total_so_far) - 1.

   Append (lo, hi) => i to result_idx.


Return result_idx.


If multiplier is large, then the weights can be quite small.

(Small weights sacrifice some accuracy.) And if the multiplier

is 1, you effectively have a "Raw" index.


If you make those changes, you should be able to use a similar

process to expand all the different index types. (After multiplying,
truncating, or hashing, you either end up with a delta, or an
absolute position. You can turn deltas into absolute positions,
and then feed them all into the same base algorithm.)

There are also a few things I think might be bugs:

Was there meant to be a constraint that the Weighted total is

UINT64_MAX? Or close to UINT64_MAX?

The fixed parameters don't make much sense otherwise.


I think v2 and v3 onion services assign descriptors to the next
highest HSDir RSA id (or ed25519 hash). But
INDEX_FROM_RING_KEYS() uses the relay input as the lowest value.

There is no next member for the last member in
INDEX_FROM_RING_KEYS(), but the code asks for one.
(Perhaps there are some typos here that make the code hard to
understand.)

We'll need special treatment for ring wrapping. (That is, 0xFF… is not
a real upper limit, it actually means, "use the lowest relay".)

It's weird to call a middle value a "suffix", but "infix" is also a bit of an
unusual word.

There are also a bunch of typos, let me know when you're ready for
copy-editing.

T