Walking Onions status update: week 2 notes - tor-dev

13 Mar 2020


      Walking onions -- week 2 update
Hi!  On our current grant from the zcash foundation, I'm working on a
full specification for the Walking Onions design.  I'm going to try to
send these out thee updates once a week, in case anybody is interested.
My previous updates are linked below:
Week 1:
   formats, preliminaries, git repositories, binary diffs,
   metaformat decisions, and Merkle Tree trickery.
https://lists.torproject.org/pipermail/tor-dev/2020-March/014178.html
You might like to have a look at that update, and its references,
if this update doesn't make sense to you.
===
This week, I worked specifying the nitty-gritty of the SNIP and
ENDIVE document formats.  I used the CBOR meta-format [CBOR] to
build them, and the CDDL specification language [CDDL] to specify
what they should contain.
As before, I've been working in a git repository at [GITHUB]; you
can see the document I've been focusing on this week at
[SNIPFMT].  (That's the thing to read if you want to send me
patches for my grammar.)
There were a few neat things to do here:
* I had to define SNIPs so that clients and relays can be
     mostly agnostic about whether we're using a merkle tree or a
     bunch of signatures.
* I had to define a binary diff format so that relays can keep
     on downloading diffs between ENDIVE documents. (Clients don't
     download ENDIVEs).  I did a quick prototype of how to output
     this format, using python's difflib.
* To make ENDIVE diffs as efficient as possible, it's important
     not to transmit data that changes in every ENDIVE.  To this
     end, I've specified ENDIVEs so that the most volatile parts
     (Merkle trees and index ranges) are recomputed on the relay
     side.  I still need to specify how these re-computations work,
     but I'm pretty sure I got the formats right.
Doing this calculation should save relays a bunch of
     bandwidth each hour, but cost some implementation complexity.
     I'm going to have to come back to this choice going forward
     to see whether it's worth it.
* Some object types are naturally extensible, some aren't.  I've
     tried to err on the size of letting us expand important
     things in the future, and using maps (key->value mappings)
     for object that are particularly important.
In CBOR, small integers are encoded with a little less space
     than small strings.  To that end, I'm specifying the use of
     small integers for dictionary keys that need to be encoded
     briefly, and strings for non-tor and experimental extensions.
* This is a fine opportunity to re-think how we handle document
     liveness.  Right now, consensus directories have an official
     liveness interval on them, but parties that rely on
     consensuses tolerate larger variance than is specified in the
     consensus.  Instead of that approach, the usable lifetime of
     each object is now specified in the object, and is ultimately
     controlled by the authorities.  This gives the directory
     authorities more ability to work around network tolerance
     issues.
Having large lifetime tolerances in the context of walking
     onions is a little risky: it opens us up to an attack where
     a hostile relay holds multiple ENDIVEs, and decides which one
     to use when responding to a request.  I think we can address this
     attack, however, by making sure that SNIPs have a published
     time in them, and that this time moves monotonically forward.
* As I work, I'm identifying other issues in tor that stand in
     the way of a good efficient walking onion implementation that
     will require other follow-up work.  This week I ran into a
     need for non-TAP-based v2 hidden services, and a need for a
     more efficient family encoding.  I'm keeping track of these
     in my outline file.
Fun fact: In number of bytes, the walking onions proposal is now
the 9th-longest proposal in the Tor proposal repository.  And it's
still growing!
Next week, I'm planning to specify ENDIVE reconstruction, circuit
extension, and maybe start on a specification for voting.
[CBOR] RFC 7049: "Concise Binary Object Representation (CBOR)"
    https://tools.ietf.org/html/rfc7049b
[CDDL] RFC 8610: "Concise Data Definition Language (CDDL): A
    Notational Convention to Express Concise Binary Object
    Representation (CBOR) and JSON Data Structures"
    https://tools.ietf.org/html/rfc8610
[GITREPO]  https://github.com/nmathewson/walking-onions-wip
[SNIPFMT] https://github.com/nmathewson/walking-onions-wip/blob/master/specs/02-endive...