Hi Nick,
I'm interested in following along with Walking Onions, but I might drop out when the relay IPv6 work gets busy.
I'm not sure how you'd like feedback, so I'm going to try to put it in emails, or in pull requests.
(I made one comment on a git commit in walking-onions-wip, but I'm not sure if you see those, so I'll repeat it here.)
On 14 Mar 2020, at 03:52, Nick Mathewson nickm@torproject.org wrote:
This week, I worked specifying the nitty-gritty of the SNIP and ENDIVE document formats. I used the CBOR meta-format [CBOR] to build them, and the CDDL specification language [CDDL] to specify what they should contain.
As before, I've been working in a git repository at [GITHUB]; you can see the document I've been focusing on this week at [SNIPFMT]. (That's the thing to read if you want to send me patches for my grammar.)
I'm not sure if you've got to exit ports yet, but here's one possible way to partition ports: * choose large partitions so that all exits support all ports in the partition * choose smaller categories so that most exits support most ports in the partition * ignore small partitions, they're bad for client privacy anyway
For example, you might end up with: * web (80 & 443) * interactive (SSH, IRC, etc.) * bulk (torrent, etc.) * default exit policy * reduced exit policy
I'm not sure if we will want separate categories for IPv4-only and dual-stack policies. We can probably ignore IPv6-only policies for the moment, but we should think about them in future.
There were a few neat things to do here:
I had to define SNIPs so that clients and relays can be mostly agnostic about whether we're using a merkle tree or a bunch of signatures.
I had to define a binary diff format so that relays can keep on downloading diffs between ENDIVE documents. (Clients don't download ENDIVEs). I did a quick prototype of how to output this format, using python's difflib.
Can we make the OrigBytesCmdId use start and length? length may be shorter than end, and it will never be longer.
If we are doing chunk-based encoding, we could make start relative to the last position in the original file. But that would mean no back-tracking, which means we can't use some more sophisticated diff algorithms.
To make ENDIVE diffs as efficient as possible, it's important not to transmit data that changes in every ENDIVE. To this end, I've specified ENDIVEs so that the most volatile parts (Merkle trees and index ranges) are recomputed on the relay side. I still need to specify how these re-computations work, but I'm pretty sure I got the formats right.
Doing this calculation should save relays a bunch of bandwidth each hour, but cost some implementation complexity. I'm going to have to come back to this choice going forward to see whether it's worth it.
Some object types are naturally extensible, some aren't. I've tried to err on the size of letting us expand important things in the future, and using maps (key->value mappings) for object that are particularly important.
In CBOR, small integers are encoded with a little less space than small strings. To that end, I'm specifying the use of small integers for dictionary keys that need to be encoded briefly, and strings for non-tor and experimental extensions.
This is a fine opportunity to re-think how we handle document liveness. Right now, consensus directories have an official liveness interval on them, but parties that rely on consensuses tolerate larger variance than is specified in the consensus. Instead of that approach, the usable lifetime of each object is now specified in the object, and is ultimately controlled by the authorities. This gives the directory authorities more ability to work around network tolerance issues.
Having large lifetime tolerances in the context of walking onions is a little risky: it opens us up to an attack where a hostile relay holds multiple ENDIVEs, and decides which one to use when responding to a request. I think we can address this attack, however, by making sure that SNIPs have a published time in them, and that this time moves monotonically forward.
If the issue is having multiple valid ENDIVEs, then authorities could also put a cap on the number of concurrently valid ENDIVEs.
There are two simple schemes to implement a cap: * set a longer interval for rebuilding all ENDIVEs (the cap is the rebuild interval, divided by the validity interval) * refuse to sign a new SNIP for a relay that's rapidly changing (or equivalently, leave that relay out of the next ENDIVE)
Both these schemes also limit the amount of bandwidth used for a relay that's rapidly changing details.
- As I work, I'm identifying other issues in tor that stand in the way of a good efficient walking onion implementation that will require other follow-up work. This week I ran into a need for non-TAP-based v2 hidden services, and a need for a more efficient family encoding. I'm keeping track of these in my outline file.
Do "tricky restrictions" include the IP subnet restriction (avoid relays in the same IPv4 /16 and IPv6 /32) ?
What about a heterogenous IPv4 / IPv6 network, where IPv4-only relays can't connect to IPv6-only relays?
If we do decide to add IPv6-only relays, we'll probably add them in this order: * IPv6-only bridges (needs dual-stack bridge guards / middles?) * IPv6-only exits (needs dual-stack middles) * IPv6-only guards (needs dual-stack middles) * IPv6-only middles (needs dual-stack or IPv6-only guards and exits, removes need for dual-stack middles)
What about bridge guards? (That is, can bridges add an extra hop into circuits, to protect themselves from being discovered by middles?)
Maybe bridges could commit to their (blinded) bridge guards in their self-signed own snip? Or the bridge authority could distribute a bridge ENDIVE? (We might need multiple bridge authorities for redundancy.)
[CBOR] RFC 7049: "Concise Binary Object Representation (CBOR)" https://tools.ietf.org/html/rfc7049b
[CDDL] RFC 8610: "Concise Data Definition Language (CDDL): A Notational Convention to Express Concise Binary Object Representation (CBOR) and JSON Data Structures" https://tools.ietf.org/html/rfc8610
[GITREPO] https://github.com/nmathewson/walking-onions-wip
[SNIPFMT] https://github.com/nmathewson/walking-onions-wip/blob/master/specs/02-endive...