Walking onions -- week 2 update
Hi! On our current grant from the zcash foundation, I'm working on a full specification for the Walking Onions design. I'm going to try to send these out thee updates once a week, in case anybody is interested.
My previous updates are linked below:
Week 1: formats, preliminaries, git repositories, binary diffs, metaformat decisions, and Merkle Tree trickery.
https://lists.torproject.org/pipermail/tor-dev/2020-March/014178.html
You might like to have a look at that update, and its references, if this update doesn't make sense to you.
===
This week, I worked specifying the nitty-gritty of the SNIP and ENDIVE document formats. I used the CBOR meta-format [CBOR] to build them, and the CDDL specification language [CDDL] to specify what they should contain.
As before, I've been working in a git repository at [GITHUB]; you can see the document I've been focusing on this week at [SNIPFMT]. (That's the thing to read if you want to send me patches for my grammar.)
There were a few neat things to do here:
* I had to define SNIPs so that clients and relays can be mostly agnostic about whether we're using a merkle tree or a bunch of signatures.
* I had to define a binary diff format so that relays can keep on downloading diffs between ENDIVE documents. (Clients don't download ENDIVEs). I did a quick prototype of how to output this format, using python's difflib.
* To make ENDIVE diffs as efficient as possible, it's important not to transmit data that changes in every ENDIVE. To this end, I've specified ENDIVEs so that the most volatile parts (Merkle trees and index ranges) are recomputed on the relay side. I still need to specify how these re-computations work, but I'm pretty sure I got the formats right.
Doing this calculation should save relays a bunch of bandwidth each hour, but cost some implementation complexity. I'm going to have to come back to this choice going forward to see whether it's worth it.
* Some object types are naturally extensible, some aren't. I've tried to err on the size of letting us expand important things in the future, and using maps (key->value mappings) for object that are particularly important.
In CBOR, small integers are encoded with a little less space than small strings. To that end, I'm specifying the use of small integers for dictionary keys that need to be encoded briefly, and strings for non-tor and experimental extensions.
* This is a fine opportunity to re-think how we handle document liveness. Right now, consensus directories have an official liveness interval on them, but parties that rely on consensuses tolerate larger variance than is specified in the consensus. Instead of that approach, the usable lifetime of each object is now specified in the object, and is ultimately controlled by the authorities. This gives the directory authorities more ability to work around network tolerance issues.
Having large lifetime tolerances in the context of walking onions is a little risky: it opens us up to an attack where a hostile relay holds multiple ENDIVEs, and decides which one to use when responding to a request. I think we can address this attack, however, by making sure that SNIPs have a published time in them, and that this time moves monotonically forward.
* As I work, I'm identifying other issues in tor that stand in the way of a good efficient walking onion implementation that will require other follow-up work. This week I ran into a need for non-TAP-based v2 hidden services, and a need for a more efficient family encoding. I'm keeping track of these in my outline file.
Fun fact: In number of bytes, the walking onions proposal is now the 9th-longest proposal in the Tor proposal repository. And it's still growing!
Next week, I'm planning to specify ENDIVE reconstruction, circuit extension, and maybe start on a specification for voting.
[CBOR] RFC 7049: "Concise Binary Object Representation (CBOR)" https://tools.ietf.org/html/rfc7049b
[CDDL] RFC 8610: "Concise Data Definition Language (CDDL): A Notational Convention to Express Concise Binary Object Representation (CBOR) and JSON Data Structures" https://tools.ietf.org/html/rfc8610
[GITREPO] https://github.com/nmathewson/walking-onions-wip
[SNIPFMT] https://github.com/nmathewson/walking-onions-wip/blob/master/specs/02-endive...
Hi Nick,
I'm interested in following along with Walking Onions, but I might drop out when the relay IPv6 work gets busy.
I'm not sure how you'd like feedback, so I'm going to try to put it in emails, or in pull requests.
(I made one comment on a git commit in walking-onions-wip, but I'm not sure if you see those, so I'll repeat it here.)
On 14 Mar 2020, at 03:52, Nick Mathewson nickm@torproject.org wrote:
This week, I worked specifying the nitty-gritty of the SNIP and ENDIVE document formats. I used the CBOR meta-format [CBOR] to build them, and the CDDL specification language [CDDL] to specify what they should contain.
As before, I've been working in a git repository at [GITHUB]; you can see the document I've been focusing on this week at [SNIPFMT]. (That's the thing to read if you want to send me patches for my grammar.)
I'm not sure if you've got to exit ports yet, but here's one possible way to partition ports: * choose large partitions so that all exits support all ports in the partition * choose smaller categories so that most exits support most ports in the partition * ignore small partitions, they're bad for client privacy anyway
For example, you might end up with: * web (80 & 443) * interactive (SSH, IRC, etc.) * bulk (torrent, etc.) * default exit policy * reduced exit policy
I'm not sure if we will want separate categories for IPv4-only and dual-stack policies. We can probably ignore IPv6-only policies for the moment, but we should think about them in future.
There were a few neat things to do here:
I had to define SNIPs so that clients and relays can be mostly agnostic about whether we're using a merkle tree or a bunch of signatures.
I had to define a binary diff format so that relays can keep on downloading diffs between ENDIVE documents. (Clients don't download ENDIVEs). I did a quick prototype of how to output this format, using python's difflib.
Can we make the OrigBytesCmdId use start and length? length may be shorter than end, and it will never be longer.
If we are doing chunk-based encoding, we could make start relative to the last position in the original file. But that would mean no back-tracking, which means we can't use some more sophisticated diff algorithms.
To make ENDIVE diffs as efficient as possible, it's important not to transmit data that changes in every ENDIVE. To this end, I've specified ENDIVEs so that the most volatile parts (Merkle trees and index ranges) are recomputed on the relay side. I still need to specify how these re-computations work, but I'm pretty sure I got the formats right.
Doing this calculation should save relays a bunch of bandwidth each hour, but cost some implementation complexity. I'm going to have to come back to this choice going forward to see whether it's worth it.
Some object types are naturally extensible, some aren't. I've tried to err on the size of letting us expand important things in the future, and using maps (key->value mappings) for object that are particularly important.
In CBOR, small integers are encoded with a little less space than small strings. To that end, I'm specifying the use of small integers for dictionary keys that need to be encoded briefly, and strings for non-tor and experimental extensions.
This is a fine opportunity to re-think how we handle document liveness. Right now, consensus directories have an official liveness interval on them, but parties that rely on consensuses tolerate larger variance than is specified in the consensus. Instead of that approach, the usable lifetime of each object is now specified in the object, and is ultimately controlled by the authorities. This gives the directory authorities more ability to work around network tolerance issues.
Having large lifetime tolerances in the context of walking onions is a little risky: it opens us up to an attack where a hostile relay holds multiple ENDIVEs, and decides which one to use when responding to a request. I think we can address this attack, however, by making sure that SNIPs have a published time in them, and that this time moves monotonically forward.
If the issue is having multiple valid ENDIVEs, then authorities could also put a cap on the number of concurrently valid ENDIVEs.
There are two simple schemes to implement a cap: * set a longer interval for rebuilding all ENDIVEs (the cap is the rebuild interval, divided by the validity interval) * refuse to sign a new SNIP for a relay that's rapidly changing (or equivalently, leave that relay out of the next ENDIVE)
Both these schemes also limit the amount of bandwidth used for a relay that's rapidly changing details.
- As I work, I'm identifying other issues in tor that stand in the way of a good efficient walking onion implementation that will require other follow-up work. This week I ran into a need for non-TAP-based v2 hidden services, and a need for a more efficient family encoding. I'm keeping track of these in my outline file.
Do "tricky restrictions" include the IP subnet restriction (avoid relays in the same IPv4 /16 and IPv6 /32) ?
What about a heterogenous IPv4 / IPv6 network, where IPv4-only relays can't connect to IPv6-only relays?
If we do decide to add IPv6-only relays, we'll probably add them in this order: * IPv6-only bridges (needs dual-stack bridge guards / middles?) * IPv6-only exits (needs dual-stack middles) * IPv6-only guards (needs dual-stack middles) * IPv6-only middles (needs dual-stack or IPv6-only guards and exits, removes need for dual-stack middles)
What about bridge guards? (That is, can bridges add an extra hop into circuits, to protect themselves from being discovered by middles?)
Maybe bridges could commit to their (blinded) bridge guards in their self-signed own snip? Or the bridge authority could distribute a bridge ENDIVE? (We might need multiple bridge authorities for redundancy.)
[CBOR] RFC 7049: "Concise Binary Object Representation (CBOR)" https://tools.ietf.org/html/rfc7049b
[CDDL] RFC 8610: "Concise Data Definition Language (CDDL): A Notational Convention to Express Concise Binary Object Representation (CBOR) and JSON Data Structures" https://tools.ietf.org/html/rfc8610
[GITREPO] https://github.com/nmathewson/walking-onions-wip
[SNIPFMT] https://github.com/nmathewson/walking-onions-wip/blob/master/specs/02-endive...
Hi Nick,
On 14 Mar 2020, at 14:44, teor teor@riseup.net wrote:
- As I work, I'm identifying other issues in tor that stand in the way of a good efficient walking onion implementation that will require other follow-up work. This week I ran into a need for non-TAP-based v2 hidden services, and a need for a more efficient family encoding. I'm keeping track of these in my outline file.
Here's another issue you might want to consider:
Currently, new relays get in the consensus as soon as: * they post their descriptors, and * a majority of authorities can contact their ORPorts.
That means clients and relays waste a whole bunch of bandwidth downloading consensus info and descriptors for relays with very low weights.
Instead, we could have two documents:
* a "potential relays" document, that's used by the authorities, bandwidth scanner, and any other relay test infrastructure (perhaps exit scanners, sybil scanners, and other tools), and
* a "useful relays" document, that contains good relays with reasonable weights.
Let's think about this kind of efficiency as part of walking onions.
We might even be able to make this change before walking onions, by: * making sbws and other tools use the ns consensus (I think most tools already use the ns consensus), and * adding a new consensus method, which requires a minimum consensus weight (or consensus weight fraction) for relays in the microdesc consensus.
Since relays and clients use the microdesc consensus, low-weight relays would disappear from the network. But they would still be in the ns consensus.
We might find some interesting bugs in tor though. We never quite got rid of all the old code that uses the ns consensus and full descriptors.
T
On Wed, Mar 18, 2020 at 11:21 AM teor teor@riseup.net wrote:
Hi Nick,
On 14 Mar 2020, at 14:44, teor teor@riseup.net wrote:
- As I work, I'm identifying other issues in tor that stand in the way of a good efficient walking onion implementation that will require other follow-up work. This week I ran into a need for non-TAP-based v2 hidden services, and a need for a more efficient family encoding. I'm keeping track of these in my outline file.
Here's another issue you might want to consider:
Currently, new relays get in the consensus as soon as:
- they post their descriptors, and
- a majority of authorities can contact their ORPorts.
That means clients and relays waste a whole bunch of bandwidth downloading consensus info and descriptors for relays with very low weights.
Instead, we could have two documents:
Thanks for this! This dovetails nicely with some of the voting design work I'm up to right now.
Hi Nick,
On 20 Mar 2020, at 23:01, Nick Mathewson nickm@freehaven.net wrote:
On Wed, Mar 18, 2020 at 11:21 AM teor teor@riseup.net wrote:
On 14 Mar 2020, at 14:44, teor teor@riseup.net wrote:
- As I work, I'm identifying other issues in tor that stand in the way of a good efficient walking onion implementation that will require other follow-up work. This week I ran into a need for non-TAP-based v2 hidden services, and a need for a more efficient family encoding. I'm keeping track of these in my outline file.
Here's another issue you might want to consider:
Currently, new relays get in the consensus as soon as:
- they post their descriptors, and
- a majority of authorities can contact their ORPorts.
That means clients and relays waste a whole bunch of bandwidth downloading consensus info and descriptors for relays with very low weights.
Instead, we could have two documents:
Thanks for this! This dovetails nicely with some of the voting design work I'm up to right now.
It would be great to have a protocol that doesn't depend on: * time synchronisation * big documents * one-shot updates * absolute consistency
We've already made vote timing a bit more robust in the tor master branch, by ignoring late votes: https://trac.torproject.org/projects/tor/ticket/4631
Here's a few other tweaks that might help:
Tor halves the consensus interval when there is no fresh consensus. But changing the interval makes tor's code much more complex. Instead, let's have a fixed consensus interval. And make it long enough for efficiency, but short enough to recover from a failed consensus.
Let's support vote diffs, as well as consensus diffs. Vote diffs don't help when posting votes. But when requesting votes, authorities can include hashes of votes they already have. That way, authorities that are under heavy load are more likely to get all the votes.
We could increase the time that authorities have to fetch votes, and make them retry fetches every few minutes.
We could do consistency checks on smaller shards, so that a consistency failure in any one document does not break the entire consensus.
We could create a shard for each supported consensus method (like we do microdescriptors). That way, a consistency failure in any one consensus method does not break the entire consensus.
We could make shards valid for a longer time, so that if the replacement shard does not reach consensus, the older one is used.
Then, the final documents are a combination of all the consistent shards, using the highest consistent consensus method. (Much like the current microdesc consensus.)
Once we've made some of those changes, then some other changes become plausible:
Let's make votes valid for exactly 2 voting periods, and use the latest available vote from each authority.
Currently, each consensus can have one of two inputs from each authority: * the current vote, or * no vote. If a majority of authorities don't vote, then the consensus fails. (And if enough bandwidth authorities don't vote, then measured bandwidths fail.)
If there are up to two valid votes during a voting period, then each consensus can have one of three inputs from each authority: * the current vote, or * the vote before the current vote, or * no vote. Having 3 possible choices is slightly worse than having 2 choices.
But with the changes above, authorities are more likely to have the latest vote from each other authority. And having similar votes will be enough for most of the shards to be consistent, for most consensus methods.
If a majority of authorities don't have any valid votes, then the consensus fails. But that's much less likely when there are two valid votes at any one time.
We could also make each authority construct its own merkle root(s), and allow the N most popular/recent roots on the network. (Or equivalently, allow roots with current signatures from M authorities.)
We could split votes into shards as well, and make authorities exchange them like they exchange relay descriptors? (When they see a reference to a new vote shard, they try to download it from all other authorities.)
We'd need extra monitoring, to make sure that diffs, authorities, shards, consensus methods, or latest votes aren't consistently broken.
Maybe there's a few more steps we could take, and then we'd have a voting protocol that doesn't require strict time synchronisation. Where updates just happen as authorities make them available, rather than all at once.
T
On Fri, Mar 20, 2020 at 12:38 PM teor teor@riseup.net wrote:
Hi Nick,
On 20 Mar 2020, at 23:01, Nick Mathewson nickm@freehaven.net wrote:
On Wed, Mar 18, 2020 at 11:21 AM teor teor@riseup.net wrote:
On 14 Mar 2020, at 14:44, teor teor@riseup.net wrote:
- As I work, I'm identifying other issues in tor that stand in the way of a good efficient walking onion implementation that will require other follow-up work. This week I ran into a need for non-TAP-based v2 hidden services, and a need for a more efficient family encoding. I'm keeping track of these in my outline file.
Here's another issue you might want to consider:
Currently, new relays get in the consensus as soon as:
- they post their descriptors, and
- a majority of authorities can contact their ORPorts.
That means clients and relays waste a whole bunch of bandwidth downloading consensus info and descriptors for relays with very low weights.
Instead, we could have two documents:
Thanks for this! This dovetails nicely with some of the voting design work I'm up to right now.
It would be great to have a protocol that doesn't depend on:
- time synchronisation
- big documents
- one-shot updates
- absolute consistency
Hi! I think that _some_ of these changes are orthogonal to the shifts we'll need for walking onions, but some of them do dovetail nicely. I'm going to keep this email as a note to myself, so that I can double-check whether any of these items are precluded by choices I'm making in the design space.
We've already made vote timing a bit more robust in the tor master branch, by ignoring late votes: https://trac.torproject.org/projects/tor/ticket/4631
Here's a few other tweaks that might help:
Tor halves the consensus interval when there is no fresh consensus. But changing the interval makes tor's code much more complex. Instead, let's have a fixed consensus interval. And make it long enough for efficiency, but short enough to recover from a failed consensus.
This makes sense; I'd forgotten that we did this.
Let's support vote diffs, as well as consensus diffs. Vote diffs don't help when posting votes. But when requesting votes, authorities can include hashes of votes they already have. That way, authorities that are under heavy load are more likely to get all the votes.
Actually, we *can* do them when posting votes; I've sketched out a design here in the work-in-progress voting spec.
We could increase the time that authorities have to fetch votes, and make them retry fetches every few minutes.
We could do consistency checks on smaller shards, so that a consistency failure in any one document does not break the entire consensus.
This seems like something we could do as a followup here? It does seem like it could get pretty complex.
We could create a shard for each supported consensus method (like we do microdescriptors). That way, a consistency failure in any one consensus method does not break the entire consensus.
Hm, interesting. This would involve generating multiple consensuses with different methods? It sounds orthogonal to me, but quite possibly a good idea. But we'd have to withhold signatures on older consensus methods until we're sure that the newer ones are failing -- and I worry about attacks where we _cause_ a newer consensus method to be inconsistent specifically so we can force an older one.
I also worry about mix-and-match attacks on the different shards.
We could make shards valid for a longer time, so that if the replacement shard does not reach consensus, the older one is used.
Then, the final documents are a combination of all the consistent shards, using the highest consistent consensus method. (Much like the current microdesc consensus.)
Once we've made some of those changes, then some other changes become plausible:
Let's make votes valid for exactly 2 voting periods, and use the latest available vote from each authority.
Hmmm. So this doesn't help us much in the case when an authority goes down -- it just means that its vote counts for an extra hour more than it would otherwise. That would buy us an hour of extra consensus time -- but it would only be a big improvement if the authority is -regularly- failing to upload a vote. For that case, I think it might make more sense to address the reliability problem directly instead of trying to work around it.
[...]
I dig the idea of dropping time synchronization if we can; let's look into this more. It seems like something we can do orthogonally to walking onions, though?
On Sat, Mar 14, 2020 at 12:44 AM teor teor@riseup.net wrote:
Hi Nick,
I'm interested in following along with Walking Onions, but I might drop out when the relay IPv6 work gets busy.
I'm not sure how you'd like feedback, so I'm going to try to put it in emails, or in pull requests.
(I made one comment on a git commit in walking-onions-wip, but I'm not sure if you see those, so I'll repeat it here.)
Thanks, this is really helpful! I missed the repository comments, and I'll probably miss some more.
On 14 Mar 2020, at 03:52, Nick Mathewson nickm@torproject.org wrote:
This week, I worked specifying the nitty-gritty of the SNIP and ENDIVE document formats. I used the CBOR meta-format [CBOR] to build them, and the CDDL specification language [CDDL] to specify what they should contain.
As before, I've been working in a git repository at [GITHUB]; you can see the document I've been focusing on this week at [SNIPFMT]. (That's the thing to read if you want to send me patches for my grammar.)
I'm not sure if you've got to exit ports yet, but here's one possible way to partition ports:
- choose large partitions so that all exits support all ports in the partition
- choose smaller categories so that most exits support most ports in the partition
- ignore small partitions, they're bad for client privacy anyway
For example, you might end up with:
- web (80 & 443)
- interactive (SSH, IRC, etc.)
- bulk (torrent, etc.)
- default exit policy
- reduced exit policy
I'm not sure if we will want separate categories for IPv4-only and dual-stack policies. We can probably ignore IPv6-only policies for the moment, but we should think about them in future.
Interesting! Yeah, something like this might work. I've added this to my notes.
Also, there's some interesting ideas about handling exit policies in the whitepaper's section 6. I don't know if
There were a few neat things to do here:
I had to define SNIPs so that clients and relays can be mostly agnostic about whether we're using a merkle tree or a bunch of signatures.
I had to define a binary diff format so that relays can keep on downloading diffs between ENDIVE documents. (Clients don't download ENDIVEs). I did a quick prototype of how to output this format, using python's difflib.
Can we make the OrigBytesCmdId use start and length? length may be shorter than end, and it will never be longer.
Good idea; done.
If we are doing chunk-based encoding, we could make start relative to the last position in the original file. But that would mean no back-tracking, which means we can't use some more sophisticated diff algorithms.
Well, we could allow signed integers. I'm making a note to look into whether this would help much.
[...]
If the issue is having multiple valid ENDIVEs, then authorities could also put a cap on the number of concurrently valid ENDIVEs.
There are two simple schemes to implement a cap:
- set a longer interval for rebuilding all ENDIVEs (the cap is the rebuild interval, divided by the validity interval)
- refuse to sign a new SNIP for a relay that's rapidly changing (or equivalently, leave that relay out of the next ENDIVE)
Both these schemes also limit the amount of bandwidth used for a relay that's rapidly changing details.
Interesting idea; I think in the case of the first one, we'd be giving up something important, but I don't know how much so. The second one might actually help with our network stability, though.
[...]
Do "tricky restrictions" include the IP subnet restriction (avoid relays in the same IPv4 /16 and IPv6 /32) ?
I'm thinking of _all_ tricky restrictions, including but not limited to IP subnets, families, user settings, and
What about a heterogenous IPv4 / IPv6 network, where IPv4-only relays can't connect to IPv6-only relays?
This one would fit more into "alternative topologies", but I think the design can handle that. (See proposal 300 section 3.9.)
The way it would work is, you put IPv4-only relays into group A, dual-stack relays in group B, and IPv6 relays into group C.
Then you give them different successor lists, so that A has successor in A and B, C has successors in B and C, and B can have all successors.
If we do decide to add IPv6-only relays, we'll probably add them in this order:
- IPv6-only bridges (needs dual-stack bridge guards / middles?)
- IPv6-only exits (needs dual-stack middles)
- IPv6-only guards (needs dual-stack middles)
- IPv6-only middles (needs dual-stack or IPv6-only guards and exits, removes need for dual-stack middles)
What about bridge guards? (That is, can bridges add an extra hop into circuits, to protect themselves from being discovered by middles?)
Yes, that should still work with the base design. I'll need to think more about how it would work in non-clique topologies, though.