Filename: 300-walking-onions.txt Title: Walking Onions: Scaling and Saving Bandwidth Author: Nick Mathewson Created: 5-Feb-2019 Status: Draft
0. Status
This proposal describes a mechanism called "Walking Onions" for scaling the Tor network and reducing the amount of client bandwidth used to maintain a client's view of the Tor network.
This is a draft proposal; there are problems left to be solved and questions left to be answered. Once those parts are done, we can fill in section 4 with the final details of the design.
1. Introduction
In the current Tor network design, we assume that every client has a complete view of all the relays in the network. To achieve this, clients download consensus directories at regular intervals, and download descriptors for every relay listed in the directory.
The substitution of microdescriptors for regular descriptors (proposal 158) and the use of consensus diffs (proposal 140) have lowered the bytes that clients must dedicate to directory operations. But we still face the problem that, if we force each client to know about every relay in the network, each client's directory traffic will grow linearly with the number of relays in the network.
Another drawback in our current system is that client directory traffic is front-loaded: clients need to fetch an entire directory before they begin building circuits. This places extra delays on clients, and extra load on the network.
To anonymize the world, we will need to scale to a much larger number of relays and clients: requiring clients to know about every relay in the set simply won't scale, and requiring every new client to download a large document is also problematic.
There are obvious responses here, and some other anonymity tools have taken them. It's possible to have a client only use a fraction of the relays in a network--but doing so opens the client to _epistemic attacks_, in which the difference in clients' views of the network is used to partition their traffic. It's also possible to move the problem of selecting relays from the client to the relays themselves, and let each relay select the next relay in turn--but this choice opens the client to _route capture attacks_, in which a malicious relay selects only other malicious relays.
In this proposal, I'll describe a design for eliminating up-front client directory downloads. Clients still choose relays at random, but without ever having to hold a list of all the relays. This design does not require clients to trust relays any more than they do today, or open clients to epistemic attacks.
I hope to maintain feature parity with the current Tor design; I'll list the places in which I haven't figured out how to do so yet.
I'm naming this design "walking onions". The walking onion (Allium x proliferum) reproduces by growing tiny little bulbs at the end of a long stalk. When the stalk gets too top-heavy, it flops over, and the little bulbs start growing somewhere new.
The rest of this document will run as follows. In section 2, I'll explain the ideas behind the "walking onions" design, and how they can eliminate the need for regular directory downloads. In section 3, I'll answer a number of follow-up questions that arise, and explain how to keep various features in Tor working. Section 4 (not yet written) will elaborate all the details needed to turn this proposal into a concrete set of specification changes.
2. Overview
2.1. Recapping proposal 141
Back in Proposal 141 ("Download server descriptors on demand"), Peter Palfrader proposed an idea for eliminating ahead-of-time descriptor downloads. Instead of fetching all the descriptors in advance, a client would fetch the descriptor for each relay in its path right before extending the circuit to that relay. For example, if a client has a circuit from A->B and wants to extend the circuit to C, the client asks B for C's descriptor, and then extends the circuit to C.
(Note that the client needs to fetch the descriptor every time it extends the circuit, so that an observer can't tell whether the client already had the descriptor or not.)
There are a couple of limitations for this design: * It still requires clients to download a consensus. * It introduces a extra round-trip to each hop of the circuit extension process.
I'll show how to solve these problems in the two sections below.
2.2. An observation about the ntor handshake.
I'll start with an observation about our current circuit extension handshake, ntor: it should not actually be necessary to know a relay's onion key before extending to it.
Right now, the client sends: NODEID (The relay's identity) KEYID (The relay's public onion key) CLIENT_PK (a diffie-hellman public key)
and the relay responds with: SERVER_PK (a diffie-hellman public key) AUTH (a function of the relay's private keys and *all* of the public keys.)
Both parties generate shared symmetric keys from the same inputs that are are used to create the AUTH value.
The important insight here is that we could easily change this handshake so that the client sends only CLIENT_PK, and receives NODEID and KEYID as part of the response.
In other words, the client needs to know the relay's onion key to _complete_ the handshake, but doesn't actually need to know the relay's onion key in order to _initiate_ the handshake.
This is the insight that will let us save a round trip: When the client goes to extend a circuit from A->B to C, it can send B a request to extend to C and retrieve C's descriptor in a single step. Specifically, the client sends only CLIENT_PK, and relay B can include C's keys as part of the EXTENDED cell.
2.3. Extending by certified index
Now I'll explain how the client can avoid having to download a list of relays entirely.
First, let's look at how a client chooses a random relay today. First, the client puts all of the relays in a list, and computes a weighted bandwidth for each one. For example, suppose the relay identities are R1, R2, R3, R4, and R5, and their bandwidth weights are 50, 40, 30, 20, and 10. The client makes a table like this:
Relay Weight Range of index values R1 50 0..49 R2 40 50..89 R3 30 90..119 R4 20 120..139 R5 10 140..149
To choose a random relay, the client picks a random index value between 0 and 149 inclusive, and looks up the corresponding relay in the table. For example, if the client's random number is 77, it will choose R2. If its random number is 137, it chooses R4.
The key observation for the "walking onions" design is that the client doesn't actually need to construct this table itself. Instead, we will have this table be constructed by the authorities and distributed to all the relays.
Here's how it works: let's have the authorities make a new kind of consensus-like thing. We'll call it an Efficient Network Directory with Individually Verifiable Entries, or "ENDIVE" for short. This will differ from the client's index table above in two ways. First, every entry in the ENDIVE is normalized so that the bandwidth weights maximum index is 2^32-1:
Relay Normalized weight Range of index values R1 0x55555546 0x00000000..0x55555545 R2 0x44444438 0x55555546..0x9999997d R3 0x3333332a 0x9999997e..0xcccccca7 R4 0x2222221c 0xcccccca8..0xeeeeeec3 R5 0x1111113c 0xeeeeeec4..0xffffffff
Second, every entry in the ENDIVE is timestamped and signed by the authorities independently, so that when a client sees a line from the table above, it can verify that it came from an authentic ENDIVE. When a client has chosen a random index, one of these entries will prove to the client that a given relay corresponds to that index. Because of this property, we'll be calling these entries "Separable Network Index Proofs", or "SNIP"s for short.
For example, a single SNIP from the table above might consist of: * A range of times during which this SNIP is valid * R1's identity * R1's ntor onion key * R1's address * The index range 0x00000000..0x55555545 * A signature of all of the above, by a number of authorities
Let's put it together. Suppose that the client has a circuit from A->B, and it wants to extend to a random relay, chosen randomly weighted by bandwidth.
1. The client picks a random index value between 0 and 2**32 - 1. It sends that index to relay B in its EXTEND cell, along with a g^x value for the ntor handshake.
Note: the client doesn't send an address or identity for the next relay, since it doesn't know what relay it has chosen! (The combination of an index and a g^x value is what I'm calling a "walking onion".)
2. Now, relay B looks up the index in its most recent ENDIVE, to learn which relay the client selected.
(For example, suppose that the client's random index value is 0x50000001. This index value falls between 0x00000000 and 0x55555546 in the table above, so the relay B sees that the client has chosen R1 as its next hop.)
3. Relay B sends a create cell to R1 as usual. When it gets a CREATED reply, it includes the authority-signed SNIP for R1 as part of the EXTENDED cell.
4. As part of verifying the handshake, the client verifies that the SNIP was signed by enough authorities, that its timestamp is recent enough, and that it actually corresponds to the random index that the client selected.
Notice the properties we have with this design:
- Clients can extend circuits without having a list of all the relays.
- Because the client's random index needs to match a routing entry signed by the authorities, the client is still selecting a relay randomly by weight. A hostile relay cannot choose which relay to send the client.
On a failure to extend, a relay should still report the routing entry for the other relay that it couldn't connect to. As before, a client will start a new circuit if a partially constructed circuit is a partial failure.
We could achieve a reliability/security tradeoff by letting clients offer the relay a choice of two or more indices to extend to. This would help reliability, but give the relay more influence over the path. We'd need to analyze this impact.
In the next section, I'll discuss a bunch of details that we need to straighten out in order to make this design work.
3. Sorting out the details.
3.1. Will these routing entries fit in EXTEND2 and EXTENDED2 cells?
The EXTEND2 cell is probably big enough for this design. The random index that the client sends can be a new "link specifier" type, replacing the IP and identity link specifiers.
The EXTENDED2 cell is likely to need to grow here. We'll need to implement proposal 249 ("Allow CREATE cells with >505 bytes of handshake data") so that EXTEND2 and EXTENDED2 cells can be larger.
3.2. How should SNIPs be signed?
We have a few options, and I'd like to look into the possibilities here more closely.
The simplest possibility is to use **multiple signatures** on each SNIP, the way we do today for consensuses. These signatures should be made using medium-term Ed25519 keys from the authorities. At a cost of 64 bytes per signature, at 9 authorities, we would need 576 bytes for each SNIP. These signatures could be batch-verified to save time at the client side. Since generating a signature takes around 20 usec on my mediocre laptop, authorities should be able to generate this many signatures fairly easily.
Another possibility is to use a **threshold signature** on each SNIP, so that the authorities collaboratively generate a short signature that the clients can verify. There are multiple threshold signature schemes that we could consider here, though I haven't yet found one that looks perfect.
Another possibility is to use organize the SNIPs in a **merkle tree with a signed root**. For this design, clients could download the signed root periodically, and receive the hash-path from the signed root to the SNIP. This design might help with certificate-transparency-style designs, and it would be necessary if we ever want to move to a postquantum signature algorithm that requires large signatures.
Another possibility (due to a conversation among Chelsea Komlo, Sajin Sasy, and Ian Goldberg), is to *use SNARKs*. (Why not? All the cool kids are doing it!) For this, we'd have the clients download a signed hash of the ENDIVE periodically, and have the authorities generate a SNARK for each SNIP, proving its presence in that document.
3.3. How can we detect authority misbehavior?
We might want to take countermeasures against the possibility that a quorum of corrupt or compromised authorities give some relays a different set of SNIPs than they give other relays.
If we incorporate a merkle tree or a hash chain in the design, we can use mechanisms similar to certificate transparency to ensure that the authorities have a consistent log of all the entries that they have ever handed out.
3.4. How many types of weighted node selection are there, and how do we handle them?
Right now, there are multiple weights that we use in Tor: * Weight for exit * Weight for guard * Weight for middle node
We also filter nodes for several properties, such as flags they have.
To reproduce this behavior, we should enumerate the various weights and filters that we use, and (if there are not too many) create a separate index for each. For example, the Guard index would weight every node for selection as guard, assigning 0 weight to non-Guard nodes. The Exit index would weight every node for selection as an exit, assigning 0 weight to non-Exit nodes.
When choosing a relay, the client would have to specify which index to use. We could either have a separate (labeled) set of SNIPs entries for each index, or we could have each SNIP have a separate (labeled) index range for each index.
REGRESSION: the client's choice of which index to use would leak the next router's position and purpose in the circuit. This information is something that we believe relays can infer now, but it's not a desired feature that they can.
3.5. Does this design break onion service introduce handshakes?
In rend-spec-v3.txt section 3.3.2, we specify a variant of ntor for use in INTRODUCE2 handshakes. It allows the client to send encrypted data as part of its initial ntor handshake, but requires the client to know the onion service's onion key before it sends its initial handshake.
That won't be a problem for us here, though: we still require clients to fetch onion service descriptors before contacting a onion service.
3.6. How does the onion service directory work here?
The onion service directory is implemented as a hash ring, where each relay's position in the hash ring is decided by a hash of its identity, the current date, and a shared random value that the authorities compute each day.
To implement this hash ring using walking onions, we would need to have an extra index based not on bandwidth, but on position in the hash ring. Then onion services and clients could build a circuit, then extend it one more hop specifying their desired index in the hash ring.
We could either have a command to retrieve a trio of hashring-based routing entries by index, or to retrieve (or connect to?) the n'th item after a given hashring entry.
3.7. How can clients choose guard nodes?
We can reuse the fallback directories here. A newly bootstrapping client would connect to a fallback directory, then build a three-hop circuit, and finally extend the three-hop circuit by indexing to a random guard node. The random guard node's SNIP would contain the information that the client needs to build real circuits through that guard in the future. Because the client would be building a three-hop circuit, the fallback directory would not learn the client's guards.
(Note that even if the extend attempt fails, we should still pick the node as a possible guard based on its router entry, so that other nodes can't veto our choice of guards.)
3.8. Does the walking onions design preclude postquantum circuit handshakes?
Not at all! Both proposal 263 (ntru) and proposal 270 (newhope) work by having the client generate an ephemeral key as part of its initial handshake. The client does not need to know the relay's onion key to do this, so we can still integrate those proposals with this one.
3.9. Does the walking onions design stop us from changing the network topology?
For Tor to continue to scale, we will someday need to accept that not every relay can be simultaneously connected to every other relay. Therefore, we will need to move from our current clique topology assumption to some other topology.
There are also proposals to change node selection rules to generate routes providing better performance, or improved resistance to local adversaries.
We can, I think, implement this kind of proposal by changing the way that ENDIVEs are generated. Instead giving every relay the same ENDIVE, the authorities would generate different ENDIVEs for different relays, depending on the probability distribution of which relay should be chosen after which in the network topology. In the extreme case, this would produce O(n) ENDIVEs and O(n^2) SNIPs. In practice, I hope that we could do better by having the network topology be non-clique, and by having many relays share the same distribution of successors.
3.10. How can clients handle exit policies?
This is an unsolved challenge. If the client tells the middle relay its target port, it leaks information inappropriately.
One possibility is to try to gather exit policies into common categories, such as "most ports supported" and "most common ports supported".
Another (inefficient) possibility is for clients to keep trying exits until they find one that works.
Another (inefficient) possibility is to require that clients who use unusual ports fall back to the old mechanism for route selection.
3.11. Can this approach support families?
This is an unsolved challenge.
One (inefficient) possibility is for clients to generate circuits and discard those that use multiple relays in the same family.
One (not quite compatible) possibility is for the authorities to sort the ENDIVE so that relays in the same family are adjacent to one another. The index-bounds part of each SNIP would also have to include the bounds of the family. This approach is not quite compatible with the status quo, because it prevents relays from belonging to more than one family.
One interesting possibility (due to Chelsea Komlo, Sajin Sasy, and Ian Goldberg) is for the middle node to take responsibility for family enforcement. In this design, the client might offer the middle node multiple options for the next relay's index, and the middle node would choose the first such relay that is neither in its family nor its predecessor's family. We'd need to look for a way to make sure that the middle node wasn't biasing the path selection.
(TODO: come up with more ideas here.)
3.12. Can walking onions support IP-based and country-based restrictions?
This is an unsolved challenge.
If the user's restrictions do not exclude most paths, one (inefficient) possibility is for the user to generate paths until they generate one that they like. This idea becomes inefficient if the user is excluding most paths.
Another (inefficient and fingerprintable) possibility is to require that clients who use complex path restrictions fall back to the old mechanism for route selection.
(TODO: come up with better ideas here.)
3.13. What scaling problems have we not solved with this design?
The walking onions design doesn't solve (on its own) the problem that the authorities need to know about every relay, and arrange to have every relay tested.
The walking onions design doesn't solve (on its own) the problem that relays need to have a list of all the relays. (But see section 3.9 above.)
3.14. Should we still have clients download a consensus when they're using walking onions?
There are some fields in the current consensus directory documents that the clients will still need, like the list of supported protocols and network parameters. A client that uses walking onions should download a new flavor of consensus document that contains only these fields, and does not list any relays. In some signature schemes, this consensus would contain a digest of the ENDIVE -- see 3.2 above.
(Note that this document would be a "consensus document" but not a "consensus directory", since it doesn't list any relays.)
4. Putting it all together
[This is the section where, in a later version of this proposal, I would specify the exact behavior and data formats to be used here. Right now, I'd say we're too early in the design phase.]
A.1. Acknowledgments
Thanks to Peter Palfrader for his original design in proposal 141, and to the designers of PIR-Tor, both of which inspired aspects of this Walking Onions design.
Thanks to Chelsea Komlo, Sajin Sasy, and Ian Goldberg for feedback on an earlier version of this design.
Thanks to David Goulet, Teor, and George Kadianakis for commentary on earlier versions of this draft.
A.2. Additional ideas
Teor notes that there are ways to try to get this idea to apply to one-pass circuit construction, something like the old onion design. We might be able to derive indices and keys from the same seeds, even. I don't see a way to do this without losing forward secrecy, but it might be worth looking at harder.
I'm very happy to see this proposal! Two quick questions about relay selection:
* Can a client specify that it wants an exit node whose policy allows something unusual, e.g. exiting to a port that's not allowed by the default policy? If not, does the client need to keep picking exit nodes until it gets a SNIP with a suitable policy?
* Similarly, if a client has restrictions on the guard nodes it can connect to (fascist firewall or IPv4/v6 restrictions, for example), does it need to keep picking guards via the directory fallback circuit until it gets a suitable one?
In both cases, perhaps a client with unusual requirements could first download the consensus, find a relay matching its requirements, then send that relay's index in its extend cell, so the relay receiving the extend cell wouldn't know whether the index was picked randomly by a client with no special requirements, or non-randomly by a client with special requirements?
I think this would allow the majority of clients to save bandwidth by not downloading the consensus, without allowing relays to distinguish the minority of clients with unusual exit/guard requirements. (The presence of the full consensus on disk would indicate that the client had unusual exit/guard requirements at some point, however.)
Cheers, Michael
On 05/02/2019 17:02, Nick Mathewson wrote:
Filename: 300-walking-onions.txt Title: Walking Onions: Scaling and Saving Bandwidth Author: Nick Mathewson Created: 5-Feb-2019 Status: Draft
Status
This proposal describes a mechanism called "Walking Onions" for scaling the Tor network and reducing the amount of client bandwidth used to maintain a client's view of the Tor network.
This is a draft proposal; there are problems left to be solved and questions left to be answered. Once those parts are done, we can fill in section 4 with the final details of the design.
Introduction
In the current Tor network design, we assume that every client has a complete view of all the relays in the network. To achieve this, clients download consensus directories at regular intervals, and download descriptors for every relay listed in the directory.
The substitution of microdescriptors for regular descriptors (proposal 158) and the use of consensus diffs (proposal 140) have lowered the bytes that clients must dedicate to directory operations. But we still face the problem that, if we force each client to know about every relay in the network, each client's directory traffic will grow linearly with the number of relays in the network.
Another drawback in our current system is that client directory traffic is front-loaded: clients need to fetch an entire directory before they begin building circuits. This places extra delays on clients, and extra load on the network.
To anonymize the world, we will need to scale to a much larger number of relays and clients: requiring clients to know about every relay in the set simply won't scale, and requiring every new client to download a large document is also problematic.
There are obvious responses here, and some other anonymity tools have taken them. It's possible to have a client only use a fraction of the relays in a network--but doing so opens the client to _epistemic attacks_, in which the difference in clients' views of the network is used to partition their traffic. It's also possible to move the problem of selecting relays from the client to the relays themselves, and let each relay select the next relay in turn--but this choice opens the client to _route capture attacks_, in which a malicious relay selects only other malicious relays.
In this proposal, I'll describe a design for eliminating up-front client directory downloads. Clients still choose relays at random, but without ever having to hold a list of all the relays. This design does not require clients to trust relays any more than they do today, or open clients to epistemic attacks.
I hope to maintain feature parity with the current Tor design; I'll list the places in which I haven't figured out how to do so yet.
I'm naming this design "walking onions". The walking onion (Allium x proliferum) reproduces by growing tiny little bulbs at the end of a long stalk. When the stalk gets too top-heavy, it flops over, and the little bulbs start growing somewhere new.
The rest of this document will run as follows. In section 2, I'll explain the ideas behind the "walking onions" design, and how they can eliminate the need for regular directory downloads. In section 3, I'll answer a number of follow-up questions that arise, and explain how to keep various features in Tor working. Section 4 (not yet written) will elaborate all the details needed to turn this proposal into a concrete set of specification changes.
Overview
2.1. Recapping proposal 141
Back in Proposal 141 ("Download server descriptors on demand"), Peter Palfrader proposed an idea for eliminating ahead-of-time descriptor downloads. Instead of fetching all the descriptors in advance, a client would fetch the descriptor for each relay in its path right before extending the circuit to that relay. For example, if a client has a circuit from A->B and wants to extend the circuit to C, the client asks B for C's descriptor, and then extends the circuit to C.
(Note that the client needs to fetch the descriptor every time it extends the circuit, so that an observer can't tell whether the client already had the descriptor or not.)
There are a couple of limitations for this design: * It still requires clients to download a consensus. * It introduces a extra round-trip to each hop of the circuit extension process.
I'll show how to solve these problems in the two sections below.
2.2. An observation about the ntor handshake.
I'll start with an observation about our current circuit extension handshake, ntor: it should not actually be necessary to know a relay's onion key before extending to it.
Right now, the client sends: NODEID (The relay's identity) KEYID (The relay's public onion key) CLIENT_PK (a diffie-hellman public key)
and the relay responds with: SERVER_PK (a diffie-hellman public key) AUTH (a function of the relay's private keys and *all* of the public keys.)
Both parties generate shared symmetric keys from the same inputs that are are used to create the AUTH value.
The important insight here is that we could easily change this handshake so that the client sends only CLIENT_PK, and receives NODEID and KEYID as part of the response.
In other words, the client needs to know the relay's onion key to _complete_ the handshake, but doesn't actually need to know the relay's onion key in order to _initiate_ the handshake.
This is the insight that will let us save a round trip: When the client goes to extend a circuit from A->B to C, it can send B a request to extend to C and retrieve C's descriptor in a single step. Specifically, the client sends only CLIENT_PK, and relay B can include C's keys as part of the EXTENDED cell.
2.3. Extending by certified index
Now I'll explain how the client can avoid having to download a list of relays entirely.
First, let's look at how a client chooses a random relay today. First, the client puts all of the relays in a list, and computes a weighted bandwidth for each one. For example, suppose the relay identities are R1, R2, R3, R4, and R5, and their bandwidth weights are 50, 40, 30, 20, and 10. The client makes a table like this:
Relay Weight Range of index values R1 50 0..49 R2 40 50..89 R3 30 90..119 R4 20 120..139 R5 10 140..149
To choose a random relay, the client picks a random index value between 0 and 149 inclusive, and looks up the corresponding relay in the table. For example, if the client's random number is 77, it will choose R2. If its random number is 137, it chooses R4.
The key observation for the "walking onions" design is that the client doesn't actually need to construct this table itself. Instead, we will have this table be constructed by the authorities and distributed to all the relays.
Here's how it works: let's have the authorities make a new kind of consensus-like thing. We'll call it an Efficient Network Directory with Individually Verifiable Entries, or "ENDIVE" for short. This will differ from the client's index table above in two ways. First, every entry in the ENDIVE is normalized so that the bandwidth weights maximum index is 2^32-1:
Relay Normalized weight Range of index values R1 0x55555546 0x00000000..0x55555545 R2 0x44444438 0x55555546..0x9999997d R3 0x3333332a 0x9999997e..0xcccccca7 R4 0x2222221c 0xcccccca8..0xeeeeeec3 R5 0x1111113c 0xeeeeeec4..0xffffffff
Second, every entry in the ENDIVE is timestamped and signed by the authorities independently, so that when a client sees a line from the table above, it can verify that it came from an authentic ENDIVE. When a client has chosen a random index, one of these entries will prove to the client that a given relay corresponds to that index. Because of this property, we'll be calling these entries "Separable Network Index Proofs", or "SNIP"s for short.
For example, a single SNIP from the table above might consist of: * A range of times during which this SNIP is valid * R1's identity * R1's ntor onion key * R1's address * The index range 0x00000000..0x55555545 * A signature of all of the above, by a number of authorities
Let's put it together. Suppose that the client has a circuit from A->B, and it wants to extend to a random relay, chosen randomly weighted by bandwidth.
The client picks a random index value between 0 and 2**32 - 1. It sends that index to relay B in its EXTEND cell, along with a g^x value for the ntor handshake.
Note: the client doesn't send an address or identity for the next relay, since it doesn't know what relay it has chosen! (The combination of an index and a g^x value is what I'm calling a "walking onion".)
Now, relay B looks up the index in its most recent ENDIVE, to learn which relay the client selected.
(For example, suppose that the client's random index value is 0x50000001. This index value falls between 0x00000000 and 0x55555546 in the table above, so the relay B sees that the client has chosen R1 as its next hop.)
Relay B sends a create cell to R1 as usual. When it gets a CREATED reply, it includes the authority-signed SNIP for R1 as part of the EXTENDED cell.
As part of verifying the handshake, the client verifies that the SNIP was signed by enough authorities, that its timestamp is recent enough, and that it actually corresponds to the random index that the client selected.
Notice the properties we have with this design:
- Clients can extend circuits without having a list of all the relays. - Because the client's random index needs to match a routing entry signed by the authorities, the client is still selecting a relay randomly by weight. A hostile relay cannot choose which relay to send the client.
On a failure to extend, a relay should still report the routing entry for the other relay that it couldn't connect to. As before, a client will start a new circuit if a partially constructed circuit is a partial failure.
We could achieve a reliability/security tradeoff by letting clients offer the relay a choice of two or more indices to extend to. This would help reliability, but give the relay more influence over the path. We'd need to analyze this impact.
In the next section, I'll discuss a bunch of details that we need to straighten out in order to make this design work.
- Sorting out the details.
3.1. Will these routing entries fit in EXTEND2 and EXTENDED2 cells?
The EXTEND2 cell is probably big enough for this design. The random index that the client sends can be a new "link specifier" type, replacing the IP and identity link specifiers.
The EXTENDED2 cell is likely to need to grow here. We'll need to implement proposal 249 ("Allow CREATE cells with >505 bytes of handshake data") so that EXTEND2 and EXTENDED2 cells can be larger.
3.2. How should SNIPs be signed?
We have a few options, and I'd like to look into the possibilities here more closely.
The simplest possibility is to use **multiple signatures** on each SNIP, the way we do today for consensuses. These signatures should be made using medium-term Ed25519 keys from the authorities. At a cost of 64 bytes per signature, at 9 authorities, we would need 576 bytes for each SNIP. These signatures could be batch-verified to save time at the client side. Since generating a signature takes around 20 usec on my mediocre laptop, authorities should be able to generate this many signatures fairly easily.
Another possibility is to use a **threshold signature** on each SNIP, so that the authorities collaboratively generate a short signature that the clients can verify. There are multiple threshold signature schemes that we could consider here, though I haven't yet found one that looks perfect.
Another possibility is to use organize the SNIPs in a **merkle tree with a signed root**. For this design, clients could download the signed root periodically, and receive the hash-path from the signed root to the SNIP. This design might help with certificate-transparency-style designs, and it would be necessary if we ever want to move to a postquantum signature algorithm that requires large signatures.
Another possibility (due to a conversation among Chelsea Komlo, Sajin Sasy, and Ian Goldberg), is to *use SNARKs*. (Why not? All the cool kids are doing it!) For this, we'd have the clients download a signed hash of the ENDIVE periodically, and have the authorities generate a SNARK for each SNIP, proving its presence in that document.
3.3. How can we detect authority misbehavior?
We might want to take countermeasures against the possibility that a quorum of corrupt or compromised authorities give some relays a different set of SNIPs than they give other relays.
If we incorporate a merkle tree or a hash chain in the design, we can use mechanisms similar to certificate transparency to ensure that the authorities have a consistent log of all the entries that they have ever handed out.
3.4. How many types of weighted node selection are there, and how do we handle them?
Right now, there are multiple weights that we use in Tor: * Weight for exit * Weight for guard * Weight for middle node
We also filter nodes for several properties, such as flags they have.
To reproduce this behavior, we should enumerate the various weights and filters that we use, and (if there are not too many) create a separate index for each. For example, the Guard index would weight every node for selection as guard, assigning 0 weight to non-Guard nodes. The Exit index would weight every node for selection as an exit, assigning 0 weight to non-Exit nodes.
When choosing a relay, the client would have to specify which index to use. We could either have a separate (labeled) set of SNIPs entries for each index, or we could have each SNIP have a separate (labeled) index range for each index.
REGRESSION: the client's choice of which index to use would leak the next router's position and purpose in the circuit. This information is something that we believe relays can infer now, but it's not a desired feature that they can.
3.5. Does this design break onion service introduce handshakes?
In rend-spec-v3.txt section 3.3.2, we specify a variant of ntor for use in INTRODUCE2 handshakes. It allows the client to send encrypted data as part of its initial ntor handshake, but requires the client to know the onion service's onion key before it sends its initial handshake.
That won't be a problem for us here, though: we still require clients to fetch onion service descriptors before contacting a onion service.
3.6. How does the onion service directory work here?
The onion service directory is implemented as a hash ring, where each relay's position in the hash ring is decided by a hash of its identity, the current date, and a shared random value that the authorities compute each day.
To implement this hash ring using walking onions, we would need to have an extra index based not on bandwidth, but on position in the hash ring. Then onion services and clients could build a circuit, then extend it one more hop specifying their desired index in the hash ring.
We could either have a command to retrieve a trio of hashring-based routing entries by index, or to retrieve (or connect to?) the n'th item after a given hashring entry.
3.7. How can clients choose guard nodes?
We can reuse the fallback directories here. A newly bootstrapping client would connect to a fallback directory, then build a three-hop circuit, and finally extend the three-hop circuit by indexing to a random guard node. The random guard node's SNIP would contain the information that the client needs to build real circuits through that guard in the future. Because the client would be building a three-hop circuit, the fallback directory would not learn the client's guards.
(Note that even if the extend attempt fails, we should still pick the node as a possible guard based on its router entry, so that other nodes can't veto our choice of guards.)
3.8. Does the walking onions design preclude postquantum circuit handshakes?
Not at all! Both proposal 263 (ntru) and proposal 270 (newhope) work by having the client generate an ephemeral key as part of its initial handshake. The client does not need to know the relay's onion key to do this, so we can still integrate those proposals with this one.
3.9. Does the walking onions design stop us from changing the network topology?
For Tor to continue to scale, we will someday need to accept that not every relay can be simultaneously connected to every other relay. Therefore, we will need to move from our current clique topology assumption to some other topology.
There are also proposals to change node selection rules to generate routes providing better performance, or improved resistance to local adversaries.
We can, I think, implement this kind of proposal by changing the way that ENDIVEs are generated. Instead giving every relay the same ENDIVE, the authorities would generate different ENDIVEs for different relays, depending on the probability distribution of which relay should be chosen after which in the network topology. In the extreme case, this would produce O(n) ENDIVEs and O(n^2) SNIPs. In practice, I hope that we could do better by having the network topology be non-clique, and by having many relays share the same distribution of successors.
3.10. How can clients handle exit policies?
This is an unsolved challenge. If the client tells the middle relay its target port, it leaks information inappropriately.
One possibility is to try to gather exit policies into common categories, such as "most ports supported" and "most common ports supported".
Another (inefficient) possibility is for clients to keep trying exits until they find one that works.
Another (inefficient) possibility is to require that clients who use unusual ports fall back to the old mechanism for route selection.
3.11. Can this approach support families?
This is an unsolved challenge.
One (inefficient) possibility is for clients to generate circuits and discard those that use multiple relays in the same family.
One (not quite compatible) possibility is for the authorities to sort the ENDIVE so that relays in the same family are adjacent to one another. The index-bounds part of each SNIP would also have to include the bounds of the family. This approach is not quite compatible with the status quo, because it prevents relays from belonging to more than one family.
One interesting possibility (due to Chelsea Komlo, Sajin Sasy, and Ian Goldberg) is for the middle node to take responsibility for family enforcement. In this design, the client might offer the middle node multiple options for the next relay's index, and the middle node would choose the first such relay that is neither in its family nor its predecessor's family. We'd need to look for a way to make sure that the middle node wasn't biasing the path selection.
(TODO: come up with more ideas here.)
3.12. Can walking onions support IP-based and country-based restrictions?
This is an unsolved challenge.
If the user's restrictions do not exclude most paths, one (inefficient) possibility is for the user to generate paths until they generate one that they like. This idea becomes inefficient if the user is excluding most paths.
Another (inefficient and fingerprintable) possibility is to require that clients who use complex path restrictions fall back to the old mechanism for route selection.
(TODO: come up with better ideas here.)
3.13. What scaling problems have we not solved with this design?
The walking onions design doesn't solve (on its own) the problem that the authorities need to know about every relay, and arrange to have every relay tested.
The walking onions design doesn't solve (on its own) the problem that relays need to have a list of all the relays. (But see section 3.9 above.)
3.14. Should we still have clients download a consensus when they're using walking onions?
There are some fields in the current consensus directory documents that the clients will still need, like the list of supported protocols and network parameters. A client that uses walking onions should download a new flavor of consensus document that contains only these fields, and does not list any relays. In some signature schemes, this consensus would contain a digest of the ENDIVE -- see 3.2 above.
(Note that this document would be a "consensus document" but not a "consensus directory", since it doesn't list any relays.)
Putting it all together
[This is the section where, in a later version of this proposal, I would specify the exact behavior and data formats to be used here. Right now, I'd say we're too early in the design phase.]
A.1. Acknowledgments
Thanks to Peter Palfrader for his original design in proposal 141, and to the designers of PIR-Tor, both of which inspired aspects of this Walking Onions design.
Thanks to Chelsea Komlo, Sajin Sasy, and Ian Goldberg for feedback on an earlier version of this design.
Thanks to David Goulet, Teor, and George Kadianakis for commentary on earlier versions of this draft.
A.2. Additional ideas
Teor notes that there are ways to try to get this idea to apply to one-pass circuit construction, something like the old onion design. We might be able to derive indices and keys from the same seeds, even. I don't see a way to do this without losing forward secrecy, but it might be worth looking at harder. _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Argh, I'm really sorry, I thought I'd reached the end of the proposal but my questions were addressed further down. Sorry for the noise.
Cheers, Michael
On 05/02/2019 17:42, Michael Rogers wrote:
I'm very happy to see this proposal! Two quick questions about relay selection:
- Can a client specify that it wants an exit node whose policy allows
something unusual, e.g. exiting to a port that's not allowed by the default policy? If not, does the client need to keep picking exit nodes until it gets a SNIP with a suitable policy?
- Similarly, if a client has restrictions on the guard nodes it can
connect to (fascist firewall or IPv4/v6 restrictions, for example), does it need to keep picking guards via the directory fallback circuit until it gets a suitable one?
In both cases, perhaps a client with unusual requirements could first download the consensus, find a relay matching its requirements, then send that relay's index in its extend cell, so the relay receiving the extend cell wouldn't know whether the index was picked randomly by a client with no special requirements, or non-randomly by a client with special requirements?
I think this would allow the majority of clients to save bandwidth by not downloading the consensus, without allowing relays to distinguish the minority of clients with unusual exit/guard requirements. (The presence of the full consensus on disk would indicate that the client had unusual exit/guard requirements at some point, however.)
Cheers, Michael
On 05/02/2019 17:02, Nick Mathewson wrote:
Filename: 300-walking-onions.txt Title: Walking Onions: Scaling and Saving Bandwidth Author: Nick Mathewson Created: 5-Feb-2019 Status: Draft
Status
This proposal describes a mechanism called "Walking Onions" for scaling the Tor network and reducing the amount of client bandwidth used to maintain a client's view of the Tor network.
This is a draft proposal; there are problems left to be solved and questions left to be answered. Once those parts are done, we can fill in section 4 with the final details of the design.
Introduction
In the current Tor network design, we assume that every client has a complete view of all the relays in the network. To achieve this, clients download consensus directories at regular intervals, and download descriptors for every relay listed in the directory.
The substitution of microdescriptors for regular descriptors (proposal 158) and the use of consensus diffs (proposal 140) have lowered the bytes that clients must dedicate to directory operations. But we still face the problem that, if we force each client to know about every relay in the network, each client's directory traffic will grow linearly with the number of relays in the network.
Another drawback in our current system is that client directory traffic is front-loaded: clients need to fetch an entire directory before they begin building circuits. This places extra delays on clients, and extra load on the network.
To anonymize the world, we will need to scale to a much larger number of relays and clients: requiring clients to know about every relay in the set simply won't scale, and requiring every new client to download a large document is also problematic.
There are obvious responses here, and some other anonymity tools have taken them. It's possible to have a client only use a fraction of the relays in a network--but doing so opens the client to _epistemic attacks_, in which the difference in clients' views of the network is used to partition their traffic. It's also possible to move the problem of selecting relays from the client to the relays themselves, and let each relay select the next relay in turn--but this choice opens the client to _route capture attacks_, in which a malicious relay selects only other malicious relays.
In this proposal, I'll describe a design for eliminating up-front client directory downloads. Clients still choose relays at random, but without ever having to hold a list of all the relays. This design does not require clients to trust relays any more than they do today, or open clients to epistemic attacks.
I hope to maintain feature parity with the current Tor design; I'll list the places in which I haven't figured out how to do so yet.
I'm naming this design "walking onions". The walking onion (Allium x proliferum) reproduces by growing tiny little bulbs at the end of a long stalk. When the stalk gets too top-heavy, it flops over, and the little bulbs start growing somewhere new.
The rest of this document will run as follows. In section 2, I'll explain the ideas behind the "walking onions" design, and how they can eliminate the need for regular directory downloads. In section 3, I'll answer a number of follow-up questions that arise, and explain how to keep various features in Tor working. Section 4 (not yet written) will elaborate all the details needed to turn this proposal into a concrete set of specification changes.
Overview
2.1. Recapping proposal 141
Back in Proposal 141 ("Download server descriptors on demand"), Peter Palfrader proposed an idea for eliminating ahead-of-time descriptor downloads. Instead of fetching all the descriptors in advance, a client would fetch the descriptor for each relay in its path right before extending the circuit to that relay. For example, if a client has a circuit from A->B and wants to extend the circuit to C, the client asks B for C's descriptor, and then extends the circuit to C.
(Note that the client needs to fetch the descriptor every time it extends the circuit, so that an observer can't tell whether the client already had the descriptor or not.)
There are a couple of limitations for this design: * It still requires clients to download a consensus. * It introduces a extra round-trip to each hop of the circuit extension process.
I'll show how to solve these problems in the two sections below.
2.2. An observation about the ntor handshake.
I'll start with an observation about our current circuit extension handshake, ntor: it should not actually be necessary to know a relay's onion key before extending to it.
Right now, the client sends: NODEID (The relay's identity) KEYID (The relay's public onion key) CLIENT_PK (a diffie-hellman public key)
and the relay responds with: SERVER_PK (a diffie-hellman public key) AUTH (a function of the relay's private keys and *all* of the public keys.)
Both parties generate shared symmetric keys from the same inputs that are are used to create the AUTH value.
The important insight here is that we could easily change this handshake so that the client sends only CLIENT_PK, and receives NODEID and KEYID as part of the response.
In other words, the client needs to know the relay's onion key to _complete_ the handshake, but doesn't actually need to know the relay's onion key in order to _initiate_ the handshake.
This is the insight that will let us save a round trip: When the client goes to extend a circuit from A->B to C, it can send B a request to extend to C and retrieve C's descriptor in a single step. Specifically, the client sends only CLIENT_PK, and relay B can include C's keys as part of the EXTENDED cell.
2.3. Extending by certified index
Now I'll explain how the client can avoid having to download a list of relays entirely.
First, let's look at how a client chooses a random relay today. First, the client puts all of the relays in a list, and computes a weighted bandwidth for each one. For example, suppose the relay identities are R1, R2, R3, R4, and R5, and their bandwidth weights are 50, 40, 30, 20, and 10. The client makes a table like this:
Relay Weight Range of index values R1 50 0..49 R2 40 50..89 R3 30 90..119 R4 20 120..139 R5 10 140..149
To choose a random relay, the client picks a random index value between 0 and 149 inclusive, and looks up the corresponding relay in the table. For example, if the client's random number is 77, it will choose R2. If its random number is 137, it chooses R4.
The key observation for the "walking onions" design is that the client doesn't actually need to construct this table itself. Instead, we will have this table be constructed by the authorities and distributed to all the relays.
Here's how it works: let's have the authorities make a new kind of consensus-like thing. We'll call it an Efficient Network Directory with Individually Verifiable Entries, or "ENDIVE" for short. This will differ from the client's index table above in two ways. First, every entry in the ENDIVE is normalized so that the bandwidth weights maximum index is 2^32-1:
Relay Normalized weight Range of index values R1 0x55555546 0x00000000..0x55555545 R2 0x44444438 0x55555546..0x9999997d R3 0x3333332a 0x9999997e..0xcccccca7 R4 0x2222221c 0xcccccca8..0xeeeeeec3 R5 0x1111113c 0xeeeeeec4..0xffffffff
Second, every entry in the ENDIVE is timestamped and signed by the authorities independently, so that when a client sees a line from the table above, it can verify that it came from an authentic ENDIVE. When a client has chosen a random index, one of these entries will prove to the client that a given relay corresponds to that index. Because of this property, we'll be calling these entries "Separable Network Index Proofs", or "SNIP"s for short.
For example, a single SNIP from the table above might consist of: * A range of times during which this SNIP is valid * R1's identity * R1's ntor onion key * R1's address * The index range 0x00000000..0x55555545 * A signature of all of the above, by a number of authorities
Let's put it together. Suppose that the client has a circuit from A->B, and it wants to extend to a random relay, chosen randomly weighted by bandwidth.
The client picks a random index value between 0 and 2**32 - 1. It sends that index to relay B in its EXTEND cell, along with a g^x value for the ntor handshake.
Note: the client doesn't send an address or identity for the next relay, since it doesn't know what relay it has chosen! (The combination of an index and a g^x value is what I'm calling a "walking onion".)
Now, relay B looks up the index in its most recent ENDIVE, to learn which relay the client selected.
(For example, suppose that the client's random index value is 0x50000001. This index value falls between 0x00000000 and 0x55555546 in the table above, so the relay B sees that the client has chosen R1 as its next hop.)
Relay B sends a create cell to R1 as usual. When it gets a CREATED reply, it includes the authority-signed SNIP for R1 as part of the EXTENDED cell.
As part of verifying the handshake, the client verifies that the SNIP was signed by enough authorities, that its timestamp is recent enough, and that it actually corresponds to the random index that the client selected.
Notice the properties we have with this design:
- Clients can extend circuits without having a list of all the relays. - Because the client's random index needs to match a routing entry signed by the authorities, the client is still selecting a relay randomly by weight. A hostile relay cannot choose which relay to send the client.
On a failure to extend, a relay should still report the routing entry for the other relay that it couldn't connect to. As before, a client will start a new circuit if a partially constructed circuit is a partial failure.
We could achieve a reliability/security tradeoff by letting clients offer the relay a choice of two or more indices to extend to. This would help reliability, but give the relay more influence over the path. We'd need to analyze this impact.
In the next section, I'll discuss a bunch of details that we need to straighten out in order to make this design work.
- Sorting out the details.
3.1. Will these routing entries fit in EXTEND2 and EXTENDED2 cells?
The EXTEND2 cell is probably big enough for this design. The random index that the client sends can be a new "link specifier" type, replacing the IP and identity link specifiers.
The EXTENDED2 cell is likely to need to grow here. We'll need to implement proposal 249 ("Allow CREATE cells with >505 bytes of handshake data") so that EXTEND2 and EXTENDED2 cells can be larger.
3.2. How should SNIPs be signed?
We have a few options, and I'd like to look into the possibilities here more closely.
The simplest possibility is to use **multiple signatures** on each SNIP, the way we do today for consensuses. These signatures should be made using medium-term Ed25519 keys from the authorities. At a cost of 64 bytes per signature, at 9 authorities, we would need 576 bytes for each SNIP. These signatures could be batch-verified to save time at the client side. Since generating a signature takes around 20 usec on my mediocre laptop, authorities should be able to generate this many signatures fairly easily.
Another possibility is to use a **threshold signature** on each SNIP, so that the authorities collaboratively generate a short signature that the clients can verify. There are multiple threshold signature schemes that we could consider here, though I haven't yet found one that looks perfect.
Another possibility is to use organize the SNIPs in a **merkle tree with a signed root**. For this design, clients could download the signed root periodically, and receive the hash-path from the signed root to the SNIP. This design might help with certificate-transparency-style designs, and it would be necessary if we ever want to move to a postquantum signature algorithm that requires large signatures.
Another possibility (due to a conversation among Chelsea Komlo, Sajin Sasy, and Ian Goldberg), is to *use SNARKs*. (Why not? All the cool kids are doing it!) For this, we'd have the clients download a signed hash of the ENDIVE periodically, and have the authorities generate a SNARK for each SNIP, proving its presence in that document.
3.3. How can we detect authority misbehavior?
We might want to take countermeasures against the possibility that a quorum of corrupt or compromised authorities give some relays a different set of SNIPs than they give other relays.
If we incorporate a merkle tree or a hash chain in the design, we can use mechanisms similar to certificate transparency to ensure that the authorities have a consistent log of all the entries that they have ever handed out.
3.4. How many types of weighted node selection are there, and how do we handle them?
Right now, there are multiple weights that we use in Tor: * Weight for exit * Weight for guard * Weight for middle node
We also filter nodes for several properties, such as flags they have.
To reproduce this behavior, we should enumerate the various weights and filters that we use, and (if there are not too many) create a separate index for each. For example, the Guard index would weight every node for selection as guard, assigning 0 weight to non-Guard nodes. The Exit index would weight every node for selection as an exit, assigning 0 weight to non-Exit nodes.
When choosing a relay, the client would have to specify which index to use. We could either have a separate (labeled) set of SNIPs entries for each index, or we could have each SNIP have a separate (labeled) index range for each index.
REGRESSION: the client's choice of which index to use would leak the next router's position and purpose in the circuit. This information is something that we believe relays can infer now, but it's not a desired feature that they can.
3.5. Does this design break onion service introduce handshakes?
In rend-spec-v3.txt section 3.3.2, we specify a variant of ntor for use in INTRODUCE2 handshakes. It allows the client to send encrypted data as part of its initial ntor handshake, but requires the client to know the onion service's onion key before it sends its initial handshake.
That won't be a problem for us here, though: we still require clients to fetch onion service descriptors before contacting a onion service.
3.6. How does the onion service directory work here?
The onion service directory is implemented as a hash ring, where each relay's position in the hash ring is decided by a hash of its identity, the current date, and a shared random value that the authorities compute each day.
To implement this hash ring using walking onions, we would need to have an extra index based not on bandwidth, but on position in the hash ring. Then onion services and clients could build a circuit, then extend it one more hop specifying their desired index in the hash ring.
We could either have a command to retrieve a trio of hashring-based routing entries by index, or to retrieve (or connect to?) the n'th item after a given hashring entry.
3.7. How can clients choose guard nodes?
We can reuse the fallback directories here. A newly bootstrapping client would connect to a fallback directory, then build a three-hop circuit, and finally extend the three-hop circuit by indexing to a random guard node. The random guard node's SNIP would contain the information that the client needs to build real circuits through that guard in the future. Because the client would be building a three-hop circuit, the fallback directory would not learn the client's guards.
(Note that even if the extend attempt fails, we should still pick the node as a possible guard based on its router entry, so that other nodes can't veto our choice of guards.)
3.8. Does the walking onions design preclude postquantum circuit handshakes?
Not at all! Both proposal 263 (ntru) and proposal 270 (newhope) work by having the client generate an ephemeral key as part of its initial handshake. The client does not need to know the relay's onion key to do this, so we can still integrate those proposals with this one.
3.9. Does the walking onions design stop us from changing the network topology?
For Tor to continue to scale, we will someday need to accept that not every relay can be simultaneously connected to every other relay. Therefore, we will need to move from our current clique topology assumption to some other topology.
There are also proposals to change node selection rules to generate routes providing better performance, or improved resistance to local adversaries.
We can, I think, implement this kind of proposal by changing the way that ENDIVEs are generated. Instead giving every relay the same ENDIVE, the authorities would generate different ENDIVEs for different relays, depending on the probability distribution of which relay should be chosen after which in the network topology. In the extreme case, this would produce O(n) ENDIVEs and O(n^2) SNIPs. In practice, I hope that we could do better by having the network topology be non-clique, and by having many relays share the same distribution of successors.
3.10. How can clients handle exit policies?
This is an unsolved challenge. If the client tells the middle relay its target port, it leaks information inappropriately.
One possibility is to try to gather exit policies into common categories, such as "most ports supported" and "most common ports supported".
Another (inefficient) possibility is for clients to keep trying exits until they find one that works.
Another (inefficient) possibility is to require that clients who use unusual ports fall back to the old mechanism for route selection.
3.11. Can this approach support families?
This is an unsolved challenge.
One (inefficient) possibility is for clients to generate circuits and discard those that use multiple relays in the same family.
One (not quite compatible) possibility is for the authorities to sort the ENDIVE so that relays in the same family are adjacent to one another. The index-bounds part of each SNIP would also have to include the bounds of the family. This approach is not quite compatible with the status quo, because it prevents relays from belonging to more than one family.
One interesting possibility (due to Chelsea Komlo, Sajin Sasy, and Ian Goldberg) is for the middle node to take responsibility for family enforcement. In this design, the client might offer the middle node multiple options for the next relay's index, and the middle node would choose the first such relay that is neither in its family nor its predecessor's family. We'd need to look for a way to make sure that the middle node wasn't biasing the path selection.
(TODO: come up with more ideas here.)
3.12. Can walking onions support IP-based and country-based restrictions?
This is an unsolved challenge.
If the user's restrictions do not exclude most paths, one (inefficient) possibility is for the user to generate paths until they generate one that they like. This idea becomes inefficient if the user is excluding most paths.
Another (inefficient and fingerprintable) possibility is to require that clients who use complex path restrictions fall back to the old mechanism for route selection.
(TODO: come up with better ideas here.)
3.13. What scaling problems have we not solved with this design?
The walking onions design doesn't solve (on its own) the problem that the authorities need to know about every relay, and arrange to have every relay tested.
The walking onions design doesn't solve (on its own) the problem that relays need to have a list of all the relays. (But see section 3.9 above.)
3.14. Should we still have clients download a consensus when they're using walking onions?
There are some fields in the current consensus directory documents that the clients will still need, like the list of supported protocols and network parameters. A client that uses walking onions should download a new flavor of consensus document that contains only these fields, and does not list any relays. In some signature schemes, this consensus would contain a digest of the ENDIVE -- see 3.2 above.
(Note that this document would be a "consensus document" but not a "consensus directory", since it doesn't list any relays.)
Putting it all together
[This is the section where, in a later version of this proposal, I would specify the exact behavior and data formats to be used here. Right now, I'd say we're too early in the design phase.]
A.1. Acknowledgments
Thanks to Peter Palfrader for his original design in proposal 141, and to the designers of PIR-Tor, both of which inspired aspects of this Walking Onions design.
Thanks to Chelsea Komlo, Sajin Sasy, and Ian Goldberg for feedback on an earlier version of this design.
Thanks to David Goulet, Teor, and George Kadianakis for commentary on earlier versions of this draft.
A.2. Additional ideas
Teor notes that there are ways to try to get this idea to apply to one-pass circuit construction, something like the old onion design. We might be able to derive indices and keys from the same seeds, even. I don't see a way to do this without losing forward secrecy, but it might be worth looking at harder. _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
On 2019-02-05 18:44, Michael Rogers wrote:
Argh, I'm really sorry, I thought I'd reached the end of the proposal but my questions were addressed further down. Sorry for the noise.
Hang on. The issues you mentioned are indeed addressed in the proposal, and there are also solutions given in the proposal. But it might be that your solution goes beyond the one in the proposal.
It's this part in your solution:
"[...], then send *that relay's index* in its extend cell, so the relay receiving the extend cell wouldn't know whether the index was picked randomly by a client with no special requirements, or non-randomly by a client with special requirements?"
This is also related to the concern in 3.12 of the proposal where falling back to the old mechanism for route selection is said to be fingerprintable.
I'd be curious whether your solution is really fingerprintable.
All in all, I'm excited to see this proposal! This seems like a huge step forward for clients with limited bandwidth and for the wonderful future where we have tens of thousands relays.
All the best, Karsten
Cheers, Michael
On 05/02/2019 17:42, Michael Rogers wrote:
I'm very happy to see this proposal! Two quick questions about relay selection:
- Can a client specify that it wants an exit node whose policy allows
something unusual, e.g. exiting to a port that's not allowed by the default policy? If not, does the client need to keep picking exit nodes until it gets a SNIP with a suitable policy?
- Similarly, if a client has restrictions on the guard nodes it can
connect to (fascist firewall or IPv4/v6 restrictions, for example), does it need to keep picking guards via the directory fallback circuit until it gets a suitable one?
In both cases, perhaps a client with unusual requirements could first download the consensus, find a relay matching its requirements, then send that relay's index in its extend cell, so the relay receiving the extend cell wouldn't know whether the index was picked randomly by a client with no special requirements, or non-randomly by a client with special requirements?
I think this would allow the majority of clients to save bandwidth by not downloading the consensus, without allowing relays to distinguish the minority of clients with unusual exit/guard requirements. (The presence of the full consensus on disk would indicate that the client had unusual exit/guard requirements at some point, however.)
Cheers, Michael
On 05/02/2019 17:02, Nick Mathewson wrote:
Filename: 300-walking-onions.txt Title: Walking Onions: Scaling and Saving Bandwidth Author: Nick Mathewson Created: 5-Feb-2019 Status: Draft
Status
This proposal describes a mechanism called "Walking Onions" for scaling the Tor network and reducing the amount of client bandwidth used to maintain a client's view of the Tor network.
This is a draft proposal; there are problems left to be solved and questions left to be answered. Once those parts are done, we can fill in section 4 with the final details of the design.
Introduction
In the current Tor network design, we assume that every client has a complete view of all the relays in the network. To achieve this, clients download consensus directories at regular intervals, and download descriptors for every relay listed in the directory.
The substitution of microdescriptors for regular descriptors (proposal 158) and the use of consensus diffs (proposal 140) have lowered the bytes that clients must dedicate to directory operations. But we still face the problem that, if we force each client to know about every relay in the network, each client's directory traffic will grow linearly with the number of relays in the network.
Another drawback in our current system is that client directory traffic is front-loaded: clients need to fetch an entire directory before they begin building circuits. This places extra delays on clients, and extra load on the network.
To anonymize the world, we will need to scale to a much larger number of relays and clients: requiring clients to know about every relay in the set simply won't scale, and requiring every new client to download a large document is also problematic.
There are obvious responses here, and some other anonymity tools have taken them. It's possible to have a client only use a fraction of the relays in a network--but doing so opens the client to _epistemic attacks_, in which the difference in clients' views of the network is used to partition their traffic. It's also possible to move the problem of selecting relays from the client to the relays themselves, and let each relay select the next relay in turn--but this choice opens the client to _route capture attacks_, in which a malicious relay selects only other malicious relays.
In this proposal, I'll describe a design for eliminating up-front client directory downloads. Clients still choose relays at random, but without ever having to hold a list of all the relays. This design does not require clients to trust relays any more than they do today, or open clients to epistemic attacks.
I hope to maintain feature parity with the current Tor design; I'll list the places in which I haven't figured out how to do so yet.
I'm naming this design "walking onions". The walking onion (Allium x proliferum) reproduces by growing tiny little bulbs at the end of a long stalk. When the stalk gets too top-heavy, it flops over, and the little bulbs start growing somewhere new.
The rest of this document will run as follows. In section 2, I'll explain the ideas behind the "walking onions" design, and how they can eliminate the need for regular directory downloads. In section 3, I'll answer a number of follow-up questions that arise, and explain how to keep various features in Tor working. Section 4 (not yet written) will elaborate all the details needed to turn this proposal into a concrete set of specification changes.
Overview
2.1. Recapping proposal 141
Back in Proposal 141 ("Download server descriptors on demand"), Peter Palfrader proposed an idea for eliminating ahead-of-time descriptor downloads. Instead of fetching all the descriptors in advance, a client would fetch the descriptor for each relay in its path right before extending the circuit to that relay. For example, if a client has a circuit from A->B and wants to extend the circuit to C, the client asks B for C's descriptor, and then extends the circuit to C.
(Note that the client needs to fetch the descriptor every time it extends the circuit, so that an observer can't tell whether the client already had the descriptor or not.)
There are a couple of limitations for this design: * It still requires clients to download a consensus. * It introduces a extra round-trip to each hop of the circuit extension process.
I'll show how to solve these problems in the two sections below.
2.2. An observation about the ntor handshake.
I'll start with an observation about our current circuit extension handshake, ntor: it should not actually be necessary to know a relay's onion key before extending to it.
Right now, the client sends: NODEID (The relay's identity) KEYID (The relay's public onion key) CLIENT_PK (a diffie-hellman public key)
and the relay responds with: SERVER_PK (a diffie-hellman public key) AUTH (a function of the relay's private keys and *all* of the public keys.)
Both parties generate shared symmetric keys from the same inputs that are are used to create the AUTH value.
The important insight here is that we could easily change this handshake so that the client sends only CLIENT_PK, and receives NODEID and KEYID as part of the response.
In other words, the client needs to know the relay's onion key to _complete_ the handshake, but doesn't actually need to know the relay's onion key in order to _initiate_ the handshake.
This is the insight that will let us save a round trip: When the client goes to extend a circuit from A->B to C, it can send B a request to extend to C and retrieve C's descriptor in a single step. Specifically, the client sends only CLIENT_PK, and relay B can include C's keys as part of the EXTENDED cell.
2.3. Extending by certified index
Now I'll explain how the client can avoid having to download a list of relays entirely.
First, let's look at how a client chooses a random relay today. First, the client puts all of the relays in a list, and computes a weighted bandwidth for each one. For example, suppose the relay identities are R1, R2, R3, R4, and R5, and their bandwidth weights are 50, 40, 30, 20, and 10. The client makes a table like this:
Relay Weight Range of index values R1 50 0..49 R2 40 50..89 R3 30 90..119 R4 20 120..139 R5 10 140..149
To choose a random relay, the client picks a random index value between 0 and 149 inclusive, and looks up the corresponding relay in the table. For example, if the client's random number is 77, it will choose R2. If its random number is 137, it chooses R4.
The key observation for the "walking onions" design is that the client doesn't actually need to construct this table itself. Instead, we will have this table be constructed by the authorities and distributed to all the relays.
Here's how it works: let's have the authorities make a new kind of consensus-like thing. We'll call it an Efficient Network Directory with Individually Verifiable Entries, or "ENDIVE" for short. This will differ from the client's index table above in two ways. First, every entry in the ENDIVE is normalized so that the bandwidth weights maximum index is 2^32-1:
Relay Normalized weight Range of index values R1 0x55555546 0x00000000..0x55555545 R2 0x44444438 0x55555546..0x9999997d R3 0x3333332a 0x9999997e..0xcccccca7 R4 0x2222221c 0xcccccca8..0xeeeeeec3 R5 0x1111113c 0xeeeeeec4..0xffffffff
Second, every entry in the ENDIVE is timestamped and signed by the authorities independently, so that when a client sees a line from the table above, it can verify that it came from an authentic ENDIVE. When a client has chosen a random index, one of these entries will prove to the client that a given relay corresponds to that index. Because of this property, we'll be calling these entries "Separable Network Index Proofs", or "SNIP"s for short.
For example, a single SNIP from the table above might consist of: * A range of times during which this SNIP is valid * R1's identity * R1's ntor onion key * R1's address * The index range 0x00000000..0x55555545 * A signature of all of the above, by a number of authorities
Let's put it together. Suppose that the client has a circuit from A->B, and it wants to extend to a random relay, chosen randomly weighted by bandwidth.
The client picks a random index value between 0 and 2**32 - 1. It sends that index to relay B in its EXTEND cell, along with a g^x value for the ntor handshake.
Note: the client doesn't send an address or identity for the next relay, since it doesn't know what relay it has chosen! (The combination of an index and a g^x value is what I'm calling a "walking onion".)
Now, relay B looks up the index in its most recent ENDIVE, to learn which relay the client selected.
(For example, suppose that the client's random index value is 0x50000001. This index value falls between 0x00000000 and 0x55555546 in the table above, so the relay B sees that the client has chosen R1 as its next hop.)
Relay B sends a create cell to R1 as usual. When it gets a CREATED reply, it includes the authority-signed SNIP for R1 as part of the EXTENDED cell.
As part of verifying the handshake, the client verifies that the SNIP was signed by enough authorities, that its timestamp is recent enough, and that it actually corresponds to the random index that the client selected.
Notice the properties we have with this design:
- Clients can extend circuits without having a list of all the relays. - Because the client's random index needs to match a routing entry signed by the authorities, the client is still selecting a relay randomly by weight. A hostile relay cannot choose which relay to send the client.
On a failure to extend, a relay should still report the routing entry for the other relay that it couldn't connect to. As before, a client will start a new circuit if a partially constructed circuit is a partial failure.
We could achieve a reliability/security tradeoff by letting clients offer the relay a choice of two or more indices to extend to. This would help reliability, but give the relay more influence over the path. We'd need to analyze this impact.
In the next section, I'll discuss a bunch of details that we need to straighten out in order to make this design work.
- Sorting out the details.
3.1. Will these routing entries fit in EXTEND2 and EXTENDED2 cells?
The EXTEND2 cell is probably big enough for this design. The random index that the client sends can be a new "link specifier" type, replacing the IP and identity link specifiers.
The EXTENDED2 cell is likely to need to grow here. We'll need to implement proposal 249 ("Allow CREATE cells with >505 bytes of handshake data") so that EXTEND2 and EXTENDED2 cells can be larger.
3.2. How should SNIPs be signed?
We have a few options, and I'd like to look into the possibilities here more closely.
The simplest possibility is to use **multiple signatures** on each SNIP, the way we do today for consensuses. These signatures should be made using medium-term Ed25519 keys from the authorities. At a cost of 64 bytes per signature, at 9 authorities, we would need 576 bytes for each SNIP. These signatures could be batch-verified to save time at the client side. Since generating a signature takes around 20 usec on my mediocre laptop, authorities should be able to generate this many signatures fairly easily.
Another possibility is to use a **threshold signature** on each SNIP, so that the authorities collaboratively generate a short signature that the clients can verify. There are multiple threshold signature schemes that we could consider here, though I haven't yet found one that looks perfect.
Another possibility is to use organize the SNIPs in a **merkle tree with a signed root**. For this design, clients could download the signed root periodically, and receive the hash-path from the signed root to the SNIP. This design might help with certificate-transparency-style designs, and it would be necessary if we ever want to move to a postquantum signature algorithm that requires large signatures.
Another possibility (due to a conversation among Chelsea Komlo, Sajin Sasy, and Ian Goldberg), is to *use SNARKs*. (Why not? All the cool kids are doing it!) For this, we'd have the clients download a signed hash of the ENDIVE periodically, and have the authorities generate a SNARK for each SNIP, proving its presence in that document.
3.3. How can we detect authority misbehavior?
We might want to take countermeasures against the possibility that a quorum of corrupt or compromised authorities give some relays a different set of SNIPs than they give other relays.
If we incorporate a merkle tree or a hash chain in the design, we can use mechanisms similar to certificate transparency to ensure that the authorities have a consistent log of all the entries that they have ever handed out.
3.4. How many types of weighted node selection are there, and how do we handle them?
Right now, there are multiple weights that we use in Tor: * Weight for exit * Weight for guard * Weight for middle node
We also filter nodes for several properties, such as flags they have.
To reproduce this behavior, we should enumerate the various weights and filters that we use, and (if there are not too many) create a separate index for each. For example, the Guard index would weight every node for selection as guard, assigning 0 weight to non-Guard nodes. The Exit index would weight every node for selection as an exit, assigning 0 weight to non-Exit nodes.
When choosing a relay, the client would have to specify which index to use. We could either have a separate (labeled) set of SNIPs entries for each index, or we could have each SNIP have a separate (labeled) index range for each index.
REGRESSION: the client's choice of which index to use would leak the next router's position and purpose in the circuit. This information is something that we believe relays can infer now, but it's not a desired feature that they can.
3.5. Does this design break onion service introduce handshakes?
In rend-spec-v3.txt section 3.3.2, we specify a variant of ntor for use in INTRODUCE2 handshakes. It allows the client to send encrypted data as part of its initial ntor handshake, but requires the client to know the onion service's onion key before it sends its initial handshake.
That won't be a problem for us here, though: we still require clients to fetch onion service descriptors before contacting a onion service.
3.6. How does the onion service directory work here?
The onion service directory is implemented as a hash ring, where each relay's position in the hash ring is decided by a hash of its identity, the current date, and a shared random value that the authorities compute each day.
To implement this hash ring using walking onions, we would need to have an extra index based not on bandwidth, but on position in the hash ring. Then onion services and clients could build a circuit, then extend it one more hop specifying their desired index in the hash ring.
We could either have a command to retrieve a trio of hashring-based routing entries by index, or to retrieve (or connect to?) the n'th item after a given hashring entry.
3.7. How can clients choose guard nodes?
We can reuse the fallback directories here. A newly bootstrapping client would connect to a fallback directory, then build a three-hop circuit, and finally extend the three-hop circuit by indexing to a random guard node. The random guard node's SNIP would contain the information that the client needs to build real circuits through that guard in the future. Because the client would be building a three-hop circuit, the fallback directory would not learn the client's guards.
(Note that even if the extend attempt fails, we should still pick the node as a possible guard based on its router entry, so that other nodes can't veto our choice of guards.)
3.8. Does the walking onions design preclude postquantum circuit handshakes?
Not at all! Both proposal 263 (ntru) and proposal 270 (newhope) work by having the client generate an ephemeral key as part of its initial handshake. The client does not need to know the relay's onion key to do this, so we can still integrate those proposals with this one.
3.9. Does the walking onions design stop us from changing the network topology?
For Tor to continue to scale, we will someday need to accept that not every relay can be simultaneously connected to every other relay. Therefore, we will need to move from our current clique topology assumption to some other topology.
There are also proposals to change node selection rules to generate routes providing better performance, or improved resistance to local adversaries.
We can, I think, implement this kind of proposal by changing the way that ENDIVEs are generated. Instead giving every relay the same ENDIVE, the authorities would generate different ENDIVEs for different relays, depending on the probability distribution of which relay should be chosen after which in the network topology. In the extreme case, this would produce O(n) ENDIVEs and O(n^2) SNIPs. In practice, I hope that we could do better by having the network topology be non-clique, and by having many relays share the same distribution of successors.
3.10. How can clients handle exit policies?
This is an unsolved challenge. If the client tells the middle relay its target port, it leaks information inappropriately.
One possibility is to try to gather exit policies into common categories, such as "most ports supported" and "most common ports supported".
Another (inefficient) possibility is for clients to keep trying exits until they find one that works.
Another (inefficient) possibility is to require that clients who use unusual ports fall back to the old mechanism for route selection.
3.11. Can this approach support families?
This is an unsolved challenge.
One (inefficient) possibility is for clients to generate circuits and discard those that use multiple relays in the same family.
One (not quite compatible) possibility is for the authorities to sort the ENDIVE so that relays in the same family are adjacent to one another. The index-bounds part of each SNIP would also have to include the bounds of the family. This approach is not quite compatible with the status quo, because it prevents relays from belonging to more than one family.
One interesting possibility (due to Chelsea Komlo, Sajin Sasy, and Ian Goldberg) is for the middle node to take responsibility for family enforcement. In this design, the client might offer the middle node multiple options for the next relay's index, and the middle node would choose the first such relay that is neither in its family nor its predecessor's family. We'd need to look for a way to make sure that the middle node wasn't biasing the path selection.
(TODO: come up with more ideas here.)
3.12. Can walking onions support IP-based and country-based restrictions?
This is an unsolved challenge.
If the user's restrictions do not exclude most paths, one (inefficient) possibility is for the user to generate paths until they generate one that they like. This idea becomes inefficient if the user is excluding most paths.
Another (inefficient and fingerprintable) possibility is to require that clients who use complex path restrictions fall back to the old mechanism for route selection.
(TODO: come up with better ideas here.)
3.13. What scaling problems have we not solved with this design?
The walking onions design doesn't solve (on its own) the problem that the authorities need to know about every relay, and arrange to have every relay tested.
The walking onions design doesn't solve (on its own) the problem that relays need to have a list of all the relays. (But see section 3.9 above.)
3.14. Should we still have clients download a consensus when they're using walking onions?
There are some fields in the current consensus directory documents that the clients will still need, like the list of supported protocols and network parameters. A client that uses walking onions should download a new flavor of consensus document that contains only these fields, and does not list any relays. In some signature schemes, this consensus would contain a digest of the ENDIVE -- see 3.2 above.
(Note that this document would be a "consensus document" but not a "consensus directory", since it doesn't list any relays.)
Putting it all together
[This is the section where, in a later version of this proposal, I would specify the exact behavior and data formats to be used here. Right now, I'd say we're too early in the design phase.]
A.1. Acknowledgments
Thanks to Peter Palfrader for his original design in proposal 141, and to the designers of PIR-Tor, both of which inspired aspects of this Walking Onions design.
Thanks to Chelsea Komlo, Sajin Sasy, and Ian Goldberg for feedback on an earlier version of this design.
Thanks to David Goulet, Teor, and George Kadianakis for commentary on earlier versions of this draft.
A.2. Additional ideas
Teor notes that there are ways to try to get this idea to apply to one-pass circuit construction, something like the old onion design. We might be able to derive indices and keys from the same seeds, even. I don't see a way to do this without losing forward secrecy, but it might be worth looking at harder. _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Thanks for the proposal, Nick! I think it's definitely discussion in a good direction!
On Wed, Feb 06, 2019 at 11:18:12AM +0100, Karsten Loesing wrote:
On 2019-02-05 18:44, Michael Rogers wrote:
Argh, I'm really sorry, I thought I'd reached the end of the proposal but my questions were addressed further down. Sorry for the noise.
Hang on. The issues you mentioned are indeed addressed in the proposal, and there are also solutions given in the proposal. But it might be that your solution goes beyond the one in the proposal.
It's this part in your solution:
"[...], then send *that relay's index* in its extend cell, so the relay receiving the extend cell wouldn't know whether the index was picked randomly by a client with no special requirements, or non-randomly by a client with special requirements?"
This is also related to the concern in 3.12 of the proposal where falling back to the old mechanism for route selection is said to be fingerprintable.
I'd be curious whether your solution is really fingerprintable.
At some point, you hit the problem that if there's a really unusual port that not many relays will exit to, and a very-low-bandwidth relay does support that port, then anyone extending to that relay is pretty likely doing so in order to exit to that port. (This happens today, not just with Walking Onions.) So the question is whether the solution is *more* fingerprintable than that.
for the wonderful future where we have tens of thousands relays.
Think bigger. :-)
[And to be clear in the proposal itself, I was not _advocating_ using SNARKs, but rather just pointing out that that's a thing one could conceivably use if you really wanted to keep the current consensus format, but still prove that this SNIP was part of it, Cinderella[0]-style.]
[0] https://www.andrew.cmu.edu/user/bparno/papers/cinderella.pdf
Another issue not addressed by the current proposal is how to handle the "not all the relays have upgraded" problem.
On 12 Feb 2019, at 04:49, Ian Goldberg iang@uwaterloo.ca wrote:
Another issue not addressed by the current proposal is how to handle the "not all the relays have upgraded" problem.
And how to handle the onion service protocol (in detail).
Here's one possible migration path: (I think it works, but we should spec these changes in detail.)
Step 0: graceful protocol upgrades on new relays
The walking onions EXTEND and CREATE cells are different from the legacy cells. (If the walking onions CREATE cell was exactly the same as the current CREATE cell, clients would only need to check the current hop's protocols.)
So, when extending, a client: * sends the certified index as an additional link specifier in the EXTEND cell * if both relays support the protocol, the EXTENDED cell contains an ENDIVE in addition to its legacy content. Otherwise, it is a legacy EXTENDED cell.
For v3 onion services: * service descriptors contain the certified index as *an additional* link specifier * client intro EXTENDs contain the certified index as *an additional* link specifier * client INTRODUCE cells contain the certified index as *an additional* link specifier * service rendezvous EXTEND cells contain the certified index as *an additional* link specifier
(If we implement unknown link specifier passthrough from the v3 onion service proposal, these protocol upgrades should happen automatically.)
For v3 single onion services: * introduction points are looked up via a 3-hop path, like guards, using a random middle index * rendezvous points are connected to directly, using their IP and ID link specifiers. (A 3-hop lookup is too slow for rendezvous.) * the rest of the protocol remains the same
(v2 onion services can't support walking onions, because their protocol isn't flexible enough.)
Step 1: require protocol upgrades on new relays
For standard paths, like Step 0, but: * if both relays support the protocol, send the certified index as *the only* link specifier in the EXTEND cell
For v3 onion services: * descriptors remain the same as step 0. (The link specifiers in the descriptor can be 24 hours old, so the client can't rely on the service's view of the intro's protocols.) * if both relays support the new protocol, client intro EXTEND cells contain the certified index as *the only* link specifier * client INTRODUCE cells remain the same as step 0. (To support single onion services, which need legacy link specifiers.) * if both relays support the new protocol, service rendezvous EXTEND cells contain the certified index as *the only* link specifier
For v3 single onion services, the protocol remains the same: * rendezvous points are connected to directly, using their IP and ID link specifiers. (A 3-hop lookup is too slow for rendezvous.)
Step 2: ignore legacy relays
Like Step 1, but: * check the consensus for a flag that tells us to ignore legacy relays
v3 onion services continue to include legacy ID and IP link specifiers, for older tor versions, and single onion services.
Step 3: remove legacy onion service descriptor link specifiers
Once all onion services and clients support walking onions: * service descriptors contain the certified index as *the only* link specifier. (Clients will have to make a multi-hop path. But that's ok, because we don't support Tor2web on v3 onion services.) * client INTRODUCE cells continue to include legacy IP and ID link specifiers, to support single onion services.
This upgrade process leads to some more questions:
1. How do clients discover protocol versions?
We can do one or more of: * ENDIVEs contain a list of supported protocol versions for each relay, or * clients ask the current hop for a copy of its descriptor, which contains protocol versions (see below for tradeoffs), or * we have a flag in the consensus that tells clients if they can safely ignore legacy relays
2. How do clients keep their guard information up to date?
Two options: * They ask the guard for its descriptor. * They re-extend to the guard via a 3-hop path
Fetching the descriptor could also help us in situations where we need to discover less common information about the relay. But it's slower, and it looks different to the relay, and on the wire. So we might want to avoid it. (That said, Tor already fetches descriptors from bridges.)
3. How do we minimise Tor version distinguishers?
We should implement all the stages in a small number of Tor releases, then active them using a consensus parameter.
We should try to implement onion service link specifier passthrough in the same release as full onion service IPv6 support, so we have fewer onion service version distinguishers.
If we implement it later, we might end up with: * 2-4 IPv6 distinguishers (until these versions are unsupported) * 1 passthrough distinguisher * 1-3 walking onions stage 0/1/2 distinguisher(s)
T
Hi Nick,
Thanks for posting this initial draft. I enjoyed reading more of the details, after hearing about it last week.
On February 5, 2019 5:02:50 PM UTC, Nick Mathewson nickm@torproject.org wrote:
Filename: 300-walking-onions.txt Title: Walking Onions: Scaling and Saving Bandwidth Author: Nick Mathewson Created: 5-Feb-2019 Status: Draft
- Status
This proposal describes a mechanism called "Walking Onions" for scaling the Tor network and reducing the amount of client bandwidth used to maintain a client's view of the Tor network.
...
- As part of verifying the handshake, the client verifies that the SNIP was signed by enough authorities, that its timestamp is recent enough, and that it actually corresponds to the random index that the client selected.
Let's make sure that we check the signature *first*, before parsing the rest of the document. (Maybe that's something we can specify when we write the detailed section 4.)
Tor's current directory parsing implementation parses the document, then checks the signature. This order makes some parsing bugs easier to trigger, because they don't require a valid set of authority signatures.
We could encourage implementers to check the signature first by putting it first in the document, or adding a signature offset field to the header. Or we could document this issue in a security considerations section, and hope all the implementers read it.
T
-- teor ----------------------------------------------------------------------
Hi Nick!
On Tue, Feb 5, 2019 at 11:03 AM Nick Mathewson nickm@torproject.org wrote:
Filename: 300-walking-onions.txt Title: Walking Onions: Scaling and Saving Bandwidth Author: Nick Mathewson Created: 5-Feb-2019 Status: Draft
...
2.3. Extending by certified index
...
Here's how it works: let's have the authorities make a new kind of consensus-like thing. We'll call it an Efficient Network Directory with Individually Verifiable Entries, or "ENDIVE" for short. This will differ from the client's index table above in two ways. First, every entry in the ENDIVE is normalized so that the bandwidth weights maximum index is 2^32-1:
Relay Normalized weight Range of index values R1 0x55555546 0x00000000..0x55555545 R2 0x44444438 0x55555546..0x9999997d R3 0x3333332a 0x9999997e..0xcccccca7 R4 0x2222221c 0xcccccca8..0xeeeeeec3 R5 0x1111113c 0xeeeeeec4..0xffffffff
Second, every entry in the ENDIVE is timestamped and signed by the authorities independently, so that when a client sees a line from the table above, it can verify that it came from an authentic ENDIVE. When a client has chosen a random index, one of these entries will prove to the client that a given relay corresponds to that index. Because of this property, we'll be calling these entries "Separable Network Index Proofs", or "SNIP"s for short.
For example, a single SNIP from the table above might consist of: * A range of times during which this SNIP is valid * R1's identity * R1's ntor onion key * R1's address * The index range 0x00000000..0x55555545 * A signature of all of the above, by a number of authorities
Let's put it together. Suppose that the client has a circuit from A->B, and it wants to extend to a random relay, chosen randomly weighted by bandwidth.
The client picks a random index value between 0 and 2**32 - 1. It sends that index to relay B in its EXTEND cell, along with a g^x value for the ntor handshake.
Note: the client doesn't send an address or identity for the next relay, since it doesn't know what relay it has chosen! (The combination of an index and a g^x value is what I'm calling a "walking onion".)
Now, relay B looks up the index in its most recent ENDIVE, to learn which relay the client selected.
(For example, suppose that the client's random index value is 0x50000001. This index value falls between 0x00000000 and 0x55555546 in the table above, so the relay B sees that the client has chosen R1 as its next hop.)
Relay B sends a create cell to R1 as usual. When it gets a CREATED reply, it includes the authority-signed SNIP for R1 as part of the EXTENDED cell.
As part of verifying the handshake, the client verifies that the SNIP was signed by enough authorities, that its timestamp is recent enough, and that it actually corresponds to the random index that the client selected.
This is very cool. One thing that comes to mind reading the proposal is that you will either want some fancy sorting scheme to try to make ranges (nearly) consistent across ENDIVEs, OR will want to have the Relay commit to an ENDIVE version before the client sends an index (otherwise an adversarial relay group can increase their effective weight by a factor of # of acceptable ENDIVE versions).
Notice the properties we have with this design:
- Clients can extend circuits without having a list of all the relays. - Because the client's random index needs to match a routing entry signed by the authorities, the client is still selecting a relay randomly by weight. A hostile relay cannot choose which relay to send the client.
On a failure to extend, a relay should still report the routing entry for the other relay that it couldn't connect to. As before, a client will start a new circuit if a partially constructed circuit is a partial failure.
We could achieve a reliability/security tradeoff by letting clients offer the relay a choice of two or more indices to extend to. This would help reliability, but give the relay more influence over the path. We'd need to analyze this impact.
In the next section, I'll discuss a bunch of details that we need to straighten out in order to make this design work.
- Sorting out the details.
3.1. Will these routing entries fit in EXTEND2 and EXTENDED2 cells?
The EXTEND2 cell is probably big enough for this design. The random index that the client sends can be a new "link specifier" type, replacing the IP and identity link specifiers.
The EXTENDED2 cell is likely to need to grow here. We'll need to implement proposal 249 ("Allow CREATE cells with >505 bytes of handshake data") so that EXTEND2 and EXTENDED2 cells can be larger.
3.2. How should SNIPs be signed?
We have a few options, and I'd like to look into the possibilities here more closely.
The simplest possibility is to use **multiple signatures** on each SNIP, the way we do today for consensuses. These signatures should be made using medium-term Ed25519 keys from the authorities. At a cost of 64 bytes per signature, at 9 authorities, we would need 576 bytes for each SNIP. These signatures could be batch-verified to save time at the client side. Since generating a signature takes around 20 usec on my mediocre laptop, authorities should be able to generate this many signatures fairly easily.
Another possibility is to use a **threshold signature** on each SNIP, so that the authorities collaboratively generate a short signature that the clients can verify. There are multiple threshold signature schemes that we could consider here, though I haven't yet found one that looks perfect.
Another possibility is to use organize the SNIPs in a **merkle tree with a signed root**. For this design, clients could download the signed root periodically, and receive the hash-path from the signed root to the SNIP. This design might help with certificate-transparency-style designs, and it would be necessary if we ever want to move to a postquantum signature algorithm that requires large signatures.
Another possibility (due to a conversation among Chelsea Komlo, Sajin Sasy, and Ian Goldberg), is to *use SNARKs*. (Why not? All the cool kids are doing it!) For this, we'd have the clients download a signed hash of the ENDIVE periodically, and have the authorities generate a SNARK for each SNIP, proving its presence in that document.
3.3. How can we detect authority misbehavior?
We might want to take countermeasures against the possibility that a quorum of corrupt or compromised authorities give some relays a different set of SNIPs than they give other relays.
If we incorporate a merkle tree or a hash chain in the design, we can use mechanisms similar to certificate transparency to ensure that the authorities have a consistent log of all the entries that they have ever handed out.
3.4. How many types of weighted node selection are there, and how do we handle them?
Right now, there are multiple weights that we use in Tor: * Weight for exit * Weight for guard * Weight for middle node
We also filter nodes for several properties, such as flags they have.
To reproduce this behavior, we should enumerate the various weights and filters that we use, and (if there are not too many) create a separate index for each. For example, the Guard index would weight every node for selection as guard, assigning 0 weight to non-Guard nodes. The Exit index would weight every node for selection as an exit, assigning 0 weight to non-Exit nodes.
When choosing a relay, the client would have to specify which index to use. We could either have a separate (labeled) set of SNIPs entries for each index, or we could have each SNIP have a separate (labeled) index range for each index.
REGRESSION: the client's choice of which index to use would leak the next router's position and purpose in the circuit. This information is something that we believe relays can infer now, but it's not a desired feature that they can.
3.5. Does this design break onion service introduce handshakes?
In rend-spec-v3.txt section 3.3.2, we specify a variant of ntor for use in INTRODUCE2 handshakes. It allows the client to send encrypted data as part of its initial ntor handshake, but requires the client to know the onion service's onion key before it sends its initial handshake.
That won't be a problem for us here, though: we still require clients to fetch onion service descriptors before contacting a onion service.
3.6. How does the onion service directory work here?
The onion service directory is implemented as a hash ring, where each relay's position in the hash ring is decided by a hash of its identity, the current date, and a shared random value that the authorities compute each day.
To implement this hash ring using walking onions, we would need to have an extra index based not on bandwidth, but on position in the hash ring. Then onion services and clients could build a circuit, then extend it one more hop specifying their desired index in the hash ring.
We could either have a command to retrieve a trio of hashring-based routing entries by index, or to retrieve (or connect to?) the n'th item after a given hashring entry.
3.7. How can clients choose guard nodes?
We can reuse the fallback directories here. A newly bootstrapping client would connect to a fallback directory, then build a three-hop circuit, and finally extend the three-hop circuit by indexing to a random guard node. The random guard node's SNIP would contain the information that the client needs to build real circuits through that guard in the future. Because the client would be building a three-hop circuit, the fallback directory would not learn the client's guards.
(Note that even if the extend attempt fails, we should still pick the node as a possible guard based on its router entry, so that other nodes can't veto our choice of guards.)
3.8. Does the walking onions design preclude postquantum circuit handshakes?
Not at all! Both proposal 263 (ntru) and proposal 270 (newhope) work by having the client generate an ephemeral key as part of its initial handshake. The client does not need to know the relay's onion key to do this, so we can still integrate those proposals with this one.
3.9. Does the walking onions design stop us from changing the network topology?
For Tor to continue to scale, we will someday need to accept that not every relay can be simultaneously connected to every other relay. Therefore, we will need to move from our current clique topology assumption to some other topology.
There are also proposals to change node selection rules to generate routes providing better performance, or improved resistance to local adversaries.
We can, I think, implement this kind of proposal by changing the way that ENDIVEs are generated. Instead giving every relay the same ENDIVE, the authorities would generate different ENDIVEs for different relays, depending on the probability distribution of which relay should be chosen after which in the network topology. In the extreme case, this would produce O(n) ENDIVEs and O(n^2) SNIPs. In practice, I hope that we could do better by having the network topology be non-clique, and by having many relays share the same distribution of successors.
3.10. How can clients handle exit policies?
This is an unsolved challenge. If the client tells the middle relay its target port, it leaks information inappropriately.
One possibility is to try to gather exit policies into common categories, such as "most ports supported" and "most common ports supported".
Another (inefficient) possibility is for clients to keep trying exits until they find one that works.
Another (inefficient) possibility is to require that clients who use unusual ports fall back to the old mechanism for route selection.
3.11. Can this approach support families?
This is an unsolved challenge.
One (inefficient) possibility is for clients to generate circuits and discard those that use multiple relays in the same family.
One (not quite compatible) possibility is for the authorities to sort the ENDIVE so that relays in the same family are adjacent to one another. The index-bounds part of each SNIP would also have to include the bounds of the family. This approach is not quite compatible with the status quo, because it prevents relays from belonging to more than one family.
One interesting possibility (due to Chelsea Komlo, Sajin Sasy, and Ian Goldberg) is for the middle node to take responsibility for family enforcement. In this design, the client might offer the middle node multiple options for the next relay's index, and the middle node would choose the first such relay that is neither in its family nor its predecessor's family. We'd need to look for a way to make sure that the middle node wasn't biasing the path selection.
(TODO: come up with more ideas here.)
3.12. Can walking onions support IP-based and country-based restrictions?
This is an unsolved challenge.
If the user's restrictions do not exclude most paths, one (inefficient) possibility is for the user to generate paths until they generate one that they like. This idea becomes inefficient if the user is excluding most paths.
Another (inefficient and fingerprintable) possibility is to require that clients who use complex path restrictions fall back to the old mechanism for route selection.
(TODO: come up with better ideas here.)
3.13. What scaling problems have we not solved with this design?
The walking onions design doesn't solve (on its own) the problem that the authorities need to know about every relay, and arrange to have every relay tested.
The walking onions design doesn't solve (on its own) the problem that relays need to have a list of all the relays. (But see section 3.9 above.)
3.14. Should we still have clients download a consensus when they're using walking onions?
There are some fields in the current consensus directory documents that the clients will still need, like the list of supported protocols and network parameters. A client that uses walking onions should download a new flavor of consensus document that contains only these fields, and does not list any relays. In some signature schemes, this consensus would contain a digest of the ENDIVE -- see 3.2 above.
(Note that this document would be a "consensus document" but not a "consensus directory", since it doesn't list any relays.)
Putting it all together
[This is the section where, in a later version of this proposal, I would specify the exact behavior and data formats to be used here. Right now, I'd say we're too early in the design phase.]
A.1. Acknowledgments
Thanks to Peter Palfrader for his original design in proposal 141, and to the designers of PIR-Tor, both of which inspired aspects of this Walking Onions design.
Thanks to Chelsea Komlo, Sajin Sasy, and Ian Goldberg for feedback on an earlier version of this design.
Thanks to David Goulet, Teor, and George Kadianakis for commentary on earlier versions of this draft.
A.2. Additional ideas
Teor notes that there are ways to try to get this idea to apply to one-pass circuit construction, something like the old onion design. We might be able to derive indices and keys from the same seeds, even. I don't see a way to do this without losing forward secrecy, but it might be worth looking at harder. _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
On Wed, Feb 6, 2019 at 10:45 PM Nicholas Hopper hopper@cs.umn.edu wrote: [...]
This is very cool. One thing that comes to mind reading the proposal is that you will either want some fancy sorting scheme to try to make ranges (nearly) consistent across ENDIVEs, OR will want to have the Relay commit to an ENDIVE version before the client sends an index (otherwise an adversarial relay group can increase their effective weight by a factor of # of acceptable ENDIVE versions).
That's right. I had been think that we wouldn't go too wrong if we sorted by identity, since bandwidth doesn't change too much from hour to hour. Another possibility to investigate is putting out ENDIVES less frequently than we currently put out consensus directories.
But I like your commitment idea better, if we can make it work: as part of its CREATED, a relay could promise that it would use a certain ENDIVE for any extend attempts in the next N minutes. We'd need to make sure that this was authenticated by the create/created handshake, but I think that should be doable.
cheers,
Hi Nick,
This is awesome. We at NRL discussed a very similar concept starting about a year and half ago after going over the PIR-Tor paper in a reading group. We've left it mostly backburnered since then, though I thought we had talked about it a few times to people at the Tor Dev meetings.
Anyway GMTA and now we don't have to wonder when we'll get around to it since you're doing it. The core ideas we had are in this Proposal already I think. Here are a few thoughts that we've discussed that I didn't see mentioned.
For handling exits: Including just exit ports along with the index of relays on the list given to clients doesn't seem like much overhead.
For ASes, MyFamily, /16 separation etc.: The first and (especially) middle relay could be expected to know the full consensus and enforce these and (provably) indicate a violation of these policies if they occur. Alternatively if a client requests of a guard or middle relay through which it is building a circuit, descriptors of a handful (say 5) relays selected by index, then the client can make this decision amongst those or start again if they all fail policy. (Clients might also make use of previously cached descriptors This is another potential source of leakage, which is reduced by including a relay index of an already cached descriptor in a request if it is a candidate to be selected since it is know to match, e.g., needed exit ports as well as family, etc. constraints.) More issues, but it's an idea probably worth contemplating.
aloha, Paul
On Tue, Feb 05, 2019 at 12:02:50PM -0500, Nick Mathewson wrote:
Filename: 300-walking-onions.txt Title: Walking Onions: Scaling and Saving Bandwidth Author: Nick Mathewson Created: 5-Feb-2019 Status: Draft
Status
This proposal describes a mechanism called "Walking Onions" for scaling the Tor network and reducing the amount of client bandwidth used to maintain a client's view of the Tor network.
This is a draft proposal; there are problems left to be solved and questions left to be answered. Once those parts are done, we can fill in section 4 with the final details of the design.
Introduction
In the current Tor network design, we assume that every client has a complete view of all the relays in the network. To achieve this, clients download consensus directories at regular intervals, and download descriptors for every relay listed in the directory.
The substitution of microdescriptors for regular descriptors (proposal 158) and the use of consensus diffs (proposal 140) have lowered the bytes that clients must dedicate to directory operations. But we still face the problem that, if we force each client to know about every relay in the network, each client's directory traffic will grow linearly with the number of relays in the network.
Another drawback in our current system is that client directory traffic is front-loaded: clients need to fetch an entire directory before they begin building circuits. This places extra delays on clients, and extra load on the network.
To anonymize the world, we will need to scale to a much larger number of relays and clients: requiring clients to know about every relay in the set simply won't scale, and requiring every new client to download a large document is also problematic.
There are obvious responses here, and some other anonymity tools have taken them. It's possible to have a client only use a fraction of the relays in a network--but doing so opens the client to _epistemic attacks_, in which the difference in clients' views of the network is used to partition their traffic. It's also possible to move the problem of selecting relays from the client to the relays themselves, and let each relay select the next relay in turn--but this choice opens the client to _route capture attacks_, in which a malicious relay selects only other malicious relays.
In this proposal, I'll describe a design for eliminating up-front client directory downloads. Clients still choose relays at random, but without ever having to hold a list of all the relays. This design does not require clients to trust relays any more than they do today, or open clients to epistemic attacks.
I hope to maintain feature parity with the current Tor design; I'll list the places in which I haven't figured out how to do so yet.
I'm naming this design "walking onions". The walking onion (Allium x proliferum) reproduces by growing tiny little bulbs at the end of a long stalk. When the stalk gets too top-heavy, it flops over, and the little bulbs start growing somewhere new.
The rest of this document will run as follows. In section 2, I'll explain the ideas behind the "walking onions" design, and how they can eliminate the need for regular directory downloads. In section 3, I'll answer a number of follow-up questions that arise, and explain how to keep various features in Tor working. Section 4 (not yet written) will elaborate all the details needed to turn this proposal into a concrete set of specification changes.
Overview
2.1. Recapping proposal 141
Back in Proposal 141 ("Download server descriptors on demand"), Peter Palfrader proposed an idea for eliminating ahead-of-time descriptor downloads. Instead of fetching all the descriptors in advance, a client would fetch the descriptor for each relay in its path right before extending the circuit to that relay. For example, if a client has a circuit from A->B and wants to extend the circuit to C, the client asks B for C's descriptor, and then extends the circuit to C.
(Note that the client needs to fetch the descriptor every time it extends the circuit, so that an observer can't tell whether the client already had the descriptor or not.)
There are a couple of limitations for this design: * It still requires clients to download a consensus. * It introduces a extra round-trip to each hop of the circuit extension process.
I'll show how to solve these problems in the two sections below.
2.2. An observation about the ntor handshake.
I'll start with an observation about our current circuit extension handshake, ntor: it should not actually be necessary to know a relay's onion key before extending to it.
Right now, the client sends: NODEID (The relay's identity) KEYID (The relay's public onion key) CLIENT_PK (a diffie-hellman public key)
and the relay responds with: SERVER_PK (a diffie-hellman public key) AUTH (a function of the relay's private keys and *all* of the public keys.)
Both parties generate shared symmetric keys from the same inputs that are are used to create the AUTH value.
The important insight here is that we could easily change this handshake so that the client sends only CLIENT_PK, and receives NODEID and KEYID as part of the response.
In other words, the client needs to know the relay's onion key to _complete_ the handshake, but doesn't actually need to know the relay's onion key in order to _initiate_ the handshake.
This is the insight that will let us save a round trip: When the client goes to extend a circuit from A->B to C, it can send B a request to extend to C and retrieve C's descriptor in a single step. Specifically, the client sends only CLIENT_PK, and relay B can include C's keys as part of the EXTENDED cell.
2.3. Extending by certified index
Now I'll explain how the client can avoid having to download a list of relays entirely.
First, let's look at how a client chooses a random relay today. First, the client puts all of the relays in a list, and computes a weighted bandwidth for each one. For example, suppose the relay identities are R1, R2, R3, R4, and R5, and their bandwidth weights are 50, 40, 30, 20, and 10. The client makes a table like this:
Relay Weight Range of index values R1 50 0..49 R2 40 50..89 R3 30 90..119 R4 20 120..139 R5 10 140..149
To choose a random relay, the client picks a random index value between 0 and 149 inclusive, and looks up the corresponding relay in the table. For example, if the client's random number is 77, it will choose R2. If its random number is 137, it chooses R4.
The key observation for the "walking onions" design is that the client doesn't actually need to construct this table itself. Instead, we will have this table be constructed by the authorities and distributed to all the relays.
Here's how it works: let's have the authorities make a new kind of consensus-like thing. We'll call it an Efficient Network Directory with Individually Verifiable Entries, or "ENDIVE" for short. This will differ from the client's index table above in two ways. First, every entry in the ENDIVE is normalized so that the bandwidth weights maximum index is 2^32-1:
Relay Normalized weight Range of index values R1 0x55555546 0x00000000..0x55555545 R2 0x44444438 0x55555546..0x9999997d R3 0x3333332a 0x9999997e..0xcccccca7 R4 0x2222221c 0xcccccca8..0xeeeeeec3 R5 0x1111113c 0xeeeeeec4..0xffffffff
Second, every entry in the ENDIVE is timestamped and signed by the authorities independently, so that when a client sees a line from the table above, it can verify that it came from an authentic ENDIVE. When a client has chosen a random index, one of these entries will prove to the client that a given relay corresponds to that index. Because of this property, we'll be calling these entries "Separable Network Index Proofs", or "SNIP"s for short.
For example, a single SNIP from the table above might consist of: * A range of times during which this SNIP is valid * R1's identity * R1's ntor onion key * R1's address * The index range 0x00000000..0x55555545 * A signature of all of the above, by a number of authorities
Let's put it together. Suppose that the client has a circuit from A->B, and it wants to extend to a random relay, chosen randomly weighted by bandwidth.
The client picks a random index value between 0 and 2**32 - 1. It sends that index to relay B in its EXTEND cell, along with a g^x value for the ntor handshake.
Note: the client doesn't send an address or identity for the next relay, since it doesn't know what relay it has chosen! (The combination of an index and a g^x value is what I'm calling a "walking onion".)
Now, relay B looks up the index in its most recent ENDIVE, to learn which relay the client selected.
(For example, suppose that the client's random index value is 0x50000001. This index value falls between 0x00000000 and 0x55555546 in the table above, so the relay B sees that the client has chosen R1 as its next hop.)
Relay B sends a create cell to R1 as usual. When it gets a CREATED reply, it includes the authority-signed SNIP for R1 as part of the EXTENDED cell.
As part of verifying the handshake, the client verifies that the SNIP was signed by enough authorities, that its timestamp is recent enough, and that it actually corresponds to the random index that the client selected.
Notice the properties we have with this design:
- Clients can extend circuits without having a list of all the relays. - Because the client's random index needs to match a routing entry signed by the authorities, the client is still selecting a relay randomly by weight. A hostile relay cannot choose which relay to send the client.
On a failure to extend, a relay should still report the routing entry for the other relay that it couldn't connect to. As before, a client will start a new circuit if a partially constructed circuit is a partial failure.
We could achieve a reliability/security tradeoff by letting clients offer the relay a choice of two or more indices to extend to. This would help reliability, but give the relay more influence over the path. We'd need to analyze this impact.
In the next section, I'll discuss a bunch of details that we need to straighten out in order to make this design work.
- Sorting out the details.
3.1. Will these routing entries fit in EXTEND2 and EXTENDED2 cells?
The EXTEND2 cell is probably big enough for this design. The random index that the client sends can be a new "link specifier" type, replacing the IP and identity link specifiers.
The EXTENDED2 cell is likely to need to grow here. We'll need to implement proposal 249 ("Allow CREATE cells with >505 bytes of handshake data") so that EXTEND2 and EXTENDED2 cells can be larger.
3.2. How should SNIPs be signed?
We have a few options, and I'd like to look into the possibilities here more closely.
The simplest possibility is to use **multiple signatures** on each SNIP, the way we do today for consensuses. These signatures should be made using medium-term Ed25519 keys from the authorities. At a cost of 64 bytes per signature, at 9 authorities, we would need 576 bytes for each SNIP. These signatures could be batch-verified to save time at the client side. Since generating a signature takes around 20 usec on my mediocre laptop, authorities should be able to generate this many signatures fairly easily.
Another possibility is to use a **threshold signature** on each SNIP, so that the authorities collaboratively generate a short signature that the clients can verify. There are multiple threshold signature schemes that we could consider here, though I haven't yet found one that looks perfect.
Another possibility is to use organize the SNIPs in a **merkle tree with a signed root**. For this design, clients could download the signed root periodically, and receive the hash-path from the signed root to the SNIP. This design might help with certificate-transparency-style designs, and it would be necessary if we ever want to move to a postquantum signature algorithm that requires large signatures.
Another possibility (due to a conversation among Chelsea Komlo, Sajin Sasy, and Ian Goldberg), is to *use SNARKs*. (Why not? All the cool kids are doing it!) For this, we'd have the clients download a signed hash of the ENDIVE periodically, and have the authorities generate a SNARK for each SNIP, proving its presence in that document.
3.3. How can we detect authority misbehavior?
We might want to take countermeasures against the possibility that a quorum of corrupt or compromised authorities give some relays a different set of SNIPs than they give other relays.
If we incorporate a merkle tree or a hash chain in the design, we can use mechanisms similar to certificate transparency to ensure that the authorities have a consistent log of all the entries that they have ever handed out.
3.4. How many types of weighted node selection are there, and how do we handle them?
Right now, there are multiple weights that we use in Tor: * Weight for exit * Weight for guard * Weight for middle node
We also filter nodes for several properties, such as flags they have.
To reproduce this behavior, we should enumerate the various weights and filters that we use, and (if there are not too many) create a separate index for each. For example, the Guard index would weight every node for selection as guard, assigning 0 weight to non-Guard nodes. The Exit index would weight every node for selection as an exit, assigning 0 weight to non-Exit nodes.
When choosing a relay, the client would have to specify which index to use. We could either have a separate (labeled) set of SNIPs entries for each index, or we could have each SNIP have a separate (labeled) index range for each index.
REGRESSION: the client's choice of which index to use would leak the next router's position and purpose in the circuit. This information is something that we believe relays can infer now, but it's not a desired feature that they can.
3.5. Does this design break onion service introduce handshakes?
In rend-spec-v3.txt section 3.3.2, we specify a variant of ntor for use in INTRODUCE2 handshakes. It allows the client to send encrypted data as part of its initial ntor handshake, but requires the client to know the onion service's onion key before it sends its initial handshake.
That won't be a problem for us here, though: we still require clients to fetch onion service descriptors before contacting a onion service.
3.6. How does the onion service directory work here?
The onion service directory is implemented as a hash ring, where each relay's position in the hash ring is decided by a hash of its identity, the current date, and a shared random value that the authorities compute each day.
To implement this hash ring using walking onions, we would need to have an extra index based not on bandwidth, but on position in the hash ring. Then onion services and clients could build a circuit, then extend it one more hop specifying their desired index in the hash ring.
We could either have a command to retrieve a trio of hashring-based routing entries by index, or to retrieve (or connect to?) the n'th item after a given hashring entry.
3.7. How can clients choose guard nodes?
We can reuse the fallback directories here. A newly bootstrapping client would connect to a fallback directory, then build a three-hop circuit, and finally extend the three-hop circuit by indexing to a random guard node. The random guard node's SNIP would contain the information that the client needs to build real circuits through that guard in the future. Because the client would be building a three-hop circuit, the fallback directory would not learn the client's guards.
(Note that even if the extend attempt fails, we should still pick the node as a possible guard based on its router entry, so that other nodes can't veto our choice of guards.)
3.8. Does the walking onions design preclude postquantum circuit handshakes?
Not at all! Both proposal 263 (ntru) and proposal 270 (newhope) work by having the client generate an ephemeral key as part of its initial handshake. The client does not need to know the relay's onion key to do this, so we can still integrate those proposals with this one.
3.9. Does the walking onions design stop us from changing the network topology?
For Tor to continue to scale, we will someday need to accept that not every relay can be simultaneously connected to every other relay. Therefore, we will need to move from our current clique topology assumption to some other topology.
There are also proposals to change node selection rules to generate routes providing better performance, or improved resistance to local adversaries.
We can, I think, implement this kind of proposal by changing the way that ENDIVEs are generated. Instead giving every relay the same ENDIVE, the authorities would generate different ENDIVEs for different relays, depending on the probability distribution of which relay should be chosen after which in the network topology. In the extreme case, this would produce O(n) ENDIVEs and O(n^2) SNIPs. In practice, I hope that we could do better by having the network topology be non-clique, and by having many relays share the same distribution of successors.
3.10. How can clients handle exit policies?
This is an unsolved challenge. If the client tells the middle relay its target port, it leaks information inappropriately.
One possibility is to try to gather exit policies into common categories, such as "most ports supported" and "most common ports supported".
Another (inefficient) possibility is for clients to keep trying exits until they find one that works.
Another (inefficient) possibility is to require that clients who use unusual ports fall back to the old mechanism for route selection.
3.11. Can this approach support families?
This is an unsolved challenge.
One (inefficient) possibility is for clients to generate circuits and discard those that use multiple relays in the same family.
One (not quite compatible) possibility is for the authorities to sort the ENDIVE so that relays in the same family are adjacent to one another. The index-bounds part of each SNIP would also have to include the bounds of the family. This approach is not quite compatible with the status quo, because it prevents relays from belonging to more than one family.
One interesting possibility (due to Chelsea Komlo, Sajin Sasy, and Ian Goldberg) is for the middle node to take responsibility for family enforcement. In this design, the client might offer the middle node multiple options for the next relay's index, and the middle node would choose the first such relay that is neither in its family nor its predecessor's family. We'd need to look for a way to make sure that the middle node wasn't biasing the path selection.
(TODO: come up with more ideas here.)
3.12. Can walking onions support IP-based and country-based restrictions?
This is an unsolved challenge.
If the user's restrictions do not exclude most paths, one (inefficient) possibility is for the user to generate paths until they generate one that they like. This idea becomes inefficient if the user is excluding most paths.
Another (inefficient and fingerprintable) possibility is to require that clients who use complex path restrictions fall back to the old mechanism for route selection.
(TODO: come up with better ideas here.)
3.13. What scaling problems have we not solved with this design?
The walking onions design doesn't solve (on its own) the problem that the authorities need to know about every relay, and arrange to have every relay tested.
The walking onions design doesn't solve (on its own) the problem that relays need to have a list of all the relays. (But see section 3.9 above.)
3.14. Should we still have clients download a consensus when they're using walking onions?
There are some fields in the current consensus directory documents that the clients will still need, like the list of supported protocols and network parameters. A client that uses walking onions should download a new flavor of consensus document that contains only these fields, and does not list any relays. In some signature schemes, this consensus would contain a digest of the ENDIVE -- see 3.2 above.
(Note that this document would be a "consensus document" but not a "consensus directory", since it doesn't list any relays.)
Putting it all together
[This is the section where, in a later version of this proposal, I would specify the exact behavior and data formats to be used here. Right now, I'd say we're too early in the design phase.]
A.1. Acknowledgments
Thanks to Peter Palfrader for his original design in proposal 141, and to the designers of PIR-Tor, both of which inspired aspects of this Walking Onions design.
Thanks to Chelsea Komlo, Sajin Sasy, and Ian Goldberg for feedback on an earlier version of this design.
Thanks to David Goulet, Teor, and George Kadianakis for commentary on earlier versions of this draft.
A.2. Additional ideas
Teor notes that there are ways to try to get this idea to apply to one-pass circuit construction, something like the old onion design. We might be able to derive indices and keys from the same seeds, even. I don't see a way to do this without losing forward secrecy, but it might be worth looking at harder. _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev