Cecylia, Arlo, Serene, Shelikhoo, and I are writing a research paper about Snowflake. Here is a draft: https://www.bamsoftware.com/papers/snowflake/snowflake.20231003.e6e1c30d.pdf
We're writing to check a factual claim in the section about having multiple backend bridges. Basically, we wanted it to be possible for there to be multiple Snowflake bridge sites run by different groups of people, and we did not want to share the same relay identity keys across all bridge sites, because of the increased risk of the keys being exposed. Therefore every bridge site has its own relay identity, which requires the client to know the relay fingerprints in advance and that it be the client (and not, e.g., the broker) that decides which bridge to use.
1. Is our general description (quoted below) of the design constraints as they bear on Tor correct? 2. Is §4.2 "CERTS cells" the right part of tor-spec to cite to make our point? https://gitlab.torproject.org/tpo/core/torspec/-/blob/b345ca044131b2eb18e6ae...
https://github.com/turfed/snowflake-paper/blob/e6e1c30dde6716dc5e54a32f2134f... A Tor bridge is identified by a long-term identity public key. If, on connecting to a bridge, the client finds that the bridge's identity is not the expected one, the client will terminate the connection \cite[\S 4.2]{tor-spec}. The Tor client can configure at most one identity per bridge; there is no way to indicate (with a certificate, for example) that multiple identities should be considered equivalent. This constraint leaves two options: either all Snowflake bridges must share the same cryptographic identity, or else it must be the client that makes the choice of what bridge to use. While the former option is possible to do (by synchronizing identity keys across servers), every added bridge would increase the risk of compromising the all-important identity keys. Our vision was that different bridge sites would run in different locations with their own management teams, and that any compromise of a bridge site should affect that site only.
In my own experiments, providing an incorrect relay fingerprint leads to errors in connection_or_client_learned_peer_id: https://gitlab.torproject.org/tpo/core/tor/-/blob/tor-0.4.7.13/src/core/or/c... [warn] Tried connecting to router at 192.0.2.3:80 ID=<none> RSA_ID=2B280B23E1107BB62ABFC40DDCC8824814F80A71, but RSA + ed25519 identity keys were not as expected: wanted 1111111111111111111111111111111111111111 + no ed25519 key but got 2B280B23E1107BB62ABFC40DDCC8824814F80A72 + 1zOHpg+FxqQfi/6jDLtCpHHqBTH8gjYmCKXkus1D5Ko. [warn] Problem bootstrapping. Stuck at 14% (handshake): Handshaking with a relay. (Unexpected identity in router certificate; IDENTITY; count 1; recommendation warn; host 1111111111111111111111111111111111111111 at 192.0.2.3:80)
The Snowflake paper has been conditionally accepted to Usenix Security and we are now working on final revisions. As before, no response is necessary, but if you have any comments, we can try to take them into account up until about 2024-02-26. This is a current snapshot: https://www.bamsoftware.com/papers/snowflake/snowflake.20240210.7181a5cd.pdf
If possible, we'd still like confirmation of (1) whether this is a good characterization of the constraints involved when using a Tor bridge, and (2) if 4.2 is the right part of tor-spec to cite for clients disconnecting on an unexpected relay fingerprint.
https://github.com/turfed/snowflake-paper/blob/7181a5cdfe1e07cfb4ea6bc15c07d... A Tor bridge is identified by a long-term identity public key. If, on connecting to a bridge, the client finds that the bridge's identity is not the expected one, the client will terminate the connection \cite[\S 4.2]{tor-spec}. The Tor client can configure at most one identity per bridge; there is no way to indicate (with a certificate, for example) that multiple identities should be considered equivalent. This constraint leaves two options: either all Snowflake bridges must share the same cryptographic identity, or else it must be the client that makes the choice of what bridge to use. While the former option is possible to do (by synchronizing identity keys across servers), every added bridge would increase the risk of compromising the all-important identity keys. Our vision was that different bridge sites would run in different locations with their own management teams, and that any compromise of a bridge site should affect that site only
On Tue, Oct 03, 2023 at 08:44:39PM -0400, David Fifield wrote:
Cecylia, Arlo, Serene, Shelikhoo, and I are writing a research paper about Snowflake. Here is a draft: https://www.bamsoftware.com/papers/snowflake/snowflake.20231003.e6e1c30d.pdf
We're writing to check a factual claim in the section about having multiple backend bridges. Basically, we wanted it to be possible for there to be multiple Snowflake bridge sites run by different groups of people, and we did not want to share the same relay identity keys across all bridge sites, because of the increased risk of the keys being exposed. Therefore every bridge site has its own relay identity, which requires the client to know the relay fingerprints in advance and that it be the client (and not, e.g., the broker) that decides which bridge to use.
- Is our general description (quoted below) of the design constraints as they bear on Tor correct?
- Is §4.2 "CERTS cells" the right part of tor-spec to cite to make our point? https://gitlab.torproject.org/tpo/core/torspec/-/blob/b345ca044131b2eb18e6ae...
https://github.com/turfed/snowflake-paper/blob/e6e1c30dde6716dc5e54a32f2134f... A Tor bridge is identified by a long-term identity public key. If, on connecting to a bridge, the client finds that the bridge's identity is not the expected one, the client will terminate the connection \cite[\S 4.2]{tor-spec}. The Tor client can configure at most one identity per bridge; there is no way to indicate (with a certificate, for example) that multiple identities should be considered equivalent. This constraint leaves two options: either all Snowflake bridges must share the same cryptographic identity, or else it must be the client that makes the choice of what bridge to use. While the former option is possible to do (by synchronizing identity keys across servers), every added bridge would increase the risk of compromising the all-important identity keys. Our vision was that different bridge sites would run in different locations with their own management teams, and that any compromise of a bridge site should affect that site only.
In my own experiments, providing an incorrect relay fingerprint leads to errors in connection_or_client_learned_peer_id: https://gitlab.torproject.org/tpo/core/tor/-/blob/tor-0.4.7.13/src/core/or/c... [warn] Tried connecting to router at 192.0.2.3:80 ID=<none> RSA_ID=2B280B23E1107BB62ABFC40DDCC8824814F80A71, but RSA + ed25519 identity keys were not as expected: wanted 1111111111111111111111111111111111111111 + no ed25519 key but got 2B280B23E1107BB62ABFC40DDCC8824814F80A72 + 1zOHpg+FxqQfi/6jDLtCpHHqBTH8gjYmCKXkus1D5Ko. [warn] Problem bootstrapping. Stuck at 14% (handshake): Handshaking with a relay. (Unexpected identity in router certificate; IDENTITY; count 1; recommendation warn; host 1111111111111111111111111111111111111111 at 192.0.2.3:80)
On Sat, Feb 10, 2024 at 12:15:05AM -0700, David Fifield wrote:
The Snowflake paper has been conditionally accepted to Usenix Security and we are now working on final revisions.
Congrats! This is great!
If possible, we'd still like confirmation of (1) whether this is a good characterization of the constraints involved when using a Tor bridge, and (2) if 4.2 is the right part of tor-spec to cite for clients disconnecting on an unexpected relay fingerprint.
https://github.com/turfed/snowflake-paper/blob/7181a5cdfe1e07cfb4ea6bc15c07d... A Tor bridge is identified by a long-term identity public key. If, on connecting to a bridge, the client finds that the bridge's identity is not the expected one, the client will terminate the connection \cite[\S 4.2]{tor-spec}. The Tor client can configure at most one identity per bridge; there is no way to indicate (with a certificate, for example) that multiple identities should be considered equivalent. This constraint leaves two options: either all Snowflake bridges must share the same cryptographic identity, or else it must be the client that makes the choice of what bridge to use. While the former option is possible to do (by synchronizing identity keys across servers), every added bridge would increase the risk of compromising the all-important identity keys. Our vision was that different bridge sites would run in different locations with their own management teams, and that any compromise of a bridge site should affect that site only
This all sounds reasonable to me.
The phrase "all-important identity keys" made me pause, because in the obfs4 bridgedb case, it doesn't matter so much that the client checks the identity key, (a) because it's only one hop in the circuit, and the client will still check the identities for the other relays in the circuit, and (b) because it isn't all that meaningful to verify that it really is the stranger we picked for you at random -- so long as whoever it is extends your circuit to the relay you picked, and you verify that, you're getting most of what you can get.
Whereas for the Snowflake case, checking the identity key on the bridge is much more meaningful, first because we do actually know the bridge operator and we want to make sure you reached the real one, but more importantly because the Snowflake architecture puts the 'random stranger' in exactly the position to send you somewhere else if it wants.
So yes, sounds good.
As for which part of tor-spec to cite, note that Nick and others did a big reorg of tor-spec some months ago. It looks like 4.2 is from what used to be path-spec.txt, which is about how we choose paths. If I were to pick a piece of tor-spec to show that Tor clients decline to continue if the first hop they've picked can't prove it knows its identity key, I would pick section 2.3.1, https://spec.torproject.org/tor-spec/negotiating-channels.html#negotiating and the action is in the CERTS and AUTHENTICATE cells.
(For extra fun, I don't think anything promises that "2.3.1" will still be the number of this section in the future.)
--Roger
On Thu, Feb 15, 2024 at 10:54:22AM -0500, Roger Dingledine wrote:
If possible, we'd still like confirmation of (1) whether this is a good characterization of the constraints involved when using a Tor bridge, and (2) if 4.2 is the right part of tor-spec to cite for clients disconnecting on an unexpected relay fingerprint.
https://github.com/turfed/snowflake-paper/blob/7181a5cdfe1e07cfb4ea6bc15c07d... A Tor bridge is identified by a long-term identity public key. If, on connecting to a bridge, the client finds that the bridge's identity is not the expected one, the client will terminate the connection \cite[\S 4.2]{tor-spec}. The Tor client can configure at most one identity per bridge; there is no way to indicate (with a certificate, for example) that multiple identities should be considered equivalent. This constraint leaves two options: either all Snowflake bridges must share the same cryptographic identity, or else it must be the client that makes the choice of what bridge to use. While the former option is possible to do (by synchronizing identity keys across servers), every added bridge would increase the risk of compromising the all-important identity keys. Our vision was that different bridge sites would run in different locations with their own management teams, and that any compromise of a bridge site should affect that site only
This all sounds reasonable to me.
The phrase "all-important identity keys" made me pause, because in the obfs4 bridgedb case, it doesn't matter so much that the client checks the identity key, (a) because it's only one hop in the circuit, and the client will still check the identities for the other relays in the circuit, and (b) because it isn't all that meaningful to verify that it really is the stranger we picked for you at random -- so long as whoever it is extends your circuit to the relay you picked, and you verify that, you're getting most of what you can get.
Whereas for the Snowflake case, checking the identity key on the bridge is much more meaningful, first because we do actually know the bridge operator and we want to make sure you reached the real one, but more importantly because the Snowflake architecture puts the 'random stranger' in exactly the position to send you somewhere else if it wants.
So yes, sounds good.
Thanks for checking.
As for which part of tor-spec to cite, note that Nick and others did a big reorg of tor-spec some months ago. It looks like 4.2 is from what used to be path-spec.txt, which is about how we choose paths. If I were to pick a piece of tor-spec to show that Tor clients decline to continue if the first hop they've picked can't prove it knows its identity key, I would pick section 2.3.1, https://spec.torproject.org/tor-spec/negotiating-channels.html#negotiating and the action is in the CERTS and AUTHENTICATE cells.
(For extra fun, I don't think anything promises that "2.3.1" will still be the number of this section in the future.)
Okay, that makes sense. What's now section 2.3.1 is the same "Negotiating and initializing connections/channels" as what was section 4 in the txt version we were referring to initially. So it looks like we did have the right part of the spec. https://gitlab.torproject.org/tpo/core/torspec/-/blob/29e445bd6e9efe82367b8a... https://gitlab.torproject.org/tpo/core/torspec/-/blob/33308845cec54bfc0096b8...
The new mdbook style makes it a little harder to refer to a specific section. Since this is the only reference to tor-spec we have, I guess what we'll do is change the bib entry to refer to the .md file of just this section, with a GitLab permalink.
Is `author = {Roger Dingledine and Nick Mathewson}` still appropriate? That was the authorship on tor-spec.txt, but the new .md doesn't have it. Would `author = {{The Tor Project}}` be better?