Filename: 242-better-families.txt Title: Better performance and usability for the MyFamily option. Author: Nick Mathewson Created: 2015-02-27 Status: Open
1. Problem statement.
The current family interface allows well-behaved relays to identify that they all belong to the same 'family', and should not be used in the same circuits.
Right now, this interface works by having every family member list every other family member in its server descriptor. This winds up using O(n^2) space in microdescriptors, server descriptors, and RAM. Adding or removing a server from the family requires all the other servers to change their torrc settings.
One proposal is to eliminate the use of the Family option entirely; see ticket #6676. But if we don't, let's come up with a way to make it better. (I'm writing this down mainly to get it out of my head.)
2. Design overview.
In this design, every family has a master ed25519 key. A node is in the family iff its server descriptor includes a certificate of its ed25519 identity key with the master ed25519 key. The certificate format is as in proposal 220 section 2.1.
Note that because server descriptors are signed with the node's ed25519 signing key, this creates a bidirectional relationship where nodes can't be put in families without their consent.
3. Changes to server descriptors
We add a new entry to server descriptors: "family-cert"
This line contains a base64-encoded certificate as described above. It may appear any number of times.
4. Changes to microdescriptors
We add a new entry to microdescriptors: "family-keys"
This line contains one or more space-separated strings describing families to which the node belongs. These strings MUST be between 1 and 64 characters long, and sorted in lexical order. Clients MUST NOT depend on any particular property of these strings.
5. Changes to voting algorithm
We allocate a new consensus method number for voting on these keys.
When generating microdescriptors using a suitable consensus method, the authorities include a "family-keys" line if the underlying server descriptor contains any family-cert lines. For reach family-cert in the server descriptor, they add a base-64-encoded string of that family-cert's signing key.
6. Client behavior
Clients should treat node A and node B as belonging to the same family if ANY of these is true:
* The client has server descriptors or microdescriptors for A and B, and A's descriptor lists B in its family line, and B's descriptor lists A in its family line.
* The client has a server descriptor for A and one for B, and they both contain valid family-cert lines whose certs are signed by the family key.
* The client has microdescriptors for A and B, and they both contain some string in common on their family-cert line.
7. Deprecating the old family lines.
Once all clients that support the old family line format are deprecated, servers can stop including family lines in their descriptors, and authorities can stop including them in their microdescriptors.
8. Open questions
The rules in section 6 above leave open the possibility of old clients and new clients reaching different decisions about who is in a family. We should evaluate this for anonymity implications.
It's possible that families are a bad idea entirely; see ticket #6676.
27.02.2015, 16:41 Nick Mathewson:
I had time to read it.
[cut] 2. Design overview.
In this design, every family has a master ed25519 key. A node is in the family iff its server descriptor includes a certificate of its ed25519 identity key with the master ed25519 key. The certificate format is as in proposal 220 section 2.1.
Assuming IFF is if and only if.
Note that because server descriptors are signed with the node's ed25519 signing key, this creates a bidirectional relationship where nodes can't be put in families without their consent.
Would be worse if the situation is not on the same level as before.
[cut] 5. Changes to voting algorithm
We allocate a new consensus method number for voting on these keys.
When generating microdescriptors using a suitable consensus method, the authorities include a "family-keys" line if the underlying server descriptor contains any family-cert lines. For reach family-cert in the server descriptor, they add a base-64-encoded string of that family-cert's signing key.
s/For reach/For each/
Open questions
The rules in section 6 above leave open the possibility of old clients and new clients reaching different decisions about who is in a family. We should evaluate this for anonymity implications.
It's possible that families are a bad idea entirely; see ticket #6676.
I had trouble seeing family being good in a 6 relay Tor network run by two different entities (three relays each, one honest, one malicious), because the family setting would make one use two malicious relays every time. The question was how much does this change when the amount of entities and the amount of relays increases.
BTW: With some ill-intent the quote "It's possible that families are a bad idea entirely[.]" can be taken out of context by journalists.
Best Regards, Sebastian G.
Hi,
If I understand the factors, as things stand currently, regarding family use with respect to the *security* of Tor.
Pros 1 - Prevents information disclosure in case of using related relay too much (relay configuration or seizure of hardware).
Cons 2 - It's not used by operators with malicious intent. 3 - Reduces diversity in choosing non-malicious relay assuming all relay in a family have similar performance/bandwidth. If the metrics vary widely in the first place there's the chance it won't matter. 4 - Allows use of nickname and fingerprint. 5 - Can be used arbitrarily by unrelated nodes to influence path selection. 6 - Clients already disobey family under non-deterministic circumstance (not reliably reproduced but have measured)
The proposed changes, in absence of any errata, are an improvement for enforcing a bidirectional relationship. For this reason it mitigates (4), and (5). If arbitrary nodes cannot simply join a family it also has less of an impact on (3) than when the tickets were originally filed.
Towards mitigating (3) it might be worth considering the AS of the related relays. I know this increases computation cost so it's more of a thought than even a suggestion. If the AS differs across related relay you might consider this (honest?) operator a safe choice for not setting the family. That is that the family might be better based on similarity of AS, supposing that this would make it easier to compromise usage data. Another thought is to consider families as a single node for the purpose of computing network diversity. If a situation occurs where diversity is low it would be useful to not consider families or to reconsider the so-called safe-families. You might call this a discussion towards relaxing family definition (slightly) in favor of increasing diversity, but staying within the changes of Proposal 242. --leeroy