```
Filename: 321-happy-families.md
Title: Better performance and usability for the MyFamily option (v2)
Author: Nick Mathewson
Created: 27 May 2020
Status: Open
```
## Problem statement.
The current family mechanism allows well-behaved relays to
identify that they all belong to the same 'family', and should
not be used in the same circuits.
Right now, families work by having every family member list every
other family member in its server descriptor. This winds up using
O(n^2) space in microdescriptors and server descriptors. (For RAM,
we can de-duplicate families which sometimes helps.) Adding or
removing a server from the family requires all the other servers to
change their torrc settings.
The is growth in size is not just a theoretical problem. Family
declarations currently make up a little over 55% of the
microdescriptors in the directory--around 24% after compression.
The largest family has around 270 members. With Walking Onions, 270
members times a 160-bit hashed identifier leads to over 5 kilobytes
per SNIP, which is much greater than we'd want to use.
This is an updated version of proposal 242. It differs by clarifying
requirements and providing a more detailed migration plan.
## Design overview.
In this design, every family has a master ed25519 "family key". A node
is in the family iff its server descriptor includes a certificate of its
ed25519 identity key with the family key. The certificate
format the one in the tor-certs.txt spec; we would allocate a new
certificate type for this usage. These certificates would need to
include the signing key in the appropriate extension.
Note that because server descriptors are signed with the node's
ed25519 signing key, this creates a bidirectional relationship
between the two keys, so that nodes can't be put in families without
their consent.
## Changes to router descriptors
We add a new entry to server descriptors:
"family-cert" NL
"-----BEGIN FAMILY CERT-----" NL
cert
"-----END FAMILY CERT-----".
This entry contains a base64-encoded certificate as described
above. It may appear any number of times; authorities MAY reject
descriptors that include it more than three times.
## Changes to microdescriptors
We add a new entry to microdescriptors: `family-keys`.
This line contains one or more space-separated strings describing
families to which the node belongs. These strings MUST be sorted in
lexicographic order. Clients MUST NOT depend on any particular property
of these strings.
## Changes to voting algorithm
We allocate a new consensus method number for voting on these keys.
When generating microdescriptors using a suitable consensus method,
the authorities include a "family-keys" line if the underlying
server descriptor contains any valid family-cert lines. For each
valid family-cert in the server descriptor, they add a
base-64-encoded string of that family-cert's signing key.
> See also "deriving family lines from family-keys?" below for an
> interesting but more difficult extension mechanism that I would
> not recommend.
## Relay configuration
There are several ways that we could configure relays to let them
include family certificates in their descriptors.
The easiest would be putting the private family key on each relay,
so that the relays could generate their own certificates. This is
easy to configure, but slightly risky: if the private key is
compromised on any relay, anybody can claim membership in the
family. That isn't so very bad, however -- all the relays would
need to do in this event would be to move to a new private family
key.
A more orthodox method would be to keep the private key somewhere
offline, and using it to generate a certificate for each relay in
the family as needed. These certificates should be made with
long-enough lifetimes, and relays should warn when they are going to
expire soon.
## Changes to relay behavior
Each relay should track which other relays they have seen using the
same family-key as itself. When generating a router descriptor,
each relay should list all of these relays on the legacy 'family'
line. This keeps the "family" lines up-to-date with "family-keys"
lines for compliant relays.
Relays should continue listing relays in their family lines if they
have seen a relay with that identity using the same family-key at
any time in the last 7 days.
The presence of this line should be configured by a network
parameter, `derive-family-line`.
Relays whose family lines do not stay at least mostly in sync with
their family keys should be marked invalid by the authorities.
## Client behavior
Clients should treat node A and node B as belonging to the same
family if ANY of these is true:
* The client has descriptors for A and B, and A's descriptor lists B
in its family line, and B's descriptor lists A in its family line.
* Client A has descriptors for A and B, and they both contain the
same entry in their family-keys or family-cert.
## Migration
For some time, existing relays and clients will not support family
certificates. Because of this, we try to make sure above the
well-behaved relays will list the same entries in both places.
Once enough clients have migrated to using family
certificates, authorities SHOULD disable `derive-family-line`.
## Security
Listing families remains as voluntary in this design as in today's
Tor, though bad-relay hunters can continue to look for families that
have not adopted a family key.
A hostile relay family could list a "family" line that did not match
its "family-certs" values. However, the only reason to do so would
be in order to launch a client partitioning attack, which is
probably less valuable than the kinds of attacks that they could run
by simply not listing families at all.
## Appendix: deriving family lines from family-keys?
As an alternative, we might declare that _authorities_ should keep
family lines in sync with family-certs. Here is a design sketch of
how we might do that, but I don't think it's actually a good idea,
since it would require major changes to the data flow of the
voting system.
In this design, authorties would include a "family-keys" line in
each router section in their votes corresponding to a relay with any
family-cert. When generating final microdescriptors using this
method, the authorities would use these lines to add entries to the
microdescriptors' family lines:
1. For every relay appearing in a routerstatus's family-keys, the
relays calculate a consensus family-keys value by listing including
all those keys that are listed by a majority of those voters listing
the same router with the same descriptor. (This is the algorithm we
use for voting on other values derived from the descriptor.)
2. The authorities then compute a set of "expanded families": one
for each family key. Each "expanded family" is a set containing
every router in the consensus associated with that key in its consensus
family-keys value.
3. The authorities discard all "expanded families" of size 1 or
smaller.
4. Every router listed for the "expanded family" has every other
router added to the "family" line in its microdescriptor. (The
"family" line is then re-canonicalized according to the rules of
proposal 298 to remove its )
5. Note that the final microdescriptor consensus will include the
digest of the derived microdescriptor in step 4, rather than the
digest of the microdescriptor listed in the original votes. (This
calculation is deterministic.)
The problem with this approach is that authorities would have to s
to fetch microdescriptors they do not have in order to replace their
family lines. Currently, voting never requires an authority to
fetch a microdescriptor from another authority. If we implement
vote compression and diffs as in the Walking Onions proposal,
however, we might suppose that votes could include microdescriptors
directly.
Still, this is likely more complexity than we want for a transition
mechanism.
## Appendix: Deriving family-keys from families??
We might also imagine that authorities could infer which families
exist from the graph of family relationships, and then include
synthetic "family-keys" entries for routers that belong to the same
family.
This has two challenges: first, to compute these synthetic family
keys, the authorities would need to have the same graph of family
relationships to begin with, which once again would require them to
include the complete list of families in their votes.
Secondly, finding all the families is equivalent to finding all
maximal cliques in a graph. This problem is NP-hard in its general
case. Although polynomial solutions exist for nice well-behaved
graphs, we'd still need to worry about hostile relays including
strange family relationships in order to drive the algorithm into
its exponential cases.
## Appendix: New assigned values
We need a new assigned value for the certificate type used for
family signing keys.
We need a new consensus method for placing family-keys lines in
microdescriptors.
## Appendix: New network parameters
* `derive-family-line`: If 1, relays should derive family lines from
observed family-keys. If 0, they do not. Min: 0, Max: 1. Default: 1.