Hi, Scott!
You're
right that having your relay in a family means that it is less likely
to be chosen, on the whole. The reason that an attacker would include
their relay in a family is in order to increase the odds that, *when*
they are chosen, they can observe the path. As an attacker, you
wouldn't put all your relays in a given family: you'd put them in
different families.
As a simplified example,
suppose that all relays have equal bandwidth=1. Suppose that there are N
relays in the network and the attacker controls 2 of them.
If
the attacker does not claim membership in any family, then the
probability of them seeing the first and last hop of a random circuit is
`(2/N) * (1/(N-1))`. That is, one of their relays is selected for the
first hop with probability 2/N, and their other one is selected with
probability `1/(N-1)`.
Now suppose that one of
their relays claims membership in a family with F honest members, and
the other claims membership in a different family with G honest
members. Now the probability that they will be the first and last hop
on a random circuit becomes:
`(1/N) * (1/(N-1-F)) + (1/N) * (1/(N-1-G))`
In
other words, whenever a client picks one of the attacker's relays as a
first hop, a whole family's worth of relays will be excluded when the
client is choosing the last hop, which will in turn improve the
attacker's odds of getting both positions.
(Things
would get even worse if the attacker could _define_ families or join
multiple families. Suppose that one of the attacker's nodes declares
family membership with every relay in the network except for one other
attacker-controlled node. Then, whenever that first node was chosen,
the attacker would be certain to have its other one chosen as the exit.)
Now
I realize that this attack is somewhat self-limiting, since it is less
helpful the larger the attacker becomes. Still, because of this attack
(and in case there are even better ones) it seems best to authenticate
family membership.