This thread continues the broader discussion of Tor Circuit path selection discussed at
https://lists.torproject.org/pipermail/tor-relays/2018-August/015994.html regarding possible correlation attacks by an autonomous system.
Current measures include:
* Preventing two relays from the same /16 in IPv4 and /32 in IPv6 networks, from being in the same Tor circuit. CIDR is helpful, but is it enough?
* The MyFamily directive, this does rely on relay operators being honest and we shouldn't rely on this as the sole indicator.
* Others things that I am not aware of?
That's a good summary, there are a few more details about Guards in:
Since 0.2.9, all relays in the consensus have the Running and Valid flags,
so those constraints are redundant:
A recent proposal talks about removing path restrictions altogether,
in favour of vanguards (layered guards):
Some measures worth considering include:
* Preventing two relays in the same ASN from being in a circuit.
* Maybe prevent two relays in the same ASN from being Guard and Exit, excluding the middle relay from this calculation.
ASN data needs to be authenticated and distributed to clients, or we introduce
a security dependency on an external data provider. Search the archives for
details.
* Bridges could be a challenge when implementing this, although it's not impossible.
Clients already impose the IPv4 /16 constraint on paths they build via bridges.
What's the challenge here?
* Looking at relays with same/similar names, heuristics maybe? It's really guesswork but hey it might work.
* Looking at relays with same/similar contact info
Changing path selection based on data that hasn't been verified introduces
additional attack vectors. I think there are previous posts on this topic, but
I'm not sure.
* Looking at relays in the same geographic regions and avoiding them
Country data is very poor quality: it's generally a mix of geographical location
of the actual relay, and legal location of the data centre company. It's also not
authenticated or validated, introducing a dependency on an external data
provider.
We need more research to determine what the threat models are, how country
data is useful, and what the default settings should be.
* Relays with the same non-standard ports - excluding 9001, 9030, 80, 443 (anything else that's super common?)
The assumption is that relays with the same ports might have the same operator.
ORPorts are verified by the directory authorities, so we could use them.
I think the logic should be:
If fewer than X% of relays have a port, consider all those relays to be in the same
family.
Here are the open questions:
What should X be? (Look at existing port distribution.)
Should we use number of relays, or consensus weight?
The next step is to write a proposal:
* On device models looking at the above data to make decisions of which relays are most likely run by the same entity, use machine learning to make an informed decision based on all factors maybe?
Machine learning is easily manipulated, hard to update, and hard to distribute.
Recent tor proposals:
Outside the scope:
* In AS-es where Virtual Machines are sold, and Physical Machines are not. It's quite possible that the provider may steal relay keys. Little research exists where you could successfully protect against such an adversary who isn't playing nice. Legislation (For example, GDPR) in the EU exists where such activity may violate local laws. This may or may not be enough. Certainly not against a government actor, but against an AS doing it per their only devices maybe.
Physical access to the device also provides the opportunity to access data
on the device. It is also out of scope.
* An AS hosting a Tor relay who logs or watches network traffic will always be able to learn something about the circuit, but perhaps we can prevent them from learning everything about the circuit most of the time.
Tor is designed so that that each relay learns some information, but no relay
learns all the information. That's why we have path constraints for similar relays:
Everyone on the list has a had very insightful and helpful thoughts on this discussion so far and I'm looking forward to getting more discussion of the broader issue.
T