Good evening,
A group of students at Johns Hopkins University and I have been analyzing the circuit selection algorithm for TOR´s browser EXIT nodes. It is an exploratory project trying to discover how do the EXIT nodes are selected every time the circuit changes.
So far, we have discovered that the time that the browser is accessed, calendar date, physical location, use of a bridge (or not), and the entry node do not change the pattern of the EXIT nodes. Moreover, than approximately 80% of all exit nodes come from 10 specific countries, even though these 10 countries only account for approximately 50% of all available exit nodes.
All of which has led us to conclude that the selection of the EXIT nodes in the TOR browser is not random. We would like to further explore however, what are the factors that determine which relay becomes the exit node every time a circuit is changed. Is there anyone that we could speak to or that could give us further insight as to how the selection of the exit node in TOR´s circuit works?
v/r
Hi,
Thanks for your interest in Tor's path selection algorithm.
Some of my colleagues are working on "vanguards", which significantly changes path selection. I think this is their latest proposal: https://gitweb.torproject.org/torspec.git/tree/proposals/292-mesh-vanguards....
I'll let them share any details they feel are helpful.
See also my specific answers inline below:
On 20 Feb 2020, at 06:20, Vianney Gomezgil Yaspik vgomezg1@jhu.edu wrote:
A group of students at Johns Hopkins University and I have been analyzing the circuit selection algorithm for TOR´s browser EXIT nodes. It is an exploratory project trying to discover how do the EXIT nodes are selected every time the circuit changes.
So far, we have discovered that the time that the browser is accessed, calendar date, physical location, use of a bridge (or not), and the entry node do not change the pattern of the EXIT nodes.
The set of Tor exits changes over time, so the calendar date will change Tor's path selection slightly.
Similarly, Tor clients try to avoid choosing paths that are within the same network, or all controlled by the same operator. So guard selection does have a slight impact on the chosen exit.
For more details, see tor's path spec, particularly the constraints section: https://gitweb.torproject.org/torspec.git/tree/path-spec.txt#n230
Moreover, than approximately 80% of all exit nodes come from 10 specific countries, even though these 10 countries only account for approximately 50% of all available exit nodes.
How are you counting exit nodes?
Tor uses the bandwidth weights in the consensus, to weight its random selection of exit nodes: https://gitweb.torproject.org/torspec.git/tree/path-spec.txt#n92
These weights are limited by: * any operator-configured bandwidth limits, and scaled using: * the relay's own observed bandwidth usage, and * the capacities measured by the 6 Tor bandwidth authorities.
All of which has led us to conclude that the selection of the EXIT nodes in the TOR browser is not random. We would like to further explore however, what are the factors that determine which relay becomes the exit node every time a circuit is changed. Is there anyone that we could speak to or that could give us further insight as to how the selection of the exit node in TOR´s circuit works?
Have you looked at the destination port? Tor tried to select exits that will allow the requested port.
Are you aware of preemptive circuits? https://gitweb.torproject.org/torspec.git/tree/path-spec.txt#n147
If you're mainly measuring preemptive circuits, you'll see fairly consistent behaviour. These circuits have fewer constraints, because they need to be suitable for general use.
That's probably enough to get you started, please let us know if you have more questions.
T