Hi,

Thanks for your interest in Tor's path selection algorithm.

Some of my colleagues are working on "vanguards", which
significantly changes path selection. I think this is their
latest proposal:
https://gitweb.torproject.org/torspec.git/tree/proposals/292-mesh-vanguards.txt

I'll let them share any details they feel are helpful.

See also my specific answers inline below:

On 20 Feb 2020, at 06:20, Vianney Gomezgil Yaspik <vgomezg1@jhu.edu> wrote:

A group of students at Johns Hopkins University and I have been analyzing the circuit selection algorithm for TOR´s browser EXIT nodes. It is an exploratory project trying to discover how do the EXIT nodes are selected every time the circuit changes. 

So far, we have discovered that the time that the browser is accessed, calendar date,
physical location, use of a bridge (or not), and the entry node do not change the pattern of the EXIT nodes.

The set of Tor exits changes over time, so the calendar
date will change Tor's path selection slightly.

Similarly, Tor clients try to avoid choosing paths that
are within the same network, or all controlled by the
same operator. So guard selection does have a slight
impact on the chosen exit.

For more details, see tor's path spec, particularly the
constraints section:
https://gitweb.torproject.org/torspec.git/tree/path-spec.txt#n230

Moreover, than approximately 80% of all exit nodes come from 10 specific countries, even though these 10 countries only account for approximately 50% of all available exit nodes. 

How are you counting exit nodes?

Tor uses the bandwidth weights in the consensus, to
weight its random selection of exit nodes:
https://gitweb.torproject.org/torspec.git/tree/path-spec.txt#n92

These weights are limited by:
* any operator-configured bandwidth limits,
and scaled using:
* the relay's own observed bandwidth usage, and
* the capacities measured by the 6 Tor bandwidth
  authorities.

All of which has led us to conclude that the selection of the EXIT nodes in the TOR browser is not random. We would like to further explore however, what are the factors that determine which relay becomes the exit node every time a circuit is changed. Is there anyone that we could speak to or that could give us further insight as to how the selection of the exit node in TOR´s circuit works?

Have you looked at the destination port?
Tor tried to select exits that will allow the requested
port.

Are you aware of preemptive circuits?
https://gitweb.torproject.org/torspec.git/tree/path-spec.txt#n147

If you're mainly measuring preemptive circuits, you'll
see fairly consistent behaviour. These circuits have
fewer constraints, because they need to be suitable
for general use.

That's probably enough to get you started, please
let us know if you have more questions.

T