Design for an exit relay scanner: feedback appreciated

List overview All Threads
Download

newer

older

GSoC 2014

Orbot+Orchid redesign

Philipp Winter

9 Oct 2013 9 Oct '13

9:44 p.m.

I am working on a Python-based exit relay scanner which should detect malicious and misbehaving exits. The design should have a reasonable balance between being fast/parallel and stressing the network as little as possible.

I came up with the following three steps:

1. Spawn a "parent" Tor process to get an up-to-date consensus.

2.1 For every selected exit relay, spawn a lightweight Tor process.

2.2 The consensus is copied from the "parent" process to the lightweight process' data directory. That way, the consensus has to be downloaded only once.

2.3 Every lightweight Tor process has the following configuration:

--- SOCKSPort auto ControlPort 0 __DisablePredictedCircuits 1 UseEntryGuards 0 FetchServerDescriptors 0 DataDirectory <data_directory> ExitNodes <exit_relay> ---

Entry guards are not used to distribute the load. Predicted circuits are disabled to prevent expensive creation of circuits which would not be used anyway. In addition, I am considering adding "EntryNodes" or "Bridge" to concentrate the first hop's load on machines under my control.

3. torsocks is then used to establish decoy connections over the respective exit relay. After that, the process is terminated.

Any thoughts on how to further improve the design or ideas for a better one?

Cheers, Philipp

Show replies by date

Aaron

10 Oct 10 Oct

7:23 a.m.

On Wed, Oct 9, 2013 at 9:44 PM, Philipp Winter identity.function@gmail.comwrote:

...

I am working on a Python-based exit relay scanner which should detect malicious and misbehaving exits. The design should have a reasonable balance between being fast/parallel and stressing the network as little as possible.

I came up with the following three steps:

Spawn a "parent" Tor process to get an up-to-date consensus.

2.1 For every selected exit relay, spawn a lightweight Tor process.

2.2 The consensus is copied from the "parent" process to the lightweight process' data directory. That way, the consensus has to be downloaded only once.

2.3 Every lightweight Tor process has the following configuration:
---
SOCKSPort auto
ControlPort 0
__DisablePredictedCircuits 1
UseEntryGuards 0
FetchServerDescriptors 0
DataDirectory <data_directory>
ExitNodes <exit_relay>
---

Entry guards are not used to distribute the load.  Predicted circuits
are disabled to prevent expensive creation of circuits which would not be used anyway. In addition, I am considering adding "EntryNodes" or "Bridge" to concentrate the first hop's load on machines under my control.

torsocks is then used to establish decoy connections over the

respective exit relay. After that, the process is terminated.

Any thoughts on how to further improve the design or ideas for a better one?

Hi Phillip,

I'm excited to hear you're interested in working on an exit scanner.

I have some thoughts regarding the design that might interest you.

I have been working on adding a "Tor Network Test Template" to ooni-probe; the basic concept is to extend the Tor controller library we use (txtorcon) to be able to build and attach circuits to specific streams, and iterate over the exits in the consensus. That is, we'll provide primitives that will allow you to specify a network interference test and tell ooni to run that test against every exit we know about (or a subset, a specific exit, or what have you). ooni-probe features a concurrency-based scheduler, so as to limit the impact on the network, and we've designed it to support plug-in rate limiting hooks if you want to do something fancier.

My intent is to provide Tor network tests as part of ooni-probe's standard set of tests, so that many individuals will measure the Tor network and automatically publish their results, and so that current and future network interference tests can be easily adapted to running on the Tor network. Future ideas include adding signing support to ooni reports so that reliable reporters can build trust, and automatically parsing submitted reports to generate BadExit after a threshold of reporters is reached.

Hope this interests you,

--Aaron

Cheers,

...

Philipp _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

meejah

8:50 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Aaron aagbsn@extc.org writes:

...

I have been working on adding a "Tor Network Test Template" to ooni-probe; the basic concept is to extend the Tor controller library we use (txtorcon) to be able to build and attach circuits to specific streams, and iterate over the exits in the consensus.

I have an old branch called exit_scanner in txtorcon's repository that was/is a fairly bare-bones scanner. It used a single Tor slave. It's a bit old by now and should probably at least be re-based to master, but if any of it looks promising, it might be a starting point.

https://github.com/meejah/txtorcon/compare/exit_scanner

- -- meejah

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iQEcBAEBAgAGBQJSVmpbAAoJEJ0UOBRRgEVQADcH/i/zaqzOqvuuaRD1dB6uP1Wz K84MWNtDy6F6wRfE75v4mXOi2Y05tJZq7KbrxYoSdqKaB84w9r/fQLTZhYORlMKI Hbe+X06gfnO/sRWdWzh2Dgwz5UxX//67A2Djzb4ZSVOLl9jyeOyFqh9c+KYA/72S DIqADNSD2/JVTVGbGSgGBPJ8fqyk3biMY36bEGshFawvyC0/i2L+nt98w8sQMw3E 1RJDqMhbEqeZC6L3uCgcSjH9k17uFC7BGmfqb4IJnZ13rHmHyRLC321nF9VU9Ok1 wpbJcCD0QI2JAk6LTkSJbk3297U1uGgHpCbLChiOM3UclamPLdg8pzScX9EfUFo= =AcJI -----END PGP SIGNATURE-----

Philipp Winter

10 a.m.

On Thu, Oct 10, 2013 at 12:50:32PM +0400, meejah wrote:

...

...
I have been working on adding a "Tor Network Test Template" to ooni-probe; the basic concept is to extend the Tor controller library we use (txtorcon) to be able to build and attach circuits to specific streams, and iterate over the exits in the consensus.

I have an old branch called exit_scanner in txtorcon's repository that was/is a fairly bare-bones scanner. It used a single Tor slave. It's a bit old by now and should probably at least be re-based to master, but if any of it looks promising, it might be a starting point.

https://github.com/meejah/txtorcon/compare/exit_scanner

Thanks for the hint; I will have a look. In fact, I already have the code. If I could come up with a good name for it, I would even publish it ;)

Cheers, Philipp

Philipp Winter

9:57 a.m.

On Thu, Oct 10, 2013 at 07:23:11AM +0000, Aaron wrote:

...

I have been working on adding a "Tor Network Test Template" to ooni-probe; the basic concept is to extend the Tor controller library we use (txtorcon) to be able to build and attach circuits to specific streams, and iterate over the exits in the consensus. That is, we'll provide primitives that will allow you to specify a network interference test and tell ooni to run that test against every exit we know about (or a subset, a specific exit, or what have you).

I have a very similar goal. However, instead of extending my controller (I use stem), I spawn parallel Tor processes out of a process pool (based on Python's 'concurrent' module). I assume, your scanning would be sequential?

...

ooni-probe features a concurrency-based scheduler, so as to limit the impact on the network, and we've designed it to support plug-in rate limiting hooks if you want to do something fancier.

Rate limiting certainly sounds fancier than what I had in mind :)

Cheers, Philipp

Aaron

2:47 p.m.

On Thu, Oct 10, 2013 at 9:57 AM, Philipp Winter <identity.function@gmail.com

...

wrote:

...

On Thu, Oct 10, 2013 at 07:23:11AM +0000, Aaron wrote:

...
I have been working on adding a "Tor Network Test Template" to

ooni-probe;

...
the basic concept is to extend the Tor controller library we use

(txtorcon)

...
to be able to build and attach circuits to specific streams, and iterate

over

...
the exits in the consensus. That is, we'll provide primitives that will

allow

...
you to specify a network interference test and tell ooni to run that test against every exit we know about (or a subset, a specific exit, or what

have

...
you).

I have a very similar goal. However, instead of extending my controller (I use stem), I spawn parallel Tor processes out of a process pool (based on Python's 'concurrent' module). I assume, your scanning would be sequential?

ooni is built with Twisted (http://twistedmatrix.com/), which is a python based asynchronous event-driven framework, similar in concept to libevent. Our scheduler caps the number of 'in flight' measurements to whatever you specify in the ooniprobe.conf 'concurrency' options.

...

...
ooni-probe features a concurrency-based scheduler, so as to limit the

impact

...
on the network, and we've designed it to support plug-in rate limiting

hooks

...
if you want to do something fancier.

Rate limiting certainly sounds fancier than what I had in mind :)

We don't do anything that fancy yet, just left the stubs in place for future use as we thought they might come in handy.

...

Cheers, Philipp _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

meejah

14 Oct 14 Oct

5:03 p.m.

Philipp Winter identity.function@gmail.com writes:

...

I assume, your scanning would be sequential?

There are many circuits in-flight in the single Tor instance; many outstanding requests are possible.

-- meejah

Aaron

22 Oct 22 Oct

8:41 a.m.

On Mon, Oct 14, 2013 at 5:03 PM, meejah meejah@meejah.ca wrote:

...

Philipp Winter identity.function@gmail.com writes:

...
I assume, your scanning would be sequential?

There are many circuits in-flight in the single Tor instance; many outstanding requests are possible.

-- meejah _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

In case anyone is interested, I have a branch over here: https://github.com/TheTorProject/ooni-probe/tree/feature/tor_test_template that provides a basic example of a Tor network test in ooni.

There's plenty of room for improvement, comments and patches are very welcome.

--Aaron

Philipp Winter

25 Nov 25 Nov

11:09 p.m.

On Thu, Oct 10, 2013 at 07:23:11AM +0000, Aaron wrote:

...

I have been working on adding a "Tor Network Test Template" to ooni-probe; the basic concept is to extend the Tor controller library we use (txtorcon) to be able to build and attach circuits to specific streams, and iterate over the exits in the consensus.

I now have similar code which is based on stem: https://github.com/NullHypothesis/exitmap

However, the problem with a parallel single-Tor-process design is that there is no easy way for scanning modules to figure out which exit relay they were attached to. The Tor controller just sees a bunch of incoming streams but once, one of these streams spots something fishy, it is difficult to figure out which exit relay is to blame.

Cheers, Philipp

Lunar

26 Nov 26 Nov

2:21 p.m.

Philipp Winter:

...

I now have similar code which is based on stem: https://github.com/NullHypothesis/exitmap

However, the problem with a parallel single-Tor-process design is that there is no easy way for scanning modules to figure out which exit relay they were attached to. The Tor controller just sees a bunch of incoming streams but once, one of these streams spots something fishy, it is difficult to figure out which exit relay is to blame.

I am unsure about the meaning of the `cmd` parameter in the probe method of your modules. But, can't you pass the circuit ID (or an opaque wrapper object, actually) to the scanning module, and have the scanning module pass that to the stream attacher when it has a stream ready?

(Feel free to ignore me if that sounds like non-sense. I only had a quick look at the code.)

-- Lunar lunar@torproject.org

Philipp Winter

9:47 p.m.

On Tue, Nov 26, 2013 at 03:21:04PM +0100, Lunar wrote:

...

Philipp Winter:

...
I now have similar code which is based on stem: https://github.com/NullHypothesis/exitmap

However, the problem with a parallel single-Tor-process design is that there is no easy way for scanning modules to figure out which exit relay they were attached to. The Tor controller just sees a bunch of incoming streams but once, one of these streams spots something fishy, it is difficult to figure out which exit relay is to blame.

I am unsure about the meaning of the `cmd` parameter in the probe method of your modules. But, can't you pass the circuit ID (or an opaque wrapper object, actually) to the scanning module, and have the scanning module pass that to the stream attacher when it has a stream ready?

"cmd" abstracts away torsocks and handling standlone processes. That's useful for modules which don't make use of Python libraries.

And yes, your idea should work in theory but Python's concurrent.futures doesn't make it straightforward at this point. I'll look into it. Thanks for having a look at the code.

Cheers, Philipp

Damian Johnson

2 Dec 2 Dec

1:09 a.m.

...

I now have similar code which is based on stem: https://github.com/NullHypothesis/exitmap

Hi Philipp, sorry about the long delay before responding. ExitMap looks great!

You might want to look into PEP8 [1], Python's de-facto style guide. It's certainly up to you which bits you do/don't like, but coming close will make your code more uniform with the rest of the Python world. PyPI has a slick pep8 script you can run over your codebase [2]. Personally I run this as part of the tests for Stem.

Would you be amenable to changes in that regard from me? This looks like a fun project, so I'm a bit tempted to sink a weekend or two into into seeing if I can simplify the codebase.

Some general code review feedback follows...

======================================== exitmap / circuit.py ========================================

...

new = Circuit

Not especially pythonic, but works.

======================================== exitmap / circuitpool.py ========================================

...

while (circuitID is None):

Parentheses aren't necessary.

...

if len(self.exitRelays) == 0:

Can also be 'if not self.exitRelays:'.

...

try: circuitID = self.ctrl.new_circuit([const.FIRST_HOP, exitRelay], await_build=awaitBuild) except (stem.InvalidRequest, stem.CircuitExtensionFailed, stem.ControllerError) as error:

Stem's InvalidRequest and CircuitExtensionFailed extend ControllerError, so this can be simplified to...

except stem.ControllerError as error:

Stem's exception hierarchy can be found on...

https://stem.torproject.org/api/control.html#exceptions-and-attribute-enums

Your _addCircuit() is soley used by _fillPool(), so personally I'd probably do this as a generator...

def _generate_circuit(self, await_build): """ Pops the top relay off our exitRelays and generates a circuit through it, going on to the next entry if it fails.

:param bool await_build: block until the circuit is created if **True**

:returns: :class:`~circuit.Circuit` for the circuit we establish """

while self.exitRelays: exit_fingerprint = self.exitRelays.pop(0)

logger.debug("Attempting to create circuit with '%s' as exit " \ "relay." % exit_fingerprint)

try: circuit_id = self.ctrl.new_circuit( [const.FIRST_HOP, exit_fingerprint], await_build = await_build, )

logger.debug("Created circuit #%s with '%s' as exit relay." % (circuit_id, exit_fingerprint))

yield circuit.Circuit(circuit_id) except stem.ControllerError as exc: logger.warning("Could not establish circuit with '%s'. " \ "Skipping to next exit (error=%s)." % (exit_fingerprint, exc))

logger.warning("No more exit relay fingerprints to create circuits with.")

def _fill_pool(self, await_build = False): if len(self.pool) == const.CIRCUIT_POOL_SIZE: return

logger.debug("Attempting to refill the circuit pool to size %d." % const.CIRCUIT_POOL_SIZE)

while self.exitRelays and len(self.pool) != const.CIRCUIT_POOL_SIZE: self.pool.append(self._generate_circuit(awaitBuild))

...

# go over circuit pool once poolLen = len(self.pool) for idx in xrange(poolLen):
if idx >= len(self.pool):
    return None

logger.debug("idx=%d, poolsize=%d" % (idx, len(self.pool)))

circuit = self.pool[idx] # that's a Circuit() object

Honestly I'm finding this class to be overcomplicated. Popping and appending items between lists is making this more confusing than it needs to be.

======================================== exitmap / command.py ========================================

Stem offers a friendlier method of calling commands. Mostly I intended it for the system module's functions, but you might find it useful here too...

https://stem.torproject.org/api/util/system.html#stem.util.system.call

======================================== exitmap / exitselector.py ========================================

...

for desc in stem.descriptor.parse_file(open(consensus)):

Why read the consensus file directly? If you have a controller then getting it via tor would be the best option. If not then fetching this directly via the authorities is generally the easiest...

https://stem.torproject.org/api/descriptor/remote.html

...

if not "Exit" in desc.flags: continue

Perfectly fine, though stem does offer enums...

if stem.Flag.EXIT not in desc.flags: continue

...

for (ip, port) in hosts: if not desc.exit_policy.can_exit_to(ip, port): continue

I don't think this'll actually work. The 'continue' will be for the iteration over hosts.

======================================== exitmap / scanner.py ========================================

...

stem.connection.authenticate_none(torCtrl)

There is no point in doing this unless you *only* want it to work without authentication. If you opt for authenticate() instead then this will also work for cookie auth.

Cheers! -Damian

[1] http://www.python.org/dev/peps/pep-0008/ [2] https://pypi.python.org/pypi/pep8

Philipp Winter

3 Dec 3 Dec

11:38 a.m.

On Sun, Dec 01, 2013 at 05:09:55PM -0800, Damian Johnson wrote:

...

You might want to look into PEP8 [1], Python's de-facto style guide. It's certainly up to you which bits you do/don't like, but coming close will make your code more uniform with the rest of the Python world. PyPI has a slick pep8 script you can run over your codebase [2]. Personally I run this as part of the tests for Stem.

Thanks for all your remarks. That's very helpful feedback!

...

Would you be amenable to changes in that regard from me? This looks like a fun project, so I'm a bit tempted to sink a weekend or two into into seeing if I can simplify the codebase.

Sure, absolutely. You should wait a couple of days before doing that, though. After Lunar's suggestions, I'm significantly restructuring the code.

Cheers, Philipp

10 Oct 10 Oct

11:39 a.m.

On Wednesday 09 October 2013 23:44:18 Philipp Winter wrote:

...

I am working on a Python-based exit relay scanner which should detect malicious and misbehaving exits. The design should have a reasonable balance between being fast/parallel and stressing the network as little as possible.

I came up with the following three steps:

Spawn a "parent" Tor process to get an up-to-date consensus.

2.1 For every selected exit relay, spawn a lightweight Tor process.

2.2 The consensus is copied from the "parent" process to the lightweight process' data directory. That way, the consensus has to be downloaded only once.

2.3 Every lightweight Tor process has the following configuration:
---
SOCKSPort auto
ControlPort 0
__DisablePredictedCircuits 1
UseEntryGuards 0
FetchServerDescriptors 0
DataDirectory <data_directory>
ExitNodes <exit_relay>
---

Entry guards are not used to distribute the load.  Predicted circuits
are disabled to prevent expensive creation of circuits which would not be used anyway. In addition, I am considering adding "EntryNodes" or "Bridge" to concentrate the first hop's load on machines under my control.

torsocks is then used to establish decoy connections over the

respective exit relay. After that, the process is terminated.

Any thoughts on how to further improve the design or ideas for a better one?

There is no need to spawn multiple Tor processes if you do circuit building and stream handling on your own.

Best, Robert

Fabio Pietrosanti (naif)

11 Oct 11 Oct

8:12 a.m.

Why don't just use OONI and a single Tor instance to do so?

I expect it will take much less and you will be able to leverage existing code and exitsting knowledge within the tor project.

Fabio

Il 10/9/13 11:44 PM, Philipp Winter ha scritto:

...

I am working on a Python-based exit relay scanner which should detect malicious and misbehaving exits. The design should have a reasonable balance between being fast/parallel and stressing the network as little as possible.

3992

Age (days ago)

4047

Last active (days ago)

tor-dev@lists.torproject.org

14 comments

7 participants

tags (0)

participants (7)

Aaron
Damian Johnson
Fabio Pietrosanti (naif)
Lunar
meejah
Philipp Winter
ra