I am working on a Python-based exit relay scanner which should detect malicious and misbehaving exits. The design should have a reasonable balance between being fast/parallel and stressing the network as little as possible.
I came up with the following three steps:
1. Spawn a "parent" Tor process to get an up-to-date consensus.
2.1 For every selected exit relay, spawn a lightweight Tor process.
2.2 The consensus is copied from the "parent" process to the lightweight process' data directory. That way, the consensus has to be downloaded only once.
2.3 Every lightweight Tor process has the following configuration:
--- SOCKSPort auto ControlPort 0 __DisablePredictedCircuits 1 UseEntryGuards 0 FetchServerDescriptors 0 DataDirectory <data_directory> ExitNodes <exit_relay> ---
Entry guards are not used to distribute the load. Predicted circuits are disabled to prevent expensive creation of circuits which would not be used anyway. In addition, I am considering adding "EntryNodes" or "Bridge" to concentrate the first hop's load on machines under my control.
3. torsocks is then used to establish decoy connections over the respective exit relay. After that, the process is terminated.
Any thoughts on how to further improve the design or ideas for a better one?
Cheers, Philipp
On Wed, Oct 9, 2013 at 9:44 PM, Philipp Winter identity.function@gmail.comwrote:
I am working on a Python-based exit relay scanner which should detect malicious and misbehaving exits. The design should have a reasonable balance between being fast/parallel and stressing the network as little as possible.
I came up with the following three steps:
- Spawn a "parent" Tor process to get an up-to-date consensus.
2.1 For every selected exit relay, spawn a lightweight Tor process.
2.2 The consensus is copied from the "parent" process to the lightweight process' data directory. That way, the consensus has to be downloaded only once.
2.3 Every lightweight Tor process has the following configuration:
--- SOCKSPort auto ControlPort 0 __DisablePredictedCircuits 1 UseEntryGuards 0 FetchServerDescriptors 0 DataDirectory <data_directory> ExitNodes <exit_relay> --- Entry guards are not used to distribute the load. Predicted circuits
are disabled to prevent expensive creation of circuits which would not be used anyway. In addition, I am considering adding "EntryNodes" or "Bridge" to concentrate the first hop's load on machines under my control.
- torsocks is then used to establish decoy connections over the
respective exit relay. After that, the process is terminated.
Any thoughts on how to further improve the design or ideas for a better one?
Hi Phillip,
I'm excited to hear you're interested in working on an exit scanner.
I have some thoughts regarding the design that might interest you.
I have been working on adding a "Tor Network Test Template" to ooni-probe; the basic concept is to extend the Tor controller library we use (txtorcon) to be able to build and attach circuits to specific streams, and iterate over the exits in the consensus. That is, we'll provide primitives that will allow you to specify a network interference test and tell ooni to run that test against every exit we know about (or a subset, a specific exit, or what have you). ooni-probe features a concurrency-based scheduler, so as to limit the impact on the network, and we've designed it to support plug-in rate limiting hooks if you want to do something fancier.
My intent is to provide Tor network tests as part of ooni-probe's standard set of tests, so that many individuals will measure the Tor network and automatically publish their results, and so that current and future network interference tests can be easily adapted to running on the Tor network. Future ideas include adding signing support to ooni reports so that reliable reporters can build trust, and automatically parsing submitted reports to generate BadExit after a threshold of reporters is reached.
Hope this interests you,
--Aaron
Cheers,
Philipp _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Aaron aagbsn@extc.org writes:
I have been working on adding a "Tor Network Test Template" to ooni-probe; the basic concept is to extend the Tor controller library we use (txtorcon) to be able to build and attach circuits to specific streams, and iterate over the exits in the consensus.
I have an old branch called exit_scanner in txtorcon's repository that was/is a fairly bare-bones scanner. It used a single Tor slave. It's a bit old by now and should probably at least be re-based to master, but if any of it looks promising, it might be a starting point.
https://github.com/meejah/txtorcon/compare/exit_scanner
- -- meejah
On Thu, Oct 10, 2013 at 12:50:32PM +0400, meejah wrote:
I have been working on adding a "Tor Network Test Template" to ooni-probe; the basic concept is to extend the Tor controller library we use (txtorcon) to be able to build and attach circuits to specific streams, and iterate over the exits in the consensus.
I have an old branch called exit_scanner in txtorcon's repository that was/is a fairly bare-bones scanner. It used a single Tor slave. It's a bit old by now and should probably at least be re-based to master, but if any of it looks promising, it might be a starting point.
Thanks for the hint; I will have a look. In fact, I already have the code. If I could come up with a good name for it, I would even publish it ;)
Cheers, Philipp
On Thu, Oct 10, 2013 at 07:23:11AM +0000, Aaron wrote:
I have been working on adding a "Tor Network Test Template" to ooni-probe; the basic concept is to extend the Tor controller library we use (txtorcon) to be able to build and attach circuits to specific streams, and iterate over the exits in the consensus. That is, we'll provide primitives that will allow you to specify a network interference test and tell ooni to run that test against every exit we know about (or a subset, a specific exit, or what have you).
I have a very similar goal. However, instead of extending my controller (I use stem), I spawn parallel Tor processes out of a process pool (based on Python's 'concurrent' module). I assume, your scanning would be sequential?
ooni-probe features a concurrency-based scheduler, so as to limit the impact on the network, and we've designed it to support plug-in rate limiting hooks if you want to do something fancier.
Rate limiting certainly sounds fancier than what I had in mind :)
Cheers, Philipp
On Thu, Oct 10, 2013 at 9:57 AM, Philipp Winter <identity.function@gmail.com
wrote:
On Thu, Oct 10, 2013 at 07:23:11AM +0000, Aaron wrote:
I have been working on adding a "Tor Network Test Template" to
ooni-probe;
the basic concept is to extend the Tor controller library we use
(txtorcon)
to be able to build and attach circuits to specific streams, and iterate
over
the exits in the consensus. That is, we'll provide primitives that will
allow
you to specify a network interference test and tell ooni to run that test against every exit we know about (or a subset, a specific exit, or what
have
you).
I have a very similar goal. However, instead of extending my controller (I use stem), I spawn parallel Tor processes out of a process pool (based on Python's 'concurrent' module). I assume, your scanning would be sequential?
ooni is built with Twisted (http://twistedmatrix.com/), which is a python based asynchronous event-driven framework, similar in concept to libevent. Our scheduler caps the number of 'in flight' measurements to whatever you specify in the ooniprobe.conf 'concurrency' options.
ooni-probe features a concurrency-based scheduler, so as to limit the
impact
on the network, and we've designed it to support plug-in rate limiting
hooks
if you want to do something fancier.
Rate limiting certainly sounds fancier than what I had in mind :)
We don't do anything that fancy yet, just left the stubs in place for future use as we thought they might come in handy.
Cheers, Philipp _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Philipp Winter identity.function@gmail.com writes:
I assume, your scanning would be sequential?
There are many circuits in-flight in the single Tor instance; many outstanding requests are possible.
On Mon, Oct 14, 2013 at 5:03 PM, meejah meejah@meejah.ca wrote:
Philipp Winter identity.function@gmail.com writes:
I assume, your scanning would be sequential?
There are many circuits in-flight in the single Tor instance; many outstanding requests are possible.
-- meejah _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
In case anyone is interested, I have a branch over here: https://github.com/TheTorProject/ooni-probe/tree/feature/tor_test_template that provides a basic example of a Tor network test in ooni.
There's plenty of room for improvement, comments and patches are very welcome.
--Aaron
On Thu, Oct 10, 2013 at 07:23:11AM +0000, Aaron wrote:
I have been working on adding a "Tor Network Test Template" to ooni-probe; the basic concept is to extend the Tor controller library we use (txtorcon) to be able to build and attach circuits to specific streams, and iterate over the exits in the consensus.
I now have similar code which is based on stem: https://github.com/NullHypothesis/exitmap
However, the problem with a parallel single-Tor-process design is that there is no easy way for scanning modules to figure out which exit relay they were attached to. The Tor controller just sees a bunch of incoming streams but once, one of these streams spots something fishy, it is difficult to figure out which exit relay is to blame.
Cheers, Philipp
Philipp Winter:
I now have similar code which is based on stem: https://github.com/NullHypothesis/exitmap
However, the problem with a parallel single-Tor-process design is that there is no easy way for scanning modules to figure out which exit relay they were attached to. The Tor controller just sees a bunch of incoming streams but once, one of these streams spots something fishy, it is difficult to figure out which exit relay is to blame.
I am unsure about the meaning of the `cmd` parameter in the probe method of your modules. But, can't you pass the circuit ID (or an opaque wrapper object, actually) to the scanning module, and have the scanning module pass that to the stream attacher when it has a stream ready?
(Feel free to ignore me if that sounds like non-sense. I only had a quick look at the code.)
On Tue, Nov 26, 2013 at 03:21:04PM +0100, Lunar wrote:
Philipp Winter:
I now have similar code which is based on stem: https://github.com/NullHypothesis/exitmap
However, the problem with a parallel single-Tor-process design is that there is no easy way for scanning modules to figure out which exit relay they were attached to. The Tor controller just sees a bunch of incoming streams but once, one of these streams spots something fishy, it is difficult to figure out which exit relay is to blame.
I am unsure about the meaning of the `cmd` parameter in the probe method of your modules. But, can't you pass the circuit ID (or an opaque wrapper object, actually) to the scanning module, and have the scanning module pass that to the stream attacher when it has a stream ready?
"cmd" abstracts away torsocks and handling standlone processes. That's useful for modules which don't make use of Python libraries.
And yes, your idea should work in theory but Python's concurrent.futures doesn't make it straightforward at this point. I'll look into it. Thanks for having a look at the code.
Cheers, Philipp
I now have similar code which is based on stem: https://github.com/NullHypothesis/exitmap
Hi Philipp, sorry about the long delay before responding. ExitMap looks great!
You might want to look into PEP8 [1], Python's de-facto style guide. It's certainly up to you which bits you do/don't like, but coming close will make your code more uniform with the rest of the Python world. PyPI has a slick pep8 script you can run over your codebase [2]. Personally I run this as part of the tests for Stem.
Would you be amenable to changes in that regard from me? This looks like a fun project, so I'm a bit tempted to sink a weekend or two into into seeing if I can simplify the codebase.
Some general code review feedback follows...
======================================== exitmap / circuit.py ========================================
new = Circuit
Not especially pythonic, but works.
======================================== exitmap / circuitpool.py ========================================
while (circuitID is None):
Parentheses aren't necessary.
if len(self.exitRelays) == 0:
Can also be 'if not self.exitRelays:'.
try: circuitID = self.ctrl.new_circuit([const.FIRST_HOP, exitRelay], await_build=awaitBuild) except (stem.InvalidRequest, stem.CircuitExtensionFailed, stem.ControllerError) as error:
Stem's InvalidRequest and CircuitExtensionFailed extend ControllerError, so this can be simplified to...
except stem.ControllerError as error:
Stem's exception hierarchy can be found on...
https://stem.torproject.org/api/control.html#exceptions-and-attribute-enums
Your _addCircuit() is soley used by _fillPool(), so personally I'd probably do this as a generator...
def _generate_circuit(self, await_build): """ Pops the top relay off our exitRelays and generates a circuit through it, going on to the next entry if it fails.
:param bool await_build: block until the circuit is created if **True**
:returns: :class:`~circuit.Circuit` for the circuit we establish """
while self.exitRelays: exit_fingerprint = self.exitRelays.pop(0)
logger.debug("Attempting to create circuit with '%s' as exit " \ "relay." % exit_fingerprint)
try: circuit_id = self.ctrl.new_circuit( [const.FIRST_HOP, exit_fingerprint], await_build = await_build, )
logger.debug("Created circuit #%s with '%s' as exit relay." % (circuit_id, exit_fingerprint))
yield circuit.Circuit(circuit_id) except stem.ControllerError as exc: logger.warning("Could not establish circuit with '%s'. " \ "Skipping to next exit (error=%s)." % (exit_fingerprint, exc))
logger.warning("No more exit relay fingerprints to create circuits with.")
def _fill_pool(self, await_build = False): if len(self.pool) == const.CIRCUIT_POOL_SIZE: return
logger.debug("Attempting to refill the circuit pool to size %d." % const.CIRCUIT_POOL_SIZE)
while self.exitRelays and len(self.pool) != const.CIRCUIT_POOL_SIZE: self.pool.append(self._generate_circuit(awaitBuild))
# go over circuit pool once poolLen = len(self.pool) for idx in xrange(poolLen):
if idx >= len(self.pool): return None logger.debug("idx=%d, poolsize=%d" % (idx, len(self.pool))) circuit = self.pool[idx] # that's a Circuit() object
Honestly I'm finding this class to be overcomplicated. Popping and appending items between lists is making this more confusing than it needs to be.
======================================== exitmap / command.py ========================================
Stem offers a friendlier method of calling commands. Mostly I intended it for the system module's functions, but you might find it useful here too...
https://stem.torproject.org/api/util/system.html#stem.util.system.call
======================================== exitmap / exitselector.py ========================================
for desc in stem.descriptor.parse_file(open(consensus)):
Why read the consensus file directly? If you have a controller then getting it via tor would be the best option. If not then fetching this directly via the authorities is generally the easiest...
https://stem.torproject.org/api/descriptor/remote.html
if not "Exit" in desc.flags: continue
Perfectly fine, though stem does offer enums...
if stem.Flag.EXIT not in desc.flags: continue
for (ip, port) in hosts: if not desc.exit_policy.can_exit_to(ip, port): continue
I don't think this'll actually work. The 'continue' will be for the iteration over hosts.
======================================== exitmap / scanner.py ========================================
stem.connection.authenticate_none(torCtrl)
There is no point in doing this unless you *only* want it to work without authentication. If you opt for authenticate() instead then this will also work for cookie auth.
Cheers! -Damian
[1] http://www.python.org/dev/peps/pep-0008/ [2] https://pypi.python.org/pypi/pep8
On Sun, Dec 01, 2013 at 05:09:55PM -0800, Damian Johnson wrote:
You might want to look into PEP8 [1], Python's de-facto style guide. It's certainly up to you which bits you do/don't like, but coming close will make your code more uniform with the rest of the Python world. PyPI has a slick pep8 script you can run over your codebase [2]. Personally I run this as part of the tests for Stem.
Thanks for all your remarks. That's very helpful feedback!
Would you be amenable to changes in that regard from me? This looks like a fun project, so I'm a bit tempted to sink a weekend or two into into seeing if I can simplify the codebase.
Sure, absolutely. You should wait a couple of days before doing that, though. After Lunar's suggestions, I'm significantly restructuring the code.
Cheers, Philipp
On Wednesday 09 October 2013 23:44:18 Philipp Winter wrote:
I am working on a Python-based exit relay scanner which should detect malicious and misbehaving exits. The design should have a reasonable balance between being fast/parallel and stressing the network as little as possible.
I came up with the following three steps:
- Spawn a "parent" Tor process to get an up-to-date consensus.
2.1 For every selected exit relay, spawn a lightweight Tor process.
2.2 The consensus is copied from the "parent" process to the lightweight process' data directory. That way, the consensus has to be downloaded only once.
2.3 Every lightweight Tor process has the following configuration:
--- SOCKSPort auto ControlPort 0 __DisablePredictedCircuits 1 UseEntryGuards 0 FetchServerDescriptors 0 DataDirectory <data_directory> ExitNodes <exit_relay> --- Entry guards are not used to distribute the load. Predicted circuits
are disabled to prevent expensive creation of circuits which would not be used anyway. In addition, I am considering adding "EntryNodes" or "Bridge" to concentrate the first hop's load on machines under my control.
- torsocks is then used to establish decoy connections over the
respective exit relay. After that, the process is terminated.
Any thoughts on how to further improve the design or ideas for a better one?
There is no need to spawn multiple Tor processes if you do circuit building and stream handling on your own.
Best, Robert
Why don't just use OONI and a single Tor instance to do so?
I expect it will take much less and you will be able to leverage existing code and exitsting knowledge within the tor project.
Fabio
Il 10/9/13 11:44 PM, Philipp Winter ha scritto:
I am working on a Python-based exit relay scanner which should detect malicious and misbehaving exits. The design should have a reasonable balance between being fast/parallel and stressing the network as little as possible.