Hi all,
We have been discussing lately some changes that we would like to make to the ooni-probe test deck format [1] and would like to have some feedback on what we have come up with.
For those of you not familiar with ooni-probe, a test deck is basically a way of telling it "Run this list of OONI tests with these inputs and by the way be sure you also set these options properly when doing so".
The previous deck format involved a lot of boilerplate and did not make it possible to bake inputs into the deck itself. This new format is supposed to overcome some of the limitations of the old design and we hope that a major redesign will not be needed in the near future.
The specification can be found in git [3], but to facilitate commenting I am also going to paste it into this email. All feedback is greatly appreciated.
-- BEGIN SPEC --
# Test deck specification
* version: 0.1.0 * date: 2014-08-13 * author: Alejandro López (kudrom), Arturo Filastò
# 0. Terminology Analyst: The person who writes a deck.
Collector: A machine running the ooni-backend that collects the reports generated by the execution of ooni-probe.
Nettest: A test whose execution by ooniprobe creates a report sent later to the collector if provided.
Ooni: Open observatory of network interference.
Ooni-probe: The client side of ooni used to execute a set of nettests.
Tester: The person who executes ooni-probe.
YAML: A human-readable data serialization format.
# 1. Rationale
To ease the execution of nettests, the ooni developers came up with the idea of a container that would allow a tester to easily execute a bunch of configured nettests.That container was called a deck.
This way, an analyst interested in a particular behaviour would write a deck; then, she would distribute that deck to every tester interested in the ongoing analysis. Finally, the tester would execute ooni-probe with that deck to properly create and send the nettests' reports.
Unfortunately, right now both the deck format and ooni-probe don't allow that level of automation. The way we want to solve this situation is by writing a new spec for the deck format that would allow us to write, with great confidence, the given functionality to ooni-probe. The document that you're reading is that spec.
# 2. Goals
Allow an analyst to reuse well written and tested nettests in an easy way.
Allow a tester to execute easily a battery of nettests with a complex configuration.
Grant privacy of the tester.
# 3. The data format
Every deck is a yaml file composed of two major sections: the header and the body.
The header is a dictionary that provides all the shared and global configuration of every nettest included in the deck. Its main purpose is to reduce boilerplate by letting the analyst express common behaviour in one section instead of in every nettest execution. The ooni-probe options allowed in this section are:
1. collector: Address of the collector of test results. 2. bouncer: Address of the bouncer for test helpers. 3. annotations: Annotate the report with a key:value[, key:value] format. 4. no-collector: Disable the collector. (FLAG) 5. no-geoip: Disable the geoip support. (FLAG)
Every option of the header section is a dictionary's key except that labeled with a FLAG, which are members of a list called flags. So for example a valid header section is the following:
``` header: collector: 'http://localhost' annotations: key1: value1 key2: value2 flags: - no-collector ```
The deck header can also contain metadata associated to the test deck. The possible fields are:
1. name: The name of the test deck. 2. description: A short description of the test deck. 3. author: The author of the test deck. 4. version: A version number for the test deck. 5. requires-root: A flag to indicate that the test deck requires root to run. (FLAG) 6. requires-tor: A flag to indicate that the test deck requires tor to run. (FLAG)
The header may also be omitted.
The body is a list composed of one element per nettest execution. Every nettest execution is a dictionary composed of the following three keys:
1. nettest: name of the nettest to execute (MANDATORY) 2. local_options: local_options of the test execution (OPTIONAL) 3. global_options: global options of the test execution (OPTIONAL)
In the same way that with the header, every option can be a member of a flag list if that options doesn't have any arguments. A valid body can be:
``` body: - nettest: manipulation/http_request local_options: url: 'http://torproject.org' global_options: collector: 'http://localhost' flags: - no-geoip - nettest: manipulation/captiveportal ```
All file paths must be relative and they must start with "deck/" if they are referring to files contained inside of the test deck. They must start with "http(o|s)://" if they are referring to files to be downloaded via Tor.
All other file paths should be ignored and raise an exception. This is because we do not want an analyst creating a deck to be pass as arguments to a test arbitrary files on the testers filesystem.
# 3.1 Container format
The container proposed is tar+gzip because it's well supported in python. The deck container will be composed of a directory named "deck" containing the deck file and the inputs.
The directory layout will be:
deck/test.deck
deck/input-filename-1.txt
deck/input-filename-2.txt
This will then be compressed using tar+gzip.
# 4. Implementation details
## 4.1 Introduction
Each execution of a nettest in ooniprobe needs four main inputs.
1. The global config file 2. The global cmd options 3. The nettest/deck 4. The local options for each nettest
The first three ones are mandatory to run a nettest, the last one depends on the nettest.
When a single nettest is executed, all options except the first one are passed in the cmd line.
The difference between the global config options and the global cmd options is that the second one has some shared options with the config file plus some additional subcommands to ooniprobe.
When a deck is involved in the execution, the last three inputs are meant to be passed (at least partially) in the deck. In fact, the deck is no more than the three last inputs, with some subtleties that I expect to explain in the following sections.
## 4.2 Who overwrites who
In a single-nettest execution, the global config is parsed and a global object is built. Then the cmd line is parsed and it overwrites the global configs. This is done to allow the setting of the most dynamic options in the console without the need of writing them each time to the config file.
All the ooni-probe's code base is allowed to access that global object, included the nettests. So, the global options affect the way ooni-probe behaves but also can affect the way the executing nettest behaves. That's the reason why the deck must provide the specification of some global options, but not all because there are some powerful options that would put in danger the tester. The allowed options are listed in the section [3. The data format].
In a deck-nettest execution, the header section of the deck is parsed first and a global object is built. Then the config file is parsed and the object is overwritten when it applies. Finally the cmd line of the ooni-probe is parsed and the object is overwritten.
We read the header section first to avoid the analyst overwrite some sensitive options of the config file, which only should be modified by the tester.
## 4.3 Copy-on-write
To reduce the boilerplate in the deck, the file is splitted into a shared section for all nettests and a local one for each nettest.
The idea is to allow the writer of the deck to express common and global behaviour of all the nettests in the header of the deck and to put specific and local options in each nettest element, as was already explained in [3. The data format].
What this means for the execution of the nettests is that when a nettest overwrites a global configuration of the header, this change is only visible to this nettest, not to every nettest who may ever execute in the same instance of ooni-probe. So what follows is that every change to the global options of ooni-probe in the local section of a deck should attend a copy-on-write policy regarding the global config object.
## 4.4 The input file
Ooni-probe invokes every nettest method with the information saved in a file called the input file. This file is part of the local configuration of every test, and therefore must be provided with the deck.
So the deck should be a compressed container which includes both the deck file and every input file necessary to every nettest included in the deck. Otherwise, the analyst would have to send to the tester the input files separately, which is unacceptable.
The container proposed is tar+gzip because it's well supported in python.
# 5. Example deck
The complete.deck provided with each installation of ooni-probe would be:
``` header: name: Complete description: Runs all the existing ooniprobe tests author: 'OONI ooni-dev@lists.torproject.org' version: 0.1.0 flags: - requires-root - requires-tor body: - nettest: blocking/http_request local_options: input_file: 'httpo://ihiderha53f36lsd.onion/input/37e60e13536f6afe47a830bfb6b371b5cf65da66d7ad65137344679b24fdccd1'
- nettest: blocking/dns_consistency local_options: input_file: 'httpo://ihiderha53f36lsd.onion/input/37e60e13536f6afe47a830bfb6b371b5cf65da66d7ad65137344679b24fdccd1'
- nettest: manipulation/http_invalid_request_line
- nettest: manipulation/http_header_field_manipulation
- nettest: manipulation/traceroute
- nettest: blocking/http_host local_options: input_file: 'httpo://ihiderha53f36lsd.onion/input/37e60e13536f6afe47a830bfb6b371b5cf65da66d7ad65137344679b24fdccd1' ```
-- END SPEC --
~ Art.
[1] https://trac.torproject.org/projects/tor/ticket/12823 [2] https://gitweb.torproject.org/ooni/spec.git/blob_plain/HEAD:/test-decks/td-s...