On Fri, May 31, 2013 at 8:30 PM, Jacob Appelbaum <jacob@appelbaum.net> wrote:
Greetings from India,

So I've been testing networks in Bangalore and I've noticed a few odd
quirks with using a test deck.

Here is my ooniprobe.conf:

 % cat ooniprobe.conf
# This is the configuration file for OONIProbe
# This file follows the YAML markup format:
http://yaml.org/spec/1.2/spec.html
# Keep in mind that indentation matters.

basic:
    # Where OONIProbe should be writing it's log file
    logfile: ooniprobe-bangalore.log
privacy:
    # Should we include the IP address of the probe in the report?
    includeip: true
    # Should we include the ASN of the probe in the report?
    includeasn: true
    # Should we include the country as reported by GeoIP in the report?
    includecountry: true
    # Should we include the city as reported by GeoIP in the report?
    includecity: true
    # Should we collect a full packet capture on the client?
    includepcap: false
reports:
    # This is a packet capture file (.pcap) to load as a test:
    pcap: Null
advanced:
    # XXX change this to point to the directory where you have stored
the GeoIP
    # database file. This should be the directory in which OONI is installed
    # /path/to/ooni-probe/data/
    #geoip_data_dir: /usr/share/GeoIP/
    geoip_data_dir: /home/a/ooni-probe/data/
    debug: true
    # tor_binary: '/usr/sbin/tor'
    # For auto detection
    interface: auto
    # Of specify a specific interface
    #interface: wlan0
    # If you do not specify start_tor, you will have to have Tor running and
    # explicitly set the control port and SOCKS port
    start_tor: true
    # After how many seconds we should give up on a particular measurement
    measurement_timeout: 30
    # After how many retries we should give up on a measurement
    measurement_retries: 2
    # How many measurments to perform concurrently
    measurement_concurrency: 10
    # After how may seconds we should give up reporting
    reporting_timeout: 30
    # After how many retries to give up on reporting
    reporting_retries: 6
    # How many reports to perform concurrently
    reporting_concurrency: 10
tor:
    socks_port: 9250
    control_port: 9251
    # Specify the absolute path to the Tor bridges to use for testing
    #bridges: bridges.list
    # Specify path of the tor datadirectory.
    # This should be set to something to avoid having Tor download each time
    # the descriptors and consensus data.
    data_dir: ~/.tor/


Here is the test deck:

 % cat decks/india-full.deck
- options:
    collector: null
    help: 0
    logfile: null
    pcapfile: null
    reportfile: null
    subargs: [-t, '192.168.1.1', -f,
'inputs/india-uniq-hosts-with-alexa-top-1000.txt']
    test_file: nettests/blocking/dnsconsistency.py
- options:
    collector: httpo://nkvphnp3p6agi5qq.onion
    help: 0
    logfile: null
    pcapfile: null
    reportfile: null
    subargs: [-b, 'http://93.95.227.200']
    test_file: nettests/manipulation/http_header_field_manipulation.py
- options:
    collector: httpo://nkvphnp3p6agi5qq.onion
    help: 0
    logfile: null
    pcapfile: null
    reportfile: null
    subargs: [-b, 'http://93.95.227.200']
    test_file: nettests/manipulation/http_invalid_request_line.py
- options:
    collector: httpo://nkvphnp3p6agi5qq.onion
    help: 0
    logfile: null
    pcapfile: null
    reportfile: null
    subargs: [-b, 'http://93.95.227.200', -f,
'inputs/india-uniq-urls-with-alexa-top-1000.txt']
    test_file: nettests/manipulation/http_host.py

A few things happen when I attempt to use this deck.

Tor fails to return my IP:
2013-06-01 00:44:15+0530 [TorControlProtocol,client] [D] 100%: Done
2013-06-01 00:44:15+0530 [TorControlProtocol,client] [D] Building a TorState
2013-06-01 00:44:16+0530 [TorControlProtocol,client] Successfully
bootstrapped Tor
2013-06-01 00:44:16+0530 [TorControlProtocol,client] [D] We now have the
following circuits:
2013-06-01 00:44:16+0530 [TorControlProtocol,client] [D]  * <Circuit 1
BUILT [194.132.32.43 165.225.132.54 46.165.221.166] for GENERAL>
2013-06-01 00:44:16+0530 [TorControlProtocol,client] [D]  * <Circuit 2
EXTENDED [194.132.32.43] for GENERAL>
2013-06-01 00:44:16+0530 [TorControlProtocol,client] [D]  * <Circuit 3
EXTENDED [] for GENERAL>
2013-06-01 00:44:16+0530 [TorControlProtocol,client] [D]  * <Circuit 4
EXTENDED [] for GENERAL>
2013-06-01 00:44:16+0530 [TorControlProtocol,client] [D] Obtained our IP
address from a Tor Relay None
2013-06-01 00:44:16+0530 [TorControlProtocol,client] Unhandled Error
        Traceback (most recent call last):
        Failure: txtorcon.torcontrolprotocol.TorProtocolError: 551
Address unknown

known issue with resolving IP by Tor before any descriptors have been fetched.
 

2013-06-01 00:44:16+0530 [TorControlProtocol,client] Unable to lookup
the probe IP via Tor.
2013-06-01 00:44:16+0530 [TorControlProtocol,client] [D] Cannot
determine the probe IP address with a traceroute, becase of insufficient
priviledges
2013-06-01 00:44:16+0530 [TorControlProtocol,client] Looking up your IP
address via maxmind

Does the log end here? You should see some noise about a report being created at least, because the file header was written to disk.

Then things get a little strange - http_host.py is never executed.
Another is that http_header_field_manipulation.py runs and the log file
shows everything, the yamloo file shows only this:

% cat report-http_header_field_manipulation-2013-05-31T191417Z.yamloo
###########################################
# OONI Probe Report for http_header_field_manipulation (0.1.3)
# Sat Jun  1 00:57:40 2013
###########################################
---
options: [-b, 'http://93.95.227.200']
probe_asn: AS24560
probe_cc: IN
probe_ip: 122.167.211.176
software_name: ooniprobe
software_version: 0.0.11
start_time: 1370027657.776991
test_name: http_header_field_manipulation
test_version: 0.1.3
...

The debug log shows the headers being sent and the data being returned
with an issue at the collector:
2013-06-01 00:57:40+0530 [SOCKS5Client,client] Creating report with
OONIB Reporter. Please be patient.
2013-06-01 00:57:40+0530 [SOCKS5Client,client] This may take up to 1-2
minutes...
2013-06-01 00:57:40+0530 [SOCKS5Client,client] [D] Successfully
performed report <ooni.tasks.ReportEntry object at 0x588c190>
2013-06-01 00:57:40+0530 [SOCKS5Client,client] [D] None
2013-06-01 00:57:40+0530 [Uninitialized] [!] Failed to connect to
reporter backend
2013-06-01 00:57:40+0530 [Uninitialized] Traceback (most recent call last):
2013-06-01 00:57:40+0530 [Uninitialized]   File
"/home/io/Documents/backup/git/tor/ooni-probe/ooni/reporter.py", line
323, in createReport
2013-06-01 00:57:40+0530 [Uninitialized]     bodyProducer=bodyProducer)
2013-06-01 00:57:40+0530 [Uninitialized] ConnectError: An error occurred
while connecting: [Failure instance: Traceback (failure with no frames):
<class 'twisted.internet.error.ConnectionLost'>: Connection to the other
side was lost in a non-clean fashion: Connection lost.
2013-06-01 00:57:40+0530 [Uninitialized] ].
2013-06-01 00:57:40+0530 [Uninitialized] [!] Failed to open
<ooni.reporter.OONIBReporter object at 0x461d710> reporter, giving up...
2013-06-01 00:57:40+0530 [Uninitialized] [!] Reporter
<ooni.reporter.OONIBReporter object at 0x461d710> failed, removing from
report...
2013-06-01 00:57:40+0530 [Uninitialized] [D] Starting this task
<generator object generateMeasurements at 0x51906e0>
2013-06-01 00:57:40+0530 [Uninitialized] [D] Running <class
'nettests.manipulation.http_header_field_manipulation.HTTPHeaderFieldManipulation'>
test_put
2013-06-01 00:57:40+0530 [Uninitialized] [D] Finished test setup
2013-06-01 00:57:40+0530 [Uninitialized] [D] Performing request
http://93.95.227.200 PUT {'Accept-Language': ['en-US,en;q=0.8'],
'Accept-Encoding': ['gzip,deflate,sdch'], 'Accept':
['text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'],
'User-Agent': ['Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US;
rv:1.9.2) Gecko/20100115 Firefox/3.6'], 'Accept-Charset':
['ISO-8859-1,utf-8;q=0.7,*;q=0.3'], 'Host': ['XAxlpMzUMfI5Vvi.com']}
2013-06-01 00:57:40+0530 [Uninitialized] [D] Running <class
'nettests.manipulation.http_header_field_manipulation.HTTPHeaderFieldManipulation'>
test_get_random_capitalization
2013-06-01 00:57:40+0530 [Uninitialized] [D] Finished test setup
2013-06-01 00:57:40+0530 [Uninitialized] [D] Performing request
http://93.95.227.200 gET {'accePt-lanGuAGe': ['en-US,en;q=0.8'],
'accEpT-eNcoDING': ['gzip,deflate,sdch'], 'ACCepT':
['text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'],
'USeR-aGEnT': ['Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US;
rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7'], 'aCcEPt-chaRseT':
['ISO-8859-1,utf-8;q=0.7,*;q=0.3'], 'HoSt': ['l5tHomKVddWW1A4.com']}
2013-06-01 00:57:40+0530 [Uninitialized] [D] Running <class
'nettests.manipulation.http_header_field_manipulation.HTTPHeaderFieldManipulation'>
test_post_random_capitalization
2013-06-01 00:57:40+0530 [Uninitialized] [D] Finished test setup

In the end, I didn't have any yamloo files from the
nettests/manipulation/http_invalid_request_line.py test. I had three
files that updated and had some data which was basically:

  report-dns_consistency-2013-05-31T191417Z.yamloo
  report-http_header_field_manipulation-2013-05-31T191417Z.yamloo
  ooniprobe-bangalore.log


I expected a few different things - one is that each test in the deck
should produce a yamloo file. If the reporting back end takes the
report, I suppose I might find it alright to not have the file but in
the event of a failure, I really hope the data will be logged to a local
.yamloo file.

The data should always be logged to a local yamloo file. If the test fails to run, it won't write anything other than the report header (this happens before the test is started).

When I run the following deck:

 % cat decks/india.deck
- options:
    collector: httpo://nkvphnp3p6agi5qq.onion
    help: 0
    logfile: http_host_india_bangalore_justa_hotel.log
    pcapfile: null
    reportfile: http_host_india_cis.yamloo
    subargs: [-b, 'http://93.95.227.200', -f,
'inputs/india-uniq-urls-with-alexa-top-1000.txt']
    test_file: nettests/manipulation/http_host.py

I have the proper output for http_host.py:

 % head report-http_host-2013-05-31T193306Z.yamloo
###########################################
# OONI Probe Report for http_host (0.2.3)
# Sat Jun  1 01:03:06 2013
###########################################
---
options: [-b, 'http://93.95.227.200', -f,
inputs/india-uniq-urls-with-alexa-top-1000.txt]
probe_asn: AS24560
probe_cc: IN
probe_ip: 122.167.211.176
software_name: ooniprobe

% tail report-http_host-2013-05-31T193306Z.yamloo
    url: http://93.95.227.200
  response:
    body: '{"headers_dict": {"Connection": ["close"], "Host":
["zustmovies.com"]},
      "request_line": "\nGET / HTTP/1.1", "request_headers":
[["Connection", "close"],
      ["Host", "zustmovies.com"]]}'
    code: 200
    headers: []
socksproxy: null
transparent_http_proxy: false
...

Note that the yamloo file is created not as
http_host_india_bangalore_justa_hotel.log but as
report-http_host-2013-05-31T193306Z.yamloo...

This is a bug. I opened an issue at: https://github.com/TheTorProject/ooni-probe/issues/123

It seems that perhaps test decks are too experimental for actual use
with these issues - or did I do something horribly wrong?

They do need better testing. Another painful failure I discovered is that if a test fails explosively the remainder of the deck will not be run.  I worked around this issue with a janky shell script and just commented out tests that had already run.


Thoughts?

We had some issues with the collector being hammered to the point it ran out of file descriptors. In general, if you know you will be doing tests from remote areas with poor connectivity without much up-front notice it would be helpful to do one of the following:

1. set up and run a new collector on a spare machine or amazon instance for your tests
2. or ask someone in advance to set up a backup collector
3. familiarize yourself with oonib operation and troubleshoot

sadly things are still a little fragile, but if you know your tests, input lists, and collectors all run cleanly before heading into the field you alleviate a lot of stress.

p.s. iirc you do have access to the tpo collector; is that still the case?

--Aaron


All the best,
Jacob
_______________________________________________
ooni-dev mailing list
ooni-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/ooni-dev