Hi Tyler,
Thanks for your email!
On Jan 14, 2016, at 04:47, Tyler Fisher apt.get.apps@gmail.com wrote:
Signed PGP part Hello,
I am working on normalisation for all of the DNS based tests right now (i.e. dns_consistency, and dns_injection) and was wondering if any of you had any suggestions with regards to how we should be normalising these results.
So far, this is what I have come up with looks like this:
{'data_format_version': None, 'input': 'www.ignored.ch', 'options': ['-f', 'citizenlab-urls-global.txt', '-T', 'dns-server-ch.txt'], 'probe_asn': 'AS41715', 'probe_cc': 'CH', 'probe_ip': '127.0.0.1', 'report_filename': 's3://ooni-private/reports-raw/yaml/2016-01-01/dns_consistency-2015-12-3 1T220031Z-AS41715-probe.yamloo', 'report_id': 'bWEWmX6oEftSSJq9yEF5oH0VPOU5VZJooX06gQENo136sSoj9MzlTBk7EjhfH1Td', 'software_name': 'ooniprobe', 'software_version': '1.3.2', 'test_helpers': {'backend': '213.138.109.232:57004'}, 'test_keys': {'annotations': None, 'backend_version': '1.1.4', 'control_resolver': '213.138.109.232:57004', 'errors': {'130.60.128.3': 'dns_lookup_error', '130.60.128.5': 'dns_lookup_error', '194.158.230.53': False, '194.230.1.5': False, '82.195.224.5': 'no_answer'}, 'failed': {'130.60.128.3', '130.60.128.5', '82.195.224.5'}, 'input_hashes': ['3f786850e387550fdab836ed7e6dc881de23001b'], 'queries': [{failure': None, 'hostname': 'www.ignored.ch', 'query_type': 'A', 'resolver_hostname': '213.138.109.232', 'resolver_port': 57004}, {'failure': None, 'hostname': 'www.ignored.ch', 'query_type': 'A', 'resolver_hostname': '212.147.10.10', 'resolver_port': 53}], 'successful': {'194.158.230.53', '194.230.1.5', '195.186.1.111', '81.221.252.10'}}, 'test_name': 'dns_consistency', 'test_runtime': 32.54842686653137, 'test_start_time': 1451605073.0, 'test_version': '0.6'}
After looking into the source code for the DNS consistency test, and the dnst template I was able to determine the subject of the DNS query, however, I am not sure how to handle the addr. section which changes depending on whether the associated DNS query has a type of A/SOA/NS (see: https://github.com/TheTorProject/ooni-probe/blob/master/ooni/templates/d nst.py#L153).
If you have any suggestions with regards to how to normalise dnst results, I've linked to the raw, and normalised reports below.
Gist: https://gist.github.com/TylerJFisher/7372f9c31c54b5207d2a Normalisation routine: https://gist.github.com/TylerJFisher/7372f9c31c54b5207d2a#file-normalise -py
I think how you have normalised the dns_consistency test is much better and I think that we should eventually integrate this data format directly inside of the ooni-probe tests themselves so that we don’t have to do any further normalised, that are error prone, on future reports.
I am a bit torn as to how to resolve the addrs key issue, because on one side I like the idea of not having to dig too much into the answers array to extract the stuff I am interested in, but on the other hand it’s probably best to have things be as consistent as possible. I think the best option is probably to just merge the “addrs” and “answers” into one list and make the items of the list change depending on the type of query (there is no cleaner way around this since the RDATA field in DNS is made this way).
I would say every item in the answers list has in the “ttl” key, the rest is specific depending on the type of query like so:
* A = “answers”: [{“ipv4”: “xxx.xxx.xxx.xxx”}, {“ipv4”: “xxx.xxx.xxx.xxx”}]
* PTR, NS = “ answers”: [{“hostname”: “xxx.yyy”}, {“hostname”: “xxx.yyy”}]
* MX = “ answers”: [{“preference”: int, “hostname”: “xxx.yyy”}, {“preference”: int, “hostname”: “xxx.yyy”}]
* SOA = “ answers”: [{“serial_number”: int, “refresh_interval”: int, “retry_interval”: int, “expiration_limit”: int, “minimum_ttl”: int, “hostname”: “xxx.yyy”, “responsible_name”: “xxx.yyy.zzz”}, …]
Note: For SOA queries we currently don’t collect all the above mentioned data in ooni-probe, but since we are going to change the data format anyways we may as well change it in a way that is future proof.
Do you think this makes sense?
~ Arturo