Arturo,
As we discussed last week, it would be ideal if we could retrieve a file containing urls of file locations. That way drop-off locations can vary and scale horizontally without the parser requiring prior knowledge.
Something like this :
ooni section : probe -> test -> test report -> collector -> generate received report list (for retransmission in case of intended or unintended failure to push to publisher) -> do data scrubbing if required for data publication -> compress (scrubbed) report -> push (scrubbed) compressed report -> publisher (a http server)
-> generate/update url list of reports existing on the publisher
chokepoint section: parser -> retrieve url list from publisher -> process url list -> retrieve file in url -> decompress file -> process report file
The report itself seems fairly clear, but some comments and questions nevertheless.
Report meta data: "options: [-f, /home/uwaterloo_geossl/bridge_reachability/bridges.txt, -t, '300']" Should preferably only contain the filename, not the path, it seems like there is potential data leakage there.
probe_cc: RU Does the cc refer to the ip´s cc ? How is the cc generated, maxmind or ? If the cc is generated based on an external geolocation service, this service and the date of generation should preferably be known.
probe_ip: 127.0.0.1 This should preferably be removed entirely before publication. Maybe it should not be there at all, the ASN seems sufficient.
Report content:
Can you supply a list of possible values or value ranges that can be expected for the following report entries: input: .... success: is this true/false ? tor_progress: is this 0-100 ? tor_progress_summary: this refers to the stage the previous finishes ? tor_progress_tag: are there values other than 'null' and 'done' ?
As discussed previously, let´s start with processing tests for the publicly known bridges only. How to manage the secret bridges we should tackle at a later stage, with the understanding that we should under no circumstances have access to the actual addresses of those bridges.
- Ruben
On 07/12/2014 04:18 PM, Arturo Filastò wrote:
As promised I published the bridge reachability measurements on the public ooni report hosting.
You can find them here:
https://ooni.torproject.org/reports/0.1/CN/ https://ooni.torproject.org/reports/0.1/RU/ https://ooni.torproject.org/reports/0.1/US/
Keep in mind that, as I was telling you, during some of the runs there were some issues with the measurements due to incompatibilities of ooniprobe with the old fedora version running on planetlab, so not all the measurements may be 100% accurate. They should still, at least, give you an idea of how the data format looks like and if it contains enough information for doing your parsing work.
I would suggest we keep this discussion public and maintain the ooni-dev list in cc.
~ Art.