-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
Hi Arturo,
- --- Currently for our data processing needs we have begun to bucket reports by date (every date corresponds to when a certain report has been submitted to the collector). What I would like to know is of the two following options what would be most convenient to you for accessing the data.
The options are:
OPTION A: Have 1 JSON stream for every day of measurements (either gzipped or plai n)
ex. - https://ooni.torproject.org/reports/json/2016-01-01.json - https://ooni.torproject.org/reports/json/2016-01-02.json - https://ooni.torproject.org/reports/json/2016-01-03.json etc.
OPTION B: Have 1 JSON stream for every ooni-probe test run and publish them inside of a directory with the timestamp of when it was collected
ex. - - https://ooni.torproject.org/reports/json/2016-01-01/20160101T204732Z NL-AS3265-http_requests-v1-probe.json.gz - - https://ooni.torproject.org/reports/json/2016-01-01/20160101T204732Z US-AS3265-dns_consistency-v1-probe.json.gz etc.
Since we are internally using the daily batches for doing the processing and analysis of reports unless there is an explicit request to publish them on a test run basis we will probably end up going for option A, so don’t be shy to reply :) - ---
I agree with David in that it will be easier to access specific ooni-probe test results using option (B) (i.e. the current solution).
What benefits did you identify when considering to switch to option (A)?
A few reasons to stick with option (B) include:
- - Retaining the ability to run ooni-pipeline on a subset of reports associated with a given time period by filtering by date prefix, and substrings within key names; - - Retaining the ability to distribute small units of work easily among subprocesses; and - - Retaining the idempotent nature of ooni-pipeline, and the luigi framework - switching from lots of small files to a single large file for a given day will invariably increase the time required to recover from failures (i.e. if a small dnst-based test fails to normalise, you'll have to renormalise everything as opposed to a single test; - - Developers will not have to download hundreds of megabytes of data in order to access a traceroute test result that is only a few kilobytes in size; and - - It's generally easier to work with smaller files than it is to work with big files.
Cheers, Tyler
GPG fingerprint: 8931 45DF 609B EE2E BC32 5E71 631E 6FC3 4686 F0EB (tyler@tylerfisher.org)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
Hi,
Are we going to have a per-country directory for published reports? It's a useful option to download all reports from a given country.
~Vasilis
On Mar 11, 2016, at 18:34, Vasilis andz@torproject.org wrote:
Signed PGP part Hi,
Are we going to have a per-country directory for published reports? It's a useful option to download all reports from a given country.
No, that is not going to be supported in the directory listing.
You will be able to list the reports for a given country from the API or perform the filtering yourself by inspecting the filename.
The directory structure of the new reports can be seen here: https://ooni.torproject.org/reports/
Note: since there is no more space on the torproject.org box hosting these, the HTTP requests tests are not being published there at the moment and the rsync has stopped. Once we setup the web server for hosting the report dumps /reports will redirect to it.
~ Arturo
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
On 11/03/16 14:39, Arturo Filastò wrote:
On Mar 11, 2016, at 18:34, Vasilis andz@torproject.org wrote: Are we going to have a per-country directory for published reports? It's a useful option to download all reports from a given country.
No, that is not going to be supported in the directory listing.
You will be able to list the reports for a given country from the API or perform the filtering yourself by inspecting the filename.
Perhaps we should add a daily dump per country of all reports so that a user could be able to download all reports from a given country?
The directory structure of the new reports can be seen here: https://ooni.torproject.org/reports/
Note: since there is no more space on the torproject.org box hosting these, the HTTP requests tests are not being published there at the moment and the rsync has stopped. Once we setup the web server for hosting the report dumps /reports will redirect to it.
The web server hosting the updated reports is ready and can be found here: http://141.20.103.26:8080
I'm waiting for the DNS A record of'measurements.ooni.io' to be added so that I can setup a letsencrypt certificate.
~Vasilis
On Mar 11, 2016, at 20:15, Vasilis andz@torproject.org wrote:
The web server hosting the updated reports is ready and can be found here: http://141.20.103.26:8080
I'm waiting for the DNS A record of'measurements.ooni.io' to be added so that I can setup a letsencrypt certificate.
$ dig A +short measurements.ooni.io 141.20.103.26
~ A.
Awesome! The measurements server in that form is already very useful. I wouldn't worry too much about getting country-specific views; it seems pretty trivial to crawl the current listings to find all reports for a country of interest.
On Fri, Mar 11, 2016 at 1:56 PM, Arturo Filastò art@torproject.org wrote:
On Mar 11, 2016, at 20:15, Vasilis andz@torproject.org wrote:
The web server hosting the updated reports is ready and can be found here: http://141.20.103.26:8080
I'm waiting for the DNS A record of'measurements.ooni.io' to be added so that I can setup a letsencrypt certificate.
$ dig A +short measurements.ooni.io 141.20.103.26
~ A.
ooni-dev mailing list ooni-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/ooni-dev