-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
On 11/03/16 18:52, Arturo Filastò wrote:
>
>> On Mar 11, 2016, at 20:15, Vasilis <andz(a)torproject.org> wrote:
>>
>> The web server hosting the updated reports is ready and can be
>> found here: http://141.20.103.26:8080
>>
>> I'm waiting for the DNS A record of'measurements.ooni.io' to be
>> added so that I can setup a letsencrypt certificate.
>
>
> $ dig A +short measurements.ooni.io 141.20.103.26
Thanks!
Waiting for the administrator to raise the ACL rules on 443.
Until then the reports will be served on http://measurements.ooni.io:808
0/
~Vasilis
-----BEGIN PGP SIGNATURE-----
iQIcBAEBCgAGBQJW5CQlAAoJEF+/cLHRJgFiiH4P/3tMZbNLuMEqqIMvULDqDQ/j
Una9RvZjGlQLkj+1ASk1SeklUj4kIDJJvba9nmc9Ts8oKAt5koiwB0OPagIIH75Y
dMX41fVvEolqxBficJHrJljuThzXUMbxMFMyW+5S51esrKYHAK2dE3sYcq0U5akH
xnuRFv2ZWk71v02DXsrU/3VeE0RNBbYD94trbxoERe1rlXWrWCMi+gqh8k3GU/pA
b8kttunk5gvvviG0kz/12yVrcJTM0TQe2BJbKMHJpNx2Oz18mVCLpGsdq+b2hrlb
dnx+6XBW9OHE1JVUN3HqTIuR2uAA0zz2OYCpJ958mxSl+2GJW3HilxeXDDYILWoz
xbSBtdOhQ/1Z4SkZUWPVV4+8+rbhdPpLLE5X+K/WgGTVQKRGFRfG+93oZHAR8whE
XbHZ5NdL0MTAOfI4w6GoDQwys07yJWd8MBKjZ+K7YwdCHQieEE8w4+Suvmx2BC6D
Qsde0RV0f7OKt8cRK7JvkF2GrAffm46Pcd68hNnDfrYTMqDVyDHgurXjcfFYrOE+
FEFVRchzE9Nh/grOqUXvXhgKaWjjpCMEs3HzwkmncjPswuBBOUzG5Vuc7WiR6scg
9nVt0H6DLOaGIkqpGM8zVDljlF/R3U7O0wr1ehL+E9q88/jTVshCF+1U+hJbiUZT
umoNjGsj5JNwY0QPdmSn
=C5TQ
-----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
Hi Arturo,
- ---
Currently for our data processing needs we have begun to bucket
reports by date (every date corresponds to when a certain report has
been submitted to the collector). What I would like to know is of the
two following options what would be most convenient to you for
accessing the data.
The options are:
OPTION A:
Have 1 JSON stream for every day of measurements (either gzipped or plai
n)
ex.
- https://ooni.torproject.org/reports/json/2016-01-01.json
- https://ooni.torproject.org/reports/json/2016-01-02.json
- https://ooni.torproject.org/reports/json/2016-01-03.json
etc.
OPTION B:
Have 1 JSON stream for every ooni-probe test run and publish them
inside of a directory with the timestamp of when it was collected
ex.
- - https://ooni.torproject.org/reports/json/2016-01-01/20160101T204732Z
NL-AS3265-http_requests-v1-probe.json.gz
- - https://ooni.torproject.org/reports/json/2016-01-01/20160101T204732Z
US-AS3265-dns_consistency-v1-probe.json.gz
etc.
Since we are internally using the daily batches for doing the
processing and analysis of reports unless there is an explicit request
to publish them on a test run basis we will probably end up going for
option A, so don’t be shy to reply :)
- ---
I agree with David in that it will be easier to access specific
ooni-probe test results using option (B) (i.e. the current solution).
What benefits did you identify when considering to switch to option (A)?
A few reasons to stick with option (B) include:
- - Retaining the ability to run ooni-pipeline on a subset of reports
associated with a given time period by filtering by date prefix, and
substrings within key names;
- - Retaining the ability to distribute small units of work easily among
subprocesses; and
- - Retaining the idempotent nature of ooni-pipeline, and the luigi
framework - switching from lots of small files to a single large file
for a given day will invariably increase the time required to recover
from failures (i.e. if a small dnst-based test fails to normalise,
you'll have to renormalise everything as opposed to a single test;
- - Developers will not have to download hundreds of megabytes of data
in order to access a traceroute test result that is only a few
kilobytes in size; and
- - It's generally easier to work with smaller files than it is to work
with big files.
Cheers,
Tyler
GPG fingerprint: 8931 45DF 609B EE2E BC32 5E71 631E 6FC3 4686 F0EB
(tyler(a)tylerfisher.org)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAEBCAAGBQJWn1+qAAoJEGMeb8NGhvDr6hUP/R6XcXEejwT8DYuKLoVBpujs
CqXtIj88A5JYhtt1npRF/a0peNihzFRbYuQpAUX/D1EdPa1UHDwCuqp1hO642xIJ
WePgWHWIS7qzYK/i5LbMXC+oWmfAA0J25SawmjyNWclK+NCgIwQ1k7kleyFP7Ul5
PFjJKLCcuqJkQl1hnZlW7YhgLYZAf2QHOD1cJauLM5aDCNBDUSgfIP+/P/xfFLq3
XqLGFBfNrMXaWmOfDGLR7tV4mS3R4M5L7rL66AiomQULdld4cuLonAht4CWLDuhV
MKyKrURixRqgTUoing59OcjgOcGEVQD5P5NaMuVruU1hFbHW0wr430/mEq8pdDW9
BQHZh/VZ/f2xz4rjWiE8Mfl3mgmGfbFiT6WMKQTRY3vr5mwbmefg0/IneJ1eHtIo
A/XMX579DQt3V19tMa7rO4TjpdBKIWwJ8/6mwwaw9QrS/I2pmlg8AscLU0oQtMqc
3CcWELOdoV7uIPBVg3TfiL+RLDSxzIJIp0k6IM19tkwZAxGLmD+cZvDo+dxME3fd
Y7+fYxovuQrt4vvhaPFU15EDzFWHMoMqlNUSOeC0FuIhpbYbM0Dqn1qL4EScJ+PA
+p/rtMTZLiiIJLtYuZCRiZjbaMvqsfZAmPEZ5ZgSShJKsB4jhV4/5LsmHedYCHeW
S/IdJ81xUzTfxMUZBqgI
=kztT
-----END PGP SIGNATURE-----
Did report_ids get regenerated when reports were converted from YAML to
JSON? I think they did but I want to make sure.
For example, I have a copy of the old YAML report
20140428T232415Z-AS1241-http_requests_test-v1-probe.yaml.gz. It has a
report_id ending in "zuj":
report_filename: 20140428T232415Z-AS1241-http_requests_test-v1-probe.yaml
report_id: 2014-04-28aqfgmdfzxjwmreodmroptzeugvanvtznhclirzuj
The new corresponding JSON file seems to be
https://ooni-public.s3.amazonaws.com/json/2014-02-28.json (requires an
access key). Its report_id instead ends in "ois":
"report_filename": "20140428T232415Z-AS1241-http_requests_test-v1-probe.yaml",
"report_id": "2014-04-28nltgjbivffrtkqsvpoaudhgsgyafkbfldqncrois"