As promised I published the bridge reachability measurements on the public ooni report hosting.
You can find them here:
https://ooni.torproject.org/reports/0.1/CN/ https://ooni.torproject.org/reports/0.1/RU/ https://ooni.torproject.org/reports/0.1/US/
Keep in mind that, as I was telling you, during some of the runs there were some issues with the measurements due to incompatibilities of ooniprobe with the old fedora version running on planetlab, so not all the measurements may be 100% accurate. They should still, at least, give you an idea of how the data format looks like and if it contains enough information for doing your parsing work.
I would suggest we keep this discussion public and maintain the ooni-dev list in cc.
~ Art.
Arturo,
As we discussed last week, it would be ideal if we could retrieve a file containing urls of file locations. That way drop-off locations can vary and scale horizontally without the parser requiring prior knowledge.
Something like this :
ooni section : probe -> test -> test report -> collector -> generate received report list (for retransmission in case of intended or unintended failure to push to publisher) -> do data scrubbing if required for data publication -> compress (scrubbed) report -> push (scrubbed) compressed report -> publisher (a http server)
-> generate/update url list of reports existing on the publisher
chokepoint section: parser -> retrieve url list from publisher -> process url list -> retrieve file in url -> decompress file -> process report file
The report itself seems fairly clear, but some comments and questions nevertheless.
Report meta data: "options: [-f, /home/uwaterloo_geossl/bridge_reachability/bridges.txt, -t, '300']" Should preferably only contain the filename, not the path, it seems like there is potential data leakage there.
probe_cc: RU Does the cc refer to the ip´s cc ? How is the cc generated, maxmind or ? If the cc is generated based on an external geolocation service, this service and the date of generation should preferably be known.
probe_ip: 127.0.0.1 This should preferably be removed entirely before publication. Maybe it should not be there at all, the ASN seems sufficient.
Report content:
Can you supply a list of possible values or value ranges that can be expected for the following report entries: input: .... success: is this true/false ? tor_progress: is this 0-100 ? tor_progress_summary: this refers to the stage the previous finishes ? tor_progress_tag: are there values other than 'null' and 'done' ?
As discussed previously, let´s start with processing tests for the publicly known bridges only. How to manage the secret bridges we should tackle at a later stage, with the understanding that we should under no circumstances have access to the actual addresses of those bridges.
- Ruben
On 07/12/2014 04:18 PM, Arturo Filastò wrote:
As promised I published the bridge reachability measurements on the public ooni report hosting.
You can find them here:
https://ooni.torproject.org/reports/0.1/CN/ https://ooni.torproject.org/reports/0.1/RU/ https://ooni.torproject.org/reports/0.1/US/
Keep in mind that, as I was telling you, during some of the runs there were some issues with the measurements due to incompatibilities of ooniprobe with the old fedora version running on planetlab, so not all the measurements may be 100% accurate. They should still, at least, give you an idea of how the data format looks like and if it contains enough information for doing your parsing work.
I would suggest we keep this discussion public and maintain the ooni-dev list in cc.
~ Art.
On 7/23/14, 9:26 PM, Ruben Bloemgarten wrote:
Arturo,
As we discussed last week, it would be ideal if we could retrieve a file containing urls of file locations. That way drop-off locations can vary and scale horizontally without the parser requiring prior knowledge.
Something like this :
ooni section : probe -> test -> test report -> collector -> generate received report list (for retransmission in case of intended or unintended failure to push to publisher) -> do data scrubbing if required for data publication -> compress (scrubbed) report -> push (scrubbed) compressed report -> publisher (a http server)
-> generate/update url list of
reports existing on the publisher
chokepoint section: parser -> retrieve url list from publisher -> process url list -> retrieve file in url -> decompress file -> process report file
This seems like a reasonable thing to do. I think the ideal way to do this would be to integrate it into the publishing step and every time we update the HTTP published data we also regenerate this file containing the list of existing reports.
The report itself seems fairly clear, but some comments and questions nevertheless.
Report meta data: "options: [-f, /home/uwaterloo_geossl/bridge_reachability/bridges.txt, -t, '300']" Should preferably only contain the filename, not the path, it seems like there is potential data leakage there.
Yes you are correct. There is a ticket open about this issue and I think it's something we should do:
https://trac.torproject.org/projects/tor/ticket/12706
probe_cc: RU Does the cc refer to the ip´s cc ? How is the cc generated, maxmind or ?
Yes it is the country code of the IP address and the data is taken from maxmind.
If the cc is generated based on an external geolocation service, this service and the date of generation should preferably be known.
I need to look into if there is a way to determine the version of the database installed on a users machine, because the calculation of the CC and ASN is done locally by them not upstream.
probe_ip: 127.0.0.1 This should preferably be removed entirely before publication. Maybe it should not be there at all, the ASN seems sufficient.
We want to keep that there for consistency and there are cases when we are ok with also publishing the probe IP address.
Report content:
Can you supply a list of possible values or value ranges that can be expected for the following report entries: input: .... success: is this true/false ? tor_progress: is this 0-100 ? tor_progress_summary: this refers to the stage the previous finishes ? tor_progress_tag: are there values other than 'null' and 'done' ?
I just realized that the bridge_reachability test is not specified. I created a ticket for doing that: https://trac.torproject.org/projects/tor/ticket/12757#ticket
As discussed previously, let´s start with processing tests for the publicly known bridges only. How to manage the secret bridges we should tackle at a later stage, with the understanding that we should under no circumstances have access to the actual addresses of those bridges.
Ok, I think we should anyways also start collecting data for the private bridges so that we have that in stock for when we decide how to do that.
- Ruben
On 07/12/2014 04:18 PM, Arturo Filastò wrote:
As promised I published the bridge reachability measurements on the public ooni report hosting.
You can find them here:
https://ooni.torproject.org/reports/0.1/CN/ https://ooni.torproject.org/reports/0.1/RU/ https://ooni.torproject.org/reports/0.1/US/
Keep in mind that, as I was telling you, during some of the runs there were some issues with the measurements due to incompatibilities of ooniprobe with the old fedora version running on planetlab, so not all the measurements may be 100% accurate. They should still, at least, give you an idea of how the data format looks like and if it contains enough information for doing your parsing work.
I would suggest we keep this discussion public and maintain the ooni-dev list in cc.
~ Art.
On 07/31/2014 06:53 PM, Arturo Filastò wrote:
On 7/23/14, 9:26 PM, Ruben Bloemgarten wrote:
Arturo,
As we discussed last week, it would be ideal if we could retrieve a file containing urls of file locations. That way drop-off locations can vary and scale horizontally without the parser requiring prior knowledge.
Something like this :
ooni section : probe -> test -> test report -> collector -> generate received report list (for retransmission in case of intended or unintended failure to push to publisher) -> do data scrubbing if required for data publication -> compress (scrubbed) report -> push (scrubbed) compressed report -> publisher (a http server)
-> generate/update url list of
reports existing on the publisher
chokepoint section: parser -> retrieve url list from publisher -> process url list -> retrieve file in url -> decompress file -> process report file
This seems like a reasonable thing to do. I think the ideal way to do this would be to integrate it into the publishing step and every time we update the HTTP published data we also regenerate this file containing the list of existing reports.
That would be perfect.
The report itself seems fairly clear, but some comments and questions nevertheless.
Report meta data: "options: [-f, /home/uwaterloo_geossl/bridge_reachability/bridges.txt, -t, '300']" Should preferably only contain the filename, not the path, it seems like there is potential data leakage there.
Yes you are correct. There is a ticket open about this issue and I think it's something we should do:
great.
probe_cc: RU Does the cc refer to the ip´s cc ? How is the cc generated, maxmind or ?
Yes it is the country code of the IP address and the data is taken from maxmind.
If the cc is generated based on an external geolocation service, this service and the date of generation should preferably be known.
I need to look into if there is a way to determine the version of the database installed on a users machine, because the calculation of the CC and ASN is done locally by them not upstream.
I am assuming that the maxmind dat file is used for this and it is packaged with the ooni client. What would probably be sufficient would be to supply the date the maxmind dat file was taken from maxmind + the sum of the file for verification. (we keep a collection of the dat files so we could start generating sums for comparison.)
probe_ip: 127.0.0.1 This should preferably be removed entirely before publication. Maybe it should not be there at all, the ASN seems sufficient.
We want to keep that there for consistency and there are cases when we are ok with also publishing the probe IP address.
How prone to user error is this ? Can we think of a way to determine if the ip reveal was done on purpose and block publication/destroy the file based on that ?
Report content:
Can you supply a list of possible values or value ranges that can be expected for the following report entries: input: .... success: is this true/false ? tor_progress: is this 0-100 ? tor_progress_summary: this refers to the stage the previous finishes ? tor_progress_tag: are there values other than 'null' and 'done' ?
I just realized that the bridge_reachability test is not specified. I created a ticket for doing that: https://trac.torproject.org/projects/tor/ticket/12757#ticket
ok.
As discussed previously, let´s start with processing tests for the publicly known bridges only. How to manage the secret bridges we should tackle at a later stage, with the understanding that we should under no circumstances have access to the actual addresses of those bridges.
Ok, I think we should anyways also start collecting data for the private bridges so that we have that in stock for when we decide how to do that.
As long as I don´t get to see them, that sounds fine :)
- Ruben
On 07/12/2014 04:18 PM, Arturo Filastò wrote:
As promised I published the bridge reachability measurements on the public ooni report hosting.
You can find them here:
https://ooni.torproject.org/reports/0.1/CN/ https://ooni.torproject.org/reports/0.1/RU/ https://ooni.torproject.org/reports/0.1/US/
Keep in mind that, as I was telling you, during some of the runs there were some issues with the measurements due to incompatibilities of ooniprobe with the old fedora version running on planetlab, so not all the measurements may be 100% accurate. They should still, at least, give you an idea of how the data format looks like and if it contains enough information for doing your parsing work.
I would suggest we keep this discussion public and maintain the ooni-dev list in cc.
~ Art.
Ruben Bloemgarten transcribed 3.4K bytes:
Arturo,
As we discussed last week, it would be ideal if we could retrieve a file containing urls of file locations. That way drop-off locations can vary and scale horizontally without the parser requiring prior knowledge.
Something like this :
ooni section : probe -> test -> test report -> collector -> generate received report list (for retransmission in case of intended or unintended failure to push to publisher) -> do data scrubbing if required for data publication -> compress (scrubbed) report -> push (scrubbed) compressed report -> publisher (a http server)
-> generate/update url list of
reports existing on the publisher
Why doesn't data scrubbing happen client-side?
chokepoint section: parser -> retrieve url list from publisher -> process url list -> retrieve file in url -> decompress file -> process report file
The report itself seems fairly clear, but some comments and questions nevertheless.
Report meta data: "options: [-f, /home/uwaterloo_geossl/bridge_reachability/bridges.txt, -t, '300']" Should preferably only contain the filename, not the path, it seems like there is potential data leakage there.
probe_cc: RU Does the cc refer to the ip´s cc ? How is the cc generated, maxmind or ? If the cc is generated based on an external geolocation service, this service and the date of generation should preferably be known.
probe_ip: 127.0.0.1 This should preferably be removed entirely before publication. Maybe it should not be there at all, the ASN seems sufficient.
Report content:
Can you supply a list of possible values or value ranges that can be expected for the following report entries: input: .... success: is this true/false ? tor_progress: is this 0-100 ? tor_progress_summary: this refers to the stage the previous finishes ? tor_progress_tag: are there values other than 'null' and 'done' ?
As discussed previously, let´s start with processing tests for the publicly known bridges only. How to manage the secret bridges we should tackle at a later stage, with the understanding that we should under no circumstances have access to the actual addresses of those bridges.
How do you intend to test access to a bridge whose address you do not have? And if you only had the fingerprint of the bridge, what would be the point of that? What adversary would that defend against? Or did you all have some alternate mechanism for defending bridge addresses?
- Ruben
On 07/12/2014 04:18 PM, Arturo Filastò wrote:
As promised I published the bridge reachability measurements on the public ooni report hosting.
You can find them here:
https://ooni.torproject.org/reports/0.1/CN/ https://ooni.torproject.org/reports/0.1/RU/ https://ooni.torproject.org/reports/0.1/US/
Keep in mind that, as I was telling you, during some of the runs there were some issues with the measurements due to incompatibilities of ooniprobe with the old fedora version running on planetlab, so not all the measurements may be 100% accurate. They should still, at least, give you an idea of how the data format looks like and if it contains enough information for doing your parsing work.
I would suggest we keep this discussion public and maintain the ooni-dev list in cc.
~ Art.
ooni-dev mailing list ooni-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/ooni-dev