Hi,
As part of our work incorporating ooniprobe into the blocked.org.uk scheduling system, we now have a final piece of code that relays ooniprobe results from the scheduling system back to the main OONI collectors. We've currently got about a million individual test results stored, covering 7 ISPs.
I have a few questions though:
1) Is it possible to submit results to the OONI collector over HTTP/HTTPS instead of TOR, and if so, is there a public DNS name used for the collector? For the volume of results we've got, it could be a lot more bandwidth efficient and faster to run.
2) Is there a testing or staging collector that you'd like us to use initially? I've been testing against a collector running inside a vagrant VM so far, but wondered if there was any pre-live testing that could be done.
3) Is there a rate limit that you would want enforced? The system is able to relay probe results in real-time, but there's quite a large backlog.
Thanks,
Daniel.
On Jul 19, 2015, at 23:00, Daniel Ramsay daniel@dretzq.org.uk wrote:
Hi,
As part of our work incorporating ooniprobe into the blocked.org.uk scheduling system, we now have a final piece of code that relays ooniprobe results from the scheduling system back to the main OONI collectors. We've currently got about a million individual test results stored, covering 7 ISPs.
Hi Daniel,
Thank for reaching out with these questions.
I will summarise what I told Richard on IRC during this weeks dev gathering so that it’s recorded here.
I have a few questions though:
- Is it possible to submit results to the OONI collector over
HTTP/HTTPS instead of TOR, and if so, is there a public DNS name used for the collector? For the volume of results we've got, it could be a lot more bandwidth efficient and faster to run.
Currently we only support reporting via Tor Hidden Service or via vanilla HTTP. Reporting via HTTP is currently not supported by the canonical OONI collector so in order to support that you would have to run your own collector that support HTTP and peer with the OONI data collection pipeline to submit the results it gathers.
Since we have received many requests of supporting HTTPS collectors we have plans of adding support for it in the near future. Nowadays it should be much easier since the twisted API for doing HTTPS has improved since version 14.0.
Still I would like to preserve the property of having URLs be self authenticating and designed a scheme to extend HTTPS URIs to support something similar to certificate pinning here: https://github.com/hellais/sslpin. That code is just a POC and is based on an old version of twisted when it was harder to do cert validation. I think supporting this in recent versions of twisted should be much easier.
This is the relevant trac ticket about adding HTTPS support to ooni-backend: https://trac.torproject.org/projects/tor/ticket/11997
- Is there a testing or staging collector that you'd like us to use
initially? I've been testing against a collector running inside a vagrant VM so far, but wondered if there was any pre-live testing that could be done.
We currently don’t have a testing backend, but as discussed with Richard we shall set one up once you are ready to start deploying this so we can stage the ORG probing infrastructure before rolling it out into production.
- Is there a rate limit that you would want enforced? The system is
able to relay probe results in real-time, but there's quite a large backlog.
Richard said that you will be testing about 7.1k URLs per day from 7 vantage points. I think this amount of data is highly manageable by the current data collection pipeline and if it’s not we should fix the pipeline and not impose restrictive rate limits on clients.
I would say we start without any rate limiting and if it does become a problem when trying it out on the staging testing collector we can figure out some solution.
All in all I am very excited about these development and please let me know if there is anything we can do to help you move this forward.
You can always find me on #ooni on irc.oftc.net for any questions or request for help :)
Have fun!
~ Arturo
Hi Arturo, thanks for providing these answers.
On 29/07/15 10:04, Arturo Filastò wrote:
On Jul 19, 2015, at 23:00, Daniel Ramsay daniel@dretzq.org.uk wrote:
Hi,
As part of our work incorporating ooniprobe into the blocked.org.uk scheduling system, we now have a final piece of code that relays ooniprobe results from the scheduling system back to the main OONI collectors. We've currently got about a million individual test results stored, covering 7 ISPs.
Hi Daniel,
Thank for reaching out with these questions.
I will summarise what I told Richard on IRC during this weeks dev gathering so that it’s recorded here.
I have a few questions though:
- Is it possible to submit results to the OONI collector over
HTTP/HTTPS instead of TOR, and if so, is there a public DNS name used for the collector? For the volume of results we've got, it could be a lot more bandwidth efficient and faster to run.
Currently we only support reporting via Tor Hidden Service or via vanilla HTTP. Reporting via HTTP is currently not supported by the canonical OONI collector so in order to support that you would have to run your own collector that support HTTP and peer with the OONI data collection pipeline to submit the results it gathers.
We're already emulating a collector in the blocked.org.uk API, so perhaps we can go directly to peering with the pipeline. Is there any reference information that I can read on how to go about getting this set up (protocols, hostnames, etc)?
Since we have received many requests of supporting HTTPS collectors we have plans of adding support for it in the near future. Nowadays it should be much easier since the twisted API for doing HTTPS has improved since version 14.0.
I did add some minimal support for HTTPS collector URLs in the patch set. It's still being worked on for upstream submission. The HTTPS support probably doesn't go as far as you'd like though.
Still I would like to preserve the property of having URLs be self authenticating and designed a scheme to extend HTTPS URIs to support something similar to certificate pinning here: https://github.com/hellais/sslpin. That code is just a POC and is based on an old version of twisted when it was harder to do cert validation. I think supporting this in recent versions of twisted should be much easier.
Newer versions of twisted and python will do certificate verification using the operating system's certificate store, but as you point out, that doesn't provide a way of ensuring that the only certificate that can be used is from the official CA rather than any of the others.
It may be possible to force a twisted agent to only use a bundled CA certificate for verification, rather than relying on the system installed CA list. The python requests library supports this usage, but I'm not sure about twisted.
Thanks again,
Daniel.
On 09 Aug 2015, at 12:57, Daniel Ramsay daniel@dretzq.org.uk wrote:
We're already emulating a collector in the blocked.org.uk API, so perhaps we can go directly to peering with the pipeline. Is there any reference information that I can read on how to go about getting this set up (protocols, hostnames, etc)?
Currently peering is achieved by me giving you an AWS shared secret and you running with a daily (or hourly) periodicity the following invoke task: https://github.com/TheTorProject/ooni-pipeline-ng/blob/master/tasks.py#L228
The documentation of these components is basically inexistent, but if you are familiar with fabric and python software it should be quite natural it’s usage.
Basically once you have installed invoke and the required dependencies (listed in requirements.txt) you will edit the invoke.yaml file (using the .example as a template) to include the AWS shared secret.
Then you can configure a hourly cronjob to run:
invoke sync_reports /PATH/TO/YOUR/REPORTS/ARCHIVE
This will lead to you having peered with the OONI data pipeline.
Since we have received many requests of supporting HTTPS collectors we have plans of adding support for it in the near future. Nowadays it should be much easier since the twisted API for doing HTTPS has improved since version 14.0.
I did add some minimal support for HTTPS collector URLs in the patch set. It's still being worked on for upstream submission. The HTTPS support probably doesn't go as far as you'd like though.
Oh that’s great!
I would love to check out this code and provide some feedback on it.
Still I would like to preserve the property of having URLs be self authenticating and designed a scheme to extend HTTPS URIs to support something similar to certificate pinning here: https://github.com/hellais/sslpin. That code is just a POC and is based on an old version of twisted when it was harder to do cert validation. I think supporting this in recent versions of twisted should be much easier.
Newer versions of twisted and python will do certificate verification using the operating system's certificate store, but as you point out, that doesn't provide a way of ensuring that the only certificate that can be used is from the official CA rather than any of the others.
It may be possible to force a twisted agent to only use a bundled CA certificate for verification, rather than relying on the system installed CA list. The python requests library supports this usage, but I'm not sure about twisted.
Yeah I think a bit of hacks may be needed to implement this, though I think this requirement is quite important to be met.
~ Arturo
On 28/08/15 16:58, Arturo Filastò wrote:
Thanks Arturo. I'm reading through the code at the moment.
I did add some minimal support for HTTPS collector URLs in the patch set. It's still being worked on for upstream submission. The HTTPS support probably doesn't go as far as you'd like though.
Oh that’s great!
I would love to check out this code and provide some feedback on it.
I've been tidying up the pull request ready to resubmit soon!
Still I would like to preserve the property of having URLs be self authenticating and designed a scheme to extend HTTPS URIs to support something similar to certificate pinning here: https://github.com/hellais/sslpin. That code is just a POC and is based on an old version of twisted when it was harder to do cert validation. I think supporting this in recent versions of twisted should be much easier.
Newer versions of twisted and python will do certificate verification using the operating system's certificate store, but as you point out, that doesn't provide a way of ensuring that the only certificate that can be used is from the official CA rather than any of the others.
It may be possible to force a twisted agent to only use a bundled CA certificate for verification, rather than relying on the system installed CA list. The python requests library supports this usage, but I'm not sure about twisted.
Yeah I think a bit of hacks may be needed to implement this, though I think this requirement is quite important to be met.
Not too much hacking required - it was quite straightforward to use twisted.internet.ssl.CertificateOptions to verify a server certificate against a single provided CA cert (even a self-generated one). Hostname verification is still missing though.
I tested it with Twisted 13.2.0 (which is the version provided with Ubuntu 14.04) and Python 2.7.6.
Daniel.
On 28/08/15 16:58, Arturo Filastò wrote:
On 09 Aug 2015, at 12:57, Daniel Ramsay daniel@dretzq.org.uk wrote:
We're already emulating a collector in the blocked.org.uk API, so perhaps we can go directly to peering with the pipeline. Is there any reference information that I can read on how to go about getting this set up (protocols, hostnames, etc)?
Currently peering is achieved by me giving you an AWS shared secret and you running with a daily (or hourly) periodicity the following invoke task: https://github.com/TheTorProject/ooni-pipeline-ng/blob/master/tasks.py#L228
The documentation of these components is basically inexistent, but if you are familiar with fabric and python software it should be quite natural it’s usage.
Basically once you have installed invoke and the required dependencies (listed in requirements.txt) you will edit the invoke.yaml file (using the .example as a template) to include the AWS shared secret.
Then you can configure a hourly cronjob to run:
invoke sync_reports /PATH/TO/YOUR/REPORTS/ARCHIVE
This will lead to you having peered with the OONI data pipeline.
I've read through the scripts, and I think it's a bit more elaborate than I was looking for at this stage, though it would be a better way to work in the long term. I did run a quick test using the commandline oonireport tool (sending a small number of valid probe results), and the time to run the submission isn't bad at all. Writing an export script to extract ooniprobe results from the blocked database and using oonireport to send them via TOR is a lot more performant than I'd expected.
You mentioned a while ago setting up a staging collector, is that still a possibility? If not, I can still submit a small number of results to the main collector, which could be verified before enabling all reports to be sent.
Many thanks,
Daniel.