On Jul 19, 2015, at 23:00, Daniel Ramsay daniel@dretzq.org.uk wrote:
Hi,
As part of our work incorporating ooniprobe into the blocked.org.uk scheduling system, we now have a final piece of code that relays ooniprobe results from the scheduling system back to the main OONI collectors. We've currently got about a million individual test results stored, covering 7 ISPs.
Hi Daniel,
Thank for reaching out with these questions.
I will summarise what I told Richard on IRC during this weeks dev gathering so that it’s recorded here.
I have a few questions though:
- Is it possible to submit results to the OONI collector over
HTTP/HTTPS instead of TOR, and if so, is there a public DNS name used for the collector? For the volume of results we've got, it could be a lot more bandwidth efficient and faster to run.
Currently we only support reporting via Tor Hidden Service or via vanilla HTTP. Reporting via HTTP is currently not supported by the canonical OONI collector so in order to support that you would have to run your own collector that support HTTP and peer with the OONI data collection pipeline to submit the results it gathers.
Since we have received many requests of supporting HTTPS collectors we have plans of adding support for it in the near future. Nowadays it should be much easier since the twisted API for doing HTTPS has improved since version 14.0.
Still I would like to preserve the property of having URLs be self authenticating and designed a scheme to extend HTTPS URIs to support something similar to certificate pinning here: https://github.com/hellais/sslpin. That code is just a POC and is based on an old version of twisted when it was harder to do cert validation. I think supporting this in recent versions of twisted should be much easier.
This is the relevant trac ticket about adding HTTPS support to ooni-backend: https://trac.torproject.org/projects/tor/ticket/11997
- Is there a testing or staging collector that you'd like us to use
initially? I've been testing against a collector running inside a vagrant VM so far, but wondered if there was any pre-live testing that could be done.
We currently don’t have a testing backend, but as discussed with Richard we shall set one up once you are ready to start deploying this so we can stage the ORG probing infrastructure before rolling it out into production.
- Is there a rate limit that you would want enforced? The system is
able to relay probe results in real-time, but there's quite a large backlog.
Richard said that you will be testing about 7.1k URLs per day from 7 vantage points. I think this amount of data is highly manageable by the current data collection pipeline and if it’s not we should fix the pipeline and not impose restrictive rate limits on clients.
I would say we start without any rate limiting and if it does become a problem when trying it out on the staging testing collector we can figure out some solution.
All in all I am very excited about these development and please let me know if there is anything we can do to help you move this forward.
You can always find me on #ooni on irc.oftc.net for any questions or request for help :)
Have fun!
~ Arturo