To follow-up on Nathan's excellent report, I thought I could shed some light on the status of the OONI integration with MLab NS:
1. Our work is temporarily blocked due an operational issue that should be resolved imminently.
2. The integration that Nathan mentioned between Nagios and MLab NS is incredibly promising. As mentioned previously, MLab NS captures its information from the MLab nagios instance using a "baseList" script that runs on our monitoring server. As it functions now, MLab NS is filled with information based on the output of a baseList call that looks like:
http://nagios.measurementlab.net/baseList?show_state=1&service_name=ndt
which has output like:
ndt.iupui.mlab1.akl01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab2.akl01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab3.akl01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab1.ams01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab2.ams01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab3.ams01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab1.ams02.measurement-lab.org/ndt 0 1 ndt.iupui.mlab2.ams02.measurement-lab.org/ndt 0 1 ...
The 0s and 1s are flags indicating whether there is a "problem" with the slice or not. I.e., they are backward.
baseList takes an additional parameter known as plugin_output. We will update MLab NS to call baseList with this additional parameter. The call will look like:
http://nagios.measurementlab.net/baseList?show_state=1&service_name=ndt&...
which has output like:
ndt.iupui.mlab1.akl01.measurement-lab.org/ndt 0 1 TCP OK - 0.136 second response time on ndt.iupui.mlab1.akl01.measurement-lab.org port 3001 ndt.iupui.mlab2.akl01.measurement-lab.org/ndt 0 1 TCP OK - 0.136 second response time on ndt.iupui.mlab2.akl01.measurement-lab.org port 3001 ndt.iupui.mlab3.akl01.measurement-lab.org/ndt 0 1 TCP OK - 0.135 second response time on ndt.iupui.mlab3.akl01.measurement-lab.org port 3001 ndt.iupui.mlab1.ams01.measurement-lab.org/ndt 0 1 TCP OK - 0.145 second response time on ndt.iupui.mlab1.ams01.measurement-lab.org port 3001 ...
The extra data is the output from the plugin that monitors whether the particular service is online. In this example, we are monitoring ndt and the plugin reports whether a TCP connection is possible to port 3001 (NDT's port).
So, the integration point between nagios, MLab NS and OONI will look like this:
The nagios plugin written by LA/OONI will use return codes to signal whether the OONI service is running. That return value will be the 0s and 1s in baseList output. The "string" output from the plugin will be the information that needs to be captured in MLab NS and returned with OONI queries. Based on pull requests, I suspect the resulting response to a baseList call like
http://nagios.measurementlab.net/baseList?show_state=1&service_name=ooni...
will be something like
ooni.mlab.mlab1.akl01.measurement-lab.org/ndt 0 1 'collector_onion': 'testfakenotreal.onion' ooni.mlab.mlab1.akl02.measurement-lab.org/ndt 0 1 'collector_onion': 'testfakenotreal.onion' ...
The 'collector_onion':'testfakenotreal.onion' string will makes its way through MLab NS get spit out as tool_extra from a query like:
http://mlab-ns.appspot.com/ooni
that gives something like:
{"city": "Washington", "url": "http://ndt.iupui.mlab1.iad01.measurement-lab.org:7123", "ip": ["216.156.197.139"], "site": "iad01", "fqdn": "ndt.iupui.mlab1.iad01.measurement-lab.org", "country": "US", "port": "3001", "tool_extra": 'testfakenotreal.onion' }
The TL;DR is that we are well-positioned to make these changes to MLab NS that will not require many (any?) fundamental changes to MLab NS or our monitoring infrastructure.
Does this seem reasonable?
3. As Nathan mentioned, their integration with MLab NS will require a query type that is able to list all available answers. I mentioned in comments to a ticket that we have something similar to what they need. However, I realize now that that approach will not work.
However, there is a better option. MLab NS already has a "thing" at
http://mlab-nstesting.appspot.com/admin/map/ipv4/all
that generates a map of the status of all the services and places them on a map. We will modify that by parameterizing the output to allow for json responses which will exactly satisfy OONI's needs.
Does this seem reasonable?
Summary:
I think that we are on the brink of making this full integration happen. We will keep everyone posted as we move forward.
Feedback welcome, obviously!
Will
On 08/01/2014 03:25 PM, Nathan Wilcox wrote:
Dear OTF, Ooni, and M-Lab,
Summary
We've hashed out a design to integrate Ooni with mlab-ns on the M-Lab deployment, and we've implemented a fully functional deployment that approximates this by simulating mlab-ns (this is attached). This completes Milestone D of our contract with OTF.
Design Goals
Our top goals for this integration are:
It does not rely on any changes to upstream Ooni. (For example, probes still use a bouncer .onion, and the backend has stock bouncers, collectors, and test helpers running.)
It can be disabled easily without redeploying the M-Lab backend. Our branch's ooni-support README.md has instructions to disable the integration, merely by editing a cron job to unset an ENABLED flag. There's no need to redeploy different versions of ooni-support.
When enabled, it allows M-Lab operations to monitor collectors and test_helpers status with the same infrastructure as all other M-Lab tools.
Future Architectural Changes
In the future, it may be nice to augment ooni / mlab-ns integration. For example, mlab-ns is designed to support different policies which may be useful to tools, such as geo-location of test_helpers.
The Simulator
This deployment architecture uses a simulator. While it is fully functional and useful for testing it lacks security or robustness, so we want to emphasize *not to deploy this* to non-test environments.
Rationale
There are three rationales for this approach:
First, Least Authority didn't want to push through modifications to mlab-ns without first creating and testing a proof-of-concept.
Second, we didn't want to block our effort on M-Lab engineering effort, so this allows a clean division of labor.
Third, by creating and testing a working proof of concept we can help define the necessary changes to mlab-ns in a tightly scoped and concrete manner.
Security
This system is insecure because it does not use the M-Lab nagios system to gather data, and instead lets anyone paste any data they want into the simulator. Nagios integration is future work captured in this ticket:
Next Steps
Our contract with OTF proposes our next two milestones will focus on improving integration testing and unit test coverage. Our focus at that time was on test automation and documentation for diagnosing integration problems. Test automation has already been improved since that time, and we've accomplished most of the work for documentation:
https://github.com/m-lab-tools/ooni-support/issues/60
Therefore, we propose to focus on some outstanding issues which will improve mlab-ns integration while continuing not to block on, or interfere with, M-Lab operations as follows:
The primary change to mlab-ns will be to allow any tool to include arbitrary data per slivver to be gathered and distributed by mlab-ns. Ooni will use this to distribute data such as collector `.onion` addresses. The need for this change is discussed here:
This proposed change is documented in this ticket:
A secondary change is to implement `match=all` described in #47 above. It may not be necessary, so there is further investigation and testing necessary:
https://github.com/m-lab-tools/ooni-support/issues/56
Along with these changes to `mlab-ns`, we need trivial updates to our integration scripts to work with mlab-ns rather than the simulator:
- https://github.com/m-lab-tools/ooni-support/issues/10
- https://github.com/m-lab-tools/ooni-support/issues/11
Details & Links
Attached is a shortish overview of possible approaches to implement this integration. We've implemented a deployment with a mock mlab-ns (called mlab-ns-simulator) and the "arbitrary data" approach from the attached design document. The pull request is here:
Specific details about this pull request:
- A script for gathering necessary information from collectors and
testhelpers, then updating the mlab-ns-simulator.
- A script for updating a bouncer's state based on the mlan-ns-simulator.
- A cron script to update the bouncer on an hourly schedule.
- The mlab-ns-simulator itself, which approximates the production mlab-ns.
- `.init/` script changes to automatically launch the simulator and
bouncer on `mlab1.nuq0t.measurement-lab.org`.
- Design documentation for mlab-ns integration (including this
stepping stone architecture).
- Each instructions to disable mlab-ns integration without any redeployment.
We also created a subset pull request that has bug fixes but no mlab-ns integration features:
Github Milestones
We split the mlab-ns-simulator deployment tasks out from the larger mlab-ns integration deployment. The mlab-ns-simulator milestone is at:
The full mlab-ns integration milestone:
As always, let us know if you have any feedback!