Dear OTF, Ooni, and M-Lab,
Summary =======
We've hashed out a design to integrate Ooni with mlab-ns on the M-Lab deployment, and we've implemented a fully functional deployment that approximates this by simulating mlab-ns (this is attached). This completes Milestone D of our contract with OTF.
Design Goals ============
Our top goals for this integration are:
It does not rely on any changes to upstream Ooni. (For example, probes still use a bouncer .onion, and the backend has stock bouncers, collectors, and test helpers running.)
It can be disabled easily without redeploying the M-Lab backend. Our branch's ooni-support README.md has instructions to disable the integration, merely by editing a cron job to unset an ENABLED flag. There's no need to redeploy different versions of ooni-support.
When enabled, it allows M-Lab operations to monitor collectors and test_helpers status with the same infrastructure as all other M-Lab tools.
Future Architectural Changes ----------------------------
In the future, it may be nice to augment ooni / mlab-ns integration. For example, mlab-ns is designed to support different policies which may be useful to tools, such as geo-location of test_helpers.
The Simulator =============
This deployment architecture uses a simulator. While it is fully functional and useful for testing it lacks security or robustness, so we want to emphasize *not to deploy this* to non-test environments.
Rationale ---------
There are three rationales for this approach:
First, Least Authority didn't want to push through modifications to mlab-ns without first creating and testing a proof-of-concept.
Second, we didn't want to block our effort on M-Lab engineering effort, so this allows a clean division of labor.
Third, by creating and testing a working proof of concept we can help define the necessary changes to mlab-ns in a tightly scoped and concrete manner.
Security --------
This system is insecure because it does not use the M-Lab nagios system to gather data, and instead lets anyone paste any data they want into the simulator. Nagios integration is future work captured in this ticket:
* https://github.com/m-lab-tools/ooni-support/issues/10
Next Steps ==========
Our contract with OTF proposes our next two milestones will focus on improving integration testing and unit test coverage. Our focus at that time was on test automation and documentation for diagnosing integration problems. Test automation has already been improved since that time, and we've accomplished most of the work for documentation:
https://github.com/m-lab-tools/ooni-support/issues/60
Therefore, we propose to focus on some outstanding issues which will improve mlab-ns integration while continuing not to block on, or interfere with, M-Lab operations as follows:
The primary change to mlab-ns will be to allow any tool to include arbitrary data per slivver to be gathered and distributed by mlab-ns. Ooni will use this to distribute data such as collector `.onion` addresses. The need for this change is discussed here:
* https://github.com/m-lab-tools/ooni-support/issues/4
This proposed change is documented in this ticket:
* https://github.com/m-lab-tools/ooni-support/issues/47
A secondary change is to implement `match=all` described in #47 above. It may not be necessary, so there is further investigation and testing necessary:
https://github.com/m-lab-tools/ooni-support/issues/56
Along with these changes to `mlab-ns`, we need trivial updates to our integration scripts to work with mlab-ns rather than the simulator:
* https://github.com/m-lab-tools/ooni-support/issues/10 * https://github.com/m-lab-tools/ooni-support/issues/11
Details & Links ===============
Attached is a shortish overview of possible approaches to implement this integration. We've implemented a deployment with a mock mlab-ns (called mlab-ns-simulator) and the "arbitrary data" approach from the attached design document. The pull request is here:
* https://github.com/m-lab-tools/ooni-support/pull/59
Specific details about this pull request:
* A script for gathering necessary information from collectors and testhelpers, then updating the mlab-ns-simulator. * A script for updating a bouncer's state based on the mlan-ns-simulator. * A cron script to update the bouncer on an hourly schedule. * The mlab-ns-simulator itself, which approximates the production mlab-ns. * `.init/` script changes to automatically launch the simulator and bouncer on `mlab1.nuq0t.measurement-lab.org`. * Design documentation for mlab-ns integration (including this stepping stone architecture). * Each instructions to disable mlab-ns integration without any redeployment.
We also created a subset pull request that has bug fixes but no mlab-ns integration features:
* https://github.com/m-lab-tools/ooni-support/pull/58
Github Milestones -----------------
We split the mlab-ns-simulator deployment tasks out from the larger mlab-ns integration deployment. The mlab-ns-simulator milestone is at:
* https://github.com/m-lab-tools/ooni-support/issues?q=is%3Aissue+milestone%3A...
The full mlab-ns integration milestone:
* https://github.com/m-lab-tools/ooni-support/issues?q=is%3Aissue+milestone%3A...
As always, let us know if you have any feedback!
To follow-up on Nathan's excellent report, I thought I could shed some light on the status of the OONI integration with MLab NS:
1. Our work is temporarily blocked due an operational issue that should be resolved imminently.
2. The integration that Nathan mentioned between Nagios and MLab NS is incredibly promising. As mentioned previously, MLab NS captures its information from the MLab nagios instance using a "baseList" script that runs on our monitoring server. As it functions now, MLab NS is filled with information based on the output of a baseList call that looks like:
http://nagios.measurementlab.net/baseList?show_state=1&service_name=ndt
which has output like:
ndt.iupui.mlab1.akl01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab2.akl01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab3.akl01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab1.ams01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab2.ams01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab3.ams01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab1.ams02.measurement-lab.org/ndt 0 1 ndt.iupui.mlab2.ams02.measurement-lab.org/ndt 0 1 ...
The 0s and 1s are flags indicating whether there is a "problem" with the slice or not. I.e., they are backward.
baseList takes an additional parameter known as plugin_output. We will update MLab NS to call baseList with this additional parameter. The call will look like:
http://nagios.measurementlab.net/baseList?show_state=1&service_name=ndt&...
which has output like:
ndt.iupui.mlab1.akl01.measurement-lab.org/ndt 0 1 TCP OK - 0.136 second response time on ndt.iupui.mlab1.akl01.measurement-lab.org port 3001 ndt.iupui.mlab2.akl01.measurement-lab.org/ndt 0 1 TCP OK - 0.136 second response time on ndt.iupui.mlab2.akl01.measurement-lab.org port 3001 ndt.iupui.mlab3.akl01.measurement-lab.org/ndt 0 1 TCP OK - 0.135 second response time on ndt.iupui.mlab3.akl01.measurement-lab.org port 3001 ndt.iupui.mlab1.ams01.measurement-lab.org/ndt 0 1 TCP OK - 0.145 second response time on ndt.iupui.mlab1.ams01.measurement-lab.org port 3001 ...
The extra data is the output from the plugin that monitors whether the particular service is online. In this example, we are monitoring ndt and the plugin reports whether a TCP connection is possible to port 3001 (NDT's port).
So, the integration point between nagios, MLab NS and OONI will look like this:
The nagios plugin written by LA/OONI will use return codes to signal whether the OONI service is running. That return value will be the 0s and 1s in baseList output. The "string" output from the plugin will be the information that needs to be captured in MLab NS and returned with OONI queries. Based on pull requests, I suspect the resulting response to a baseList call like
http://nagios.measurementlab.net/baseList?show_state=1&service_name=ooni...
will be something like
ooni.mlab.mlab1.akl01.measurement-lab.org/ndt 0 1 'collector_onion': 'testfakenotreal.onion' ooni.mlab.mlab1.akl02.measurement-lab.org/ndt 0 1 'collector_onion': 'testfakenotreal.onion' ...
The 'collector_onion':'testfakenotreal.onion' string will makes its way through MLab NS get spit out as tool_extra from a query like:
http://mlab-ns.appspot.com/ooni
that gives something like:
{"city": "Washington", "url": "http://ndt.iupui.mlab1.iad01.measurement-lab.org:7123", "ip": ["216.156.197.139"], "site": "iad01", "fqdn": "ndt.iupui.mlab1.iad01.measurement-lab.org", "country": "US", "port": "3001", "tool_extra": 'testfakenotreal.onion' }
The TL;DR is that we are well-positioned to make these changes to MLab NS that will not require many (any?) fundamental changes to MLab NS or our monitoring infrastructure.
Does this seem reasonable?
3. As Nathan mentioned, their integration with MLab NS will require a query type that is able to list all available answers. I mentioned in comments to a ticket that we have something similar to what they need. However, I realize now that that approach will not work.
However, there is a better option. MLab NS already has a "thing" at
http://mlab-nstesting.appspot.com/admin/map/ipv4/all
that generates a map of the status of all the services and places them on a map. We will modify that by parameterizing the output to allow for json responses which will exactly satisfy OONI's needs.
Does this seem reasonable?
Summary:
I think that we are on the brink of making this full integration happen. We will keep everyone posted as we move forward.
Feedback welcome, obviously!
Will
On 08/01/2014 03:25 PM, Nathan Wilcox wrote:
Dear OTF, Ooni, and M-Lab,
Summary
We've hashed out a design to integrate Ooni with mlab-ns on the M-Lab deployment, and we've implemented a fully functional deployment that approximates this by simulating mlab-ns (this is attached). This completes Milestone D of our contract with OTF.
Design Goals
Our top goals for this integration are:
It does not rely on any changes to upstream Ooni. (For example, probes still use a bouncer .onion, and the backend has stock bouncers, collectors, and test helpers running.)
It can be disabled easily without redeploying the M-Lab backend. Our branch's ooni-support README.md has instructions to disable the integration, merely by editing a cron job to unset an ENABLED flag. There's no need to redeploy different versions of ooni-support.
When enabled, it allows M-Lab operations to monitor collectors and test_helpers status with the same infrastructure as all other M-Lab tools.
Future Architectural Changes
In the future, it may be nice to augment ooni / mlab-ns integration. For example, mlab-ns is designed to support different policies which may be useful to tools, such as geo-location of test_helpers.
The Simulator
This deployment architecture uses a simulator. While it is fully functional and useful for testing it lacks security or robustness, so we want to emphasize *not to deploy this* to non-test environments.
Rationale
There are three rationales for this approach:
First, Least Authority didn't want to push through modifications to mlab-ns without first creating and testing a proof-of-concept.
Second, we didn't want to block our effort on M-Lab engineering effort, so this allows a clean division of labor.
Third, by creating and testing a working proof of concept we can help define the necessary changes to mlab-ns in a tightly scoped and concrete manner.
Security
This system is insecure because it does not use the M-Lab nagios system to gather data, and instead lets anyone paste any data they want into the simulator. Nagios integration is future work captured in this ticket:
Next Steps
Our contract with OTF proposes our next two milestones will focus on improving integration testing and unit test coverage. Our focus at that time was on test automation and documentation for diagnosing integration problems. Test automation has already been improved since that time, and we've accomplished most of the work for documentation:
https://github.com/m-lab-tools/ooni-support/issues/60
Therefore, we propose to focus on some outstanding issues which will improve mlab-ns integration while continuing not to block on, or interfere with, M-Lab operations as follows:
The primary change to mlab-ns will be to allow any tool to include arbitrary data per slivver to be gathered and distributed by mlab-ns. Ooni will use this to distribute data such as collector `.onion` addresses. The need for this change is discussed here:
This proposed change is documented in this ticket:
A secondary change is to implement `match=all` described in #47 above. It may not be necessary, so there is further investigation and testing necessary:
https://github.com/m-lab-tools/ooni-support/issues/56
Along with these changes to `mlab-ns`, we need trivial updates to our integration scripts to work with mlab-ns rather than the simulator:
- https://github.com/m-lab-tools/ooni-support/issues/10
- https://github.com/m-lab-tools/ooni-support/issues/11
Details & Links
Attached is a shortish overview of possible approaches to implement this integration. We've implemented a deployment with a mock mlab-ns (called mlab-ns-simulator) and the "arbitrary data" approach from the attached design document. The pull request is here:
Specific details about this pull request:
- A script for gathering necessary information from collectors and
testhelpers, then updating the mlab-ns-simulator.
- A script for updating a bouncer's state based on the mlan-ns-simulator.
- A cron script to update the bouncer on an hourly schedule.
- The mlab-ns-simulator itself, which approximates the production mlab-ns.
- `.init/` script changes to automatically launch the simulator and
bouncer on `mlab1.nuq0t.measurement-lab.org`.
- Design documentation for mlab-ns integration (including this
stepping stone architecture).
- Each instructions to disable mlab-ns integration without any redeployment.
We also created a subset pull request that has bug fixes but no mlab-ns integration features:
Github Milestones
We split the mlab-ns-simulator deployment tasks out from the larger mlab-ns integration deployment. The mlab-ns-simulator milestone is at:
The full mlab-ns integration milestone:
As always, let us know if you have any feedback!
On Fri, Aug 1, 2014 at 3:08 PM, Will Hawkins hawkinsw@opentechinstitute.org wrote:
To follow-up on Nathan's excellent report, I thought I could shed some light on the status of the OONI integration with MLab NS:
- Our work is temporarily blocked due an operational issue that should
be resolved imminently.
Good to know.
- The integration that Nathan mentioned between Nagios and MLab NS is
incredibly promising. As mentioned previously, MLab NS captures its information from the MLab nagios instance using a "baseList" script that runs on our monitoring server. As it functions now, MLab NS is filled with information based on the output of a baseList call that looks like:
http://nagios.measurementlab.net/baseList?show_state=1&service_name=ndt
which has output like:
ndt.iupui.mlab1.akl01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab2.akl01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab3.akl01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab1.ams01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab2.ams01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab3.ams01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab1.ams02.measurement-lab.org/ndt 0 1 ndt.iupui.mlab2.ams02.measurement-lab.org/ndt 0 1 ...
The 0s and 1s are flags indicating whether there is a "problem" with the slice or not. I.e., they are backward.
baseList takes an additional parameter known as plugin_output. We will update MLab NS to call baseList with this additional parameter. The call will look like:
http://nagios.measurementlab.net/baseList?show_state=1&service_name=ndt&...
which has output like:
ndt.iupui.mlab1.akl01.measurement-lab.org/ndt 0 1 TCP OK - 0.136 second response time on ndt.iupui.mlab1.akl01.measurement-lab.org port 3001 ndt.iupui.mlab2.akl01.measurement-lab.org/ndt 0 1 TCP OK - 0.136 second response time on ndt.iupui.mlab2.akl01.measurement-lab.org port 3001 ndt.iupui.mlab3.akl01.measurement-lab.org/ndt 0 1 TCP OK - 0.135 second response time on ndt.iupui.mlab3.akl01.measurement-lab.org port 3001 ndt.iupui.mlab1.ams01.measurement-lab.org/ndt 0 1 TCP OK - 0.145 second response time on ndt.iupui.mlab1.ams01.measurement-lab.org port 3001 ...
The extra data is the output from the plugin that monitors whether the particular service is online. In this example, we are monitoring ndt and the plugin reports whether a TCP connection is possible to port 3001 (NDT's port).
So, the integration point between nagios, MLab NS and OONI will look like this:
The nagios plugin written by LA/OONI will use return codes to signal whether the OONI service is running. That return value will be the 0s and 1s in baseList output. The "string" output from the plugin will be the information that needs to be captured in MLab NS and returned with OONI queries. Based on pull requests, I suspect the resulting response to a baseList call like
http://nagios.measurementlab.net/baseList?show_state=1&service_name=ooni...
will be something like
ooni.mlab.mlab1.akl01.measurement-lab.org/ndt 0 1 'collector_onion': 'testfakenotreal.onion' ooni.mlab.mlab1.akl02.measurement-lab.org/ndt 0 1 'collector_onion': 'testfakenotreal.onion' ...
This sounds close to what we're imagining. BTW- we're tracking the Ooni side of this here:
https://github.com/m-lab-tools/ooni-support/issues/10
Could you link in a reference to the nagios plugin interface in that ticket #10 to help define its closure criteria?
Is the syntax for the plugin-specific detail just anything up to the next newline? We'd probably want to encode this in JSON to ensure any newlines or other weirdness doesn't break this format. Also, I just picked JSON because I saw that appengine's ndb has a field type for that and I figured it could be a generally useful format for any tool. Another approach is to have a blob property in mlab-ns.
Note, we closed a ticket for the mlab-ns-simulator which was to "approximate" the nagios pipeline, but it's not realistic at all:
https://github.com/m-lab-tools/ooni-support/issues/48
The 'collector_onion':'testfakenotreal.onion' string will makes its way through MLab NS get spit out as tool_extra from a query like:
http://mlab-ns.appspot.com/ooni
that gives something like:
{"city": "Washington", "url": "http://ndt.iupui.mlab1.iad01.measurement-lab.org:7123", "ip": ["216.156.197.139"], "site": "iad01", "fqdn": "ndt.iupui.mlab1.iad01.measurement-lab.org", "country": "US", "port": "3001", "tool_extra": 'testfakenotreal.onion' }
That sounds perfect. Is there a ticket somewhere for the link between nagios and mlab-ns? I'd like to keep an eye on that.
How about another ticket for including the "tool_extra" field into the mlab-ns datastore and returning it in queries? I sketched out what these changes might look like here:
https://github.com/m-lab-tools/ooni-support/issues/47
The TL;DR is that we are well-positioned to make these changes to MLab NS that will not require many (any?) fundamental changes to MLab NS or our monitoring infrastructure.
Does this seem reasonable?
Yep. Do you have some timeline estimate for the two changes of incorporating extra details in the nagios -> mlab-ns pipeline, and updating mlab-ns to store and return the "tool_extra" field?
- As Nathan mentioned, their integration with MLab NS will require a
query type that is able to list all available answers. I mentioned in comments to a ticket that we have something similar to what they need. However, I realize now that that approach will not work.
However, there is a better option. MLab NS already has a "thing" at
http://mlab-nstesting.appspot.com/admin/map/ipv4/all
that generates a map of the status of all the services and places them on a map. We will modify that by parameterizing the output to allow for json responses which will exactly satisfy OONI's needs.
Does this seem reasonable?
Yes. Is that much work?
For the first pass deployment, Ooni's needs will be "just return everything" or even "just return a random subset that fits into one response". Later releases might want to be clever about geo-location of test_helpers or other policies.
In terms of collectors, the geo location should not matter, since they are Tor hidden services. (It's kind of funny to have a map of where these hidden services will live, something we may want to change later.)
Summary:
I think that we are on the brink of making this full integration happen. We will keep everyone posted as we move forward.
Feedback welcome, obviously!
Will
On 08/01/2014 03:25 PM, Nathan Wilcox wrote:
Dear OTF, Ooni, and M-Lab,
Summary
We've hashed out a design to integrate Ooni with mlab-ns on the M-Lab deployment, and we've implemented a fully functional deployment that approximates this by simulating mlab-ns (this is attached). This completes Milestone D of our contract with OTF.
Design Goals
Our top goals for this integration are:
It does not rely on any changes to upstream Ooni. (For example, probes still use a bouncer .onion, and the backend has stock bouncers, collectors, and test helpers running.)
It can be disabled easily without redeploying the M-Lab backend. Our branch's ooni-support README.md has instructions to disable the integration, merely by editing a cron job to unset an ENABLED flag. There's no need to redeploy different versions of ooni-support.
When enabled, it allows M-Lab operations to monitor collectors and test_helpers status with the same infrastructure as all other M-Lab tools.
Future Architectural Changes
In the future, it may be nice to augment ooni / mlab-ns integration. For example, mlab-ns is designed to support different policies which may be useful to tools, such as geo-location of test_helpers.
The Simulator
This deployment architecture uses a simulator. While it is fully functional and useful for testing it lacks security or robustness, so we want to emphasize *not to deploy this* to non-test environments.
Rationale
There are three rationales for this approach:
First, Least Authority didn't want to push through modifications to mlab-ns without first creating and testing a proof-of-concept.
Second, we didn't want to block our effort on M-Lab engineering effort, so this allows a clean division of labor.
Third, by creating and testing a working proof of concept we can help define the necessary changes to mlab-ns in a tightly scoped and concrete manner.
Security
This system is insecure because it does not use the M-Lab nagios system to gather data, and instead lets anyone paste any data they want into the simulator. Nagios integration is future work captured in this ticket:
Next Steps
Our contract with OTF proposes our next two milestones will focus on improving integration testing and unit test coverage. Our focus at that time was on test automation and documentation for diagnosing integration problems. Test automation has already been improved since that time, and we've accomplished most of the work for documentation:
https://github.com/m-lab-tools/ooni-support/issues/60
Therefore, we propose to focus on some outstanding issues which will improve mlab-ns integration while continuing not to block on, or interfere with, M-Lab operations as follows:
The primary change to mlab-ns will be to allow any tool to include arbitrary data per slivver to be gathered and distributed by mlab-ns. Ooni will use this to distribute data such as collector `.onion` addresses. The need for this change is discussed here:
This proposed change is documented in this ticket:
A secondary change is to implement `match=all` described in #47 above. It may not be necessary, so there is further investigation and testing necessary:
https://github.com/m-lab-tools/ooni-support/issues/56
Along with these changes to `mlab-ns`, we need trivial updates to our integration scripts to work with mlab-ns rather than the simulator:
- https://github.com/m-lab-tools/ooni-support/issues/10
- https://github.com/m-lab-tools/ooni-support/issues/11
Details & Links
Attached is a shortish overview of possible approaches to implement this integration. We've implemented a deployment with a mock mlab-ns (called mlab-ns-simulator) and the "arbitrary data" approach from the attached design document. The pull request is here:
Specific details about this pull request:
- A script for gathering necessary information from collectors and
testhelpers, then updating the mlab-ns-simulator.
- A script for updating a bouncer's state based on the mlan-ns-simulator.
- A cron script to update the bouncer on an hourly schedule.
- The mlab-ns-simulator itself, which approximates the production mlab-ns.
- `.init/` script changes to automatically launch the simulator and
bouncer on `mlab1.nuq0t.measurement-lab.org`.
- Design documentation for mlab-ns integration (including this
stepping stone architecture).
- Each instructions to disable mlab-ns integration without any redeployment.
We also created a subset pull request that has bug fixes but no mlab-ns integration features:
Github Milestones
We split the mlab-ns-simulator deployment tasks out from the larger mlab-ns integration deployment. The mlab-ns-simulator milestone is at:
The full mlab-ns integration milestone:
As always, let us know if you have any feedback!
On 08/04/2014 03:24 PM, Nathan Wilcox wrote:
On Fri, Aug 1, 2014 at 3:08 PM, Will Hawkins hawkinsw@opentechinstitute.org wrote:
To follow-up on Nathan's excellent report, I thought I could shed some light on the status of the OONI integration with MLab NS:
- Our work is temporarily blocked due an operational issue that should
be resolved imminently.
Good to know.
We are officially unblocked.
- The integration that Nathan mentioned between Nagios and MLab NS is
incredibly promising. As mentioned previously, MLab NS captures its information from the MLab nagios instance using a "baseList" script that runs on our monitoring server. As it functions now, MLab NS is filled with information based on the output of a baseList call that looks like:
http://nagios.measurementlab.net/baseList?show_state=1&service_name=ndt
which has output like:
ndt.iupui.mlab1.akl01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab2.akl01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab3.akl01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab1.ams01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab2.ams01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab3.ams01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab1.ams02.measurement-lab.org/ndt 0 1 ndt.iupui.mlab2.ams02.measurement-lab.org/ndt 0 1 ...
The 0s and 1s are flags indicating whether there is a "problem" with the slice or not. I.e., they are backward.
baseList takes an additional parameter known as plugin_output. We will update MLab NS to call baseList with this additional parameter. The call will look like:
http://nagios.measurementlab.net/baseList?show_state=1&service_name=ndt&...
which has output like:
ndt.iupui.mlab1.akl01.measurement-lab.org/ndt 0 1 TCP OK - 0.136 second response time on ndt.iupui.mlab1.akl01.measurement-lab.org port 3001 ndt.iupui.mlab2.akl01.measurement-lab.org/ndt 0 1 TCP OK - 0.136 second response time on ndt.iupui.mlab2.akl01.measurement-lab.org port 3001 ndt.iupui.mlab3.akl01.measurement-lab.org/ndt 0 1 TCP OK - 0.135 second response time on ndt.iupui.mlab3.akl01.measurement-lab.org port 3001 ndt.iupui.mlab1.ams01.measurement-lab.org/ndt 0 1 TCP OK - 0.145 second response time on ndt.iupui.mlab1.ams01.measurement-lab.org port 3001 ...
The extra data is the output from the plugin that monitors whether the particular service is online. In this example, we are monitoring ndt and the plugin reports whether a TCP connection is possible to port 3001 (NDT's port).
So, the integration point between nagios, MLab NS and OONI will look like this:
The nagios plugin written by LA/OONI will use return codes to signal whether the OONI service is running. That return value will be the 0s and 1s in baseList output. The "string" output from the plugin will be the information that needs to be captured in MLab NS and returned with OONI queries. Based on pull requests, I suspect the resulting response to a baseList call like
http://nagios.measurementlab.net/baseList?show_state=1&service_name=ooni...
will be something like
ooni.mlab.mlab1.akl01.measurement-lab.org/ndt 0 1 'collector_onion': 'testfakenotreal.onion' ooni.mlab.mlab1.akl02.measurement-lab.org/ndt 0 1 'collector_onion': 'testfakenotreal.onion' ...
This sounds close to what we're imagining. BTW- we're tracking the Ooni side of this here:
https://github.com/m-lab-tools/ooni-support/issues/10
Could you link in a reference to the nagios plugin interface in that ticket #10 to help define its closure criteria?
Is the syntax for the plugin-specific detail just anything up to the next newline? We'd probably want to encode this in JSON to ensure any newlines or other weirdness doesn't break this format. Also, I just picked JSON because I saw that appengine's ndb has a field type for that and I figured it could be a generally useful format for any tool. Another approach is to have a blob property in mlab-ns.
Note, we closed a ticket for the mlab-ns-simulator which was to "approximate" the nagios pipeline, but it's not realistic at all:
https://github.com/m-lab-tools/ooni-support/issues/48
The 'collector_onion':'testfakenotreal.onion' string will makes its way through MLab NS get spit out as tool_extra from a query like:
http://mlab-ns.appspot.com/ooni
that gives something like:
{"city": "Washington", "url": "http://ndt.iupui.mlab1.iad01.measurement-lab.org:7123", "ip": ["216.156.197.139"], "site": "iad01", "fqdn": "ndt.iupui.mlab1.iad01.measurement-lab.org", "country": "US", "port": "3001", "tool_extra": 'testfakenotreal.onion' }
That sounds perfect. Is there a ticket somewhere for the link between nagios and mlab-ns? I'd like to keep an eye on that.
How about another ticket for including the "tool_extra" field into the mlab-ns datastore and returning it in queries? I sketched out what these changes might look like here:
I will dig into the specific tickets and update them appropriately, but I wanted you to know that we now have "tool_extra" support in the MLab NS testing instance:
http://mlab-nstesting.appspot.com/ndt
gives
{"city": "Washington_DC", "url": "http://ndt.iupui.mlab2.iad01.measurement-lab.org:7123", "ip": ["216.156.197.152"], "fqdn": "ndt.iupui.mlab2.iad01.measurement-lab.org", "site": "iad01", "country": "US", "tool_extra": "1 TCP OK - 0.075 second response time on ndt.iupui.mlab2.iad01.measurement-lab.org port 3001"}
You can see the commit here: https://code.google.com/r/hawkinsw-sieve/source/detail?name=ooni&r=2ab4a...
The TL;DR is that we are well-positioned to make these changes to MLab NS that will not require many (any?) fundamental changes to MLab NS or our monitoring infrastructure.
Does this seem reasonable?
Yep. Do you have some timeline estimate for the two changes of incorporating extra details in the nagios -> mlab-ns pipeline, and updating mlab-ns to store and return the "tool_extra" field?
See above. Sliver tools that expose plugin output will be stored in tool_extra and returned with queries.
- As Nathan mentioned, their integration with MLab NS will require a
query type that is able to list all available answers. I mentioned in comments to a ticket that we have something similar to what they need. However, I realize now that that approach will not work.
However, there is a better option. MLab NS already has a "thing" at
http://mlab-nstesting.appspot.com/admin/map/ipv4/all
that generates a map of the status of all the services and places them on a map. We will modify that by parameterizing the output to allow for json responses which will exactly satisfy OONI's needs.
Does this seem reasonable?
Yes. Is that much work?
I am moving on to this now and will keep you posted :-)
Thanks for your responses. I will keep everyone up to date as work continues!
Will
For the first pass deployment, Ooni's needs will be "just return everything" or even "just return a random subset that fits into one response". Later releases might want to be clever about geo-location of test_helpers or other policies.
In terms of collectors, the geo location should not matter, since they are Tor hidden services. (It's kind of funny to have a map of where these hidden services will live, something we may want to change later.)
Summary:
I think that we are on the brink of making this full integration happen. We will keep everyone posted as we move forward.
Feedback welcome, obviously!
Will
On 08/01/2014 03:25 PM, Nathan Wilcox wrote:
Dear OTF, Ooni, and M-Lab,
Summary
We've hashed out a design to integrate Ooni with mlab-ns on the M-Lab deployment, and we've implemented a fully functional deployment that approximates this by simulating mlab-ns (this is attached). This completes Milestone D of our contract with OTF.
Design Goals
Our top goals for this integration are:
It does not rely on any changes to upstream Ooni. (For example, probes still use a bouncer .onion, and the backend has stock bouncers, collectors, and test helpers running.)
It can be disabled easily without redeploying the M-Lab backend. Our branch's ooni-support README.md has instructions to disable the integration, merely by editing a cron job to unset an ENABLED flag. There's no need to redeploy different versions of ooni-support.
When enabled, it allows M-Lab operations to monitor collectors and test_helpers status with the same infrastructure as all other M-Lab tools.
Future Architectural Changes
In the future, it may be nice to augment ooni / mlab-ns integration. For example, mlab-ns is designed to support different policies which may be useful to tools, such as geo-location of test_helpers.
The Simulator
This deployment architecture uses a simulator. While it is fully functional and useful for testing it lacks security or robustness, so we want to emphasize *not to deploy this* to non-test environments.
Rationale
There are three rationales for this approach:
First, Least Authority didn't want to push through modifications to mlab-ns without first creating and testing a proof-of-concept.
Second, we didn't want to block our effort on M-Lab engineering effort, so this allows a clean division of labor.
Third, by creating and testing a working proof of concept we can help define the necessary changes to mlab-ns in a tightly scoped and concrete manner.
Security
This system is insecure because it does not use the M-Lab nagios system to gather data, and instead lets anyone paste any data they want into the simulator. Nagios integration is future work captured in this ticket:
Next Steps
Our contract with OTF proposes our next two milestones will focus on improving integration testing and unit test coverage. Our focus at that time was on test automation and documentation for diagnosing integration problems. Test automation has already been improved since that time, and we've accomplished most of the work for documentation:
https://github.com/m-lab-tools/ooni-support/issues/60
Therefore, we propose to focus on some outstanding issues which will improve mlab-ns integration while continuing not to block on, or interfere with, M-Lab operations as follows:
The primary change to mlab-ns will be to allow any tool to include arbitrary data per slivver to be gathered and distributed by mlab-ns. Ooni will use this to distribute data such as collector `.onion` addresses. The need for this change is discussed here:
This proposed change is documented in this ticket:
A secondary change is to implement `match=all` described in #47 above. It may not be necessary, so there is further investigation and testing necessary:
https://github.com/m-lab-tools/ooni-support/issues/56
Along with these changes to `mlab-ns`, we need trivial updates to our integration scripts to work with mlab-ns rather than the simulator:
- https://github.com/m-lab-tools/ooni-support/issues/10
- https://github.com/m-lab-tools/ooni-support/issues/11
Details & Links
Attached is a shortish overview of possible approaches to implement this integration. We've implemented a deployment with a mock mlab-ns (called mlab-ns-simulator) and the "arbitrary data" approach from the attached design document. The pull request is here:
Specific details about this pull request:
- A script for gathering necessary information from collectors and
testhelpers, then updating the mlab-ns-simulator.
- A script for updating a bouncer's state based on the mlan-ns-simulator.
- A cron script to update the bouncer on an hourly schedule.
- The mlab-ns-simulator itself, which approximates the production mlab-ns.
- `.init/` script changes to automatically launch the simulator and
bouncer on `mlab1.nuq0t.measurement-lab.org`.
- Design documentation for mlab-ns integration (including this
stepping stone architecture).
- Each instructions to disable mlab-ns integration without any redeployment.
We also created a subset pull request that has bug fixes but no mlab-ns integration features:
Github Milestones
We split the mlab-ns-simulator deployment tasks out from the larger mlab-ns integration deployment. The mlab-ns-simulator milestone is at:
The full mlab-ns integration milestone:
As always, let us know if you have any feedback!
PS: I trimmed the CC line since we were getting into the weeds and I didn't want to bother people at RFA. If it's a good idea to have them in the loop, feel free to add them back!
On 08/04/2014 07:06 PM, Will Hawkins wrote:
On 08/04/2014 03:24 PM, Nathan Wilcox wrote:
On Fri, Aug 1, 2014 at 3:08 PM, Will Hawkins hawkinsw@opentechinstitute.org wrote:
To follow-up on Nathan's excellent report, I thought I could shed some light on the status of the OONI integration with MLab NS:
- Our work is temporarily blocked due an operational issue that should
be resolved imminently.
Good to know.
We are officially unblocked.
- The integration that Nathan mentioned between Nagios and MLab NS is
incredibly promising. As mentioned previously, MLab NS captures its information from the MLab nagios instance using a "baseList" script that runs on our monitoring server. As it functions now, MLab NS is filled with information based on the output of a baseList call that looks like:
http://nagios.measurementlab.net/baseList?show_state=1&service_name=ndt
which has output like:
ndt.iupui.mlab1.akl01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab2.akl01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab3.akl01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab1.ams01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab2.ams01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab3.ams01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab1.ams02.measurement-lab.org/ndt 0 1 ndt.iupui.mlab2.ams02.measurement-lab.org/ndt 0 1 ...
The 0s and 1s are flags indicating whether there is a "problem" with the slice or not. I.e., they are backward.
baseList takes an additional parameter known as plugin_output. We will update MLab NS to call baseList with this additional parameter. The call will look like:
http://nagios.measurementlab.net/baseList?show_state=1&service_name=ndt&...
which has output like:
ndt.iupui.mlab1.akl01.measurement-lab.org/ndt 0 1 TCP OK - 0.136 second response time on ndt.iupui.mlab1.akl01.measurement-lab.org port 3001 ndt.iupui.mlab2.akl01.measurement-lab.org/ndt 0 1 TCP OK - 0.136 second response time on ndt.iupui.mlab2.akl01.measurement-lab.org port 3001 ndt.iupui.mlab3.akl01.measurement-lab.org/ndt 0 1 TCP OK - 0.135 second response time on ndt.iupui.mlab3.akl01.measurement-lab.org port 3001 ndt.iupui.mlab1.ams01.measurement-lab.org/ndt 0 1 TCP OK - 0.145 second response time on ndt.iupui.mlab1.ams01.measurement-lab.org port 3001 ...
The extra data is the output from the plugin that monitors whether the particular service is online. In this example, we are monitoring ndt and the plugin reports whether a TCP connection is possible to port 3001 (NDT's port).
So, the integration point between nagios, MLab NS and OONI will look like this:
The nagios plugin written by LA/OONI will use return codes to signal whether the OONI service is running. That return value will be the 0s and 1s in baseList output. The "string" output from the plugin will be the information that needs to be captured in MLab NS and returned with OONI queries. Based on pull requests, I suspect the resulting response to a baseList call like
http://nagios.measurementlab.net/baseList?show_state=1&service_name=ooni...
will be something like
ooni.mlab.mlab1.akl01.measurement-lab.org/ndt 0 1 'collector_onion': 'testfakenotreal.onion' ooni.mlab.mlab1.akl02.measurement-lab.org/ndt 0 1 'collector_onion': 'testfakenotreal.onion' ...
This sounds close to what we're imagining. BTW- we're tracking the Ooni side of this here:
https://github.com/m-lab-tools/ooni-support/issues/10
Could you link in a reference to the nagios plugin interface in that ticket #10 to help define its closure criteria?
Is the syntax for the plugin-specific detail just anything up to the next newline? We'd probably want to encode this in JSON to ensure any newlines or other weirdness doesn't break this format. Also, I just picked JSON because I saw that appengine's ndb has a field type for that and I figured it could be a generally useful format for any tool. Another approach is to have a blob property in mlab-ns.
Note, we closed a ticket for the mlab-ns-simulator which was to "approximate" the nagios pipeline, but it's not realistic at all:
https://github.com/m-lab-tools/ooni-support/issues/48
The 'collector_onion':'testfakenotreal.onion' string will makes its way through MLab NS get spit out as tool_extra from a query like:
http://mlab-ns.appspot.com/ooni
that gives something like:
{"city": "Washington", "url": "http://ndt.iupui.mlab1.iad01.measurement-lab.org:7123", "ip": ["216.156.197.139"], "site": "iad01", "fqdn": "ndt.iupui.mlab1.iad01.measurement-lab.org", "country": "US", "port": "3001", "tool_extra": 'testfakenotreal.onion' }
That sounds perfect. Is there a ticket somewhere for the link between nagios and mlab-ns? I'd like to keep an eye on that.
How about another ticket for including the "tool_extra" field into the mlab-ns datastore and returning it in queries? I sketched out what these changes might look like here:
I will dig into the specific tickets and update them appropriately, but I wanted you to know that we now have "tool_extra" support in the MLab NS testing instance:
http://mlab-nstesting.appspot.com/ndt
gives
{"city": "Washington_DC", "url": "http://ndt.iupui.mlab2.iad01.measurement-lab.org:7123", "ip": ["216.156.197.152"], "fqdn": "ndt.iupui.mlab2.iad01.measurement-lab.org", "site": "iad01", "country": "US", "tool_extra": "1 TCP OK - 0.075 second response time on ndt.iupui.mlab2.iad01.measurement-lab.org port 3001"}
You can see the commit here: https://code.google.com/r/hawkinsw-sieve/source/detail?name=ooni&r=2ab4a...
The TL;DR is that we are well-positioned to make these changes to MLab NS that will not require many (any?) fundamental changes to MLab NS or our monitoring infrastructure.
Does this seem reasonable?
Yep. Do you have some timeline estimate for the two changes of incorporating extra details in the nagios -> mlab-ns pipeline, and updating mlab-ns to store and return the "tool_extra" field?
See above. Sliver tools that expose plugin output will be stored in tool_extra and returned with queries.
- As Nathan mentioned, their integration with MLab NS will require a
query type that is able to list all available answers. I mentioned in comments to a ticket that we have something similar to what they need. However, I realize now that that approach will not work.
However, there is a better option. MLab NS already has a "thing" at
http://mlab-nstesting.appspot.com/admin/map/ipv4/all
that generates a map of the status of all the services and places them on a map. We will modify that by parameterizing the output to allow for json responses which will exactly satisfy OONI's needs.
Does this seem reasonable?
Yes. Is that much work?
I am moving on to this now and will keep you posted :-)
This is implemented. You can see that
http://mlab-nstesting.appspot.com/admin/sliver_tools?format=json
produces an array of JSON objects that contain information about each slice. You can parse those objects to find the ooni slivers and then get to their tool_extra bits.
Give this a look and let me know what you think. We can tweak until we get it exactly right.
Will
Thanks for your responses. I will keep everyone up to date as work continues!
Will
For the first pass deployment, Ooni's needs will be "just return everything" or even "just return a random subset that fits into one response". Later releases might want to be clever about geo-location of test_helpers or other policies.
In terms of collectors, the geo location should not matter, since they are Tor hidden services. (It's kind of funny to have a map of where these hidden services will live, something we may want to change later.)
Summary:
I think that we are on the brink of making this full integration happen. We will keep everyone posted as we move forward.
Feedback welcome, obviously!
Will
On 08/01/2014 03:25 PM, Nathan Wilcox wrote:
Dear OTF, Ooni, and M-Lab,
Summary
We've hashed out a design to integrate Ooni with mlab-ns on the M-Lab deployment, and we've implemented a fully functional deployment that approximates this by simulating mlab-ns (this is attached). This completes Milestone D of our contract with OTF.
Design Goals
Our top goals for this integration are:
It does not rely on any changes to upstream Ooni. (For example, probes still use a bouncer .onion, and the backend has stock bouncers, collectors, and test helpers running.)
It can be disabled easily without redeploying the M-Lab backend. Our branch's ooni-support README.md has instructions to disable the integration, merely by editing a cron job to unset an ENABLED flag. There's no need to redeploy different versions of ooni-support.
When enabled, it allows M-Lab operations to monitor collectors and test_helpers status with the same infrastructure as all other M-Lab tools.
Future Architectural Changes
In the future, it may be nice to augment ooni / mlab-ns integration. For example, mlab-ns is designed to support different policies which may be useful to tools, such as geo-location of test_helpers.
The Simulator
This deployment architecture uses a simulator. While it is fully functional and useful for testing it lacks security or robustness, so we want to emphasize *not to deploy this* to non-test environments.
Rationale
There are three rationales for this approach:
First, Least Authority didn't want to push through modifications to mlab-ns without first creating and testing a proof-of-concept.
Second, we didn't want to block our effort on M-Lab engineering effort, so this allows a clean division of labor.
Third, by creating and testing a working proof of concept we can help define the necessary changes to mlab-ns in a tightly scoped and concrete manner.
Security
This system is insecure because it does not use the M-Lab nagios system to gather data, and instead lets anyone paste any data they want into the simulator. Nagios integration is future work captured in this ticket:
Next Steps
Our contract with OTF proposes our next two milestones will focus on improving integration testing and unit test coverage. Our focus at that time was on test automation and documentation for diagnosing integration problems. Test automation has already been improved since that time, and we've accomplished most of the work for documentation:
https://github.com/m-lab-tools/ooni-support/issues/60
Therefore, we propose to focus on some outstanding issues which will improve mlab-ns integration while continuing not to block on, or interfere with, M-Lab operations as follows:
The primary change to mlab-ns will be to allow any tool to include arbitrary data per slivver to be gathered and distributed by mlab-ns. Ooni will use this to distribute data such as collector `.onion` addresses. The need for this change is discussed here:
This proposed change is documented in this ticket:
A secondary change is to implement `match=all` described in #47 above. It may not be necessary, so there is further investigation and testing necessary:
https://github.com/m-lab-tools/ooni-support/issues/56
Along with these changes to `mlab-ns`, we need trivial updates to our integration scripts to work with mlab-ns rather than the simulator:
- https://github.com/m-lab-tools/ooni-support/issues/10
- https://github.com/m-lab-tools/ooni-support/issues/11
Details & Links
Attached is a shortish overview of possible approaches to implement this integration. We've implemented a deployment with a mock mlab-ns (called mlab-ns-simulator) and the "arbitrary data" approach from the attached design document. The pull request is here:
Specific details about this pull request:
- A script for gathering necessary information from collectors and
testhelpers, then updating the mlab-ns-simulator.
- A script for updating a bouncer's state based on the mlan-ns-simulator.
- A cron script to update the bouncer on an hourly schedule.
- The mlab-ns-simulator itself, which approximates the production mlab-ns.
- `.init/` script changes to automatically launch the simulator and
bouncer on `mlab1.nuq0t.measurement-lab.org`.
- Design documentation for mlab-ns integration (including this
stepping stone architecture).
- Each instructions to disable mlab-ns integration without any redeployment.
We also created a subset pull request that has bug fixes but no mlab-ns integration features:
Github Milestones
We split the mlab-ns-simulator deployment tasks out from the larger mlab-ns integration deployment. The mlab-ns-simulator milestone is at:
The full mlab-ns integration milestone:
As always, let us know if you have any feedback!
On 08/04/2014 09:37 PM, Will Hawkins wrote:
PS: I trimmed the CC line since we were getting into the weeds and I didn't want to bother people at RFA. If it's a good idea to have them in the loop, feel free to add them back!
On 08/04/2014 07:06 PM, Will Hawkins wrote:
On 08/04/2014 03:24 PM, Nathan Wilcox wrote:
On Fri, Aug 1, 2014 at 3:08 PM, Will Hawkins hawkinsw@opentechinstitute.org wrote:
To follow-up on Nathan's excellent report, I thought I could shed some light on the status of the OONI integration with MLab NS:
- Our work is temporarily blocked due an operational issue that should
be resolved imminently.
Good to know.
We are officially unblocked.
- The integration that Nathan mentioned between Nagios and MLab NS is
incredibly promising. As mentioned previously, MLab NS captures its information from the MLab nagios instance using a "baseList" script that runs on our monitoring server. As it functions now, MLab NS is filled with information based on the output of a baseList call that looks like:
http://nagios.measurementlab.net/baseList?show_state=1&service_name=ndt
which has output like:
ndt.iupui.mlab1.akl01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab2.akl01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab3.akl01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab1.ams01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab2.ams01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab3.ams01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab1.ams02.measurement-lab.org/ndt 0 1 ndt.iupui.mlab2.ams02.measurement-lab.org/ndt 0 1 ...
The 0s and 1s are flags indicating whether there is a "problem" with the slice or not. I.e., they are backward.
baseList takes an additional parameter known as plugin_output. We will update MLab NS to call baseList with this additional parameter. The call will look like:
http://nagios.measurementlab.net/baseList?show_state=1&service_name=ndt&...
which has output like:
ndt.iupui.mlab1.akl01.measurement-lab.org/ndt 0 1 TCP OK - 0.136 second response time on ndt.iupui.mlab1.akl01.measurement-lab.org port 3001 ndt.iupui.mlab2.akl01.measurement-lab.org/ndt 0 1 TCP OK - 0.136 second response time on ndt.iupui.mlab2.akl01.measurement-lab.org port 3001 ndt.iupui.mlab3.akl01.measurement-lab.org/ndt 0 1 TCP OK - 0.135 second response time on ndt.iupui.mlab3.akl01.measurement-lab.org port 3001 ndt.iupui.mlab1.ams01.measurement-lab.org/ndt 0 1 TCP OK - 0.145 second response time on ndt.iupui.mlab1.ams01.measurement-lab.org port 3001 ...
The extra data is the output from the plugin that monitors whether the particular service is online. In this example, we are monitoring ndt and the plugin reports whether a TCP connection is possible to port 3001 (NDT's port).
So, the integration point between nagios, MLab NS and OONI will look like this:
The nagios plugin written by LA/OONI will use return codes to signal whether the OONI service is running. That return value will be the 0s and 1s in baseList output. The "string" output from the plugin will be the information that needs to be captured in MLab NS and returned with OONI queries. Based on pull requests, I suspect the resulting response to a baseList call like
http://nagios.measurementlab.net/baseList?show_state=1&service_name=ooni...
will be something like
ooni.mlab.mlab1.akl01.measurement-lab.org/ndt 0 1 'collector_onion': 'testfakenotreal.onion' ooni.mlab.mlab1.akl02.measurement-lab.org/ndt 0 1 'collector_onion': 'testfakenotreal.onion' ...
This sounds close to what we're imagining. BTW- we're tracking the Ooni side of this here:
https://github.com/m-lab-tools/ooni-support/issues/10
Could you link in a reference to the nagios plugin interface in that ticket #10 to help define its closure criteria?
Is the syntax for the plugin-specific detail just anything up to the next newline? We'd probably want to encode this in JSON to ensure any newlines or other weirdness doesn't break this format. Also, I just picked JSON because I saw that appengine's ndb has a field type for that and I figured it could be a generally useful format for any tool. Another approach is to have a blob property in mlab-ns.
Hello again! Sorry for responding to these out of order.
You are exactly correct. As it stands now, the baseList code will include the plugin output up to the newline. Encoding the plugin output so that there are not embedded newlines will probably be important. The more that we can do without having to change baseList, the better. But, if it is too inconvenient, we can make some changes (e.g., replace ' '-delimiters with something a little, say, clearer).
Given the progress on the other work, I think that we are ready to move forward on developing this plugin. What is the best way for us to work together to get this done? In the interest of complete disclosure, I have absolutely no experience writing nagios plugins, but I am happy to learn!
I think this will be the last email for the night :-)
Will
Note, we closed a ticket for the mlab-ns-simulator which was to "approximate" the nagios pipeline, but it's not realistic at all:
https://github.com/m-lab-tools/ooni-support/issues/48
The 'collector_onion':'testfakenotreal.onion' string will makes its way through MLab NS get spit out as tool_extra from a query like:
http://mlab-ns.appspot.com/ooni
that gives something like:
{"city": "Washington", "url": "http://ndt.iupui.mlab1.iad01.measurement-lab.org:7123", "ip": ["216.156.197.139"], "site": "iad01", "fqdn": "ndt.iupui.mlab1.iad01.measurement-lab.org", "country": "US", "port": "3001", "tool_extra": 'testfakenotreal.onion' }
That sounds perfect. Is there a ticket somewhere for the link between nagios and mlab-ns? I'd like to keep an eye on that.
How about another ticket for including the "tool_extra" field into the mlab-ns datastore and returning it in queries? I sketched out what these changes might look like here:
I will dig into the specific tickets and update them appropriately, but I wanted you to know that we now have "tool_extra" support in the MLab NS testing instance:
http://mlab-nstesting.appspot.com/ndt
gives
{"city": "Washington_DC", "url": "http://ndt.iupui.mlab2.iad01.measurement-lab.org:7123", "ip": ["216.156.197.152"], "fqdn": "ndt.iupui.mlab2.iad01.measurement-lab.org", "site": "iad01", "country": "US", "tool_extra": "1 TCP OK - 0.075 second response time on ndt.iupui.mlab2.iad01.measurement-lab.org port 3001"}
You can see the commit here: https://code.google.com/r/hawkinsw-sieve/source/detail?name=ooni&r=2ab4a...
The TL;DR is that we are well-positioned to make these changes to MLab NS that will not require many (any?) fundamental changes to MLab NS or our monitoring infrastructure.
Does this seem reasonable?
Yep. Do you have some timeline estimate for the two changes of incorporating extra details in the nagios -> mlab-ns pipeline, and updating mlab-ns to store and return the "tool_extra" field?
See above. Sliver tools that expose plugin output will be stored in tool_extra and returned with queries.
- As Nathan mentioned, their integration with MLab NS will require a
query type that is able to list all available answers. I mentioned in comments to a ticket that we have something similar to what they need. However, I realize now that that approach will not work.
However, there is a better option. MLab NS already has a "thing" at
http://mlab-nstesting.appspot.com/admin/map/ipv4/all
that generates a map of the status of all the services and places them on a map. We will modify that by parameterizing the output to allow for json responses which will exactly satisfy OONI's needs.
Does this seem reasonable?
Yes. Is that much work?
I am moving on to this now and will keep you posted :-)
This is implemented. You can see that
http://mlab-nstesting.appspot.com/admin/sliver_tools?format=json
produces an array of JSON objects that contain information about each slice. You can parse those objects to find the ooni slivers and then get to their tool_extra bits.
Give this a look and let me know what you think. We can tweak until we get it exactly right.
Will
Thanks for your responses. I will keep everyone up to date as work continues!
Will
For the first pass deployment, Ooni's needs will be "just return everything" or even "just return a random subset that fits into one response". Later releases might want to be clever about geo-location of test_helpers or other policies.
In terms of collectors, the geo location should not matter, since they are Tor hidden services. (It's kind of funny to have a map of where these hidden services will live, something we may want to change later.)
Summary:
I think that we are on the brink of making this full integration happen. We will keep everyone posted as we move forward.
Feedback welcome, obviously!
Will
On 08/01/2014 03:25 PM, Nathan Wilcox wrote:
Dear OTF, Ooni, and M-Lab,
Summary
We've hashed out a design to integrate Ooni with mlab-ns on the M-Lab deployment, and we've implemented a fully functional deployment that approximates this by simulating mlab-ns (this is attached). This completes Milestone D of our contract with OTF.
Design Goals
Our top goals for this integration are:
It does not rely on any changes to upstream Ooni. (For example, probes still use a bouncer .onion, and the backend has stock bouncers, collectors, and test helpers running.)
It can be disabled easily without redeploying the M-Lab backend. Our branch's ooni-support README.md has instructions to disable the integration, merely by editing a cron job to unset an ENABLED flag. There's no need to redeploy different versions of ooni-support.
When enabled, it allows M-Lab operations to monitor collectors and test_helpers status with the same infrastructure as all other M-Lab tools.
Future Architectural Changes
In the future, it may be nice to augment ooni / mlab-ns integration. For example, mlab-ns is designed to support different policies which may be useful to tools, such as geo-location of test_helpers.
The Simulator
This deployment architecture uses a simulator. While it is fully functional and useful for testing it lacks security or robustness, so we want to emphasize *not to deploy this* to non-test environments.
Rationale
There are three rationales for this approach:
First, Least Authority didn't want to push through modifications to mlab-ns without first creating and testing a proof-of-concept.
Second, we didn't want to block our effort on M-Lab engineering effort, so this allows a clean division of labor.
Third, by creating and testing a working proof of concept we can help define the necessary changes to mlab-ns in a tightly scoped and concrete manner.
Security
This system is insecure because it does not use the M-Lab nagios system to gather data, and instead lets anyone paste any data they want into the simulator. Nagios integration is future work captured in this ticket:
Next Steps
Our contract with OTF proposes our next two milestones will focus on improving integration testing and unit test coverage. Our focus at that time was on test automation and documentation for diagnosing integration problems. Test automation has already been improved since that time, and we've accomplished most of the work for documentation:
https://github.com/m-lab-tools/ooni-support/issues/60
Therefore, we propose to focus on some outstanding issues which will improve mlab-ns integration while continuing not to block on, or interfere with, M-Lab operations as follows:
The primary change to mlab-ns will be to allow any tool to include arbitrary data per slivver to be gathered and distributed by mlab-ns. Ooni will use this to distribute data such as collector `.onion` addresses. The need for this change is discussed here:
This proposed change is documented in this ticket:
A secondary change is to implement `match=all` described in #47 above. It may not be necessary, so there is further investigation and testing necessary:
https://github.com/m-lab-tools/ooni-support/issues/56
Along with these changes to `mlab-ns`, we need trivial updates to our integration scripts to work with mlab-ns rather than the simulator:
- https://github.com/m-lab-tools/ooni-support/issues/10
- https://github.com/m-lab-tools/ooni-support/issues/11
Details & Links
Attached is a shortish overview of possible approaches to implement this integration. We've implemented a deployment with a mock mlab-ns (called mlab-ns-simulator) and the "arbitrary data" approach from the attached design document. The pull request is here:
Specific details about this pull request:
- A script for gathering necessary information from collectors and
testhelpers, then updating the mlab-ns-simulator.
- A script for updating a bouncer's state based on the mlan-ns-simulator.
- A cron script to update the bouncer on an hourly schedule.
- The mlab-ns-simulator itself, which approximates the production mlab-ns.
- `.init/` script changes to automatically launch the simulator and
bouncer on `mlab1.nuq0t.measurement-lab.org`.
- Design documentation for mlab-ns integration (including this
stepping stone architecture).
- Each instructions to disable mlab-ns integration without any redeployment.
We also created a subset pull request that has bug fixes but no mlab-ns integration features:
Github Milestones
We split the mlab-ns-simulator deployment tasks out from the larger mlab-ns integration deployment. The mlab-ns-simulator milestone is at:
The full mlab-ns integration milestone:
As always, let us know if you have any feedback!
On Mon, Aug 4, 2014 at 6:55 PM, Will Hawkins hawkinsw@opentechinstitute.org wrote:
On 08/04/2014 09:37 PM, Will Hawkins wrote:
PS: I trimmed the CC line since we were getting into the weeds and I didn't want to bother people at RFA. If it's a good idea to have them in the loop, feel free to add them back!
Good call.
On 08/04/2014 07:06 PM, Will Hawkins wrote:
On 08/04/2014 03:24 PM, Nathan Wilcox wrote:
On Fri, Aug 1, 2014 at 3:08 PM, Will Hawkins hawkinsw@opentechinstitute.org wrote:
To follow-up on Nathan's excellent report, I thought I could shed some light on the status of the OONI integration with MLab NS:
- Our work is temporarily blocked due an operational issue that should
be resolved imminently.
Good to know.
We are officially unblocked.
- The integration that Nathan mentioned between Nagios and MLab NS is
incredibly promising. As mentioned previously, MLab NS captures its information from the MLab nagios instance using a "baseList" script that runs on our monitoring server. As it functions now, MLab NS is filled with information based on the output of a baseList call that looks like:
http://nagios.measurementlab.net/baseList?show_state=1&service_name=ndt
which has output like:
ndt.iupui.mlab1.akl01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab2.akl01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab3.akl01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab1.ams01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab2.ams01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab3.ams01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab1.ams02.measurement-lab.org/ndt 0 1 ndt.iupui.mlab2.ams02.measurement-lab.org/ndt 0 1 ...
The 0s and 1s are flags indicating whether there is a "problem" with the slice or not. I.e., they are backward.
baseList takes an additional parameter known as plugin_output. We will update MLab NS to call baseList with this additional parameter. The call will look like:
http://nagios.measurementlab.net/baseList?show_state=1&service_name=ndt&...
which has output like:
ndt.iupui.mlab1.akl01.measurement-lab.org/ndt 0 1 TCP OK - 0.136 second response time on ndt.iupui.mlab1.akl01.measurement-lab.org port 3001 ndt.iupui.mlab2.akl01.measurement-lab.org/ndt 0 1 TCP OK - 0.136 second response time on ndt.iupui.mlab2.akl01.measurement-lab.org port 3001 ndt.iupui.mlab3.akl01.measurement-lab.org/ndt 0 1 TCP OK - 0.135 second response time on ndt.iupui.mlab3.akl01.measurement-lab.org port 3001 ndt.iupui.mlab1.ams01.measurement-lab.org/ndt 0 1 TCP OK - 0.145 second response time on ndt.iupui.mlab1.ams01.measurement-lab.org port 3001 ...
The extra data is the output from the plugin that monitors whether the particular service is online. In this example, we are monitoring ndt and the plugin reports whether a TCP connection is possible to port 3001 (NDT's port).
So, the integration point between nagios, MLab NS and OONI will look like this:
The nagios plugin written by LA/OONI will use return codes to signal whether the OONI service is running. That return value will be the 0s and 1s in baseList output. The "string" output from the plugin will be the information that needs to be captured in MLab NS and returned with OONI queries. Based on pull requests, I suspect the resulting response to a baseList call like
http://nagios.measurementlab.net/baseList?show_state=1&service_name=ooni...
will be something like
ooni.mlab.mlab1.akl01.measurement-lab.org/ndt 0 1 'collector_onion': 'testfakenotreal.onion' ooni.mlab.mlab1.akl02.measurement-lab.org/ndt 0 1 'collector_onion': 'testfakenotreal.onion' ...
This sounds close to what we're imagining. BTW- we're tracking the Ooni side of this here:
https://github.com/m-lab-tools/ooni-support/issues/10
Could you link in a reference to the nagios plugin interface in that ticket #10 to help define its closure criteria?
Is the syntax for the plugin-specific detail just anything up to the next newline? We'd probably want to encode this in JSON to ensure any newlines or other weirdness doesn't break this format. Also, I just picked JSON because I saw that appengine's ndb has a field type for that and I figured it could be a generally useful format for any tool. Another approach is to have a blob property in mlab-ns.
Hello again! Sorry for responding to these out of order.
You are exactly correct. As it stands now, the baseList code will include the plugin output up to the newline. Encoding the plugin output so that there are not embedded newlines will probably be important. The more that we can do without having to change baseList, the better. But, if it is too inconvenient, we can make some changes (e.g., replace ' '-delimiters with something a little, say, clearer).
Given the progress on the other work, I think that we are ready to move forward on developing this plugin. What is the best way for us to work together to get this done? In the interest of complete disclosure, I have absolutely no experience writing nagios plugins, but I am happy to learn!
Excellent! I'm not familiar with nagios either. Can we find examples from other M-Lab tools?
BTW- I really want to focus on this remaining area of integration because we're close. However, our contract specifies that we'll work on improving unit tests. I'm going to propose to RFA that we work on this instead, because it's necessary, whereas unittests are arguably a non-essential quality improvement.
For this nagios integration within ooni-support specifically here are the steps I see:
1. Figure out if we can get nagios to execute a python script to gather the information, and if so: 2. Modify ./bouncer-plumbing/collector-to-mlab/getconfig.py so that instead of posting the details with urllib, it prints them to stdout the way nagios likes.
There's kind of a tangential issue which is that script only exists in our fork of ooni-support:
https://github.com/LeastAuthority/ooni-support/blob/combined-leastauthority-...
We've made a pull request to the upstream ooni-support's master branch, but I believe it's premature to land this (even though if it were landed, it's trivial to turn off the mlab-ns integration):
https://github.com/m-lab-tools/ooni-support/pull/59
So, a Step 0 might be to create a new "mlab-ns-integration" branch on the main ooni-support repository, land our work there, and then continue deployment on that branch until we have consensus that the integration works well.
I think this will be the last email for the night :-)
Aha, but I am on the left coast and can send later emails with less personal inconvenience! Mwuhaha!
So maybe *this* is the last one tonight?
Will
Note, we closed a ticket for the mlab-ns-simulator which was to "approximate" the nagios pipeline, but it's not realistic at all:
https://github.com/m-lab-tools/ooni-support/issues/48
The 'collector_onion':'testfakenotreal.onion' string will makes its way through MLab NS get spit out as tool_extra from a query like:
http://mlab-ns.appspot.com/ooni
that gives something like:
{"city": "Washington", "url": "http://ndt.iupui.mlab1.iad01.measurement-lab.org:7123", "ip": ["216.156.197.139"], "site": "iad01", "fqdn": "ndt.iupui.mlab1.iad01.measurement-lab.org", "country": "US", "port": "3001", "tool_extra": 'testfakenotreal.onion' }
That sounds perfect. Is there a ticket somewhere for the link between nagios and mlab-ns? I'd like to keep an eye on that.
How about another ticket for including the "tool_extra" field into the mlab-ns datastore and returning it in queries? I sketched out what these changes might look like here:
I will dig into the specific tickets and update them appropriately, but I wanted you to know that we now have "tool_extra" support in the MLab NS testing instance:
http://mlab-nstesting.appspot.com/ndt
gives
{"city": "Washington_DC", "url": "http://ndt.iupui.mlab2.iad01.measurement-lab.org:7123", "ip": ["216.156.197.152"], "fqdn": "ndt.iupui.mlab2.iad01.measurement-lab.org", "site": "iad01", "country": "US", "tool_extra": "1 TCP OK - 0.075 second response time on ndt.iupui.mlab2.iad01.measurement-lab.org port 3001"}
You can see the commit here: https://code.google.com/r/hawkinsw-sieve/source/detail?name=ooni&r=2ab4a...
The TL;DR is that we are well-positioned to make these changes to MLab NS that will not require many (any?) fundamental changes to MLab NS or our monitoring infrastructure.
Does this seem reasonable?
Yep. Do you have some timeline estimate for the two changes of incorporating extra details in the nagios -> mlab-ns pipeline, and updating mlab-ns to store and return the "tool_extra" field?
See above. Sliver tools that expose plugin output will be stored in tool_extra and returned with queries.
- As Nathan mentioned, their integration with MLab NS will require a
query type that is able to list all available answers. I mentioned in comments to a ticket that we have something similar to what they need. However, I realize now that that approach will not work.
However, there is a better option. MLab NS already has a "thing" at
http://mlab-nstesting.appspot.com/admin/map/ipv4/all
that generates a map of the status of all the services and places them on a map. We will modify that by parameterizing the output to allow for json responses which will exactly satisfy OONI's needs.
Does this seem reasonable?
Yes. Is that much work?
I am moving on to this now and will keep you posted :-)
This is implemented. You can see that
http://mlab-nstesting.appspot.com/admin/sliver_tools?format=json
produces an array of JSON objects that contain information about each slice. You can parse those objects to find the ooni slivers and then get to their tool_extra bits.
Give this a look and let me know what you think. We can tweak until we get it exactly right.
Will
Thanks for your responses. I will keep everyone up to date as work continues!
Will
For the first pass deployment, Ooni's needs will be "just return everything" or even "just return a random subset that fits into one response". Later releases might want to be clever about geo-location of test_helpers or other policies.
In terms of collectors, the geo location should not matter, since they are Tor hidden services. (It's kind of funny to have a map of where these hidden services will live, something we may want to change later.)
Summary:
I think that we are on the brink of making this full integration happen. We will keep everyone posted as we move forward.
Feedback welcome, obviously!
Will
On 08/01/2014 03:25 PM, Nathan Wilcox wrote:
Dear OTF, Ooni, and M-Lab,
Summary
We've hashed out a design to integrate Ooni with mlab-ns on the M-Lab deployment, and we've implemented a fully functional deployment that approximates this by simulating mlab-ns (this is attached). This completes Milestone D of our contract with OTF.
Design Goals
Our top goals for this integration are:
It does not rely on any changes to upstream Ooni. (For example, probes still use a bouncer .onion, and the backend has stock bouncers, collectors, and test helpers running.)
It can be disabled easily without redeploying the M-Lab backend. Our branch's ooni-support README.md has instructions to disable the integration, merely by editing a cron job to unset an ENABLED flag. There's no need to redeploy different versions of ooni-support.
When enabled, it allows M-Lab operations to monitor collectors and test_helpers status with the same infrastructure as all other M-Lab tools.
Future Architectural Changes
In the future, it may be nice to augment ooni / mlab-ns integration. For example, mlab-ns is designed to support different policies which may be useful to tools, such as geo-location of test_helpers.
The Simulator
This deployment architecture uses a simulator. While it is fully functional and useful for testing it lacks security or robustness, so we want to emphasize *not to deploy this* to non-test environments.
Rationale
There are three rationales for this approach:
First, Least Authority didn't want to push through modifications to mlab-ns without first creating and testing a proof-of-concept.
Second, we didn't want to block our effort on M-Lab engineering effort, so this allows a clean division of labor.
Third, by creating and testing a working proof of concept we can help define the necessary changes to mlab-ns in a tightly scoped and concrete manner.
Security
This system is insecure because it does not use the M-Lab nagios system to gather data, and instead lets anyone paste any data they want into the simulator. Nagios integration is future work captured in this ticket:
Next Steps
Our contract with OTF proposes our next two milestones will focus on improving integration testing and unit test coverage. Our focus at that time was on test automation and documentation for diagnosing integration problems. Test automation has already been improved since that time, and we've accomplished most of the work for documentation:
https://github.com/m-lab-tools/ooni-support/issues/60
Therefore, we propose to focus on some outstanding issues which will improve mlab-ns integration while continuing not to block on, or interfere with, M-Lab operations as follows:
The primary change to mlab-ns will be to allow any tool to include arbitrary data per slivver to be gathered and distributed by mlab-ns. Ooni will use this to distribute data such as collector `.onion` addresses. The need for this change is discussed here:
This proposed change is documented in this ticket:
A secondary change is to implement `match=all` described in #47 above. It may not be necessary, so there is further investigation and testing necessary:
https://github.com/m-lab-tools/ooni-support/issues/56
Along with these changes to `mlab-ns`, we need trivial updates to our integration scripts to work with mlab-ns rather than the simulator:
- https://github.com/m-lab-tools/ooni-support/issues/10
- https://github.com/m-lab-tools/ooni-support/issues/11
Details & Links
Attached is a shortish overview of possible approaches to implement this integration. We've implemented a deployment with a mock mlab-ns (called mlab-ns-simulator) and the "arbitrary data" approach from the attached design document. The pull request is here:
Specific details about this pull request:
- A script for gathering necessary information from collectors and
testhelpers, then updating the mlab-ns-simulator.
- A script for updating a bouncer's state based on the mlan-ns-simulator.
- A cron script to update the bouncer on an hourly schedule.
- The mlab-ns-simulator itself, which approximates the production mlab-ns.
- `.init/` script changes to automatically launch the simulator and
bouncer on `mlab1.nuq0t.measurement-lab.org`.
- Design documentation for mlab-ns integration (including this
stepping stone architecture).
- Each instructions to disable mlab-ns integration without any redeployment.
We also created a subset pull request that has bug fixes but no mlab-ns integration features:
Github Milestones
We split the mlab-ns-simulator deployment tasks out from the larger mlab-ns integration deployment. The mlab-ns-simulator milestone is at:
The full mlab-ns integration milestone:
As always, let us know if you have any feedback!
Inline!!
On 08/04/2014 10:55 PM, Nathan Wilcox wrote:
On Mon, Aug 4, 2014 at 6:55 PM, Will Hawkins hawkinsw@opentechinstitute.org wrote:
On 08/04/2014 09:37 PM, Will Hawkins wrote:
PS: I trimmed the CC line since we were getting into the weeds and I didn't want to bother people at RFA. If it's a good idea to have them in the loop, feel free to add them back!
Good call.
On 08/04/2014 07:06 PM, Will Hawkins wrote:
On 08/04/2014 03:24 PM, Nathan Wilcox wrote:
On Fri, Aug 1, 2014 at 3:08 PM, Will Hawkins hawkinsw@opentechinstitute.org wrote:
To follow-up on Nathan's excellent report, I thought I could shed some light on the status of the OONI integration with MLab NS:
- Our work is temporarily blocked due an operational issue that should
be resolved imminently.
Good to know.
We are officially unblocked.
- The integration that Nathan mentioned between Nagios and MLab NS is
incredibly promising. As mentioned previously, MLab NS captures its information from the MLab nagios instance using a "baseList" script that runs on our monitoring server. As it functions now, MLab NS is filled with information based on the output of a baseList call that looks like:
http://nagios.measurementlab.net/baseList?show_state=1&service_name=ndt
which has output like:
ndt.iupui.mlab1.akl01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab2.akl01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab3.akl01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab1.ams01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab2.ams01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab3.ams01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab1.ams02.measurement-lab.org/ndt 0 1 ndt.iupui.mlab2.ams02.measurement-lab.org/ndt 0 1 ...
The 0s and 1s are flags indicating whether there is a "problem" with the slice or not. I.e., they are backward.
baseList takes an additional parameter known as plugin_output. We will update MLab NS to call baseList with this additional parameter. The call will look like:
http://nagios.measurementlab.net/baseList?show_state=1&service_name=ndt&...
which has output like:
ndt.iupui.mlab1.akl01.measurement-lab.org/ndt 0 1 TCP OK - 0.136 second response time on ndt.iupui.mlab1.akl01.measurement-lab.org port 3001 ndt.iupui.mlab2.akl01.measurement-lab.org/ndt 0 1 TCP OK - 0.136 second response time on ndt.iupui.mlab2.akl01.measurement-lab.org port 3001 ndt.iupui.mlab3.akl01.measurement-lab.org/ndt 0 1 TCP OK - 0.135 second response time on ndt.iupui.mlab3.akl01.measurement-lab.org port 3001 ndt.iupui.mlab1.ams01.measurement-lab.org/ndt 0 1 TCP OK - 0.145 second response time on ndt.iupui.mlab1.ams01.measurement-lab.org port 3001 ...
The extra data is the output from the plugin that monitors whether the particular service is online. In this example, we are monitoring ndt and the plugin reports whether a TCP connection is possible to port 3001 (NDT's port).
So, the integration point between nagios, MLab NS and OONI will look like this:
The nagios plugin written by LA/OONI will use return codes to signal whether the OONI service is running. That return value will be the 0s and 1s in baseList output. The "string" output from the plugin will be the information that needs to be captured in MLab NS and returned with OONI queries. Based on pull requests, I suspect the resulting response to a baseList call like
http://nagios.measurementlab.net/baseList?show_state=1&service_name=ooni...
will be something like
ooni.mlab.mlab1.akl01.measurement-lab.org/ndt 0 1 'collector_onion': 'testfakenotreal.onion' ooni.mlab.mlab1.akl02.measurement-lab.org/ndt 0 1 'collector_onion': 'testfakenotreal.onion' ...
This sounds close to what we're imagining. BTW- we're tracking the Ooni side of this here:
https://github.com/m-lab-tools/ooni-support/issues/10
Could you link in a reference to the nagios plugin interface in that ticket #10 to help define its closure criteria?
Is the syntax for the plugin-specific detail just anything up to the next newline? We'd probably want to encode this in JSON to ensure any newlines or other weirdness doesn't break this format. Also, I just picked JSON because I saw that appengine's ndb has a field type for that and I figured it could be a generally useful format for any tool. Another approach is to have a blob property in mlab-ns.
Hello again! Sorry for responding to these out of order.
You are exactly correct. As it stands now, the baseList code will include the plugin output up to the newline. Encoding the plugin output so that there are not embedded newlines will probably be important. The more that we can do without having to change baseList, the better. But, if it is too inconvenient, we can make some changes (e.g., replace ' '-delimiters with something a little, say, clearer).
Given the progress on the other work, I think that we are ready to move forward on developing this plugin. What is the best way for us to work together to get this done? In the interest of complete disclosure, I have absolutely no experience writing nagios plugins, but I am happy to learn!
Excellent! I'm not familiar with nagios either. Can we find examples from other M-Lab tools?
BTW- I really want to focus on this remaining area of integration because we're close. However, our contract specifies that we'll work on improving unit tests. I'm going to propose to RFA that we work on this instead, because it's necessary, whereas unittests are arguably a non-essential quality improvement.
For this nagios integration within ooni-support specifically here are the steps I see:
- Figure out if we can get nagios to execute a python script to
gather the information, and if so: 2. Modify ./bouncer-plumbing/collector-to-mlab/getconfig.py so that instead of posting the details with urllib, it prints them to stdout the way nagios likes.
There's kind of a tangential issue which is that script only exists in our fork of ooni-support:
https://github.com/LeastAuthority/ooni-support/blob/combined-leastauthority-...
We've made a pull request to the upstream ooni-support's master branch, but I believe it's premature to land this (even though if it were landed, it's trivial to turn off the mlab-ns integration):
https://github.com/m-lab-tools/ooni-support/pull/59
So, a Step 0 might be to create a new "mlab-ns-integration" branch on the main ooni-support repository, land our work there, and then continue deployment on that branch until we have consensus that the integration works well.
I think this will be the last email for the night :-)
Aha, but I am on the left coast and can send later emails with less personal inconvenience! Mwuhaha!
So maybe *this* is the last one tonight?
I just want you to know that I got this email before I went to bed last night but chose not to one up you :-)
(continue below)
Will
Note, we closed a ticket for the mlab-ns-simulator which was to "approximate" the nagios pipeline, but it's not realistic at all:
https://github.com/m-lab-tools/ooni-support/issues/48
The 'collector_onion':'testfakenotreal.onion' string will makes its way through MLab NS get spit out as tool_extra from a query like:
http://mlab-ns.appspot.com/ooni
that gives something like:
{"city": "Washington", "url": "http://ndt.iupui.mlab1.iad01.measurement-lab.org:7123", "ip": ["216.156.197.139"], "site": "iad01", "fqdn": "ndt.iupui.mlab1.iad01.measurement-lab.org", "country": "US", "port": "3001", "tool_extra": 'testfakenotreal.onion' }
That sounds perfect. Is there a ticket somewhere for the link between nagios and mlab-ns? I'd like to keep an eye on that.
How about another ticket for including the "tool_extra" field into the mlab-ns datastore and returning it in queries? I sketched out what these changes might look like here:
I will dig into the specific tickets and update them appropriately, but I wanted you to know that we now have "tool_extra" support in the MLab NS testing instance:
http://mlab-nstesting.appspot.com/ndt
gives
{"city": "Washington_DC", "url": "http://ndt.iupui.mlab2.iad01.measurement-lab.org:7123", "ip": ["216.156.197.152"], "fqdn": "ndt.iupui.mlab2.iad01.measurement-lab.org", "site": "iad01", "country": "US", "tool_extra": "1 TCP OK - 0.075 second response time on ndt.iupui.mlab2.iad01.measurement-lab.org port 3001"}
You can see the commit here: https://code.google.com/r/hawkinsw-sieve/source/detail?name=ooni&r=2ab4a...
The TL;DR is that we are well-positioned to make these changes to MLab NS that will not require many (any?) fundamental changes to MLab NS or our monitoring infrastructure.
Does this seem reasonable?
Yep. Do you have some timeline estimate for the two changes of incorporating extra details in the nagios -> mlab-ns pipeline, and updating mlab-ns to store and return the "tool_extra" field?
See above. Sliver tools that expose plugin output will be stored in tool_extra and returned with queries.
- As Nathan mentioned, their integration with MLab NS will require a
query type that is able to list all available answers. I mentioned in comments to a ticket that we have something similar to what they need. However, I realize now that that approach will not work.
However, there is a better option. MLab NS already has a "thing" at
http://mlab-nstesting.appspot.com/admin/map/ipv4/all
that generates a map of the status of all the services and places them on a map. We will modify that by parameterizing the output to allow for json responses which will exactly satisfy OONI's needs.
Does this seem reasonable?
Yes. Is that much work?
I am moving on to this now and will keep you posted :-)
This is implemented. You can see that
http://mlab-nstesting.appspot.com/admin/sliver_tools?format=json
produces an array of JSON objects that contain information about each slice. You can parse those objects to find the ooni slivers and then get to their tool_extra bits.
Give this a look and let me know what you think. We can tweak until we get it exactly right.
Will
I changed around our implementation in response to Taylor's comments on
https://github.com/m-lab-tools/ooni-support/issues/56
The implementation is now more inline with what you suggested, Nathan, in your previous comments on https://github.com/m-lab-tools/ooni-support/issues/47
You can see the resulting implementation, by example, at
http://mlab-nstesting.appspot.com/ndt?policy=all
Besides invoking this functionality through a different URL, the semantics are slightly different. The "all" is a slight misnomer because MLab NS will return information about *all* the instances that are online. Those that nagios thinks are down will not be returned in the result. You can see that here:
http://mlab-nstesting.appspot.com/ooni?policy=all
I hope that makes our integration easier. I will post a follow-up to Taylor's response in the issue to make sure that it's tracked in both places.
Will
Thanks for your responses. I will keep everyone up to date as work continues!
Will
For the first pass deployment, Ooni's needs will be "just return everything" or even "just return a random subset that fits into one response". Later releases might want to be clever about geo-location of test_helpers or other policies.
In terms of collectors, the geo location should not matter, since they are Tor hidden services. (It's kind of funny to have a map of where these hidden services will live, something we may want to change later.)
Summary:
I think that we are on the brink of making this full integration happen. We will keep everyone posted as we move forward.
Feedback welcome, obviously!
Will
On 08/01/2014 03:25 PM, Nathan Wilcox wrote: > Dear OTF, Ooni, and M-Lab, > > Summary > ======= > > We've hashed out a design to integrate Ooni with mlab-ns on the M-Lab > deployment, and we've implemented a fully functional deployment that > approximates this by simulating mlab-ns (this is attached). This > completes Milestone D of our contract with OTF. > > Design Goals > ============ > > Our top goals for this integration are: > > It does not rely on any changes to upstream Ooni. (For example, > probes still use a bouncer .onion, and the backend has stock bouncers, > collectors, and test helpers running.) > > It can be disabled easily without redeploying the M-Lab backend. Our > branch's ooni-support README.md has instructions to disable the > integration, merely by editing a cron job to unset an ENABLED flag. > There's no need to redeploy different versions of ooni-support. > > When enabled, it allows M-Lab operations to monitor collectors and > test_helpers status with the same infrastructure as all other M-Lab > tools. > > Future Architectural Changes > ---------------------------- > > In the future, it may be nice to augment ooni / mlab-ns integration. > For example, mlab-ns is designed to support different policies which > may be useful to tools, such as geo-location of test_helpers. > > The Simulator > ============= > > This deployment architecture uses a simulator. While it is fully > functional and useful for testing it lacks security or robustness, so > we want to emphasize *not to deploy this* to non-test environments. > > Rationale > --------- > > There are three rationales for this approach: > > First, Least Authority didn't want to push through modifications to > mlab-ns without first creating and testing a proof-of-concept. > > Second, we didn't want to block our effort on M-Lab engineering > effort, so this allows a clean division of labor. > > Third, by creating and testing a working proof of concept we can help > define the necessary changes to mlab-ns in a tightly scoped and > concrete manner. > > Security > -------- > > This system is insecure because it does not use the M-Lab nagios > system to gather data, and instead lets anyone paste any data they > want into the simulator. Nagios integration is future work captured > in this ticket: > > * https://github.com/m-lab-tools/ooni-support/issues/10 > > > Next Steps > ========== > > Our contract with OTF proposes our next two milestones will focus on > improving integration testing and unit test coverage. Our focus at > that time was on test automation and documentation for diagnosing > integration problems. Test automation has already been improved since > that time, and we've accomplished most of the work for documentation: > > https://github.com/m-lab-tools/ooni-support/issues/60 > > Therefore, we propose to focus on some outstanding issues which will > improve mlab-ns integration while continuing not to block on, or > interfere with, M-Lab operations as follows: > > The primary change to mlab-ns will be to allow any tool to include > arbitrary data per slivver to be gathered and distributed by mlab-ns. > Ooni will use this to distribute data such as collector `.onion` > addresses. The need for this change is discussed here: > > * https://github.com/m-lab-tools/ooni-support/issues/4 > > This proposed change is documented in this ticket: > > * https://github.com/m-lab-tools/ooni-support/issues/47 > > A secondary change is to implement `match=all` described in #47 above. > It may not be necessary, so there is further investigation and testing > necessary: > > https://github.com/m-lab-tools/ooni-support/issues/56 > > Along with these changes to `mlab-ns`, we need trivial updates to our > integration scripts to work with mlab-ns rather than the simulator: > > * https://github.com/m-lab-tools/ooni-support/issues/10 > * https://github.com/m-lab-tools/ooni-support/issues/11 > > > Details & Links > =============== > > Attached is a shortish overview of possible approaches to implement > this integration. We've implemented a deployment with a mock mlab-ns > (called mlab-ns-simulator) and the "arbitrary data" approach from the > attached design document. The pull request is here: > > * https://github.com/m-lab-tools/ooni-support/pull/59 > > Specific details about this pull request: > > * A script for gathering necessary information from collectors and > testhelpers, then updating the mlab-ns-simulator. > * A script for updating a bouncer's state based on the mlan-ns-simulator. > * A cron script to update the bouncer on an hourly schedule. > * The mlab-ns-simulator itself, which approximates the production mlab-ns. > * `.init/` script changes to automatically launch the simulator and > bouncer on `mlab1.nuq0t.measurement-lab.org`. > * Design documentation for mlab-ns integration (including this > stepping stone architecture). > * Each instructions to disable mlab-ns integration without any redeployment. > > We also created a subset pull request that has bug fixes but no > mlab-ns integration features: > > * https://github.com/m-lab-tools/ooni-support/pull/58 > > Github Milestones > ----------------- > > We split the mlab-ns-simulator deployment tasks out from the larger > mlab-ns integration deployment. The mlab-ns-simulator milestone is > at: > > * https://github.com/m-lab-tools/ooni-support/issues?q=is%3Aissue+milestone%3A... > > The full mlab-ns integration milestone: > > * https://github.com/m-lab-tools/ooni-support/issues?q=is%3Aissue+milestone%3A... > > > > As always, let us know if you have any feedback! >
Given the progress on the other work, I think that we are ready to move forward on developing this plugin. What is the best way for us to work together to get this done? In the interest of complete disclosure, I have absolutely no experience writing nagios plugins, but I am happy to learn!
Does the plugin run on the slice or on the Nagios server? If it runs on the slice, then it's just a matter of modifying getconfig.py's output (in our ooni-support fork) to (1) have its output in the correct format and (2) test that the ports are accepting connections. If the plugin runs on the Nagios server, then we need a way to run the script on the slice, and the plugin should just be "run some arbitrary command on the slice", where, for Ooni, that command will be getconfig.py.
On Mon, Aug 4, 2014 at 7:55 PM, Will Hawkins <hawkinsw@opentechinstitute.org
wrote:
On 08/04/2014 09:37 PM, Will Hawkins wrote:
PS: I trimmed the CC line since we were getting into the weeds and I didn't want to bother people at RFA. If it's a good idea to have them in the loop, feel free to add them back!
On 08/04/2014 07:06 PM, Will Hawkins wrote:
On 08/04/2014 03:24 PM, Nathan Wilcox wrote:
On Fri, Aug 1, 2014 at 3:08 PM, Will Hawkins hawkinsw@opentechinstitute.org wrote:
To follow-up on Nathan's excellent report, I thought I could shed some light on the status of the OONI integration with MLab NS:
- Our work is temporarily blocked due an operational issue that
should
be resolved imminently.
Good to know.
We are officially unblocked.
- The integration that Nathan mentioned between Nagios and MLab NS is
incredibly promising. As mentioned previously, MLab NS captures its information from the MLab nagios instance using a "baseList" script
that
runs on our monitoring server. As it functions now, MLab NS is filled with information based on the output of a baseList call that looks
like:
http://nagios.measurementlab.net/baseList?show_state=1&service_name=ndt
which has output like:
ndt.iupui.mlab1.akl01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab2.akl01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab3.akl01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab1.ams01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab2.ams01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab3.ams01.measurement-lab.org/ndt 0 1 ndt.iupui.mlab1.ams02.measurement-lab.org/ndt 0 1 ndt.iupui.mlab2.ams02.measurement-lab.org/ndt 0 1 ...
The 0s and 1s are flags indicating whether there is a "problem" with
the
slice or not. I.e., they are backward.
baseList takes an additional parameter known as plugin_output. We will update MLab NS to call baseList with this additional parameter. The
call
will look like:
http://nagios.measurementlab.net/baseList?show_state=1&service_name=ndt&...
which has output like:
ndt.iupui.mlab1.akl01.measurement-lab.org/ndt 0 1 TCP OK - 0.136
second
response time on ndt.iupui.mlab1.akl01.measurement-lab.org port 3001 ndt.iupui.mlab2.akl01.measurement-lab.org/ndt 0 1 TCP OK - 0.136
second
response time on ndt.iupui.mlab2.akl01.measurement-lab.org port 3001 ndt.iupui.mlab3.akl01.measurement-lab.org/ndt 0 1 TCP OK - 0.135
second
response time on ndt.iupui.mlab3.akl01.measurement-lab.org port 3001 ndt.iupui.mlab1.ams01.measurement-lab.org/ndt 0 1 TCP OK - 0.145
second
response time on ndt.iupui.mlab1.ams01.measurement-lab.org port 3001 ...
The extra data is the output from the plugin that monitors whether the particular service is online. In this example, we are monitoring ndt
and
the plugin reports whether a TCP connection is possible to port 3001 (NDT's port).
So, the integration point between nagios, MLab NS and OONI will look like this:
The nagios plugin written by LA/OONI will use return codes to signal whether the OONI service is running. That return value will be the 0s and 1s in baseList output. The "string" output from the plugin will be the information that needs to be captured in MLab NS and returned with OONI queries. Based on pull requests, I suspect the resulting response to a baseList call like
http://nagios.measurementlab.net/baseList?show_state=1&service_name=ooni...
will be something like
ooni.mlab.mlab1.akl01.measurement-lab.org/ndt 0 1 'collector_onion': 'testfakenotreal.onion' ooni.mlab.mlab1.akl02.measurement-lab.org/ndt 0 1 'collector_onion': 'testfakenotreal.onion' ...
This sounds close to what we're imagining. BTW- we're tracking the Ooni side of this here:
https://github.com/m-lab-tools/ooni-support/issues/10
Could you link in a reference to the nagios plugin interface in that ticket #10 to help define its closure criteria?
Is the syntax for the plugin-specific detail just anything up to the next newline? We'd probably want to encode this in JSON to ensure any newlines or other weirdness doesn't break this format. Also, I just picked JSON because I saw that appengine's ndb has a field type for that and I figured it could be a generally useful format for any tool. Another approach is to have a blob property in mlab-ns.
Hello again! Sorry for responding to these out of order.
You are exactly correct. As it stands now, the baseList code will include the plugin output up to the newline. Encoding the plugin output so that there are not embedded newlines will probably be important. The more that we can do without having to change baseList, the better. But, if it is too inconvenient, we can make some changes (e.g., replace ' '-delimiters with something a little, say, clearer).
Given the progress on the other work, I think that we are ready to move forward on developing this plugin. What is the best way for us to work together to get this done? In the interest of complete disclosure, I have absolutely no experience writing nagios plugins, but I am happy to learn!
I think this will be the last email for the night :-)
Will
Note, we closed a ticket for the mlab-ns-simulator which was to "approximate" the nagios pipeline, but it's not realistic at all:
https://github.com/m-lab-tools/ooni-support/issues/48
The 'collector_onion':'testfakenotreal.onion' string will makes its
way
through MLab NS get spit out as tool_extra from a query like:
http://mlab-ns.appspot.com/ooni
that gives something like:
{"city": "Washington", "url": "http://ndt.iupui.mlab1.iad01.measurement-lab.org:7123", "ip": ["216.156.197.139"], "site": "iad01", "fqdn": "ndt.iupui.mlab1.iad01.measurement-lab.org", "country": "US", "port": "3001", "tool_extra": 'testfakenotreal.onion' }
That sounds perfect. Is there a ticket somewhere for the link between nagios and mlab-ns? I'd like to keep an eye on that.
How about another ticket for including the "tool_extra" field into the mlab-ns datastore and returning it in queries? I sketched out what these changes might look like here:
I will dig into the specific tickets and update them appropriately, but I wanted you to know that we now have "tool_extra" support in the MLab NS testing instance:
http://mlab-nstesting.appspot.com/ndt
gives
{"city": "Washington_DC", "url": "http://ndt.iupui.mlab2.iad01.measurement-lab.org:7123", "ip": ["216.156.197.152"], "fqdn": "ndt.iupui.mlab2.iad01.measurement-lab.org", "site": "iad01",
"country":
"US", "tool_extra": "1 TCP OK - 0.075 second response time on ndt.iupui.mlab2.iad01.measurement-lab.org port 3001"}
You can see the commit here:
https://code.google.com/r/hawkinsw-sieve/source/detail?name=ooni&r=2ab4a...
The TL;DR is that we are well-positioned to make these changes to MLab NS that will not require many (any?) fundamental changes to MLab NS or our monitoring infrastructure.
Does this seem reasonable?
Yep. Do you have some timeline estimate for the two changes of incorporating extra details in the nagios -> mlab-ns pipeline, and updating mlab-ns to store and return the "tool_extra" field?
See above. Sliver tools that expose plugin output will be stored in tool_extra and returned with queries.
- As Nathan mentioned, their integration with MLab NS will require a
query type that is able to list all available answers. I mentioned in comments to a ticket that we have something similar to what they need. However, I realize now that that approach will not work.
However, there is a better option. MLab NS already has a "thing" at
http://mlab-nstesting.appspot.com/admin/map/ipv4/all
that generates a map of the status of all the services and places them on a map. We will modify that by parameterizing the output to allow
for
json responses which will exactly satisfy OONI's needs.
Does this seem reasonable?
Yes. Is that much work?
I am moving on to this now and will keep you posted :-)
This is implemented. You can see that
http://mlab-nstesting.appspot.com/admin/sliver_tools?format=json
produces an array of JSON objects that contain information about each slice. You can parse those objects to find the ooni slivers and then get to their tool_extra bits.
Give this a look and let me know what you think. We can tweak until we get it exactly right.
Will
Thanks for your responses. I will keep everyone up to date as work continues!
Will
For the first pass deployment, Ooni's needs will be "just return everything" or even "just return a random subset that fits into one response". Later releases might want to be clever about geo-location of test_helpers or other policies.
In terms of collectors, the geo location should not matter, since they are Tor hidden services. (It's kind of funny to have a map of where these hidden services will live, something we may want to change later.)
Summary:
I think that we are on the brink of making this full integration
happen.
We will keep everyone posted as we move forward.
Feedback welcome, obviously!
Will
On 08/01/2014 03:25 PM, Nathan Wilcox wrote:
Dear OTF, Ooni, and M-Lab,
Summary
We've hashed out a design to integrate Ooni with mlab-ns on the M-Lab deployment, and we've implemented a fully functional deployment that approximates this by simulating mlab-ns (this is attached). This completes Milestone D of our contract with OTF.
Design Goals
Our top goals for this integration are:
It does not rely on any changes to upstream Ooni. (For example, probes still use a bouncer .onion, and the backend has stock
bouncers,
collectors, and test helpers running.)
It can be disabled easily without redeploying the M-Lab backend. Our branch's ooni-support README.md has instructions to disable the integration, merely by editing a cron job to unset an ENABLED flag. There's no need to redeploy different versions of ooni-support.
When enabled, it allows M-Lab operations to monitor collectors and test_helpers status with the same infrastructure as all other M-Lab tools.
Future Architectural Changes
In the future, it may be nice to augment ooni / mlab-ns integration. For example, mlab-ns is designed to support different policies which may be useful to tools, such as geo-location of test_helpers.
The Simulator
This deployment architecture uses a simulator. While it is fully functional and useful for testing it lacks security or robustness, so we want to emphasize *not to deploy this* to non-test environments.
Rationale
There are three rationales for this approach:
First, Least Authority didn't want to push through modifications to mlab-ns without first creating and testing a proof-of-concept.
Second, we didn't want to block our effort on M-Lab engineering effort, so this allows a clean division of labor.
Third, by creating and testing a working proof of concept we can help define the necessary changes to mlab-ns in a tightly scoped and concrete manner.
Security
This system is insecure because it does not use the M-Lab nagios system to gather data, and instead lets anyone paste any data they want into the simulator. Nagios integration is future work captured in this ticket:
Next Steps
Our contract with OTF proposes our next two milestones will focus on improving integration testing and unit test coverage. Our focus at that time was on test automation and documentation for diagnosing integration problems. Test automation has already been improved
since
that time, and we've accomplished most of the work for documentation:
https://github.com/m-lab-tools/ooni-support/issues/60
Therefore, we propose to focus on some outstanding issues which will improve mlab-ns integration while continuing not to block on, or interfere with, M-Lab operations as follows:
The primary change to mlab-ns will be to allow any tool to include arbitrary data per slivver to be gathered and distributed by mlab-ns. Ooni will use this to distribute data such as collector `.onion` addresses. The need for this change is discussed here:
This proposed change is documented in this ticket:
A secondary change is to implement `match=all` described in #47
above.
It may not be necessary, so there is further investigation and
testing
necessary:
https://github.com/m-lab-tools/ooni-support/issues/56
Along with these changes to `mlab-ns`, we need trivial updates to our integration scripts to work with mlab-ns rather than the simulator:
- https://github.com/m-lab-tools/ooni-support/issues/10
- https://github.com/m-lab-tools/ooni-support/issues/11
Details & Links
Attached is a shortish overview of possible approaches to implement this integration. We've implemented a deployment with a mock mlab-ns (called mlab-ns-simulator) and the "arbitrary data" approach from the attached design document. The pull request is here:
Specific details about this pull request:
- A script for gathering necessary information from collectors and
testhelpers, then updating the mlab-ns-simulator.
- A script for updating a bouncer's state based on the
mlan-ns-simulator.
- A cron script to update the bouncer on an hourly schedule.
- The mlab-ns-simulator itself, which approximates the production
mlab-ns.
- `.init/` script changes to automatically launch the simulator and
bouncer on `mlab1.nuq0t.measurement-lab.org`.
- Design documentation for mlab-ns integration (including this
stepping stone architecture).
- Each instructions to disable mlab-ns integration without any
redeployment.
We also created a subset pull request that has bug fixes but no mlab-ns integration features:
Github Milestones
We split the mlab-ns-simulator deployment tasks out from the larger mlab-ns integration deployment. The mlab-ns-simulator milestone is at:
https://github.com/m-lab-tools/ooni-support/issues?q=is%3Aissue+milestone%3A...
The full mlab-ns integration milestone:
https://github.com/m-lab-tools/ooni-support/issues?q=is%3Aissue+milestone%3A...
As always, let us know if you have any feedback!