Given the progress on the other work, I think that we are ready to move
forward on developing this plugin. What is the best way for us to work
together to get this done? In the interest of complete disclosure, I
have absolutely no experience writing nagios plugins, but I am happy to
learn!

Does the plugin run on the slice or on the Nagios server? If it runs on the slice, then it's just a matter of modifying getconfig.py's output (in our ooni-support fork) to (1) have its output in the correct format and (2) test that the ports are accepting connections. If the plugin runs on the Nagios server, then we need a way to run the script on the slice, and the plugin should just be "run some arbitrary command on the slice", where, for Ooni, that command will be getconfig.py.


On Mon, Aug 4, 2014 at 7:55 PM, Will Hawkins <hawkinsw@opentechinstitute.org> wrote:


On 08/04/2014 09:37 PM, Will Hawkins wrote:
> PS: I trimmed the CC line since we were getting into the weeds and I
> didn't want to bother people at RFA. If it's a good idea to have them in
> the loop, feel free to add them back!
>
> On 08/04/2014 07:06 PM, Will Hawkins wrote:
>>
>>
>> On 08/04/2014 03:24 PM, Nathan Wilcox wrote:
>>> On Fri, Aug 1, 2014 at 3:08 PM, Will Hawkins
>>> <hawkinsw@opentechinstitute.org> wrote:
>>>> To follow-up on Nathan's excellent report, I thought I could shed some
>>>> light on the status of the OONI integration with MLab NS:
>>>>
>>>> 1. Our work is temporarily blocked due an operational issue that should
>>>> be resolved imminently.
>>>>
>>>
>>> Good to know.
>>
>> We are officially unblocked.
>>
>>>
>>>> 2. The integration that Nathan mentioned between Nagios and MLab NS is
>>>> incredibly promising. As mentioned previously, MLab NS captures its
>>>> information from the MLab nagios instance using a "baseList" script that
>>>> runs on our monitoring server. As it functions now, MLab NS is filled
>>>> with information based on the output of a baseList call that looks like:
>>>>
>>>> http://nagios.measurementlab.net/baseList?show_state=1&service_name=ndt
>>>>
>>>> which has output like:
>>>>
>>>> ndt.iupui.mlab1.akl01.measurement-lab.org/ndt 0 1
>>>> ndt.iupui.mlab2.akl01.measurement-lab.org/ndt 0 1
>>>> ndt.iupui.mlab3.akl01.measurement-lab.org/ndt 0 1
>>>> ndt.iupui.mlab1.ams01.measurement-lab.org/ndt 0 1
>>>> ndt.iupui.mlab2.ams01.measurement-lab.org/ndt 0 1
>>>> ndt.iupui.mlab3.ams01.measurement-lab.org/ndt 0 1
>>>> ndt.iupui.mlab1.ams02.measurement-lab.org/ndt 0 1
>>>> ndt.iupui.mlab2.ams02.measurement-lab.org/ndt 0 1
>>>> ...
>>>>
>>>> The 0s and 1s are flags indicating whether there is a "problem" with the
>>>> slice or not. I.e., they are backward.
>>>>
>>>> baseList takes an additional parameter known as plugin_output. We will
>>>> update MLab NS to call baseList with this additional parameter. The call
>>>> will look like:
>>>>
>>>> http://nagios.measurementlab.net/baseList?show_state=1&service_name=ndt&plugin_output=1
>>>>
>>>> which has output like:
>>>>
>>>> ndt.iupui.mlab1.akl01.measurement-lab.org/ndt 0 1 TCP OK - 0.136 second
>>>> response time on ndt.iupui.mlab1.akl01.measurement-lab.org port 3001
>>>> ndt.iupui.mlab2.akl01.measurement-lab.org/ndt 0 1 TCP OK - 0.136 second
>>>> response time on ndt.iupui.mlab2.akl01.measurement-lab.org port 3001
>>>> ndt.iupui.mlab3.akl01.measurement-lab.org/ndt 0 1 TCP OK - 0.135 second
>>>> response time on ndt.iupui.mlab3.akl01.measurement-lab.org port 3001
>>>> ndt.iupui.mlab1.ams01.measurement-lab.org/ndt 0 1 TCP OK - 0.145 second
>>>> response time on ndt.iupui.mlab1.ams01.measurement-lab.org port 3001
>>>> ...
>>>>
>>>> The extra data is the output from the plugin that monitors whether the
>>>> particular service is online. In this example, we are monitoring ndt and
>>>> the plugin reports whether a TCP connection is possible to port 3001
>>>> (NDT's port).
>>>>
>>>> So, the integration point between nagios, MLab NS and OONI will look
>>>> like this:
>>>>
>>>> The nagios plugin written by LA/OONI will use return codes to signal
>>>> whether the OONI service is running. That return value will be the 0s
>>>> and 1s in baseList output. The "string" output from the plugin will be
>>>> the information that needs to be captured in MLab NS and returned with
>>>> OONI queries. Based on pull requests, I suspect the resulting response
>>>> to a baseList call like
>>>>
>>>> http://nagios.measurementlab.net/baseList?show_state=1&service_name=ooni&plugin_output=1
>>>>
>>>> will be something like
>>>>
>>>> ooni.mlab.mlab1.akl01.measurement-lab.org/ndt 0 1 'collector_onion':
>>>> 'testfakenotreal.onion'
>>>> ooni.mlab.mlab1.akl02.measurement-lab.org/ndt 0 1 'collector_onion':
>>>> 'testfakenotreal.onion'
>>>> ...
>>>
>>> This sounds close to what we're imagining.  BTW- we're tracking the
>>> Ooni side of this here:
>>>
>>> https://github.com/m-lab-tools/ooni-support/issues/10
>>>
>>> Could you link in a reference to the nagios plugin interface in that
>>> ticket #10 to help define its closure criteria?
>>>
>>> Is the syntax for the plugin-specific detail just anything up to the
>>> next newline?  We'd probably want to encode this in JSON to ensure any
>>> newlines or other weirdness doesn't break this format.  Also, I just
>>> picked JSON because I saw that appengine's ndb has a field type for
>>> that and I figured it could be a generally useful format for any tool.
>>> Another approach is to have a blob property in mlab-ns.

Hello again! Sorry for responding to these out of order.

You are exactly correct. As it stands now, the baseList code will
include the plugin output up to the newline. Encoding the plugin output
so that there are not embedded newlines will probably be important. The
more that we can do without having to change baseList, the better. But,
if it is too inconvenient, we can make some changes (e.g., replace '
'-delimiters with something a little, say, clearer).

Given the progress on the other work, I think that we are ready to move
forward on developing this plugin. What is the best way for us to work
together to get this done? In the interest of complete disclosure, I
have absolutely no experience writing nagios plugins, but I am happy to
learn!

I think this will be the last email for the night :-)

Will

>>>
>>>
>>> Note, we closed a ticket for the mlab-ns-simulator which was to
>>> "approximate" the nagios pipeline, but it's not realistic at all:
>>>
>>> https://github.com/m-lab-tools/ooni-support/issues/48
>>>
>>>>
>>>> The 'collector_onion':'testfakenotreal.onion' string will makes its way
>>>> through MLab NS get spit out as tool_extra from a query like:
>>>>
>>>> http://mlab-ns.appspot.com/ooni
>>>>
>>>> that gives something like:
>>>>
>>>> {"city": "Washington", "url":
>>>> "http://ndt.iupui.mlab1.iad01.measurement-lab.org:7123", "ip":
>>>> ["216.156.197.139"], "site": "iad01", "fqdn":
>>>> "ndt.iupui.mlab1.iad01.measurement-lab.org", "country": "US", "port":
>>>> "3001", "tool_extra": 'testfakenotreal.onion' }
>>>
>>> That sounds perfect.  Is there a ticket somewhere for the link between
>>> nagios and mlab-ns?  I'd like to keep an eye on that.
>>>
>>> How about another ticket for including the "tool_extra" field into the
>>> mlab-ns datastore and returning it in queries?  I sketched out what
>>> these changes might look like here:
>>>
>>> https://github.com/m-lab-tools/ooni-support/issues/47
>>>
>>>
>>
>> I will dig into the specific tickets and update them appropriately, but
>> I wanted you to know that we now have "tool_extra" support in the MLab
>> NS testing instance:
>>
>> http://mlab-nstesting.appspot.com/ndt
>>
>> gives
>>
>> {"city": "Washington_DC", "url":
>> "http://ndt.iupui.mlab2.iad01.measurement-lab.org:7123", "ip":
>> ["216.156.197.152"], "fqdn":
>> "ndt.iupui.mlab2.iad01.measurement-lab.org", "site": "iad01", "country":
>> "US", "tool_extra": "1 TCP OK - 0.075 second response time on
>> ndt.iupui.mlab2.iad01.measurement-lab.org port 3001"}
>>
>> You can see the commit here:
>> https://code.google.com/r/hawkinsw-sieve/source/detail?name=ooni&r=2ab4a584c65f7c17c42f40eb4116ae4169b92e72
>>
>>>> The TL;DR is that we are well-positioned to make these changes to MLab
>>>> NS that will not require many (any?) fundamental changes to MLab NS or
>>>> our monitoring infrastructure.
>>>>
>>>> Does this seem reasonable?
>>>
>>> Yep.  Do you have some timeline estimate for the two changes of
>>> incorporating extra details in the nagios -> mlab-ns pipeline, and
>>> updating mlab-ns to store and return the "tool_extra" field?
>>
>> See above. Sliver tools that expose plugin output will be stored in
>> tool_extra and returned with queries.
>>
>>>
>>>>
>>>> 3. As Nathan mentioned, their integration with MLab NS will require a
>>>> query type that is able to list all available answers. I mentioned in
>>>> comments to a ticket that we have something similar to what they need.
>>>> However, I realize now that that approach will not work.
>>>>
>>>> However, there is a better option. MLab NS already has a "thing" at
>>>>
>>>> http://mlab-nstesting.appspot.com/admin/map/ipv4/all
>>>>
>>>> that generates a map of the status of all the services and places them
>>>> on a map. We will modify that by parameterizing the output to allow for
>>>> json responses which will exactly satisfy OONI's needs.
>>>>
>>>> Does this seem reasonable?
>>>
>>> Yes.  Is that much work?
>>
>> I am moving on to this now and will keep you posted :-)
>
> This is implemented. You can see that
>
> http://mlab-nstesting.appspot.com/admin/sliver_tools?format=json
>
> produces an array of JSON objects that contain information about each
> slice. You can parse those objects to find the ooni slivers and then get
> to their tool_extra bits.
>
> Give this a look and let me know what you think. We can tweak until we
> get it exactly right.
>
> Will
>
>>
>> Thanks for your responses. I will keep everyone up to date as work
>> continues!
>>
>> Will
>>
>>>
>>> For the first pass deployment, Ooni's needs will be "just return
>>> everything" or even "just return a random subset that fits into one
>>> response".  Later releases might want to be clever about geo-location
>>> of test_helpers or other policies.
>>>
>>> In terms of collectors, the geo location should not matter, since they
>>> are Tor hidden services.  (It's kind of funny to have a map of where
>>> these hidden services will live, something we may want to change
>>> later.)
>>>
>>>
>>>>
>>>> Summary:
>>>>
>>>> I think that we are on the brink of making this full integration happen.
>>>> We will keep everyone posted as we move forward.
>>>>
>>>> Feedback welcome, obviously!
>>>>
>>>> Will
>>>>
>>>>
>>>>
>>>> On 08/01/2014 03:25 PM, Nathan Wilcox wrote:
>>>>> Dear OTF, Ooni, and M-Lab,
>>>>>
>>>>> Summary
>>>>> =======
>>>>>
>>>>> We've hashed out a design to integrate Ooni with mlab-ns on the M-Lab
>>>>> deployment, and we've implemented a fully functional deployment that
>>>>> approximates this by simulating mlab-ns (this is attached).  This
>>>>> completes Milestone D of our contract with OTF.
>>>>>
>>>>> Design Goals
>>>>> ============
>>>>>
>>>>> Our top goals for this integration are:
>>>>>
>>>>> It does not rely on any changes to upstream Ooni.  (For example,
>>>>> probes still use a bouncer .onion, and the backend has stock bouncers,
>>>>> collectors, and test helpers running.)
>>>>>
>>>>> It can be disabled easily without redeploying the M-Lab backend.  Our
>>>>> branch's ooni-support README.md has instructions to disable the
>>>>> integration, merely by editing a cron job to unset an ENABLED flag.
>>>>> There's no need to redeploy different versions of ooni-support.
>>>>>
>>>>> When enabled, it allows M-Lab operations to monitor collectors and
>>>>> test_helpers status with the same infrastructure as all other M-Lab
>>>>> tools.
>>>>>
>>>>> Future Architectural Changes
>>>>> ----------------------------
>>>>>
>>>>> In the future, it may be nice to augment ooni / mlab-ns integration.
>>>>> For example, mlab-ns is designed to support different policies which
>>>>> may be useful to tools, such as geo-location of test_helpers.
>>>>>
>>>>> The Simulator
>>>>> =============
>>>>>
>>>>> This deployment architecture uses a simulator.  While it is fully
>>>>> functional and useful for testing it lacks security or robustness, so
>>>>> we want to emphasize *not to deploy this* to non-test environments.
>>>>>
>>>>> Rationale
>>>>> ---------
>>>>>
>>>>> There are three rationales for this approach:
>>>>>
>>>>> First, Least Authority didn't want to push through modifications to
>>>>> mlab-ns without first creating and testing a proof-of-concept.
>>>>>
>>>>> Second, we didn't want to block our effort on M-Lab engineering
>>>>> effort, so this allows a clean division of labor.
>>>>>
>>>>> Third, by creating and testing a working proof of concept we can help
>>>>> define the necessary changes to mlab-ns in a tightly scoped and
>>>>> concrete manner.
>>>>>
>>>>> Security
>>>>> --------
>>>>>
>>>>> This system is insecure because it does not use the M-Lab nagios
>>>>> system to gather data, and instead lets anyone paste any data they
>>>>> want into the simulator.  Nagios integration is future work captured
>>>>> in this ticket:
>>>>>
>>>>> * https://github.com/m-lab-tools/ooni-support/issues/10
>>>>>
>>>>>
>>>>> Next Steps
>>>>> ==========
>>>>>
>>>>> Our contract with OTF proposes our next two milestones will focus on
>>>>> improving integration testing and unit test coverage.  Our focus at
>>>>> that time was on test automation and documentation for diagnosing
>>>>> integration problems.  Test automation has already been improved since
>>>>> that time, and we've accomplished most of the work for documentation:
>>>>>
>>>>> https://github.com/m-lab-tools/ooni-support/issues/60
>>>>>
>>>>> Therefore, we propose to focus on some outstanding issues which will
>>>>> improve mlab-ns integration while continuing not to block on, or
>>>>> interfere with, M-Lab operations as follows:
>>>>>
>>>>> The primary change to mlab-ns will be to allow any tool to include
>>>>> arbitrary data per slivver to be gathered and distributed by mlab-ns.
>>>>> Ooni will use this to distribute data such as collector `.onion`
>>>>> addresses.  The need for this change is discussed here:
>>>>>
>>>>> * https://github.com/m-lab-tools/ooni-support/issues/4
>>>>>
>>>>> This proposed change is documented in this ticket:
>>>>>
>>>>> * https://github.com/m-lab-tools/ooni-support/issues/47
>>>>>
>>>>> A secondary change is to implement `match=all` described in #47 above.
>>>>> It may not be necessary, so there is further investigation and testing
>>>>> necessary:
>>>>>
>>>>> https://github.com/m-lab-tools/ooni-support/issues/56
>>>>>
>>>>> Along with these changes to `mlab-ns`, we need trivial updates to our
>>>>> integration scripts to work with mlab-ns rather than the simulator:
>>>>>
>>>>> * https://github.com/m-lab-tools/ooni-support/issues/10
>>>>> * https://github.com/m-lab-tools/ooni-support/issues/11
>>>>>
>>>>>
>>>>> Details & Links
>>>>> ===============
>>>>>
>>>>> Attached is a shortish overview of possible approaches to implement
>>>>> this integration.  We've implemented a deployment with a mock mlab-ns
>>>>> (called mlab-ns-simulator) and the "arbitrary data" approach from the
>>>>> attached design document.  The pull request is here:
>>>>>
>>>>> * https://github.com/m-lab-tools/ooni-support/pull/59
>>>>>
>>>>> Specific details about this pull request:
>>>>>
>>>>> * A script for gathering necessary information from collectors and
>>>>> testhelpers, then updating the mlab-ns-simulator.
>>>>> * A script for updating a bouncer's state based on the mlan-ns-simulator.
>>>>> * A cron script to update the bouncer on an hourly schedule.
>>>>> * The mlab-ns-simulator itself, which approximates the production mlab-ns.
>>>>> * `.init/` script changes to automatically launch the simulator and
>>>>> bouncer on `mlab1.nuq0t.measurement-lab.org`.
>>>>> * Design documentation for mlab-ns integration (including this
>>>>> stepping stone architecture).
>>>>> * Each instructions to disable mlab-ns integration without any redeployment.
>>>>>
>>>>> We also created a subset pull request that has bug fixes but no
>>>>> mlab-ns integration features:
>>>>>
>>>>> * https://github.com/m-lab-tools/ooni-support/pull/58
>>>>>
>>>>> Github Milestones
>>>>> -----------------
>>>>>
>>>>> We split the mlab-ns-simulator deployment tasks out from the larger
>>>>> mlab-ns integration deployment.  The mlab-ns-simulator milestone is
>>>>> at:
>>>>>
>>>>> * https://github.com/m-lab-tools/ooni-support/issues?q=is%3Aissue+milestone%3A%22mlab-ns-simulator+deployment%22
>>>>>
>>>>> The full mlab-ns integration milestone:
>>>>>
>>>>> * https://github.com/m-lab-tools/ooni-support/issues?q=is%3Aissue+milestone%3A%22mlab-ns+Integration%22
>>>>>
>>>>>
>>>>>
>>>>> As always, let us know if you have any feedback!
>>>>>
>>>
>>>
>>>