david's notes from yesterday - ooni-dev

25 Oct 2014


      * TODO create complete disaster recover procedure for Agregator server
This can for example be a manual procedure... but in that
case it must be fully documented. For instance the docs should
specify how to build new Aggregator server given a freshly installed
Debian stable server; this should involve copying some Docker scripts
and running them to build Docker images that are configured properly
for the different components.
* Aggregator
What are the different Aggregator Docker images?
** synchronization/subscriber of publishing collectors
*** TODO create a Docker image to rsync data from the collectors and
trigger removal afterwards
This task can initially use rsync... but later will be replaced
to a pub/sub system.
** YAML data sanitizer
*** TODO create YAML data sanitizer Docker image
This task makes the yaml reports data suitable to be published
AND prepares for database import.
** data post processing + DB import
*** TODO create post processing Docker image
analyses the yaml reports creates database entries
- only this Docker image contains mongodb write access credentials
* non-aggregator Docker images
** mongodb
This Docker image runs the MongoDB service.
1st iteration: listens on loopback interface
2nd iteration: move this Docker image to a different machine
and listen to a publicly routed IP address. Must use crypto wire protocol
+ authentication
** web mirror
*** TODO create docker image of web mirror
* stateful directory structure
** task 0
rsync data to /data/raw
# example: data/bouncer-nkvphnp3p6agi5qq/archive
# /data/collector/archive
** task 1
sanitize data and move to /data/staging
# example: none
** state 2
post-processing to prepare to DB import
move data to /data/reports
# example: none
** state 3
data that has been imported into the DB
move data to /de
# example: /data/reports
* stateful directory backup strategy
The backup strategy for the above 3-part stateful directory workflow is to
make backup snapshots of all three state directories.
* collector to aggregator state #1
For the moment we'll use rsync to copy the data
from the collectors to the Aggregator. After rsyncing,
the data will then be removed from the Collectors.
Later iterations will involve a pub/sub system between
the Collectors and Aggregators.