* TODO create complete disaster recover procedure for Agregator server This can for example be a manual procedure... but in that case it must be fully documented. For instance the docs should specify how to build new Aggregator server given a freshly installed Debian stable server; this should involve copying some Docker scripts and running them to build Docker images that are configured properly for the different components.
* Aggregator What are the different Aggregator Docker images?
** synchronization/subscriber of publishing collectors *** TODO create a Docker image to rsync data from the collectors and trigger removal afterwards This task can initially use rsync... but later will be replaced to a pub/sub system. ** YAML data sanitizer *** TODO create YAML data sanitizer Docker image This task makes the yaml reports data suitable to be published AND prepares for database import. ** data post processing + DB import *** TODO create post processing Docker image analyses the yaml reports creates database entries - only this Docker image contains mongodb write access credentials
* non-aggregator Docker images ** mongodb This Docker image runs the MongoDB service. 1st iteration: listens on loopback interface
2nd iteration: move this Docker image to a different machine and listen to a publicly routed IP address. Must use crypto wire protocol + authentication
** web mirror *** TODO create docker image of web mirror
* stateful directory structure
** task 0 rsync data to /data/raw # example: data/bouncer-nkvphnp3p6agi5qq/archive # /data/collector/archive
** task 1 sanitize data and move to /data/staging # example: none
** state 2 post-processing to prepare to DB import move data to /data/reports # example: none
** state 3 data that has been imported into the DB move data to /de # example: /data/reports
* stateful directory backup strategy The backup strategy for the above 3-part stateful directory workflow is to make backup snapshots of all three state directories.
* collector to aggregator state #1 For the moment we'll use rsync to copy the data from the collectors to the Aggregator. After rsyncing, the data will then be removed from the Collectors.
Later iterations will involve a pub/sub system between the Collectors and Aggregators.