Hi Damian,
On 8/19/13 5:26 AM, Damian Johnson wrote:
Hi Karsten, I've finished a replacement for the DocTor consensus monitors...
https://gitweb.torproject.org/atagar/tor-utils.git/blob/HEAD:/consensus_heal... https://gitweb.torproject.org/atagar/tor-utils.git/blob/HEAD:/data/consensus...
This is awesome! Thanks! Replying to this mail first, then adding a first code review.
Like the other checkers this is presently running hourly and sending results my way. My vote would be to have them start sending results to tor-consensus-health@ [1] instead. This will double the amount of noise on the list but it should help us flush out any issues with the scripts. Once we have confidence in it we can shut down DocTor's checks.
Yes, I agree that we should have your script start sending results to the mailing list. Want to set that up? Not sure if your mails will bounce until somebody approves your sender address, but I guess we'll find out.
A code review would also be much appreciated. If there's any portions that you find confusing then let me know.
See below.
As for the DocTor website, I'm a little surprised Peter and Sebastian didn't reply. Not sure how we'd like to proceed there...
I just asked Peter on IRC. He didn't reply before, because of tl;dr.
So, Peter is fine with all consensus-health info being in a mail and not on the website. When he looks at the website, he's mostly interested in two things which aren't yet contained in status emails:
1. Do the authorities voting on BadExit agree about relays they assign this flag to? The warning might contain the diff of relay fingerprints they voted or didn't vote BadExit on. We'll probably want to rate-limit this warning to every 6 or 12 hours.
2. Do the bandwidth authorities report roughly the same number of Measured lines, or do these numbers diverge beyond a given threshold (say, 20%)? This warning should probably be rate-limited to every 24 hours, because new bandwidth authorities take some time to measure the network.
What do you think about adding these two warnings to your script? It seems we'd make the website obsolete, at least for Peter, by doing so.
How about we start a new thread on this list discussing only the part where we're planning to kill the website and move everything to the status emails? This thread is probably as long by now that nobody besides us reads it and we could as well discuss private stuff without anybody noticing like that I still disagree with you about Pepper Jack being tasty cheese and that I can highly recommend Muenster cheese.
Thoughts? -Damian
And here's the code review. As usual, feel free to ignore comments you don't agree with:
- Can you add a license to your script?
- Your rate-limiting logic seems to work only with full hours (and days). This may lead to unexpected results if two script executions don't start at the exact minute and second of an hour. In one case a message might be suppressed, in another case it might not. That's why I defined all rate-limiting intervals as X:30 hours. Maybe simply add or subtract 30 minutes from all intervals just to be sure, or add a minutes parameter with a default of 30.
- DocTor produces two output files containing warnings, one with new warnings and one with all warnings (which is empty if there are zero new warnings). New warnings are sent to the IRC bot and all warnings are sent to the mailing list. I agree that this approach is somewhat complicated, so maybe it's sufficient to just send the new warnings to both the IRC bot and the mailing list. Unless you didn't realize there was such a distinction and think it's worth adding to your script. Up to you.
- Are old warnings ever removed from your last_notified.cfg? Not sure if it matters though.
- Is the %is in `log.info("Suppressing %s, time remaining is %is"` supposed to be a %s?
- Your rule about not downloading a vote from an authority that didn't provide a consensus before seems quite strict. In theory, we could ask another authority for that first authority's vote. Maybe the first authority just didn't want to talk to us but was happy to talk to all other authorities. We should probably learn about problems with that authority's vote (or the absence of problems) even if it doesn't talk to us. I'm aware that this will make the download logic somewhat more complex.
- Typo in `unknown_consensus_parameteres`.
- Typo in `incompatable_authorities`.
- In `certificate_expiration`, is your check that a vote has exactly one authority really necessary? Wouldn't stem complain if this were not the case?
- Speaking of stem complaining about invalid votes or consensuses: what does your script do in such a case?
- I wonder if your check in `has_expected_fingerprints` would complain if somebody set up a fake "moria1". It probably shouldn't. That's why I added IP address and port to checkAuthorityRelayIdentityKeys.
- Maybe I missed them while reading your code, but did you add checks for checkContainedVotes and checkConsensusSignatures? Both checks are quite important, because even if an authority claims to add a vote, it may not have become part of the consensus, or the authority may not have signed the consensus that it contributed a vote to before.
That's all. Thanks for working on a DocTor replacement!
All the best, Karsten