Hi Damian,
we briefly discussed Stem's new descriptor fetching module and how we could extend the existing simple monitors [0] towards a replacement of the Java consensus-health checker [1].
Moving this discussion to this list with your permission.
So, you asked what exactly the consensus-health checker a.k.a. DocTor looks for. Let me try to give you a quick overview of the different parts [2]. Going through the Java source files in an order that hopefully explains best how everything works together:
- Warning.java is an enum of all different warnings that DocTor can emit. Each warning contains a little documentation string saying what it means. If these are ambiguous, let me know, and I can probably explain them better.
- Checker.java contains the various checks that are performed on previously downloaded consensuses and votes. For example, checkMissingConsensuses goes through the (hard-coded) list of known directory authorities and emits a ConsensusDownloadTimeout warning if we couldn't download the consensus from at least one of them. As you see, there are plenty more check* methods.
- StatusFileReport.java uses the results from Checker by putting all warnings in two output files, one of them containing all warnings, the other only containing new warnings. Each warning has a severity, which can be ERROR, WARNING, or NOTICE. Also, each warning defines a time after which we consider the exact same warning string new even though the warning hasn't changed. The latter is useful to rate-limit warnings. For example, the fact that a certificate is going to expire in two months from now doesn't have to be repeated every hour.
- MetricsWebsiteReport.java is the second output of DocTor. It's the website available at [3]. The idea is that the website gives more information about warnings received on IRC or via email. It's actually a hack that this website is presented on metrics. In a rewrite, PyDoctor would have its own little webserver to present consensus-health details. Once it's in place and we shut down DocTor, I'm going to replace the website on metrics with a static page linking to PyDoctor.
- DownloadStatistics.java keeps statistics about consensus download times which are displayed on the website.
- Downloader.java is a wrapper for metrics-lib's descriptor downloader.
- Main.java puts everything together. It first downloads everything, then writes the status files containing warnings, and then generates the website output.
So, that's what DocTor does right now. Here are two more things that would be great to have in DocTor or PyDoctor:
- Warn if directory authorities assign flags to unusually few or many relays [4]. This enhancement has the potential of generating lots of warnings, because the directory authorities currently vote *very* differently on certain flags. The result will be a lot of directory authority operator nagging. Just saying, you should be prepared for that when deploying this!
- Ignore certain known warnings [5]. This will reduce a lot of noise on the consensus-health mailing list. The fewer noise there is the more people will pay attention to actually valid warnings. In theory.
Hope that makes sense. Happy to provide more input or review code. Just let me know!
All the best, Karsten
[0] https://lists.torproject.org/pipermail/tor-dev/2013-July/005209.html
[1] https://www.torproject.org/getinvolved/volunteer#metrics-pyDoctor
[2] https://gitweb.torproject.org/doctor.git/tree/HEAD:/src/org/torproject/docto...
[3] https://metrics.torproject.org/consensus-health.html
Hi Karsten, just finished scanning over DocTor. The checks and email notifications look like a reasonably simple task to bite off. The website however (MetricsWebsiteReport.java) I'll need to give some more thought to. I'll let you know when I have something up and running.
Cheers! -Damian
Hi Karsten, just finished throwing together a script that does seven of the eighteen DocTor checks. The rest shouldn't be hard, just take a little elbow grease...
https://gitweb.torproject.org/atagar/tor-utils.git/commitdiff/1e49c33
As for the website, why is that part of the same codebase as the monitors? The site doesn't look to make use of the derived warnings. Is this simply a kludge since they both make use of the same descriptor data?
The website might be a good use case for Hyde (http://ringce.com/hyde). That said, this feels like it should belong in the metrics-web repository...
Cheers! -Damian
On 8/12/13 2:02 AM, Damian Johnson wrote:
Hi Karsten, just finished throwing together a script that does seven of the eighteen DocTor checks. The rest shouldn't be hard, just take a little elbow grease...
https://gitweb.torproject.org/atagar/tor-utils.git/commitdiff/1e49c33
Cool!
I quickly looked at the commit, and this is probably due to the early state of the script, but I thought I'd mention it anyway: I wondered how the single try block around get_consensuses, get_votes, and run_checks would respond to single directory authorities being unavailable, closing connections in the middle of the download, taking longer than 60 seconds, etc. I think the 60 seconds thing might be handled fine, but would the other I/O errors make the script not run any checks?
As for the website, why is that part of the same codebase as the monitors? The site doesn't look to make use of the derived warnings. Is this simply a kludge since they both make use of the same descriptor data?
The kludge is that checks and website don't share code, not that there's a website. The idea of the typical use case is that people receive a warning via email or IRC and then go to the website to learn more details.
Here's how I could imagine integrating checks and website more closely: for each type of warning, there's a separate class that knows how to do the following things:
- look at previously downloaded consensuses and/or votes to decide if there's something to warn about, - print out a warning message if something's not okay, - decide on a severity, - define rate limiting of this warning message, and - produce the HTML for the website.
Note that the large table at the end of the current consensus-health page is probably different, because it contains much more information than what's required to further investigate a warning message. We should not include that table in your Python rewrite. I just took it out from the metrics website to see if anybody cares. For reference, here's the archived latest consensus-health.html that contains that table:
https://people.torproject.org/~karsten/volatile/consensus-health-2013-08-12-...
The website might be a good use case for Hyde (http://ringce.com/hyde).
Plausible, yes. Can't say much about tools, but something to generate static HTML sounds like a fine choice.
That said, this feels like it should belong in the metrics-web repository...
No, we should rather move the website output to its own subdomain, e.g., doctor.tpo. It's a kludge that it's on the metrics website. It doesn't belong there, as much as ExoneraTor and relay search don't belong there.
All the best, Karsten
I quickly looked at the commit, and this is probably due to the early state of the script, but I thought I'd mention it anyway: I wondered how the single try block around get_consensuses, get_votes, and run_checks would respond to single directory authorities being unavailable, closing connections in the middle of the download, taking longer than 60 seconds, etc. I think the 60 seconds thing might be handled fine, but would the other I/O errors make the script not run any checks?
Correct, when an authority is unavailable the script only reports a single error - that outage. It didn't cross my mind that we would want to run the checks on a subset of the authorities, but that's an easy tweak to make.
The kludge is that checks and website don't share code, not that there's a website. The idea of the typical use case is that people receive a warning via email or IRC and then go to the website to learn more details.
I disagree. This repository contains two very distinct applications:
* monitors for issues with the votes * a website that renders the present content of the votes
The use cases for each are associated, but bundling them together makes about as much sense as lumping vidalia and tor within the same repository.
Personally I'm a big fan of these monitors, but less so the website. I don't think it's especially useful (precious few people have cause to find a side-by-side comparison of vote attributes to be interesting, and fewer still would opt for this over reading the documents). But that said, it's not overly much code. I might toy with Hyde to generate the site after finishing the monitors but no promises. That part is not something I would want to own for the long term, though.
Note that the large table at the end of the current consensus-health page is probably different, because it contains much more information than what's required to further investigate a warning message. We should not include that table in your Python rewrite. I just took it out from the metrics website to see if anybody cares.
Ahhh, much better. With the table page loads were painfully slow (25s) but now it's 0.8s. Much more usable.
On 8/12/13 10:51 AM, Damian Johnson wrote:
The kludge is that checks and website don't share code, not that there's a website. The idea of the typical use case is that people receive a warning via email or IRC and then go to the website to learn more details.
I disagree. This repository contains two very distinct applications:
- monitors for issues with the votes
- a website that renders the present content of the votes
If someone's only interested in a presentation of vote contents, then DocTor shouldn't be their tool. If vote contents are interesting to anyone, we should add them to Onionoo and have some Onionoo client present them. This is not what I have in mind for DocTor.
I'm only interested in providing directory authority operators with the information they need to fix problems with the voting process.
(Also note that you only mention votes above. But DocTor also looks at problems with serving consensuses, e.g., connection problems when downloading the consensus, or serving outdated consensuses.)
The use cases for each are associated, but bundling them together makes about as much sense as lumping vidalia and tor within the same repository.
That's not really true. What you don't see right now is that problems with the consensus or votes would be highlighted in the website output. For example, if either gabelmoo or mori1 or tor26 is missing a certain recommended version, that line will be printed in red. So, there's a close relation between status notifications and the website, just not in the code.
Personally I'm a big fan of these monitors, but less so the website. I don't think it's especially useful (precious few people have cause to find a side-by-side comparison of vote attributes to be interesting, and fewer still would opt for this over reading the documents). But that said, it's not overly much code. I might toy with Hyde to generate the site after finishing the monitors but no promises. That part is not something I would want to own for the long term, though.
Well, maybe let's step back then and find a solution that you're happy to own for the long term. Once your tool is online, I'm planning to shut down the current consensus-health checker including the website output, so your tool should contain all the information that people need to fix problems in the consensus.
In my view, writing the results of a DocTor run to a website and highlighting problems in red was the easiest way to provide directory authority operators with all information they need. I could also imagine adding additional information about warnings to the bottom of status notification emails. So, warnings on top and then one paragraph for each warning requiring additional information. Or maybe there are other ways to provide this additional information.
We should also include Sebastian and Peter in this discussion, because they cared about consensus-health output in the past and may have more suggestions. Cc'ed them.
Thanks, Karsten
I'm only interested in providing directory authority operators with the information they need to fix problems with the voting process.
Hi Karsten. I'm going to focus on the monitors for now and come back to discussion of a website after Sebastian and Peter have a chance to respond. That said, we totally agree on this point - the goal of this project is solely to help detect and resolve issues with authorities and consensus generation. If the website is vital to that then great (I still think it should be its own repository, but it's then definitely worth keeping). However, if we can do just as well by including additional information in the warnings then that would be even more maintainable in the long term.
My thoughts on this are probably best explained by a tangent. For years I've run a sybil checker called consensusTracker.py [1]. This checker did a bit more than just watch for sybil attacks. It also generated a pretty html report of the wax and wane in relay counts over the week among other things.
While these bells and whistles were pretty, no one cared about them and rightfully so. The sole purpose of a sybil checker is to provide us a notification saying "Oi! Potential problem here, recent relay additions are...".
The html report, while pretty, was both useless and made the code far, far more complicated than it needed to be. Finally I replaced that 601 line script with a far simpler 116 line counterpart that does just what we want and nothing more [2]. The monitor is now more reliable, maintainable, and can easily be updated in the future when necessary (something I couldn't do with the mess that was the previous script).
The lesson that I learned from this was "Start with the ending goal of a project and code toward that. Anything else will just result in feature creep." This isn't to necessarily say the DocTor website is unnecessary (only the site's intended audience like Peter can tell us that), but I definitely think we should figure out what he needs before resolving to keep it.
Cheers! -Damian
[1] https://gitweb.torproject.org/atagar/tor-utils.git/blob/e537044:/consensusTr... [2] https://gitweb.torproject.org/atagar/tor-utils.git/commitdiff/9b7de30
Hi Karsten, I've finished a replacement for the DocTor consensus monitors...
https://gitweb.torproject.org/atagar/tor-utils.git/blob/HEAD:/consensus_heal... https://gitweb.torproject.org/atagar/tor-utils.git/blob/HEAD:/data/consensus...
Like the other checkers this is presently running hourly and sending results my way. My vote would be to have them start sending results to tor-consensus-health@ [1] instead. This will double the amount of noise on the list but it should help us flush out any issues with the scripts. Once we have confidence in it we can shut down DocTor's checks.
A code review would also be much appreciated. If there's any portions that you find confusing then let me know. As for the DocTor website, I'm a little surprised Peter and Sebastian didn't reply. Not sure how we'd like to proceed there...
Thoughts? -Damian
[1] https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-consensus-health
Hi Damian,
On 8/19/13 5:26 AM, Damian Johnson wrote:
Hi Karsten, I've finished a replacement for the DocTor consensus monitors...
https://gitweb.torproject.org/atagar/tor-utils.git/blob/HEAD:/consensus_heal... https://gitweb.torproject.org/atagar/tor-utils.git/blob/HEAD:/data/consensus...
This is awesome! Thanks! Replying to this mail first, then adding a first code review.
Like the other checkers this is presently running hourly and sending results my way. My vote would be to have them start sending results to tor-consensus-health@ [1] instead. This will double the amount of noise on the list but it should help us flush out any issues with the scripts. Once we have confidence in it we can shut down DocTor's checks.
Yes, I agree that we should have your script start sending results to the mailing list. Want to set that up? Not sure if your mails will bounce until somebody approves your sender address, but I guess we'll find out.
A code review would also be much appreciated. If there's any portions that you find confusing then let me know.
See below.
As for the DocTor website, I'm a little surprised Peter and Sebastian didn't reply. Not sure how we'd like to proceed there...
I just asked Peter on IRC. He didn't reply before, because of tl;dr.
So, Peter is fine with all consensus-health info being in a mail and not on the website. When he looks at the website, he's mostly interested in two things which aren't yet contained in status emails:
1. Do the authorities voting on BadExit agree about relays they assign this flag to? The warning might contain the diff of relay fingerprints they voted or didn't vote BadExit on. We'll probably want to rate-limit this warning to every 6 or 12 hours.
2. Do the bandwidth authorities report roughly the same number of Measured lines, or do these numbers diverge beyond a given threshold (say, 20%)? This warning should probably be rate-limited to every 24 hours, because new bandwidth authorities take some time to measure the network.
What do you think about adding these two warnings to your script? It seems we'd make the website obsolete, at least for Peter, by doing so.
How about we start a new thread on this list discussing only the part where we're planning to kill the website and move everything to the status emails? This thread is probably as long by now that nobody besides us reads it and we could as well discuss private stuff without anybody noticing like that I still disagree with you about Pepper Jack being tasty cheese and that I can highly recommend Muenster cheese.
Thoughts? -Damian
And here's the code review. As usual, feel free to ignore comments you don't agree with:
- Can you add a license to your script?
- Your rate-limiting logic seems to work only with full hours (and days). This may lead to unexpected results if two script executions don't start at the exact minute and second of an hour. In one case a message might be suppressed, in another case it might not. That's why I defined all rate-limiting intervals as X:30 hours. Maybe simply add or subtract 30 minutes from all intervals just to be sure, or add a minutes parameter with a default of 30.
- DocTor produces two output files containing warnings, one with new warnings and one with all warnings (which is empty if there are zero new warnings). New warnings are sent to the IRC bot and all warnings are sent to the mailing list. I agree that this approach is somewhat complicated, so maybe it's sufficient to just send the new warnings to both the IRC bot and the mailing list. Unless you didn't realize there was such a distinction and think it's worth adding to your script. Up to you.
- Are old warnings ever removed from your last_notified.cfg? Not sure if it matters though.
- Is the %is in `log.info("Suppressing %s, time remaining is %is"` supposed to be a %s?
- Your rule about not downloading a vote from an authority that didn't provide a consensus before seems quite strict. In theory, we could ask another authority for that first authority's vote. Maybe the first authority just didn't want to talk to us but was happy to talk to all other authorities. We should probably learn about problems with that authority's vote (or the absence of problems) even if it doesn't talk to us. I'm aware that this will make the download logic somewhat more complex.
- Typo in `unknown_consensus_parameteres`.
- Typo in `incompatable_authorities`.
- In `certificate_expiration`, is your check that a vote has exactly one authority really necessary? Wouldn't stem complain if this were not the case?
- Speaking of stem complaining about invalid votes or consensuses: what does your script do in such a case?
- I wonder if your check in `has_expected_fingerprints` would complain if somebody set up a fake "moria1". It probably shouldn't. That's why I added IP address and port to checkAuthorityRelayIdentityKeys.
- Maybe I missed them while reading your code, but did you add checks for checkContainedVotes and checkConsensusSignatures? Both checks are quite important, because even if an authority claims to add a vote, it may not have become part of the consensus, or the authority may not have signed the consensus that it contributed a vote to before.
That's all. Thanks for working on a DocTor replacement!
All the best, Karsten
Thanks, Karsten, for the code review! I decided to take a long weekend so was able to address all of it...
Yes, I agree that we should have your script start sending results to the mailing list. Want to set that up? Not sure if your mails will bounce until somebody approves your sender address, but I guess we'll find out.
Nope, not getting through. Asking for my addresses to be whitelisted....
https://trac.torproject.org/projects/tor/ticket/9537
- Do the authorities voting on BadExit agree about relays they assign
this flag to? The warning might contain the diff of relay fingerprints they voted or didn't vote BadExit on. We'll probably want to rate-limit this warning to every 6 or 12 hours.
Done...
https://gitweb.torproject.org/atagar/tor-utils.git/commitdiff/e6d378a
Btw, it looks like they presently *are* out of sync...
NOTICE: Authorities disagree about the BadExit flag for 7E6A3AA70A156167E7AE543E50EF54321EC80AF0 (with flag: Faravahar, without flag: tor26, moria1) NOTICE: Authorities disagree about the BadExit flag for ADF62D3A1305F0B5404D41EEDADA68ECD294FC60 (with flag: Faravahar, without flag: tor26, moria1)
- Do the bandwidth authorities report roughly the same number of
Measured lines, or do these numbers diverge beyond a given threshold (say, 20%)? This warning should probably be rate-limited to every 24 hours, because new bandwidth authorities take some time to measure the network.
Also done...
https://gitweb.torproject.org/atagar/tor-utils.git/commitdiff/975b0ae
How about we start a new thread on this list discussing only the part where we're planning to kill the website and move everything to the status emails?
Personally I don't see value in forking this thread, but I don't care strongly either.
... I still disagree with you about Pepper Jack being tasty cheese and that I can highly recommend Muenster cheese.
Blasphemy!!!
- Can you add a license to your script?
Done, opting for 3-clause BSD like DocTor...
https://gitweb.torproject.org/atagar/tor-utils.git/commitdiff/93b699f
- Your rate-limiting logic seems to work only with full hours (and
days). This may lead to unexpected results if two script executions don't start at the exact minute and second of an hour. In one case a message might be suppressed, in another case it might not. That's why I defined all rate-limiting intervals as X:30 hours. Maybe simply add or subtract 30 minutes from all intervals just to be sure, or add a minutes parameter with a default of 30.
Good catch. Fixed...
https://gitweb.torproject.org/atagar/tor-utils.git/commitdiff/ce68dab
- DocTor produces two output files containing warnings, one with new
warnings and one with all warnings (which is empty if there are zero new warnings). New warnings are sent to the IRC bot and all warnings are sent to the mailing list. I agree that this approach is somewhat complicated, so maybe it's sufficient to just send the new warnings to both the IRC bot and the mailing list. Unless you didn't realize there was such a distinction and think it's worth adding to your script. Up to you.
I've ignored the IRC bot notifications since I'm both unfamiliar with it and presently lack a mechanism to provide it with notifications.
I take it as if the bot periodically reads files from disk then dumps the contents into an IRC channel? Who maintains the bot and might we switch it to another notification mechanism? Maybe it should read tor-consensus-health@ instead?
Alternatively I can change my send() function to do whatever the bot maintainer wants, though dumping files to disk seems a bit odd to me.
- Are old warnings ever removed from your last_notified.cfg? Not sure
if it matters though.
Nope. When the suppression expires and the issue goes into alarm again it replaces the value.
- Is the %is in `log.info("Suppressing %s, time remaining is %is"`
supposed to be a %s?
Nope. It's logging an integer value with an extra 's' for seconds (ex, "45s"). Logging seconds here is pretty less-than-useful so swapped it to hours.
- Your rule about not downloading a vote from an authority that didn't
provide a consensus before seems quite strict. In theory, we could ask another authority for that first authority's vote. Maybe the first authority just didn't want to talk to us but was happy to talk to all other authorities. We should probably learn about problems with that authority's vote (or the absence of problems) even if it doesn't talk to us. I'm aware that this will make the download logic somewhat more
complex.
Actually, this makes it a little cleaner. The 'only download vote if we got a consensus' thing was to avoid redundant notices when an authority goes down (ie. both "unable to download consensus from X" and "unable to download vote from X"). With this fallback logic I don't really need to do this...
https://gitweb.torproject.org/atagar/tor-utils.git/commitdiff/5e200e9
... on a side note think I'll move the authority fingerprint and v3ident into stem later. Other scripts will likely want them too.
- Typo in `unknown_consensus_parameteres`.
- Typo in `incompatable_authorities`.
Nice catches. Fixed.
- In `certificate_expiration`, is your check that a vote has exactly
one authority really necessary? Wouldn't stem complain if this were not the case?
Nope, when reading the dir-spec it didn't cross my mind to assert that about votes. Now that you mention it though I agree that this belongs in stem - done...
https://gitweb.torproject.org/stem.git/commitdiff/4863c22 https://gitweb.torproject.org/atagar/tor-utils.git/commitdiff/9fab9f8
- Speaking of stem complaining about invalid votes or consensuses: what
does your script do in such a case?
Invalid documents are treated in a similar fashion to failing to be download, except that they state the validation issue.
- I wonder if your check in `has_expected_fingerprints` would complain
if somebody set up a fake "moria1". It probably shouldn't. That's why I added IP address and port to checkAuthorityRelayIdentityKeys.
Ahhh, I was wondering why DocTor did that. Good point, addressing this by also looking for the Named flag.
- Maybe I missed them while reading your code, but did you add checks
for checkContainedVotes and checkConsensusSignatures? Both checks are quite important, because even if an authority claims to add a vote, it may not have become part of the consensus, or the authority may not have signed the consensus that it contributed a vote to before.
Huh, wonder how I missed those. Added...
https://gitweb.torproject.org/atagar/tor-utils.git/commitdiff/0982060 https://gitweb.torproject.org/atagar/tor-utils.git/commitdiff/c97d71e
Cheers! -Damian
PS. A couple other things that need to be addressed at some point are:
* Running this on a TPO host rather than my desktop, and maybe also use a TPO address to send the emails too. * Moving these scripts to a real repository rather than my 'tor-utils' user repo. Not sure though if we should keep the name 'DocTor' or opt for something more descriptive like 'DescriptorMonitors'.
On 8/20/13 4:41 AM, Damian Johnson wrote:
Thanks, Karsten, for the code review! I decided to take a long weekend so was able to address all of it...
Hi Damian,
neat, thanks for your quick response!
Replying inline, but leaving out the parts where I'd only write "okay".
Yes, I agree that we should have your script start sending results to the mailing list. Want to set that up? Not sure if your mails will bounce until somebody approves your sender address, but I guess we'll find out.
Nope, not getting through. Asking for my addresses to be whitelisted....
Looks like this is already solved.
- Do the authorities voting on BadExit agree about relays they assign
this flag to? The warning might contain the diff of relay fingerprints they voted or didn't vote BadExit on. We'll probably want to rate-limit this warning to every 6 or 12 hours.
Done...
https://gitweb.torproject.org/atagar/tor-utils.git/commitdiff/e6d378a
Btw, it looks like they presently *are* out of sync...
NOTICE: Authorities disagree about the BadExit flag for 7E6A3AA70A156167E7AE543E50EF54321EC80AF0 (with flag: Faravahar, without flag: tor26, moria1) NOTICE: Authorities disagree about the BadExit flag for ADF62D3A1305F0B5404D41EEDADA68ECD294FC60 (with flag: Faravahar, without flag: tor26, moria1)
Are you sure? AFAIK, Faravahar never voted on BadExit. Which valid-after time was this?
How about we start a new thread on this list discussing only the part where we're planning to kill the website and move everything to the status emails?
Personally I don't see value in forking this thread, but I don't care strongly either.
It seems I learned some input from Peter on IRC, and Sebastian promised some feedback via email to you. Guess that's a fine start. We'll probably have a new discussion once we run both DocTors in parallel or when we finally switch over.
- DocTor produces two output files containing warnings, one with new
warnings and one with all warnings (which is empty if there are zero new warnings). New warnings are sent to the IRC bot and all warnings are sent to the mailing list. I agree that this approach is somewhat complicated, so maybe it's sufficient to just send the new warnings to both the IRC bot and the mailing list. Unless you didn't realize there was such a distinction and think it's worth adding to your script. Up to you.
I've ignored the IRC bot notifications since I'm both unfamiliar with it and presently lack a mechanism to provide it with notifications.
I take it as if the bot periodically reads files from disk then dumps the contents into an IRC channel? Who maintains the bot and might we switch it to another notification mechanism? Maybe it should read tor-consensus-health@ instead?
Alternatively I can change my send() function to do whatever the bot maintainer wants, though dumping files to disk seems a bit odd to me.
Peter runs the IRC bot. The interface on which it accepts input is simply an email address. You could send it the same content that you send to the mailing list, or a different output.
PS. A couple other things that need to be addressed at some point are:
- Running this on a TPO host rather than my desktop, and maybe also use a
TPO address to send the emails too.
Sure, we can do that. The easiest way would be that I run your script on yatei. Should I do that, similar to how I run DocTor on it?
We can later give you access to yatei (which you'll also need for co-co-maintaining metrics services), or we could run the script on a new, tiny VM.
- Moving these scripts to a real repository rather than my 'tor-utils' user
repo. Not sure though if we should keep the name 'DocTor' or opt for something more descriptive like 'DescriptorMonitors'.
I prefer repository names that aren't too descriptive, because if we ever want to extend or narrow scope of a tool, the name might confuse new people. That's what happened to metrics-db which doesn't even use a database anymore since I split off the metrics-web part.
How about we add your script to the DocTor repository? Again, to be quick, you could simply send me a patch produced with `git format-patch` or tell me a repository to pull from.
And if we want to do this right, we should give you a personal doctor.git and push rights to the official doctor.git.
Of course, if your preference is to start a new repository, I'm fine with that, too.
Thanks for spending your long weekend on this!
All the best, Karsten
Nope, not getting through. Asking for my addresses to be whitelisted....
Looks like this is already solved.
Ooops, evidently not. Reopened it.
Done...
https://gitweb.torproject.org/atagar/tor-utils.git/commitdiff/e6d378a
Btw, it looks like they presently *are* out of sync...
NOTICE: Authorities disagree about the BadExit flag for 7E6A3AA70A156167E7AE543E50EF54321EC80AF0 (with flag: Faravahar, without flag: tor26, moria1) NOTICE: Authorities disagree about the BadExit flag for ADF62D3A1305F0B5404D41EEDADA68ECD294FC60 (with flag: Faravahar, without flag: tor26, moria1)
Are you sure? AFAIK, Faravahar never voted on BadExit. Which valid-after time was this?
Ack! Had a bug...
https://gitweb.torproject.org/atagar/tor-utils.git/commitdiff/9b8a883
tor26 and moria1 actually did disagree about those fingerprints, but Faravahar isn't involved. Odd, I thought we had three authorities voting on the BadExit flag.
Peter runs the IRC bot. The interface on which it accepts input is simply an email address. You could send it the same content that you send to the mailing list, or a different output.
Sounds good. If he tells me the address I'll send notifications to it.
Sure, we can do that. The easiest way would be that I run your script on yatei. Should I do that, similar to how I run DocTor on it?
Could, the only gotcha is that it's presently sending through my auxiliary gmail account so we'll need to swap it to metrics@yatei.torproject.org (or something else) first. How are you sending emails?
We can later give you access to yatei (which you'll also need for co-co-maintaining metrics services), or we could run the script on a new, tiny VM.
My vote would be for another VM since this is relatively unrelated to the present metrics infrastructure.
I prefer repository names that aren't too descriptive, because if we ever want to extend or narrow scope of a tool, the name might confuse new people. That's what happened to metrics-db which doesn't even use a database anymore since I split off the metrics-web part.
Good point.
How about we add your script to the DocTor repository? Again, to be quick, you could simply send me a patch produced with `git format-patch` or tell me a repository to pull from.
And if we want to do this right, we should give you a personal doctor.git and push rights to the official doctor.git.
Sounds good! Would you mind filing the ticket to grant me push permission for doctor.git and create a user/atagar/doctor.git repo? I'll then push my tor-utils.git history into a new branch in doctor.git.
Cheers! -Damian
On 8/20/13 6:21 PM, Damian Johnson wrote:
Done...
https://gitweb.torproject.org/atagar/tor-utils.git/commitdiff/e6d378a
Btw, it looks like they presently *are* out of sync...
NOTICE: Authorities disagree about the BadExit flag for 7E6A3AA70A156167E7AE543E50EF54321EC80AF0 (with flag: Faravahar, without flag: tor26, moria1) NOTICE: Authorities disagree about the BadExit flag for ADF62D3A1305F0B5404D41EEDADA68ECD294FC60 (with flag: Faravahar, without flag: tor26, moria1)
Are you sure? AFAIK, Faravahar never voted on BadExit. Which valid-after time was this?
Ack! Had a bug...
https://gitweb.torproject.org/atagar/tor-utils.git/commitdiff/9b8a883
tor26 and moria1 actually did disagree about those fingerprints, but Faravahar isn't involved. Odd, I thought we had three authorities voting on the BadExit flag.
Interesting that tor26 and moria1 disagreed. Just confirmed this for the votes from 01:00:00 UTC today. Looks like this warning is going to be quite useful. Great!
I think turtles was voting on BadExit but stopped doing so a short while ago.
Peter runs the IRC bot. The interface on which it accepts input is simply an email address. You could send it the same content that you send to the mailing list, or a different output.
Sounds good. If he tells me the address I'll send notifications to it.
Okay. Please talk to him about details here.
Sure, we can do that. The easiest way would be that I run your script on yatei. Should I do that, similar to how I run DocTor on it?
Could, the only gotcha is that it's presently sending through my auxiliary gmail account so we'll need to swap it to metrics@yatei.torproject.org (or something else) first. How are you sending emails?
`cat out/status/all-warnings | mail -E -s 'Consensus issues' tor-consensus-health@lists.torproject.org`
Don't ask me how the mail setup on yatei looks like. "It just works."
We can later give you access to yatei (which you'll also need for co-co-maintaining metrics services), or we could run the script on a new, tiny VM.
My vote would be for another VM since this is relatively unrelated to the present metrics infrastructure.
Here's a draft of a Trac ticket I was just about to file. Figured it's better to agree on requirements first before asking Peter to create a VM that is too small/large. Please file it or tell me to do so:
Summary: Can we have a tiny VM for DocTor?
Description: """ Damian rewrote the consensus-health checker which currently runs on yatei. We'd like to deploy his new tool on a new VM, unrelated to yatei. Can we have a tiny VM for this, say, with 256 MiB RAM and 1 GiB free disk space? It should have a doctor user that atagar and karsten can `sudo -u` to. Thanks! """
Component: Tor Sysadmin Team
I prefer repository names that aren't too descriptive, because if we ever want to extend or narrow scope of a tool, the name might confuse new people. That's what happened to metrics-db which doesn't even use a database anymore since I split off the metrics-web part.
Good point.
How about we add your script to the DocTor repository? Again, to be quick, you could simply send me a patch produced with `git format-patch` or tell me a repository to pull from.
And if we want to do this right, we should give you a personal doctor.git and push rights to the official doctor.git.
Sounds good! Would you mind filing the ticket to grant me push permission for doctor.git and create a user/atagar/doctor.git repo? I'll then push my tor-utils.git history into a new branch in doctor.git.
Added as #9545.
All the best, Karsten
Could, the only gotcha is that it's presently sending through my auxiliary gmail account so we'll need to swap it to metrics@yatei.torproject.org (or something else) first. How are you sending emails?
`cat out/status/all-warnings | mail -E -s 'Consensus issues' tor-consensus-health@lists.torproject.org`
Don't ask me how the mail setup on yatei looks like. "It just works."
Great. I'll look into using sendmail or whatever the VM has once we have one.
Here's a draft of a Trac ticket I was just about to file. Figured it's better to agree on requirements first before asking Peter to create a VM that is too small/large. Please file it or tell me to do so
Sounds good, make it so!