Hi all,
Not content to let you have all the fun, I decided to run my own Tor network!
Kidding ;) But the Directory Authorities, the crappy experiment leading up to Black Hat, and the promise that one can recreate the Tor Network in the event of some catastrophe interests me enough that I decided to investigate it. I'm aware of Chutney and Shadow, but I wanted it to feel as authentic as possible, so I forwent those, and just ran full-featured independent tor daemons. I explicitly wanted to avoid setting TestingTorNetwork. I did have to edit a few other parameters, but very few. [0]
I plan on doing a blog post, giving a HOWTO, but I thought I'd write about my experience so far. I've found a number of interesting issues that arise in the bootstrapping of a non-TestingTorNetwork, mostly around reachability testing.
-----
One of the first things I ran into was a problem where I could not get any routers to upload descriptors. An OR checks itself to determine reachability before uploading a descriptor by building a circuit - bypassed with AssumeReachable or TestingTorNetwork. This works fine for Chutney and Shadow, as they reach into the OR and set AssumeReachable. But if the Tor Network were to be rebooted... most nodes out there would _not_ have AssumeReachable, and they would not be able to perform self-testing with a consensus consisting of just Directory Authorities. I think nodes left running would be okay, but nodes restarted would be stuck in a startup loop. I imagine what would actually happen is Noisebridge and TorServers and a few other close friends would set the flag, they would get into the consensus, and then the rest of the network would start coming back... (Or possibly a few nodes could anticipate this problem ahead of time, and set it now.)
What I had to do was make one of my Directory Authorities an exit - this let the other nodes start building circuits through the authorities and upload descriptors. Maybe an OR should have logic that if it has a valid consensus with no Exit nodes, it should assume it's reachable and send a descriptor - and then let the Directory Authorities perform reachability tests for whether or not to include it? From the POV of an intentional DoS - an OR doesn't have to obey the reachability test of course, so no change there. It could potentially lead to an unintentional DoS where all several thousand routers start slamming the DirAuths as soon as a usable-but-blank consensus is found... but AFAIK routers probe for a consensus based on semi-random timing anyway, so that may mitigate that?
-----
Another problem I ran into was that nodes couldn't conduct reachability tests when I had exits that were only using the Reduced Exit Policy - because it doesn't list the ORPort/DirPort! (I was using nonstandard ports actually, but indeed the reduced exit policy does not include 9001 or 9030.) Looking at the current consensus, there are 40 exits that exit to all ports, and 400-something exits that use the ReducedExitPolicy. It seems like 9001 and 9030 should probably be added to that for reachability tests?
-----
Continuing in this thread, another problem I hit was that (I believe) nodes expect the 'Stable' flag when conducting certain reachability tests. I'm not 100% certain - it may not prevent the relay from uploading a descriptor, but it seems like if no acceptable exit node is Stable - some reachability tests will be stuck. I see these sorts of errors when there is no stable Exit node (the node generating the errors is in fact a Stable Exit though, so it clearly uploaded its descriptor and keeps running): Oct 13 14:49:46.000 [warn] Making tunnel to dirserver failed. Oct 13 14:49:46.000 [warn] We just marked ourself as down. Are your external addresses reachable? Oct 13 14:50:47.000 [notice] No Tor server allows exit to [scrubbed]:25030. Rejecting.
Since ORPort/DirPort are not in the ReducedExitPolicy, this (may?) restrict the number of nodes available for conducting a reachability test. I think the Stable flag is calculated off the average age of the network though, so the only time when this would cause a big problem is when the network (DirAuths) have been running for a little bit and a full exit node hasn't been added - it would have to wait longer for the full exit node to get the Stable flag.
-----
Getting a BWAuth running was... nontrivial. Some of the things I found: - SQLAlchemy 0.7.x is no longer supported. 0.9.x does not work, nor 0.8.x. 0.7.10 does. - Several quasi-bugs with the code/documentation (the earliest three commits here: https://github.com/tomrittervg/torflow/commits/tomedits) - The bandwidth scanner actively breaks in certain situations of divide-by-zero (https://github.com/tomrittervg/torflow/commit/053dfc17c0411dac0f6c4e43954f90...) - The scanner will be perpetually stuck if you're sitting on the same /16 and you don't perform the equivalent of EnforceDistinctSubnets 0 [1]
Ultimately, while I successfully produced a bandwidth file [2], I wasn't convinced it was meaningful. There is a tremendous amount of code complexity buried beneath the statement 'Scan the nodes and see how fast they are', and a tremendous amount of informational complexity behind 'Weight the nodes so users can pick a good stream'.
-----
I tested what it would look like if an imposter DirAuth started trying to participate in the consensus. It generated the warning you would expect:
Oct 14 00:04:31.000 [warn] Got a vote from an authority (nickname authimposter, address W.X.Y.Z) with authority key ID Z. This key ID is not recognized. Known v3 key IDs are: A, B, C, D
But it also generated a warning you would not expect, and sent me down a rabbit hole for a while:
Oct 10 21:44:56.000 [debug] directory_handle_command_post(): Received POST command. Oct 10 21:44:56.000 [debug] directory_handle_command_post(): rewritten url as '"/tor/post/consensus-signature"'. Oct 10 21:44:56.000 [notice] Got a signature from W.X.Y.Z. Adding it to the pending consensus. Oct 10 21:44:56.000 [info] dirvote_add_signatures_to_pending_consensus(): Have 1 signatures for adding to ns consensus. Oct 10 21:44:56.000 [info] dirvote_add_signatures_to_pending_consensus(): Added -1 signatures to consensus. Oct 10 21:44:56.000 [info] dirvote_add_signatures_to_pending_consensus(): Have 1 signatures for adding to microdesc consensus. Oct 10 21:44:56.000 [info] dirvote_add_signatures_to_pending_consensus(): Added -1 signatures to consensus. Oct 10 21:44:56.000 [warn] Unable to store signatures posted by W.X.Y.Z: Mismatched digest.
Over on the imposter:
Oct 14 00:19:32.000 [warn] http status 400 ("Mismatched digest.") response after uploading signatures to dirserver 'W.X.Y.Z:15030'. Please correct.
The imposter DirAuth is sending up a signature for a consensus that is not the same consensus that the rest of the DirAuths computed. Specifically, the imposter DirAuth lists itself as a dir-source and the signature covers this line. (Everything else matches because the imposter has been outvoted and respects that.)
I guess the lesson is, if you see the "Mismatched digest" warning in conjunction with the unrecognized key ID - it's just one issue, not two.
-----
The notion and problems of an imposter DirAuth also come up during how the network behaves when adding a DirAuth. I started with 4 authorities, then started a fifth (auth5). Not interesting - it behaved as the imposter scenario.
I then added auth5 to a single DirAuth (auth1) as a trusted DirAuth. This resulted in a consensus with 3 signatures, as auth1 did not sign the consensus. On auth1 I got warn messages: A consensus needs 3 good signatures from recognized authorities for us to accept it. This one has 2 (auth1 auth5). 3 (auth2 auth3 auth4) of the authorities we know didn't sign it.
I then added auth5 to a second DirAuth (auth2) as a trusted DirAuth. This results in a consensus for auth1, auth2, and auth5 - but auth3 and auth4 did not sign it or produce a consensus. Because the consensus was only signed by 2 of the 4 Auths (e.g., not a majority) - it was rejected by the relays (which did not list auth5). At this point something interesting and unexpected happened:
The other 2 DirAuths (not knowing about auth5) did not have a consensus. This tricked dirvote_recalculate_timing into thinking we should use the TestingV3AuthInitialVotingInterval parameters, so they got out of sync with the other 3 DirAuths (that did know about auth5). That if/else statement seems very odd, and the parameters seem odd as well. First off, I'm not clear what the parameters are intended to represent. The man page says:
TestingV3AuthInitialVotingInterval N minutes|hours Like V3AuthVotingInterval, but for initial voting interval before the first consensus has been created. Changing this requires that TestingTorNetwork is set. (Default: 30 minutes) TestingV3AuthInitialVoteDelay N minutes|hours Like TestingV3AuthInitialVoteDelay, but for initial voting interval before the first consensus has been created. Changing this requires that TestingTorNetwork is set. (Default: 5 minutes) TestingV3AuthInitialDistDelay N minutes|hours Like TestingV3AuthInitialDistDelay, but for initial voting interval before the first consensus has been created. Changing this requires that TestingTorNetwork is set. (Default: 5 minutes)
Notice that the first says "Like V3AuthVotingInterval", but the other two just repeat their name? And how there _is no_ V3AuthInitialVotingInterval? And that you can't modify these parameters without turning on TestingTorParameters (despite the fact that they will be used without TestingTorNetwork?) And also, unrelated to the naming, these parameters are a fallback case for when we don't have a consensus, but if they're not kept in sync with V3AuthVotingInterval and their kin - the DirAuth can wind up completely out of sync and be unable to recover (except by luck).
It seems like these parameters should be renamed to V3AuthInitialXXX, keep their existing defaults, remove the requirement on TestingTorNetwork, and be documented as needing to be divisors of the V3AuthVotingXXX parameter, to allow a DirAuth who has tripped into them to be able to recover.
I have a number of other situations I want to test around adding, subtracting, and manipulating traffic to a DirAuth to see if there are other strange situations that can arise.
-----
Other notes: - I was annoyed by TestingAuthDirTimeToLearnReachability several times (as I refused to turn on TestingTorNetwork) - I wanted to override it. I thought maybe that should be an option, but ultimately convinced myself that in the event of a network reboot, the 30 minutes would likely still be needed. - The Directory Authority information is a bit out of date. Specifically, I was most confused by V1 vs V2 vs V3 Directories. I am not sure if the actual network's DirAuths set V1AuthoritativeDirectory or V2AuthoritativeDirectory - but I eventually convinced myself that only V3AuthoritativeDirectory was needed. - It seems like an Authority will not vote for itself as an HSDir or Stable... but I could't find precisely where that was in the code. (It makes sense to not vote itself Stable, but I'm not sure why HSDir...) - The networkstatus-bridges file is not included in the tor man page - I feel like the log message "Consensus includes unrecognized authority" (currently info) is worthy of being upgraded to notice. - While debugging, I feel this patch would be helpful. [3] - I've had my eye on Proposal 164 for a bit, so I'm keeping that in mind - I wanted the https://consensus-health.torproject.org/ page for my network, but didn't want to run the java code, so I ported it to python. This project is growing, and right now I've been editing consensus_health_checker.py as well. https://github.com/tomrittervg/doctor/commits/python-website I have a few more TODOs for it (like download statistics), but it's coming along.
-----
Finally, something I wanted to ask after was the idea of a node (an OR, not a client) belonging to two or more Tor networks. From the POV of the node operator, I would see it as a node would add some config lines (maybe 'AdditionalDirServer' to add to, rather than redefining, the default DirServers), and it would upload its descriptors to those as well, fetch a consensus from all AdditionalDirServers, and allow connections from and to nodes in either. I'm still reading through the code to see which areas would be particularly confusing in the context of multiple consensuses, but I thought I'd throw it out there.
-tom
-----
[0] AuthoritativeDirectory 1 V3AuthoritativeDirectory 1
VersioningAuthoritativeDirectory 1 RecommendedClientVersions [stuff] RecommendedServerVersions [stuff] ConsensusParams [stuff]
AuthDirMaxServersPerAddr 0 AuthDirMaxServersPerAuthAddr 0
V3AuthVotingInterval 5 minutes V3AuthVoteDelay 30 seconds V3AuthDistDelay 30 seconds
V3AuthNIntervalsValid 3 MinUptimeHidServDirectoryV2 1 hour
-----
[1] diff --git a/NetworkScanners/BwAuthority/bwauthority_child.py b/NetworkScanners/BwAuthority/bwauthority_child.py index 28b89c2..e07718f 100755 --- a/NetworkScanners/BwAuthority/bwauthority_child.py +++ b/NetworkScanners/BwAuthority/bwauthority_child.py @@ -60,7 +60,7 @@ __selmgr = PathSupport.SelectionManager( percent_fast=100, percent_skip=0, min_bw=1024, - use_all_exits=False, + use_all_exits=True, uniform=True, use_exit=None, use_guards=False,
-----
[2] node_id=$C447A9E99C66A96E775A5EF7A8B0DF96C414D0FE bw=37 nick=relay4 measured_at=1413257649 updated_at=1413257649 pid_error=0.0681488657221 pid_error_sum=0 pid_bw=59064 pid_delta=0 circ_fail=0.0 node_id=$70145044B8C20F46F991B7A38D9F27D157B1CB9D bw=37 nick=relay5 measured_at=1413257649 updated_at=1413257649 pid_error=0.0583026021009 pid_error_sum=0 pid_bw=59603 pid_delta=0 circ_fail=0.0 node_id=$7FAC1066DCCC0C62984B8E579C5AABBBAE8146B2 bw=37 nick=exit2 measured_at=1413257649 updated_at=1413257649 pid_error=0.0307144741235 pid_error_sum=0 pid_bw=55938 pid_delta=0 circ_fail=0.0 node_id=$9838F41EB01BA62B7AA67BDA942AC4DC3B2B0F98 bw=37 nick=exit3 measured_at=1413257649 updated_at=1413257649 pid_error=0.0124944051714 pid_error_sum=0 pid_bw=55986 pid_delta=0 circ_fail=0.0 node_id=$49090AC6DB52AD8FFF95AF1EC1E898126A9E5CA6 bw=37 nick=relay3 measured_at=1413257649 updated_at=1413257649 pid_error=0.0030073731241 pid_error_sum=0 pid_bw=56489 pid_delta=0 circ_fail=0.0 node_id=$F5C43BB6AD2256730197533596930A8DD7BEC367 bw=37 nick=exit1 measured_at=1413257649 updated_at=1413257649 pid_error=-0.0032777385693 pid_error_sum=0 pid_bw=55114 pid_delta=0 circ_fail=0.0 node_id=$3D53FF771CC3CB9DE0A55C33E5E8DA4238C96AB5 bw=37 nick=relay2 measured_at=1413257649 updated_at=1413257649 pid_error=-0.0418210520821 pid_error_sum=0 pid_bw=51021 pid_delta=0 circ_fail=0.0
-----
[3] diff --git a/src/or/networkstatus.c b/src/or/networkstatus.c index 890da0a..4d72add 100644 --- a/src/or/networkstatus.c +++ b/src/or/networkstatus.c @@ -1442,6 +1442,8 @@ networkstatus_note_certs_arrived(void) waiting_body, networkstatus_get_flavor_name(i), NSSET_WAS_WAITING_FOR_CERTS)) { + log_info(LD_DIR, "After fetching certificates, we were able to " + "accept the consensus."); tor_free(waiting_body); } }
18:21 < nickm> tjr: are you still around? 18:26 < tjr> nickm: Yup 18:27 < nickm> So, speaking generally as a reaction: Yeah! Bootstrapping Tor from zero should work better and be easier. If you want to push us that way, we'll get there. If not, there are other ways you can help us get there. 18:28 < nickm> My first suggestion would be: make a master ticket on trac, then open a bunch of child tickets. 18:28 < nickm> (Or open a bunch of tickets with the same keyword) 18:28 < nickm> and then let's solve this foolishness and bring about the new golden age^W^W^Wrestorable tor network 18:29 < nickm> Also: Cool! Thanks for doing these tests! 18:30 < mrphs> GeKo: yeah but eh, brade's comment :/ I should probbaly reply there 18:30 < tjr> haha awesome. I will go the master ticket route, and attach some easy tickets with suggestions/patches and harder tickets that may have patches eventually 18:30 < nickm> wrt a dirauth accepted by some but not all dirauths: This is explicitly not handled by the dirauth design. But it would be cool if our response were better in that case. 18:31 < tjr> nickm: Ah okay, that probably explains some stuff 18:32 < tjr> When you add/subtract one, do all the DirAuths have a flag day? 18:32 < nickm> basically yeah. 18:32 < nickm> It would be nice to make that a more tolerant flag day... 18:32 < tjr> The voting interval on Prod is an hour, so if you time it right, the Running flag issue won't arise, but otherwise it seems risky 18:32 < nickm> but the problem of how to have everybody who has participated in a vote agree on the outcome of the vote when they can't agree on who the voters are... is not a solved problem today 18:34 < nickm> Joining two tor networks is a cool idea. 18:34 < nickm> I don't think it's supported though 18:34 < nickm> Please though, spam my inbox with a huge pile of trac emails! 18:34 < tjr> RE: agreeing on consensus but not voters. Yea, definetly I'm pretty nervous about mucking around in that area - I'm going to have to think about it quite a bit and do a lot of simulations 18:35 < tjr> I'm sure it's not supported, I'm just not entirely sure how much logic inside an OR would get deathly confused by having to support it. 18:35 < tjr> Mostly I'm wondering about parameters being on for one and off for the other 18:35 < nickm> Think also about the basic results in byzantine fault tolerance. With >1/3 parties corrupt, no consensus can be reached with any protocol. 18:36 < nickm> btw, okay with you if I copy-and-paste this conversation to tor-dev in response to your email? :) 18:36 < tjr> Of course
Hi Tom!
Neat stuff. Let me try to point you in useful directions.
On Wed, Oct 15, 2014 at 08:39:12PM -0500, Tom Ritter wrote:
One of the first things I ran into was a problem where I could not get any routers to upload descriptors. [...] I imagine what would actually happen is Noisebridge and TorServers and a few other close friends would set the flag, they would get into the consensus, and then the rest of the network would start coming back...
Yep -- that seems like an adequate plan. Given that the Tor network has been running for the last 12 or 13 years with exactly zero downtime, and we have a plausible way of easily getting it back going if we need to, I'm not worried.
What I had to do was make one of my Directory Authorities an exit - this let the other nodes start building circuits through the authorities and upload descriptors.
This part seems surprising to me -- directory authorities always publish their dirport whether they've found it reachable or not, and relays publish their descriptors directly to the dirport of each directory authority (not through the Tor network).
So maybe there's a bug that you aren't describing, or maybe you are misunderstanding what you saw?
See also https://trac.torproject.org/projects/tor/ticket/11973
Another problem I ran into was that nodes couldn't conduct reachability tests when I had exits that were only using the Reduced Exit Policy - because it doesn't list the ORPort/DirPort! (I was using nonstandard ports actually, but indeed the reduced exit policy does not include 9001 or 9030.) Looking at the current consensus, there are 40 exits that exit to all ports, and 400-something exits that use the ReducedExitPolicy. It seems like 9001 and 9030 should probably be added to that for reachability tests?
The reachability tests for the ORPort involve extending the circuit to the ORPort -- which doesn't use an exit stream. So your relays should have been able to find themselves reachable, and published a descriptor, even with no exit relays in the network.
But I think you're right that they would have opted to list their dirport as 0, since they would not have been able to verify that it's reachable. And that in turn would have caused clients to skip over them and ask their questions to the directory authorities, since they're the only ones advertising (with a non-zero dirport) that they know how to answer directory questions.
So it would work, but it would be non-ideal from a scalability perspective.
And once https://trac.torproject.org/projects/tor/ticket/12538 is resolved it will work more smoothly anyway.
Continuing in this thread, another problem I hit was that (I believe) nodes expect the 'Stable' flag when conducting certain reachability tests. I'm not 100% certain - it may not prevent the relay from uploading a descriptor, but it seems like if no acceptable exit node is Stable - some reachability tests will be stuck. I see these sorts of errors when there is no stable Exit node (the node generating the errors is in fact a Stable Exit though, so it clearly uploaded its descriptor and keeps running):
In consider_testing_reachability() we call
circuit_launch_by_extend_info(CIRCUIT_PURPOSE_TESTING, ei, CIRCLAUNCH_NEED_CAPACITY|CIRCLAUNCH_IS_INTERNAL);
So the ORPort reachability test doesn't require the Stable flag.
The DirPort reachability test just launches a new stream that attaches to circuits like normal, so whether it prefers the Stable flag will be a function of whether the destination DirPort is in the LongLivedPorts set -- usually not I think.
Oct 13 14:49:46.000 [warn] Making tunnel to dirserver failed. Oct 13 14:49:46.000 [warn] We just marked ourself as down. Are your external addresses reachable? Oct 13 14:50:47.000 [notice] No Tor server allows exit to [scrubbed]:25030. Rejecting.
That sure looks like a failed dirport reachability test. Nothing necessarily to do with the Stable flag.
Getting a BWAuth running was... nontrivial.
[...]
There is a tremendous amount of code complexity buried beneath the statement 'Scan the nodes and see how fast they are', and a tremendous amount of informational complexity behind 'Weight the nodes so users can pick a good stream'.
Yeah, no kidding. And worse, its voodoo is no longer correctly tuned for the current network. And it's not robust to intentional lying attacks either. See e.g. the paragraph at the very end of https://lists.torproject.org/pipermail/tor-reports/2014-October/000675.html
dirvote_add_signatures_to_pending_consensus(): Added -1 signatures to consensus.
This one looks like a simple (harmless) bug. The code is
r = networkstatus_add_detached_signatures(pc->consensus, sigs, source, severity, msg_out); log_info(LD_DIR,"Added %d signatures to consensus.", r);
and it shouldn't be logging that if r is < 0.
I then added auth5 to a second DirAuth (auth2) as a trusted DirAuth. This results in a consensus for auth1, auth2, and auth5 - but auth3 and auth4 did not sign it or produce a consensus. Because the consensus was only signed by 2 of the 4 Auths (e.g., not a majority) - it was rejected by the relays (which did not list auth5).
Right -- when you change the set of directory authorities, you need to get a sufficient clump of them to change all at once. This coordination has been a real hassle as we grow the number of directory authorities, and it's one of the main reasons we don't have more currently.
At this point something interesting and unexpected happened:
The other 2 DirAuths (not knowing about auth5) did not have a consensus. This tricked dirvote_recalculate_timing into thinking we should use the TestingV3AuthInitialVotingInterval parameters, so they got out of sync with the other 3 DirAuths (that did know about auth5). That if/else statement seems very odd, and the parameters seem odd as well. First off, I'm not clear what the parameters are intended to represent. The man page says:
TestingV3AuthInitialVotingInterval N minutes|hours Like V3AuthVotingInterval, but for initial voting interval before the first consensus has been created. Changing this requires that TestingTorNetwork is set. (Default: 30 minutes) TestingV3AuthInitialVoteDelay N minutes|hours Like TestingV3AuthInitialVoteDelay, but for initial voting interval before the first consensus has been created. Changing this requires that TestingTorNetwork is set. (Default: 5 minutes) TestingV3AuthInitialDistDelay N minutes|hours Like TestingV3AuthInitialDistDelay, but for initial voting interval before the first consensus has been created. Changing this requires that TestingTorNetwork is set. (Default: 5 minutes)
Basically, if you didn't make a consensus, you try to make one every half hour rather than every hour, on the theory that the network should recover faster if a lot of authorities are in this boat.
Notice that the first says "Like V3AuthVotingInterval", but the other two just repeat their name?
This was fixed in git commit c03cfc05, and I think the fix went into Tor 0.2.4.13-alpha. What ancient version is your man page from?
And how there _is no_ V3AuthInitialVotingInterval? And that you can't modify these parameters without turning on TestingTorParameters (despite the fact that they will be used without TestingTorNetwork?) And also, unrelated to the naming, these parameters are a fallback case for when we don't have a consensus, but if they're not kept in sync with V3AuthVotingInterval and their kin - the DirAuth can wind up completely out of sync and be unable to recover (except by luck).
Yeah, don't mess with them unless you know what you're doing.
As for the confusing names, you're totally right: https://trac.torproject.org/projects/tor/ticket/11967
Other notes:
- I was annoyed by TestingAuthDirTimeToLearnReachability several
times (as I refused to turn on TestingTorNetwork) - I wanted to override it. I thought maybe that should be an option, but ultimately convinced myself that in the event of a network reboot, the 30 minutes would likely still be needed.
Right. If you want to crank down this '30 minutes' value while actually relying on the reachability tests, you will also need to crank up the fraction of the network that gets tested on each call to dirserv_test_reachability() -- i.e. REACHABILITY_MODULO_PER_TEST and REACHABILITY_TEST_INTERVAL.
- The Directory Authority information is a bit out of date.
Specifically, I was most confused by V1 vs V2 vs V3 Directories. I am not sure if the actual network's DirAuths set V1AuthoritativeDirectory or V2AuthoritativeDirectory - but I eventually convinced myself that only V3AuthoritativeDirectory was needed.
Correct. Can you submit a ticket to fix this, wherever you found it? Assuming it wasn't from your ancient man page that is? :)
- It seems like an Authority will not vote for itself as an HSDir or
Stable... but I could't find precisely where that was in the code. (It makes sense to not vote itself Stable, but I'm not sure why HSDir...)
I think this is a bug. Mostly a harmless one in practice, but it might be surprising in a tiny test network.
- The networkstatus-bridges file is not included in the tor man page
Yep. Please file a ticket.
- I feel like the log message "Consensus includes unrecognized
authority" (currently info) is worthy of being upgraded to notice.
I don't think this is wise -- it is fine to have a consensus that has been signed by a newer authority than you know about, so long as it has enough signatures from ones you do know about.
If we made this a notice, then every time we added a new authority, all the users running stable would see scary-sounding log messages and report them to us over and over.
- I wanted the https://consensus-health.torproject.org/ page for my
network, but didn't want to run the java code, so I ported it to python. This project is growing, and right now I've been editing consensus_health_checker.py as well. https://github.com/tomrittervg/doctor/commits/python-website I have a few more TODOs for it (like download statistics), but it's coming along.
Neat! Karsten has been wanting to get rid of the consensus-health page for a while now. Maybe you want to run the replacement?
Finally, something I wanted to ask after was the idea of a node (an OR, not a client) belonging to two or more Tor networks. From the POV of the node operator, I would see it as a node would add some config lines (maybe 'AdditionalDirServer' to add to, rather than redefining, the default DirServers), and it would upload its descriptors to those as well, fetch a consensus from all AdditionalDirServers, and allow connections from and to nodes in either. I'm still reading through the code to see which areas would be particularly confusing in the context of multiple consensuses, but I thought I'd throw it out there.
This idea should work in theory. In fact, back when Ironkey was running their own Tor network, I joked periodically about just dumping the cached-descriptors file from their network into moria1's cached-descriptors file. I think that by itself would have been sufficient to add all of those relays into our Tor network.
We're slowly accumulating situations where we want all the relays to know about all the relays (e.g. RefuseUnknownExits), but I don't think the world ends when it isn't quite true.
Thanks! --Roger
On 22 October 2014 05:48, Roger Dingledine arma@mit.edu wrote:
What I had to do was make one of my Directory Authorities an exit - this let the other nodes start building circuits through the authorities and upload descriptors.
This part seems surprising to me -- directory authorities always publish their dirport whether they've found it reachable or not, and relays publish their descriptors directly to the dirport of each directory authority (not through the Tor network).
So maybe there's a bug that you aren't describing, or maybe you are misunderstanding what you saw?
See also https://trac.torproject.org/projects/tor/ticket/11973
Another problem I ran into was that nodes couldn't conduct reachability tests when I had exits that were only using the Reduced Exit Policy - because it doesn't list the ORPort/DirPort! (I was using nonstandard ports actually, but indeed the reduced exit policy does not include 9001 or 9030.) Looking at the current consensus, there are 40 exits that exit to all ports, and 400-something exits that use the ReducedExitPolicy. It seems like 9001 and 9030 should probably be added to that for reachability tests?
The reachability tests for the ORPort involve extending the circuit to the ORPort -- which doesn't use an exit stream. So your relays should have been able to find themselves reachable, and published a descriptor, even with no exit relays in the network.
I think I traced down the source of the behavior I saw. In brief, I don't think reachability tests happen when there are no Exit nodes because of a quirk in the bootstrapping process, where we never think we have a minimum of directory information:
Nov 09 22:10:26.000 [notice] I learned some more directory information, but not enough to build a circuit: We need more descriptors: we have 5/5, and can only build 0% of likely paths. (We have 100% of guards bw, 100% of midpoint bw, and 0% of exit bw.)
In long form: https://trac.torproject.org/projects/tor/ticket/13718
Continuing in this thread, another problem I hit was that (I believe) nodes expect the 'Stable' flag when conducting certain reachability tests. I'm not 100% certain - it may not prevent the relay from uploading a descriptor, but it seems like if no acceptable exit node is Stable - some reachability tests will be stuck. I see these sorts of errors when there is no stable Exit node (the node generating the errors is in fact a Stable Exit though, so it clearly uploaded its descriptor and keeps running):
In consider_testing_reachability() we call
circuit_launch_by_extend_info(CIRCUIT_PURPOSE_TESTING, ei, CIRCLAUNCH_NEED_CAPACITY|CIRCLAUNCH_IS_INTERNAL);
So the ORPort reachability test doesn't require the Stable flag.
You're right, reachability doesn't depend on Stable, sorry.
I then added auth5 to a second DirAuth (auth2) as a trusted DirAuth. This results in a consensus for auth1, auth2, and auth5 - but auth3 and auth4 did not sign it or produce a consensus. Because the consensus was only signed by 2 of the 4 Auths (e.g., not a majority) - it was rejected by the relays (which did not list auth5).
Right -- when you change the set of directory authorities, you need to get a sufficient clump of them to change all at once. This coordination has been a real hassle as we grow the number of directory authorities, and it's one of the main reasons we don't have more currently.
I'm going to try thinking more about this problem.
This was fixed in git commit c03cfc05, and I think the fix went into Tor 0.2.4.13-alpha. What ancient version is your man page from?
/looks sheepish I was using http://linux.die.net/man/1/tor because it's very quick to pull up :-p
And how there _is no_ V3AuthInitialVotingInterval? And that you can't modify these parameters without turning on TestingTorParameters (despite the fact that they will be used without TestingTorNetwork?) And also, unrelated to the naming, these parameters are a fallback case for when we don't have a consensus, but if they're not kept in sync with V3AuthVotingInterval and their kin - the DirAuth can wind up completely out of sync and be unable to recover (except by luck).
Yeah, don't mess with them unless you know what you're doing.
As for the confusing names, you're totally right: https://trac.torproject.org/projects/tor/ticket/11967
Ahha.
- The Directory Authority information is a bit out of date.
Specifically, I was most confused by V1 vs V2 vs V3 Directories. I am not sure if the actual network's DirAuths set V1AuthoritativeDirectory or V2AuthoritativeDirectory - but I eventually convinced myself that only V3AuthoritativeDirectory was needed.
Correct. Can you submit a ticket to fix this, wherever you found it? Assuming it wasn't from your ancient man page that is? :)
It was.
- The networkstatus-bridges file is not included in the tor man page
Yep. Please file a ticket.
https://trac.torproject.org/projects/tor/ticket/13713
- I feel like the log message "Consensus includes unrecognized
authority" (currently info) is worthy of being upgraded to notice.
I don't think this is wise -- it is fine to have a consensus that has been signed by a newer authority than you know about, so long as it has enough signatures from ones you do know about.
If we made this a notice, then every time we added a new authority, all the users running stable would see scary-sounding log messages and report them to us over and over.
That's fair.
- I wanted the https://consensus-health.torproject.org/ page for my
network, but didn't want to run the java code, so I ported it to python. This project is growing, and right now I've been editing consensus_health_checker.py as well. https://github.com/tomrittervg/doctor/commits/python-website I have a few more TODOs for it (like download statistics), but it's coming along.
Neat! Karsten has been wanting to get rid of the consensus-health page for a while now. Maybe you want to run the replacement?
Yes, I think that is going to happen: https://trac.torproject.org/projects/tor/ticket/13637
Finally, something I wanted to ask after was the idea of a node (an OR, not a client) belonging to two or more Tor networks. From the POV of the node operator, I would see it as a node would add some config lines (maybe 'AdditionalDirServer' to add to, rather than redefining, the default DirServers), and it would upload its descriptors to those as well, fetch a consensus from all AdditionalDirServers, and allow connections from and to nodes in either. I'm still reading through the code to see which areas would be particularly confusing in the context of multiple consensuses, but I thought I'd throw it out there.
This idea should work in theory. In fact, back when Ironkey was running their own Tor network, I joked periodically about just dumping the cached-descriptors file from their network into moria1's cached-descriptors file. I think that by itself would have been sufficient to add all of those relays into our Tor network.
Curious. I will try running this idea down a bit more.
We're slowly accumulating situations where we want all the relays to know about all the relays (e.g. RefuseUnknownExits), but I don't think the world ends when it isn't quite true.
Sure, but that only matters if you're trying to bridge Tor networks without cooperation - a Tor node that wants to sit on two networks wouldn't have a problem knowing about all the nodes in each. And a Tor client using network A or B would route only through that network. I didn't imagine them as interleaving, I imagined them as separate, with some relay operators opting to move traffic for both.
-tom