On 8 May 2016, at 01:52, Xiaofan Li xli2@andrew.cmu.edu wrote:
About the issue, I've checkout the 0.2.8 commit and tested on that. The problem is still there so I looked deeper into it. I've run it many time and it seems like once I start restricting path, it becomes undeterministic whether the bootstrap will succeed. And I think it might have something to do with the cache-microdesc-consensus file fetched by that client. Just for recap, I'm running a network with 11 nodes (2 relays) and 2 clients who have path restriction.
As you said in your next email, this is meant to be "11 nodes (2 authorities)".
Please use an odd number of authorities, such as 3. If you have an even number of authorities, they can't break ties, and this can cause a consensus not to form, or perhaps to lose nodes.
My observations are: • Each client will have a cache-microdesc-consensus file with 4 relays in it. relay 0, 1 and 2 will always be there and the last one changes each time I start the network. • When the all 3 nodes on the restricted path are on the cache-microdesc-consensus file, the bootstrap will succeed quickly. For example, if my path is restricted to R2->R3->R1, since 0, 1 and 2 are always present in the consensus, whenever R3 is there, the bootstrap will work. • When one of the node is not on the consensus, the bootstrap will be stuck and never reach 100%. Depending on which node of the path is not included in the consensus, the error message varies. In the above example, if R3 is not in the consensus, we will fail to connect to hop 1 (assume 0-based logging). • I waited for a long time (~30min) and nothing would improve: consensus does not contain more nodes and bootstrap would still be stuck. I think the root of the problem might be the consensus having too few nodes.. Is it normal for a cache-microdesc-consensus file to only have 4 nodes in a 11-node network? Should I look into how the code that generate the consensus?
If you can't get all 11 relays in your consensus, you have a network configuration issue between those relays and the authorities, not a Tor code issue.
The routerlist_t I mentioned is in routerlist.c, line 124. 124/** Global list of all of the routers that we know about. */ 125static routerlist_t *routerlist = NULL;
But now I think this probably just stores the same info as the cache-microdesc-consensus file, right?
Yes.
Hmm, then it's likely a configuration issue with your network.
Shouldn't chutney also fail if it is a configuration issue? Or are you saying it's a configuration issue with my underlying network topology?
It's one or the other. I really can't tell based on the information you've given. I'm just guessing.
The only thing different in the torrc files for the chutney run and the Emulab run is "Sandbox 1" and "RunAsDaemon 1" but I don't think they cause any issue?
They could, if your configuration asks them to access files that are blocked by the sandbox.
But it's far more likely that some of the relays are configured with the wrong addresses and ports (either in the torrc or in the OS), or aren't actually connected to your network properly at lower layers, such as TCP or IP or ethernet.
Tim
Tim Wilson-Brown (teor)
teor2345 at gmail dot com PGP 968F094B ricochet:ekmygaiu4rzgsk6n