Tim: 
Thank you for your understanding! 
Our Tor baseline version is Tor 0.2.8.0-alpha-dev at commit hash42dea56363c24960e85344749644f6502f625463 (on Jan 26, 2016). 
I also included a sample torrc file used by a relay. Hope it's useful. 

I generally use a fresh directory with chutney every time, so I haven't run into this bug.
I'd encourage you to use a fresh config with chutney every time until it's fixed.

I would hope this is not an issue that has something to do with the amount of traffic going through the network (it is possible that the QUIC system underneath crashes when under stress) because we did change the network topology in between the testings. I will run a normal version of Tor today to try to catch this bug. If it cannot be reproduced with normal Tor, I'll report to close the ticket; or I'll contribute more proof on this issue. 

Are you using the 'TestingDirAuthVoteGuard *' option in your authorities' configs?

Yes. For other flags, see the torrc file below. The authority and client torrc are similar. 

In 0.2.6.2-alpha (commit 22a1e9cac), I added a fix for this last issue when TestingTorNetwork is 1.
Given the line number you're using below, it looks like you're using a version of tor without this fix.
If you're using a private network, I'm guessing you have the default 3 entry guards.

I've checked and this commit is in our code. With this commit, will Tor still take out 3 nodes whenever it's building a circuit? 

So please use at least 8 working nodes in your network.

This is my plan for today as well. I asked the question before I increased the number of nodes because a long time ago we used to test with chutney on networks/basic-min, which has even fewer nodes and I didn't see this problem. 
Just to be clear, why is it that the "excluding nodes" behavior is somehow undeterministic? In our experiment, rarely (but it does happen) a circuit will be completely built and other times all the nodes in the network will all be excluded. Is this related to Tor falling back to use all routers when it can't find enough "good" routers? 
 
I also can't see how the issue you describes relates to the commit you linked to: 62fb209d837f3f5510075ef8bdb6e231ebdfa9bc.
If it still concerns you, can you check you have the right commit, or explain further?

I was simply backtracing the code to where nodes are being excluded from the usable router list and found this comment. I was looking for any code that would indicate I have a wrong torrc flag (maybe excluded some nodes by mistake). Now that I know I should use 8 nodes, yes this commit is less relevant. 

Tim

Tim Wilson-Brown (teor)

teor2345 at gmail dot com
PGP 968F094B
ricochet:ekmygaiu4rzgsk6n