Reinaldo Junior rjunior@thoughtworks.com writes:
- number of guards we tried until the first successful circuit
- time until the first successful circuit is built
A successful circuit is one which we succeeded to find a guard using the algorithm AND we succeeded to connect to it.
In general, we are interested in security and performance. For security we
are trying to minimize our exposure to network. For performance, we want to minimize our downtime when our current guard becomes unreachable or after our network comes back up.
Here are some concrete statistics that we could gather in the simulator:
Security statistics: - Number of unique guards we connected to during the course of the simulation.
We have this as "exposure after 30 hours".
- Time spent connected to lower priority guards while a primary
guard was online. - Time spent connected to lower priority guards while a higher priority guard was online and the network was up.
We don't have these. And also I'm not sure about how we should detect network conditions: we can try to guess from the algorithm or look at which network scenario we are using at the moment.
With the above statistics I'm trying to find out how well our guard picking algorithms cope under unreliable networks. For example:
- Alice goes to a coffee shop with a FascistFirewall. Her guard (position #2 on her guardlist) is on port 9001 so it stops working. Tor performs the guard picking algorithm and finds a new guard on port 80 that works but is position #6 on her guardlist.
Now imagine that the primary guard in position #1 was on port 80 so it _could_ actually work behind a FascistFirewall, but because the guard picking algorithm goes downwards the list Alice ended up with guard #6. This is suboptimal behavior. An optimal guard algorithm would switch directly to the guard in position #1.
- Alice travels a lot and over the day she works on her laptop without Internet 70% of the time. Even while she is offline, Tor is active (because who shuts down the system tor), so Tor keeps on cycling through guards continuously. At some point she reaches a coffee shop and goes online, and successfuly makes a circuit to a guard G. Depending on the guard picking algorithm, this guard G might be #1 or #6 or #12 position on the guard list. If it's one of the two latter cases, a good guard algorithm will realize that it did not connect to a high-priority guard, and would somehow go back to #1 (for example, prop259 does the primary guards 3 seconds trigger). By gathering the two statistics suggested above we learn how well a guard picking algorithm can cope under such scenarios.
Do you have any scenarios like the above in guardsim? I think particularly the travelling Alice scenario will be very useful for stress testing algorithms. Note that it's different from the FlakyNetwork scenario in tornet.py, because FlakyNetwork just returns "circuit failed" based on some independent probability for each circuit, whereas in the TravellingAlice scenario we want to always return "circuit failed" _for some time_ before we start returning "circuit success" again.
The way you should actually "detect" these network conditions on your codebase actually depends on the architecture of the simulator. I might have some time tomorrow to take a look at the code and suggest some approaches. Would that be helpful?
I'm only going to touch this subject for now because of lack of time. Will reply in more length tomorrow....
Performance statistics: - Time spent cycling through guards. - Time spent cycling through guards while network is up.
Since time is stopped while we're choosing guards we have to come with a different metric for this. And it also requires detecting the network time.
- Time spent on dystopic mode.
- Time spent on dystopic mode while the network was utopic.
These should be easy as long as we have defined how to detect the network type.
Is it possible to collect those statistics? I'm curious to learn how the current guard algorithm compares to the new prop259 on those aspects.
We have tooling to generate graphs with success rate and exposure taken from a round of ~500 simulations. I can send them to you when they finish running ;)
What other stats are important here you think?
We have discussed about counting how many network connections we make over time. For now, we have been comparing success and exposure.
I guess we can add these stats, we just need to come up with an approach to determine the network condition.
All the code is in https://github.com/twstrike/tor_guardsim (branch develop).
1 - doc/stuff-to-test.txt