Hi David,
On 29/10/2019 14:52, David Goulet wrote:
Long story short, couple weeks ago we've almost merged a new behavior on the service side with #31561 that would have ditch an intro point if its circuit would time out instead of retrying it. (Today, a service always retry their intro point up to 3 times on any type of circuit failure.)
Thanks for not merging this yet. :-)
The primary original argument for retrying is based on the mobile use case. If a .onion is running on a cellphone and the network happens to be bad all the sudden, the service is better off to re-establish the intro circuits which would make the retry attempts of the client to finally succeed after a bit instead of having to re-fetch a descriptor and go to the new intro points.
Thus, in theory, it is mostly a reachability argument.
One question that can arise from this is: Will the client be able to reconnect using the old intro points by the time the service re-established?
In other words, is the retry behavior of the *client* allows enough time for the service to stabilize for the mobile use case? I'm curious to learn from people with experience with this!
For what it's worth, we used to run into the following problem with Briar:
* Device X tries to connect to device Y's hidden service * X has a cached descriptor for Y's HS * Since the time when X cached the descriptor, Y has lost its guard connection, so it's built new intro circuits to new intro points * After multiple connection attempts, X gives up on the intro points in the cached descriptor and fetches a new descriptor * This causes a delay in X connecting to Y
A typical mobile device loses its guard connection frequently - not necessarily because it loses internet access, but because it switches between wifi and mobile data. So the scenario above was very common.
Before the HS behaviour was changed to reuse the old intro points, we had to maintain a patch against Tor to add a controller command for flushing a cached HS descriptor before trying to connect. This essentially made the client's descriptor cache redundant, so it was a slight loss of efficiency, but better than trying a bunch of stale intro points and then fetching a new descriptor anyway.
If you're considering switching back to the old behaviour, I'd like to discuss whether we could make one of the following changes to continue supporting the mobile HS use case:
1. Add a controller command for flushing an HS descriptor 2. Add a controller command for notifying Tor that we lost/gained internet access, or switched between wifi and mobile data, so Tor knows that (a) its guard connection may be dead, and (b) its intro circuits may be dead, but not due to an attack by the intro points, so it can safely reuse the intro points 3. If intro circuits are closed due to DisableNetwork changing from 0 to 1, remember this and reuse the intro points when the network is re-enabled
Android notifies apps of connectivity changes, so Briar could easily pass this information on to Tor via a new controller command or by setting DisableNetwork. (The general problem of detecting whether our internet connectivity is broken for some definition of broken remains hard, but fortunately we don't need to solve that to handle the common cases of switching between wifi and mobile data, and losing mobile signal, which the OS can tell us about.)
My one-sided two cents. ;-)
Cheers, Michael