Hello list,
we've recently been thinking about how to expose onion-service-related errors to Tor Browser so that we can give more useful error pages to users. We currently return "Unable to connect" error pages for any kind of onion service error, and I think we can do better.
This is a thread to think about the errors we want to expose, how that should look like, and what options we should give to the users when it happens. Relevant master tickets are #30022, #30025 and #30000.
We decided (in #14389) that Tor will export these errors through the SOCKS port, and the relevant spec is proposal 304 [0].
As part of #30090 antonela started making a table of potential errors. I'm gonna use that in this thread and also add a few more.
Let's go:
= Client-level errors =
These are errors on the user side of things:
=== 1) Typo error on address ===
This can be detected by Tor using the checksum or if the address is too big or too small.
TODO: We will need to add a new error code to prop304. Not sure if the error code should distinguish between checksum fail or length fail.
There is no recovery here since the address is busted. The user needs to find the right one.
=== 2) Missing Client Authorization ===
This is prop304's 'F4' error (see #30382), and it means that we can't decrypt the descriptor because it requires client auth, but we don't have it configured.
The recovery here is the whole point of #30237 where we make a dialog for the user to insert their client auth credentials.
=== 3) Wrong Client Authorization ===
This is prop304's 'F5' error, and it means that there client auth credentials configured for this onion are wrong.
The user recovery here is unclear but it might be that they need to change their client auth credentials. IMO, we should not try to make the perfect UX here, and we should just go with something super simple.
= Service-level errors =
These are errors on the onion service side:
=== 4) Service Descriptor Can Not be Found ===
This is prop304's 'F0' error, and it means that we could not find the descriptor of the service on the directory servers. This means that the service is not up right now (or, more unlikely, that some bug has happened somewhere).
The user recovery here is unclear. The user can try to reconnect in case the service got up in the meanwhile, but this is not so likely in a small period of time.
Perhaps we can give the user the option to reconnect every 10 seconds or so? Does this make sense from a UX PoV?
Again this equivalent to a "Remote host is down" error and we should use it as such.
= Network-level errors =
These are errors caused by the network (directory servers, intro points, rendezvous points) or even the service itself. It's kinda unclear given all the hops involved.
=== 5) Onion Service Descriptor Is Invalid ===
This is prop304's 'F1' error and it means that we got a descriptor back from the directory but it's corrupted.
This is very unlikely to happen since directory servers do not keep corrupted descriptors, so it usually means that some bug happened somewhere (or that the directory is bad or confused).
In terms of recovery and error page, this is kinda an "Oops. Internal error." situation where this is rare and weird and hence we don't know what's the best recovery option. We can give the option to reconnect but it's likely not gonna help much.
Again this should never really appear, so let's not stress too much over it.
=== 6) Onion Service Introduction Failed ===
This is prop304's 'F2' error and it means that for some reason the introduction did not complete. This could be because the onion service is not up anymore, or it could be because the network is screwed in some way (e.g. the service is DoSed).
The recovery here might be some 'reconnect' button which could be helpful in case of a DoS situation, but it would not help much if the service is not up anymore.
=== 7) Onion Service Rendezvous Failed ===
This is prop304's 'F3' error and it means that the rendezvous did not complete. This usually means that the service is having a bad time, and is either DoSed or it generally cannot cope.
The recovery again here is some 'reconnect' button, since if we did the introduction successfuly, the service is up, and reconnecting might work at some point.
This one and (6) are very related and perhaps they can be handlded identically, since exposing terms like "intro" and "rend" to users will not be nice. Still we might want to expose a technical error value somewhere for debugging purposes when users come to us.
=======================================================================
I think the above set of errors will satisfy all our needs. In particular: - #30022 (typos ticket) needs error (1) from above. - #30025 (client errors ticket) needs errors (4), (5), (6), (7) from above. - #30000 (client auth) needs errors (2) and (3) from above.
In terms of error page, I'm not sure how it should look like. Perhaps along with the error description and the recovery path, we should provide some education about onion services to the users?
In terms of unsafe paths, I don't see any of these errors being dangerous in terms of causing security issues if you attempt to reconnect or anything. The Tor protocol takes care of this in the layer below. The worst thing you can do is slightly damage the network from too many reconnects, but I think that's OK since people who use the TB are legitimate users and not DoS attackers.
===
Hope this was useful!
[0]: https://github.com/torproject/torspec/blob/master/proposals/304-socks5-exten...