Hi everyone,
I’m building a messaging app based on Tor v3 onion services and I’m wondering what kind of uptime expectations we should set with users and other stakeholders.
Is there data over time on uptime for onion service functionality? That is, not for a particular onion service, but for something like, given that the user’s access to Tor is not being limited by their ISP, and given that the onion service is fully operational, whether a Tor user can reach the onion service?
Some more concrete versions of this question are:
1. For what percentage of time over a given time period (say the past 3 years) are there no known network-wide problems affecting onion services? 2. What percentage of attempts by a user attempting to connect to a onion service are successful, assuming no successful censorship of the user’s network? 3. Is there some incident log somewhere of problems that affected onion services network wide that includes how long these problems persisted for? (I don’t see any onion service outage notes in this document, though I seem to remember there was an issue a few months back? https://metrics.torproject.org/news.html https://metrics.torproject.org/news.html)
I see there’s uptime data for various relays, but I’m not sure how to translate this into a meaningful answer to the two above questions. Are there any good answers to these questions out there in the wild? Even approximate answers or lower bounds for uptime are fine and super helpful!
Thanks!!! Holmes
And I just saw today's blog post about the new status page. Congrats on launching this! Someone is reading my mind :)
But is there any good source for historical data on incidents or re: the questions below?
Also, is there currently some monitoring in place such that someone on the Tor team gets a phonecall or SMS alert if onion services seem to be globally down? Several projects I’ve been a part of over the years have benefitted immensely from this, using tools like PagerDuty, so I’m curious if Tor has something like this for onion services.
Great work and I’m curious to learn more!!
—Holmes
On May 5, 2021, at 3:27 PM, Holmes Wilson h@zbay.llc wrote:
Hi everyone,
I’m building a messaging app based on Tor v3 onion services and I’m wondering what kind of uptime expectations we should set with users and other stakeholders.
Is there data over time on uptime for onion service functionality? That is, not for a particular onion service, but for something like, given that the user’s access to Tor is not being limited by their ISP, and given that the onion service is fully operational, whether a Tor user can reach the onion service?
Some more concrete versions of this question are:
- For what percentage of time over a given time period (say the past 3 years) are there no known network-wide problems affecting onion services?
- What percentage of attempts by a user attempting to connect to a onion service are successful, assuming no successful censorship of the user’s network?
- Is there some incident log somewhere of problems that affected onion services network wide that includes how long these problems persisted for? (I don’t see any onion service outage notes in this document, though I seem to remember there was an issue a few months back? https://metrics.torproject.org/news.html https://metrics.torproject.org/news.html)
I see there’s uptime data for various relays, but I’m not sure how to translate this into a meaningful answer to the two above questions. Are there any good answers to these questions out there in the wild? Even approximate answers or lower bounds for uptime are fine and super helpful!
Thanks!!! Holmes
Holmes Wilson h@zbay.llc writes:
And I just saw today's blog post about the new status page. Congrats on launching this! Someone is reading my mind :)
Hello Holmes,
glad you like the status page! It's indeed great!
But is there any good source for historical data on incidents or re: the questions below?
I don't think we have historical data about v3 downtimes unfortunately, apart from the January event mentioned on status.torproject.org .
We *have* developed tools to monitor the health of v3 onion services in an attempt to weed down reachability issues [0] but we mainly used the tool to find specific bugs, and not as a downtime scanner. That is, we never performed truly long-term experiments with it, or hooked it to some global dashboard.
Also, is there currently some monitoring in place such that someone on the Tor team gets a phonecall or SMS alert if onion services seem to be globally down? Several projects I’ve been a part of over the years have benefitted immensely from this, using tools like PagerDuty, so I’m curious if Tor has something like this for onion services.
We are currently not aware of any unresolved reachability issues with v3 onion services.
Fortunately, the onion community is pretty active so we usually become aware of such issues pretty quickly if they appear on a widespread scale. That said, I don't think getting alerted more quickly (via an SMS) would be a bad idea.
Regarding your question:
- What percentage of attempts by a user attempting to connect to a
onion service are successful, assuming no successful censorship of the user’s network?
I would say 100% of attempts modulo unknown reachability bugs. Even if the client picks a bad path, and a circuit gets broken, the Tor client should be smart enough to rebuild the circuit and retry the onion connection.
As always, if you have encountered reachability issues, please do get in touch with us (and also please provide some logs) so that we can look this more deeply.
[0]: https://gitlab.torproject.org/tpo/core/tor/-/issues/28841
On May 5, 2021, at 3:27 PM, Holmes Wilson h@zbay.llc wrote:
Hi everyone,
I’m building a messaging app based on Tor v3 onion services and I’m wondering what kind of uptime expectations we should set with users and other stakeholders.
Is there data over time on uptime for onion service functionality? That is, not for a particular onion service, but for something like, given that the user’s access to Tor is not being limited by their ISP, and given that the onion service is fully operational, whether a Tor user can reach the onion service?
Some more concrete versions of this question are:
- For what percentage of time over a given time period (say the past 3 years) are there no known network-wide problems affecting onion services?
- What percentage of attempts by a user attempting to connect to a onion service are successful, assuming no successful censorship of the user’s network?
- Is there some incident log somewhere of problems that affected onion services network wide that includes how long these problems persisted for? (I don’t see any onion service outage notes in this document, though I seem to remember there was an issue a few months back? https://metrics.torproject.org/news.html https://metrics.torproject.org/news.html)
I see there’s uptime data for various relays, but I’m not sure how to translate this into a meaningful answer to the two above questions. Are there any good answers to these questions out there in the wild? Even approximate answers or lower bounds for uptime are fine and super helpful!
Thanks!!! Holmes
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
On Wed, May 05, 2021 at 03:27:23PM -0400, Holmes Wilson wrote:
- Is there some incident log somewhere of problems that affected
onion services network wide that includes how long these problems persisted for? (I don't see any onion service outage notes in this document, though I seem to remember there was an issue a few months back? https://metrics.torproject.org/news.html)
There's a small number of onion-related events at https://gitlab.torproject.org/tpo/metrics/timeline (search for "onion".)
Metrics news.html takes its input from the above timeline, but news.html is currently out of sync, since before the January incident: https://gitlab.torproject.org/tpo/metrics/timeline/-/issues/4