Hi teor,
Sorry for the delayed response, but should I start working on the code now, or should I wait for feedback from other developers?
Also, for Tor developers other than teor (or even including teor), what's your opinion on Prop299? Is it ready for implementation, or are revisions needed?
Thank You,
Neel
===
February 5, 2019 11:40 PM, "teor" teor@riseup.net wrote:
Hi Neel,
Thanks for your initial draft code, and this proposal.
On February 6, 2019 12:26:40 AM UTC, Neel Chauhan neel@neelc.org wrote:
Hi tor-dev@ mailing list,
First off, thank you to Nick for making this an official proposal and thank you again for marking it as open. I really appreciate this. Also, thank you teor for aiding me on my first proposal.
My proposal is available on torspec here: https://gitweb.torproject.org/torspec.git/tree/proposals/299-ip-failure-coun...
Now that my proposal "Preferring IPv4 or IPv6 based on IP Version Failure Count" is Open, I would really appreciate your opinions on this. Is it good, bad? Could it have any improvements?
I think this proposal is good for an experimental option. We could develop and merge the code, but not turn it on by default. Then we could do some testing to tune the design.
Here's one thing we must fix before we start implementing this proposal:
We don't store connection statistics on Tor clients right now. This proposal would make us store these statistics.
The connection count from 3 sessions (Tor launches? Days?) ago doesn't tell us much about the current network state. But it's really bad for user privacy.
So how often should we forget?
Remember: many users only wait 10 seconds for a web browser page to load. Most Tor Browser users are more patient, but they still give up after 30 seconds or a few minutes. So we don't want to leave them hanging for long after a network change.
Here's a quick way to forget old connections, while retaining an approximate history: regularly divide both counters by two. We can check if we want to do the division when we add a new failure.
If the division takes us down to zero, we can re-initialise (see 1.).
Or maybe we should just store the last summarised failure point value (SFPV) in the state file?
Here are some questions we need to answer before we turn it on by default:
- Are N/8 fractions a good choice?
While bootstrapping, Tor makes up to 7 connections in the first 30 seconds. But if a connection hangs, Tor only allows 3 concurrent connections (see 3.). So N/8 is probably too low?
Maybe we should consider a larger fraction (for example, N/4). But there's a design tradeoff here:
- failing 1/4 connections wastes bandwidth, but web browsers with happy eyeballs fail up to 50%, so
it can't be that bad
- failing 1/4 connections may trigger path bias warnings (see 3.)
- trying 1/4 of each IP version makes starting up and changing networks faster for users
- trying 1/4 of each IP version limits our ability to load-balance across IP families
Let's try N/4, and see how it goes? Even if we guess wrong, we still want Tor to work.
- What is the starting SFPV?
I suggest that we use the number of IPv4 and IPv6-capable entry nodes to calculate the initial SFPV.
For standard clients:
- during initial bootstrap: count fallback directory mirrors
- once the initial consensus is received: count guards in the consensus
For bridge clients:
- count configured bridges.
That way, new clients are automatically load-balanced across IPv4 and IPv6. (We shouldn't add the actual number of guards to the counters: that would swamp the first few thousand connection failures.)
If we have a recent connection history, we don't need to update the counters when the consensus or bridge config changes. But if we are hibernating or dormant, we should use the entry nodes to seed the SFPV.
- When switching between IPv4-only and IPv6-only networks, the circuit failure rate could start as
high as 87.5% (7/8), then approach 12.5% (1/8). Depending on the historical number of connections, the failure rate could stay at 7/8 for quite some time.
Switching to N/4 makes the range 3/4 to 1/4, which is good right after switching, but bad long-term.
What does the pathbias code do when this many failures happen? We could make pathbias smarter: for example, it could ignore or scale down "no route", or add 1/4 to its threshold when ClientAutoIPv6ORPort is in use.
- What happens on a network which drops IPv4 or IPv6 packets?
Tor only makes 3 simultaneous connections, and if they are all the same protocol, Tor will wait for 2 minutes for the connections to timeout.
The Happy Eyeballs RFC: https://tools.ietf.org/html/rfc8305 avoids this issue by making concurrent IPv4 and IPv6 connections.
Tor could make concurrent connections, or just keep the sequential connection code.
Either way, we should increase the connection limit. But increasing the connection limit increases the DoS risk. We can limit the risk in two ways:
- make the limit higher for pending connections, but keep the connection limit at 3 for connections
that have successfully opened TCP (before they do an expensive TLS handshake)
- make a separate connection limit for IPv4 and IPv6
I don't know which change is easier. Perhaps both would be useful.
- Do we want to count successful connections?
If we know that an IP version works, we should use it. And if both work, we should use both, with the right load balancing (see 1.).
I hope that gives you some things to think about as we write, review, merge, and test this code.
T
-- teor