On Wed, Jan 29, 2020 at 9:04 AM teor teor@riseup.net wrote:
Hello again! This looks like another fine proposal. I'm leaving comments inline, and clipping sections that I'm not commenting on.
Filename: 312-relay-auto-ipv6-addr.txt Title: Tor Relays Automatically Find Their IPv6 Address Author: teor Created: 28-January-2020 Status: Draft Ticket: #33073
Abstract
We propose that Tor relays (and bridges) should automatically find their IPv6 address, and use it to publish an IPv6 ORPort. For some relays to find their IPv6 address, they may need to fetch some directory documents from directory authorities over IPv6. (For anonymity reasons, bridges are unable to fetch directory documents over IPv6, until clients start to do so.)
Introduction
Tor relays (and bridges) currently find their IPv4 address, and use it as their ORPort and DirPort address when publishing their descriptor. But relays and bridges do not automatically find their IPv6 address.
At the beginning of this document, we should be a bit more clear about which address specifically we're trying to find. If we wanted _some_ address, or if NAT and firewalls didn't exist, we could just open a socket, call getsockname(), and be done with it. What we are looking for specifically is an address that we can advertise to the rest of the world in our server descriptor. [I know you know this, but we should say so.]
[...]
Finding Relay IPv6 Addresses
We propose that tor relays (and bridges) automatically find their IPv6 address, and use it to publish an IPv6 ORPort.
For some relays to find their IPv6 address, they may need to fetch some directory documents from directory authorities over IPv6. (For anonymity reasons, bridges are unable to fetch directory documents over IPv6, until clients start to do so.)
3.1. Current Relay IPv4 Address Implementation
Currently, all relays (and bridges) must have an IPv4 address. IPv6 addresses are optional for relays.
Tor currently tries to find relay IPv4 addresses in this order: 1. the Address torrc option 2. the address of the hostname (resolved using DNS, if needed) 3. a local interface address (by making a self-connected socket, if needed) 4. an address reported by a directory server (using X-Your-Address-Is)
Any server, or only an authority? Over any connection, or only an authenticated one?
[...]
3.2. Finding Relay IPv6 Addresses
We propose that relays (and bridges) try to find their IPv6 address. For consistency, we also propose to change the address resolution order for IPv4 addresses.
We use the following general principles to choose the order of IP address methods: * Explicit is better than Implicit, * Local Information is better than a Remote Dependency, * Trusted is better than Untrusted, and * Reliable is better than Unreliable. Within these constraints, we try to find the simplest working design.
We should make sure to be clear about the impact of using an untrusted source. Anybody who can fool a relay about its IP can effectively MITM that relay's incoming connections (traffic patterns only), so using a non-trusted source can be risky for anonymity.
[...]
(Each of these address resolution steps is described in more detail, in its own subsection.)
While making these changes, we want to preserve tor's existing behaviour: * resolve Address using the local resolver, if needed, * ignore private addresses on public tor networks, and * when there are multiple valid addresses, choose the first or latest address, as appropriate.
Instead of "first or latest" I suggest "first-listed or most recently received" here, to help non-native speakers.
3.2.1. Make the Address torrc Option Support IPv6
[...]
It is an error to configure an Address option with a private IPv4 or IPv6 address, or with a hostname that does not resolve to any publicly routable IPv4 or IPv6 addresses.
We should say "on a public network" here -- private addresses are fine on private networks.
Also, this seems to mean that if the relay's DNS resolver goes down, the relay should give an error and exit, even if it was already running. That seems undesired.
[...]
3.2.2. Use the Advertised ORPort IPv4 and IPv6 Addresses
Next, we propose that relays (and bridges) use the first advertised ORPort IPv4 and IPv6 addresses, as configured in their torrc.
The ORPort address may be a hostname. If it is, tor should try to use it to resolve an IPv4 and IPv6 address, and open ORPorts on the first available IPv4 and IPv6 address. Tor should respect the IPv4Only and IPv6Only port flags, if specified. (Tor currently resolves IPv4 addresses in ORPort lines. It may not look for an IPv6 address.)
Relays (and bridges) currently use the first advertised ORPort IPv6 address as their IPv6 address. We propose to use the first advertised IPv4 ORPort address in a similar way, for consistency.
Therefore, this change may affect existing relay IPv4 addressses. We expect that a small number of relays may change IPv4 address, from a guessed IPv4 address, to their first advertised IPv4 ORPort address.
In rare cases, relays may have been using non-advertised ORPorts for their addresses. This change may also change their addresses.
We propose ignoring private configured ORPort addresses on public tor networks. (Binding to private ORPort addresses is supported, even on public tor networks, for relays that use NAT to reach the Internet.) If an ORPort address is private, address resolution should go to the next step.
3.2.3. Use the Advertised DirPort IPv4 Address
Next, we propose that relays use the first advertised DirPort IPv4 address, as configured in their torrc.
I think that we could omit this method; it seems unlikely to me that anybody is going to configure an advertised DirPort address but not an advertised ORPort address. In the long run, I think we want DirPorts to disappear entirely as part of our official protocol.
3.2.4. Use Local Interface IPv6 Address
Next, we propose that relays (and bridges) use publicly routable addresses from the OS interface addresses or routing table, as their IPv4 and IPv6 addresses.
Tor has local interface address resolution functions, which support most major OSes. Tor uses these functions to guess its IPv4 address. We propose using them to also guess tor's IPv6 address.
We also propose modifying the address resolution order, so interface addresses are used before the local hostname. This decision is based on our principles: interface addresses are local, trusted, and reliable; hostname lookups may be remote, untrusted, and unreliable.
Some developer documentation also recommends using interface addresses, rather than resolving the host's own hostname. For example, on recent versions of macOS, the man pages tell developers to use interface addresses (getifaddrs) rather than look up the host's own hostname (gethostname and getaddrinfo). Unfortunately, these man pages don't seem to be available online, except for short quotes (see [getaddrinfo man page] for the relevant quote).
If the local interface addresses are unavailable, tor opens a self-connected UDP socket to a publicly routable address, but doesn't actually send any packets. Instead, it uses the socket APIs to discover the interface address for the socket.
I don't understand in which sense this socket is "self-connected" -- maybe "unused" or something? Also I'd suggest that Tor should use an authority's IP address for this purpose. Currently, we use 18.0.0.1, which tends to confuse people who are looking at their firewall's warnings.
Tor already ignores private IPv4 interface addresses on public relays. (Binding to private DirPort addresses is supported, for networks that use NAT.) We propose to also ignore private IPv6 interface addresses. If all IPv4 or IPv6 interface addresses are private, address resolution should go to the next step.
3.2.5. Use Own Hostname IPv6 Addresses
Next, we propose that relays (and bridges) get their local hostname, look up its addresses, and use them as its IPv4 and IPv6 addresses.
We propose to use the same underlying lookup functions to look up the IPv4 and IPv6 addresses for: * the Address torrc option (see section 3.2.1), and * the local hostname. However, OS APIs typically only return a single hostname.
Even though the hostname lookup may use remote DNS, we propose to use it on directory authorities, to maintain compatibility with current configurations. Even if it is remote, we expect the configured DNS to be somewhat trusted by the operator.
Do you mean to say "directory authorities" here? I don't understand that part.
The hostname lookup should ignore private addresses on public relays. If multiple IPv4 or IPv6 addresses are returned, the first public address from each family should be used. If all IPv4 or IPv6 hostname addresses are private, address resolution should go to the next step.
[...]
3.2.6. Use Directory Header IPv6 Addresses
Finally, we propose that relays get their IPv4 and IPv6 addresses from the X-Your-Address-Is HTTP header in tor directory documents. To support this change, we propose that relays start fetching directory documents over IPv4 and IPv6.
Can we specify use of NETINFO cells additionally or instead? Unlike DirPort connections, ORPort connections are authenticated, so we know who is telling us what our address is.
We propose that bridges continue to only fetch directory documents over IPv4, because they try to imitate clients. (Most clients only fetch directory documents over IPv4, a few clients are configured to only fetch over IPv6.) When client behaviour changes to use both IPv4 and IPv6 for directory fetches, bridge behaviour can also change to match. (See section 3.4.1 and [Proposal 306: Client Auto IPv6 Connections].)
We propose that directory authorities should ignore addresses in directory headers. Allowing other authorities (or relays?) to change a directory authority's published IP address may lead to security issues. Instead, if interface and hostname lookups fail, tor should stop address resolution, and return a permanent error. (And issue a log to the operator, see below.)
I suggest that we simplify the whole directory authority logic and say that authorities must have configured Address lines, or nothing.
[...]
3.3. Consequential Tor Client Changes
We do not propose any required client address resolution changes at this time.
However, clients will use the updated address resolution functions to detect when they are on a new connection, and therefore need to rotate their TLS keys.
Do clients have meaningful TLS keys any more, now that they have dropped client support for the v1 link protocol?
(This is just a side question -- clients should still have a working ip_address_changed() function.)
[...]
3.5. Optional Efficiency and Reliability Changes
We propose some optional changes for efficiency and reliability, and describe their impact.
Some of these changes may be more appropriate in future releases, or along with other proposed features.
3.5.1. Only Use Authenticated Directory Header IPv4 and IPv6 Addresses
We propose this optional change, to improve relay address accuracy and reliability.
I am +1 here, with a proviso that we should be able to use NETINFO cells.
[...]
3.5.5. Add IPv6 Support to AuthDirMaxServersPerAddr
We propose this optional change, to improve the health of the network, by rejecting too many relays on the same IPv6 address.
Modify get_possible_sybil_list() so it takes an address family argument, and returns a list of IPv4 or IPv6 sybils.
Use the modified get_possible_sybil_list() to exclude relays from the authority's vote, if there are more than AuthDirMaxServersPerAddr on the same IPv4 or IPv6 address.
Since these relay exclusions happen at voting time, they do not require a new consensus method.
Since it's trivial for one host to have a staggering number of IPv6 addresses, should this specify a /80 or /96 or something as being sybil-like?
[...]
3.5.7. Add IPv6 Support Using gethostbyname2()
I agree that this change should be unnecessary; I'd suggest that we not do it and just require getaddrinfo() for meaningful IPv6 resolution.
Alternatively, we could use libevent's DNS.
3.5.8. Change Relay OutboundBindAddress Defaults
We propose this optional change, to improve the reliability of IP address-based filters in tor.
For example, the tor network treats relay IP addresses differently when: * resisting denial of service, and * selecting canonical, long-term connections. (See [Ticket 33018: Dir auths using an unsustainable 400+ mbit/s] for the initial motivation for this change: resisting significant bandwidth load on directory authorities.)
Now that tor knows its own addresses, we propose that relays (and bridges) set their IPv4 and IPv6 OutboundBindAddress to these discovered addresses, by default. If binding fails, tor should fall back to an unbound socket.
I think this change might be unnecessary, but it shouldn't hurt. I'd suggest not prioritizing it very high.
[...]
In general, this plan above looks solid.
I have a suggestion before we get into the implementation, though: I think we should, for each check, make sure that we write down _when_ it happens, what makes it happen, and where we store the result. That is, some of these checks are things we need to launch (like looking up our own hostname), whereas others will happen passively pretty often (like connecting to a directory authority). Of the ones that we need to launch, some will happen only when other methods have failed, whereas some will happen on startup. Some are things that can time out, whereas others aren't. Writing this all down will make sure that we aren't making our state machine more complex than it needs to be.
IMO, we should record the status of all possible IP lookup methods, with "not yet tried" being a possible status: it will help us keep our implementation and our logging simple -- or at least, as simple as can be.
cheers,