Hello there,
recently we've been busy specifying various important improvements to entry guard security. For instance see proposals 250, 241 and ticket #16861.
Unfortunately, the current guard codebase is dusty and full of problems (see #12466, #12450). We believe that refactoring and cleaning up the entry guard code is essential before we proceed to more advanced security improvements.
We've been working on new algorithms and data structures for guard nodes as part of ticket #12595.
In this mail I include some pseudocode for this new algorithm with the hope that it will act as a draft for implementing these changes. You can find the pseucode here:
https://gitweb.torproject.org/user/asn/tor.git/tree/src/or/guardlist.c?h=bug...
A short description of the algorithm is included on top, and then various methods and functions are prototyped underneath to make the logic more concrete.
Apart from the comments and XXXs on the code, here are some more thoughts on this work:
- This new design focuses on protecting against path bias attacks, by slightly damaging our reachability.
Specifically, the old design is better at recovering in filtered networks, because it will keep on adding new nodes till one succeeds. In this new design, we will not try more than 80 relays per time. So if none of them passes the filtered network, bad luck no Tor.
While this failure mode should not happen much, it's bad news for users behind FascistFirewalls which are actually quite frequent. A quick fix here would be to always add an 80/443 guard on our list, however as it stands only 30% of the guards are 80/443 guards, so this has bad anonymity consequences.
- To improve our algorithm and make it more robust we need to understand further what kind of path bias attacks are relevant here. The adversary here is a network adversary (like a gateway) that can block our connections to certain guards. What nasty attacks can this adversary do?
If we can't find bad attacks here, then maybe we should stop worrying about those path bias attacks so much.
For example a threat here with the old guard logic, is that if we used this evil gateway just for 10 minutes (in an airport), the adversary could launch a path bias attack and force us to connect to her guard node. Then even after we left that airport, we would still stick to the evil guard node which is bad.
Also, an adversary that manages to own our guard using path bias attacks, then has further possibilites for biasing the rest of the circuit. What can this adversary do?
- Notice that the pseudocode contains no logic about bridges. I'm not sure how bridges should be handled here.
- I tried to keep the dirguard logic very simple, hoping that we can eventually forget about dirguards entirely when #12538 is done.
The main dirguard feature is that we assume that populate_live_entry_guards() and add_an_entry_guard() will return dirguards when the circuit is a directory circuit.
Maybe we should consider introducing the "primary dirguard" concept as well. And maybe also add some logic where Tor will move on to the next dirguard if it failed to receive a document from the current dirguard.
- I used the ATTEMPTED_THRESHOLD concept of prop241, but did not use the NET_THRESHOLD and CONNECTED_THRESHOLD ideas.
I removed NET_THRESHOLD because I increased the value of ATTEMPTED_THRESHOLD to the point that it can also be used as a network down indicator.
Also, I was not sure what CONNECTED_THRESHOLD was useful for, and there were certain engineering issues with it (Like, if that threshold is hit, we need a logic that will *only retry the successfully connected guards*, and not all guards).
- There is no log message warning the user of path bias attacks or bad network or anything. That's because there is no way to figure out what's the problem, and issuing an alarming log message here would confuse and panic the user.
If we want to inform the user anyhow, maybe if the user is *actively* trying to visit a destination, and we've been cycling through our guard list for ages, maybe we should then issue a log message telling the user that something is wrong with the network.
- In general, I tried to keep the number of heuristics and kludges to the minimum to keep the logic simple. Unfortunately, it seems that without a "network down" indicator (#16120) there is no way to avoid edge cases and false positives here.
We should try to fix all problems here that can occur frequently or have security consequences, but there will always be scenarios where Tor will end up thinking there is no network while it's actually on a filternet. For this reason, we should give plenty of testing to this feature before we ship it to real users!
- Finally, all the constants & parameters in the pseudocode are subject to change. I tried to motivate some of them, but others are just arbitrary.
Feedback is very welcome and please let me know of any issues with security or reachability that you find! Or of how the pseudocode should be altered to make it more useful for implementors.
Cheers!
Hello,
"To improve our algorithm and make it more robust we need to
understand further what kind of path bias attacks are relevant here...What nasty attacks can this adversary do?"
An gateway adversary which can filter the network can use guards to fingerprint you. This requires connecting to tor directly through the gateway mentioned.
By watching which guards are attempted from a preexisting list of guard. If these guards are then filtered, and the client changes location, then reconnects, the client will be identifiable by the (filtered) guards it attempts to use. It's not a mitigation to only use one guard here.
"If we can't find bad attacks here, then maybe we should stop
worrying about those path bias attacks so much."
Some types of attack are unavoidable. Detecting unobfuscated tor use in the case for including at least one 80/443 guard. The only concern I have is trying no more than 80 relay at a time still leaves a lot of room for fingerprinting (if those relay persist between bootstraps). This also raises the question of false positive and proving a gateway adversary is the source of interference. A simplification might be to have the client fail faster (than 80) and be forced to try bridges.
"...a threat here with the old guard logic, is that if we used this
evil gateway just for 10 minutes (in an airport), the adversary could launch a path bias attack and force us to connect to her guard node."
A mitigation for the airport analogy posed might be to base network location on routing characteristics. I know portable NLA is hard. If the location-characteristic route changes in some way (like change from airport to coffee shop), then consider location as having changed and also consider dropping airport guard.
"Also, an adversary that manages to own our guard using path bias
attacks, then has further possibilities for biasing the rest of the circuit. What can this adversary do?"
With modified tor software, and having already compromised the gateway, an adversary can, over-time, game the entire path selection. The adversary has at least 10 minutes in which a tor client will prefer certain nodes for types of traffic. After compromising the guard, the adversary gateway, then direct their attention to the rest of the path. They need to avoid failing too much or another guard gets chosen. So fail the occasional node selection, and end up rebiasing the chance of selecting a bad relay/exit in the favor of the adversary. How much failure is required? Doesn't that depend on the parameters maintained by a tor client?
"Also, I was not sure what CONNECTED_THRESHOLD was useful for, and
there were certain engineering issues with it (Like, if that threshold is hit, we need a logic that will *only retry the successfully connected guards*, and not all guards).
If you try *all* guards you make the fingerprinting mentioned above easier. Although even if you try successfully connected guards you run into this problem.
"There is no log message warning the user of path bias attacks or
bad network or anything. That's because there is no way to figure out what's the problem, and issuing an alarming log message here would confuse and panic the user."
Maybe this could be simplified to a log entry to indicate guard rotation, or a guard which was previously up is now unlisted or down. Couldn't the old guard be tested by using the first non-failing circuit that follows? If it's connectable using this circuit you can say something is amiss. regards --leeroy
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
Hi,
On 8/20/2015 2:28 PM, George Kadianakis wrote:
Hello there,
recently we've been busy specifying various important improvements to entry guard security. For instance see proposals 250, 241 and ticket #16861.
Unfortunately, the current guard codebase is dusty and full of problems (see #12466, #12450). We believe that refactoring and cleaning up the entry guard code is essential before we proceed to more advanced security improvements.
We've been working on new algorithms and data structures for guard nodes as part of ticket #12595.
In this mail I include some pseudocode for this new algorithm with the hope that it will act as a draft for implementing these changes. You can find the pseucode here:
https://gitweb.torproject.org/user/asn/tor.git/tree/src/or/guardlist.c?h=bug...
A short description of the algorithm is included on top, and then
various methods and functions are prototyped underneath to make the logic more concrete.
Apart from the comments and XXXs on the code, here are some more thoughts on this work:
- This new design focuses on protecting against path bias attacks,
by slightly damaging our reachability.
Specifically, the old design is better at recovering in filtered networks, because it will keep on adding new nodes till one succeeds. In this new design, we will not try more than 80 relays per time. So if none of them passes the filtered network, bad luck no Tor.
This number looks good to me. Could you make it dynamic, so in the future we don't have to change this code? Being optimistic here about Tor's scale in the future. E.g. calculate: GUARDS_ATTEMPTED_THRESHOLD == 'total no of Guards in a consensus' * 0.05 and change update it in our 'State' every time we receive a valid new consensus document which changes it. Should be slight updates here, like maybe 78, maybe 82, etc. If the result of the above calculation is not an even number, approximate with deduction (e.g. if result = 81,6, set the limit to 81).
While this failure mode should not happen much, it's bad news for users behind FascistFirewalls which are actually quite frequent. A quick fix here would be to always add an 80/443 guard on our list, however as it stands only 30% of the guards are 80/443 guards, so this has bad anonymity consequences.
Bad idea for anonymity and also not a very good idea regarding to load balancing (80/443 Guards might get hammered more). We do have a torrc option for this, in case the should enable it so Tor will only look for 80/443 Guards, or use bridges.
- To improve our algorithm and make it more robust we need to
understand further what kind of path bias attacks are relevant here. The adversary here is a network adversary (like a gateway) that can block our connections to certain guards. What nasty attacks can this adversary do?
If we can't find bad attacks here, then maybe we should stop worrying about those path bias attacks so much.
For example a threat here with the old guard logic, is that if we used this evil gateway just for 10 minutes (in an airport), the adversary could launch a path bias attack and force us to connect to her guard node. Then even after we left that airport, we would still stick to the evil guard node which is bad.
That is why we have some primary guards which we retry for some time, and not remove them from the list if we cannot connect to them one or two times. Our network could be down or the Guard's network could be down, etc.
Also, an adversary that manages to own our guard using path bias attacks, then has further possibilites for biasing the rest of the circuit. What can this adversary do?
Would it make sense for Tor to change Guard if it fails more than n circuits at a given time? If the attacker owns our guard and wants to path bias attack the rest of the circuit, since the client is the one who selects the path, it will cause a lot of circuit failures on client side - we should use this as a metric to detect this possibility and defend against it.
- Notice that the pseudocode contains no logic about bridges. I'm
not sure how bridges should be handled here.
Prop#188 is very important for bridges, not sure what algorithm we could use here, since bridges are designed to be little bit hard to get in unlimited quantities and manually fetched and added to Tor.
- I tried to keep the dirguard logic very simple, hoping that we
can eventually forget about dirguards entirely when #12538 is done.
Indeed, this is not so important particularly because a DirGuard is way less dangerous than an Entry Guard. Just select 3 main DirGuards and add more to the list until we get a valid consensus document (which we verify ourselves anyway). After that, retry the 3 main DirGuards for some more and eventually replace them with the DirGuards we were able to connect to. I suggest retrying a DirGuard 5 times, once very 20 minutes, until we replace it from the primary DirGuard table.
We can remove this code when #12538 is done.
The main dirguard feature is that we assume that populate_live_entry_guards() and add_an_entry_guard() will return dirguards when the circuit is a directory circuit.
Maybe we should consider introducing the "primary dirguard" concept as well. And maybe also add some logic where Tor will move on to the next dirguard if it failed to receive a document from the current dirguard.
- I used the ATTEMPTED_THRESHOLD concept of prop241, but did not
use the NET_THRESHOLD and CONNECTED_THRESHOLD ideas.
I removed NET_THRESHOLD because I increased the value of ATTEMPTED_THRESHOLD to the point that it can also be used as a network down indicator.
lgtm.
Also, I was not sure what CONNECTED_THRESHOLD was useful for, and there were certain engineering issues with it (Like, if that threshold is hit, we need a logic that will *only retry the successfully connected guards*, and not all guards).
- There is no log message warning the user of path bias attacks or
bad network or anything. That's because there is no way to figure out what's the problem, and issuing an alarming log message here would confuse and panic the user.
Good.
If we want to inform the user anyhow, maybe if the user is *actively* trying to visit a destination, and we've been cycling through our guard list for ages, maybe we should then issue a log message telling the user that something is wrong with the network.
If we are under an attack which tries to force us into using a certain Guard, we need to exit after we try everything above and log a message that there's something wrong with the network, Tor cannot establish circuits.
- In general, I tried to keep the number of heuristics and kludges
to the minimum to keep the logic simple. Unfortunately, it seems that without a "network down" indicator (#16120) there is no way to avoid edge cases and false positives here.
It's hard to tell the difference between network down (for real) and gateway has a consensus document and drops packets sent to all (or almost all) Guards. Nothing to do but follow up our protocol, eliminate all the options and exit with a log message. If restarted, start again but consider the same selected primary guards and other state data and follow the algorithm again, maybe the network is fixed. Exit again if not.
We should try to fix all problems here that can occur frequently or have security consequences, but there will always be scenarios where Tor will end up thinking there is no network while it's actually on a filternet. For this reason, we should give plenty of testing to this feature before we ship it to real users!
As I said above, trying to detect a network down will make it a lot complicated for us and with little benefits since this can be trivially gamed. We are not an operating system, we don't care if network is down for real (for all destinations) - if network is down for Tor (cannot establish connections to any Guard or most of Guards [path bias attack]) for us it means network is down for Tor == network is down for real period.
What is the difference from Tor's perspective if there is no link on the internet interface or there is a link which only forwards packets to xxx.xxx.xxx and yyy.yyy.yyy.yyy ?
Consensus document (and relays in the network) is public info. This is just the limitation here, but not the end of the world.
- Finally, all the constants & parameters in the pseudocode are
subject to change. I tried to motivate some of them, but others are just arbitrary.
Feedback is very welcome and please let me know of any issues with security or reachability that you find! Or of how the pseudocode should be altered to make it more useful for implementors.
Cheers!
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
Also, we should choose a reasonable amount of retry attempts at reasonable time periods for the Guards in primary_guard_set, for the following reasons:
a) The network is not hostile and allows access just fine, but: - - the user walked out the signal coverage area of a wi-fi hotspot and left Tor running; or - - the network is just down due to ISP related problems outside the control of the user; or - - the monthly traffic limit was hit and connection was frozen;
We shouldn't change Guards here and also shouldn't account failed circuits as path bias attack.
b) user changed network / location and is subject to a different gateway with other rules.
It's easier to cover these with a reasonable number of retries at reasonable time intervals as opposite to trying to find a way to get the network status from the OS, etc.
We should retry each Guard for at least 10 times, once every 20 minutes before giving up and changing our table. If we know that sometime in the past (during a GUARD_ROTATION period) we were able to connect to a Guard, double or triple the retries amount (???) -- These numbers need adjustments.
On 8/20/2015 3:27 PM, s7r wrote:
Hi,
On 8/20/2015 2:28 PM, George Kadianakis wrote:
Hello there,
recently we've been busy specifying various important improvements to entry guard security. For instance see proposals 250, 241 and ticket #16861.
Unfortunately, the current guard codebase is dusty and full of problems (see #12466, #12450). We believe that refactoring and cleaning up the entry guard code is essential before we proceed to more advanced security improvements.
We've been working on new algorithms and data structures for guard nodes as part of ticket #12595.
In this mail I include some pseudocode for this new algorithm with the hope that it will act as a draft for implementing these changes. You can find the pseucode here:
https://gitweb.torproject.org/user/asn/tor.git/tree/src/or/guardlist.c?h=bug...
A short description of the algorithm is included on top, and then
various methods and functions are prototyped underneath to make the logic more concrete.
Apart from the comments and XXXs on the code, here are some more thoughts on this work:
- This new design focuses on protecting against path bias
attacks, by slightly damaging our reachability.
Specifically, the old design is better at recovering in filtered networks, because it will keep on adding new nodes till one succeeds. In this new design, we will not try more than 80 relays per time. So if none of them passes the filtered network, bad luck no Tor.
This number looks good to me. Could you make it dynamic, so in the future we don't have to change this code? Being optimistic here about Tor's scale in the future. E.g. calculate: GUARDS_ATTEMPTED_THRESHOLD == 'total no of Guards in a consensus' * 0.05 and change update it in our 'State' every time we receive a valid new consensus document which changes it. Should be slight updates here, like maybe 78, maybe 82, etc. If the result of the above calculation is not an even number, approximate with deduction (e.g. if result = 81,6, set the limit to 81).
While this failure mode should not happen much, it's bad news for users behind FascistFirewalls which are actually quite frequent. A quick fix here would be to always add an 80/443 guard on our list, however as it stands only 30% of the guards are 80/443 guards, so this has bad anonymity consequences.
Bad idea for anonymity and also not a very good idea regarding to load balancing (80/443 Guards might get hammered more). We do have a torrc option for this, in case the should enable it so Tor will only look for 80/443 Guards, or use bridges.
- To improve our algorithm and make it more robust we need to
understand further what kind of path bias attacks are relevant here. The adversary here is a network adversary (like a gateway) that can block our connections to certain guards. What nasty attacks can this adversary do?
If we can't find bad attacks here, then maybe we should stop worrying about those path bias attacks so much.
For example a threat here with the old guard logic, is that if we used this evil gateway just for 10 minutes (in an airport), the adversary could launch a path bias attack and force us to connect to her guard node. Then even after we left that airport, we would still stick to the evil guard node which is bad.
That is why we have some primary guards which we retry for some time, and not remove them from the list if we cannot connect to them one or two times. Our network could be down or the Guard's network could be down, etc.
Also, an adversary that manages to own our guard using path bias attacks, then has further possibilites for biasing the rest of the circuit. What can this adversary do?
Would it make sense for Tor to change Guard if it fails more than n circuits at a given time? If the attacker owns our guard and wants to path bias attack the rest of the circuit, since the client is the one who selects the path, it will cause a lot of circuit failures on client side - we should use this as a metric to detect this possibility and defend against it.
- Notice that the pseudocode contains no logic about bridges. I'm
not sure how bridges should be handled here.
Prop#188 is very important for bridges, not sure what algorithm we could use here, since bridges are designed to be little bit hard to get in unlimited quantities and manually fetched and added to Tor.
- I tried to keep the dirguard logic very simple, hoping that we
can eventually forget about dirguards entirely when #12538 is done.
Indeed, this is not so important particularly because a DirGuard is way less dangerous than an Entry Guard. Just select 3 main DirGuards and add more to the list until we get a valid consensus document (which we verify ourselves anyway). After that, retry the 3 main DirGuards for some more and eventually replace them with the DirGuards we were able to connect to. I suggest retrying a DirGuard 5 times, once very 20 minutes, until we replace it from the primary DirGuard table.
We can remove this code when #12538 is done.
The main dirguard feature is that we assume that populate_live_entry_guards() and add_an_entry_guard() will return dirguards when the circuit is a directory circuit.
Maybe we should consider introducing the "primary dirguard" concept as well. And maybe also add some logic where Tor will move on to the next dirguard if it failed to receive a document from the current dirguard.
- I used the ATTEMPTED_THRESHOLD concept of prop241, but did not
use the NET_THRESHOLD and CONNECTED_THRESHOLD ideas.
I removed NET_THRESHOLD because I increased the value of ATTEMPTED_THRESHOLD to the point that it can also be used as a network down indicator.
lgtm.
Also, I was not sure what CONNECTED_THRESHOLD was useful for, and there were certain engineering issues with it (Like, if that threshold is hit, we need a logic that will *only retry the successfully connected guards*, and not all guards).
- There is no log message warning the user of path bias attacks
or bad network or anything. That's because there is no way to figure out what's the problem, and issuing an alarming log message here would confuse and panic the user.
Good.
If we want to inform the user anyhow, maybe if the user is *actively* trying to visit a destination, and we've been cycling through our guard list for ages, maybe we should then issue a log message telling the user that something is wrong with the network.
If we are under an attack which tries to force us into using a certain Guard, we need to exit after we try everything above and log a message that there's something wrong with the network, Tor cannot establish circuits.
- In general, I tried to keep the number of heuristics and
kludges to the minimum to keep the logic simple. Unfortunately, it seems that without a "network down" indicator (#16120) there is no way to avoid edge cases and false positives here.
It's hard to tell the difference between network down (for real) and gateway has a consensus document and drops packets sent to all (or almost all) Guards. Nothing to do but follow up our protocol, eliminate all the options and exit with a log message. If restarted, start again but consider the same selected primary guards and other state data and follow the algorithm again, maybe the network is fixed. Exit again if not.
We should try to fix all problems here that can occur frequently or have security consequences, but there will always be scenarios where Tor will end up thinking there is no network while it's actually on a filternet. For this reason, we should give plenty of testing to this feature before we ship it to real users!
As I said above, trying to detect a network down will make it a lot complicated for us and with little benefits since this can be trivially gamed. We are not an operating system, we don't care if network is down for real (for all destinations) - if network is down for Tor (cannot establish connections to any Guard or most of Guards [path bias attack]) for us it means network is down for Tor == network is down for real period.
What is the difference from Tor's perspective if there is no link on the internet interface or there is a link which only forwards packets to xxx.xxx.xxx and yyy.yyy.yyy.yyy ?
Consensus document (and relays in the network) is public info. This is just the limitation here, but not the end of the world.
- Finally, all the constants & parameters in the pseudocode are
subject to change. I tried to motivate some of them, but others are just arbitrary.
Feedback is very welcome and please let me know of any issues with security or reachability that you find! Or of how the pseudocode should be altered to make it more useful for implementors.
Cheers!
"a) The network is not hostile and allows access just fine, but..."
This came up before didn't it. Nick mentioned that the question `network down` isn't the easiest question to answer portably. Supposing such a network could have it's properties (like route) enumerated this might provide another solution to (a). If the network changes in some measurable way without also being immediately preceded by a bootstrap (route shows next hop unreachable), then consider the network down and schedule attempts to reestablish communication.
A key problem will be distinguishing this type of network from a hostile network where the access is just cut off. Most likely to force a bootstrap, or guard-rotation like activity. A warning here to indicate the network was bootstrapped, but for some past interval(s) the network appears down (and should the client check firewall, gateway, or try a bridge). A client should probably be `aware` of some level of network access, otherwise all solutions are naive.
"b) ..."
Retrying guards is the crux of the problem. If you blindly retry guards, even to prevent rotation, you eventually come to a hard place where this will backfire badly. Even if it works sometimes. Although I don't think the client should rely on the OS (which may be compromised).
--leeroy
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
Hi,
Thanks for the input!
On 8/20/2015 4:59 PM, l.m wrote:
"b) ..."
Retrying guards is the crux of the problem. If you blindly retry guards, even to prevent rotation, you eventually come to a hard place where this will backfire badly. Even if it works sometimes. Although I don't think the client should rely on the OS (which may be compromised).
--leeroy
I agree, that is why I said reasonable amount of retries at reasonable time intervals. Not blindly retrying but also not rotating guards every time an user walks out of the signal coverage area of a wi-fi hotspot.
Can you suggest a retry amount and time interval? I think 10 times once every 20 minutes for the Guards we selected but never connected to and double or even triple that for the Guards we remember we were once able to connect to is reasonable. After we successfully connect to a Guard (again or for the first time) we reset the timestamp and retry attempts counter.
Thanks for the input!
Hey, no problem. Thank you for working on this too.
Can you suggest a retry amount and time interval?
If the adversary is at the gateway and can do filtering, they pretty much want some rotation. Whatever that reason may be (choose a guard you've already chosen, or choose some other which may be adversarial). In the case you describe, I would minimize retry and maximize interval size. Sorry for being unspecific. The reason being if your client has selected and not connected, the adversary may become aware of this selection. They can then fingerprint your use when you change locations using the guards. It acts as a foothold to launch further attacks. It means even if the adversary doesn't initially succeed against you, then can always resume their efforts later.
A simplification might be to have the client explicitly state location changes. More than just detecting ip changes like tor. When you start TBB you make a choice, or in the torrc. Easier than network awareness for tor. You're at a trusted location or unknown. If the location is trusted then less skepticism is needed when forced to choose a new guard, or if you should retry connecting, and how often.
If the location is otherwise you expect some degree of third-party interference is likely. You expect that rotation is unlikely to be benign (you may already have a compromised guard). The guard which was just chosen should be treated with skepticism, network interruption and outage is likely suspicious, and any friendly guards can be used to identify you if you change location where this gateway is still used. Here you find the airport example. Are long-lived guards and the default path selection implementation as secure here. Some analysis is in order. Maybe short-lived (and not persistent) guards, and tuned path selection, is as good as long-lived guards at the trusted location? The whole question of whether the entry guards concept can work effectively in an untrusted location is being questioned here.
It might be better to just default-drop guards between untrusted location, while persisting guards at explicitly trusted locations.
Some symptoms of this adversary are: unable to bootstrap from dirauths, guards which were working have now become unlisted/down, client behaviors symptomatic of censorship which persist between locations unless guards dropped, traffic flows begin to favor a particular guard over time after multiple rotations, multiple guards become unreachable at the same time when another guard is chosen. --leeroy
On 21 Aug 2015, at 00:07, s7r s7r@sky-ip.org wrote:
Can you suggest a retry amount and time interval? I think 10 times once every 20 minutes for the Guards we selected but never connected to and double or even triple that for the Guards we remember we were once able to connect to is reasonable.
These values need to be low enough that buggy clients / buggy networks don't DoS guards. Consider scenarios where the client never receives any replies, or never succeeds in the handshake, or never remembers any replies, but is still sending connection requests.
I think 10 connections in 20 minutes is in the right range here.
Also, if we're redesigning the guard code, do we want to take the opportunity to implement exponential random backoff for guard connections? (Exponential random backoff is a common strategy for avoiding network overload, based on the intuition that each retry is less likely to succeed, so we should retry after increasing intervals, and randomise the retry time, so every client doesn't retry at once.)
We've talked about using exponential backoff for client bootstrap connections to the directory authorities, or perhaps for failed tor connections in general. We've had issues in the past with buggy or obsolete clients retrying connections at a rapid pace, placing significant load on the authorities.
I'm not sure if exponential random backoff would be useful for failed guard connections, but I wanted to raise the idea during the redesign.
What this would look like in practice (a straw-man example):
If we want to connect a maximum of 10 times in 20 minutes, using exponential random backoff, we'd retry after approximately: 1, 2, 4, 8, 16, 32, 64, 128, 256, and 512 seconds.
Then we pick a random time in each interval 0-1, 1-2, 2-4, 4-8, … to actually do the reconnections.
This is a total of 10 connections over 511-1023 seconds, or 8.5 - 17 minutes, with a average of 12.75 minutes. We could tweak the average to 20 minutes by using intervals of: 2, 3, 6, 12, 24, 48, 96, 192, 384, and 768 seconds (average 19.13 minutes)
Tim
Tim Wilson-Brown (teor)
teor2345 at gmail dot com pgp 0xABFED1AC https://gist.github.com/teor2345/d033b8ce0a99adbc89c5
teor at blah dot im OTR D5BE4EC2 255D7585 F3874930 DB130265 7C9EBBC7
s7r s7r@sky-ip.org writes:
Hi,
Hello, thanks for the feedback!
I pushed some small updates to my branch based on your comments. You can check them out here: https://gitweb.torproject.org/user/asn/tor.git/tree/src/or/guardlist.c?h=bug...
On 8/20/2015 2:28 PM, George Kadianakis wrote:
Hello there,
recently we've been busy specifying various important improvements to entry guard security. For instance see proposals 250, 241 and ticket #16861.
Unfortunately, the current guard codebase is dusty and full of problems (see #12466, #12450). We believe that refactoring and cleaning up the entry guard code is essential before we proceed to more advanced security improvements.
We've been working on new algorithms and data structures for guard nodes as part of ticket #12595.
In this mail I include some pseudocode for this new algorithm with the hope that it will act as a draft for implementing these changes. You can find the pseucode here:
https://gitweb.torproject.org/user/asn/tor.git/tree/src/or/guardlist.c?h=bug...
A short description of the algorithm is included on top, and then
various methods and functions are prototyped underneath to make the logic more concrete.
Apart from the comments and XXXs on the code, here are some more thoughts on this work:
- This new design focuses on protecting against path bias attacks,
by slightly damaging our reachability.
Specifically, the old design is better at recovering in filtered networks, because it will keep on adding new nodes till one succeeds. In this new design, we will not try more than 80 relays per time. So if none of them passes the filtered network, bad luck no Tor.
This number looks good to me. Could you make it dynamic, so in the future we don't have to change this code? Being optimistic here about Tor's scale in the future. E.g. calculate: GUARDS_ATTEMPTED_THRESHOLD == 'total no of Guards in a consensus' * 0.05 and change update it in our 'State' every time we receive a valid new consensus document which changes it. Should be slight updates here, like maybe 78, maybe 82, etc. If the result of the above calculation is not an even number, approximate with deduction (e.g. if result = 81,6, set the limit to 81).
I added a comment to make both GUARDS_ATTEMPTED_THRESHOLD and PRIMARY_GUARDS consensus parameters, so that we can change them if we ever understand this problem better.
While this failure mode should not happen much, it's bad news for users behind FascistFirewalls which are actually quite frequent. A quick fix here would be to always add an 80/443 guard on our list, however as it stands only 30% of the guards are 80/443 guards, so this has bad anonymity consequences.
Bad idea for anonymity and also not a very good idea regarding to load balancing (80/443 Guards might get hammered more). We do have a torrc option for this, in case the should enable it so Tor will only look for 80/443 Guards, or use bridges.
I agree. However, most people don't know about the FascistFirewall torrc option.
- To improve our algorithm and make it more robust we need to
understand further what kind of path bias attacks are relevant here. The adversary here is a network adversary (like a gateway) that can block our connections to certain guards. What nasty attacks can this adversary do?
If we can't find bad attacks here, then maybe we should stop worrying about those path bias attacks so much.
For example a threat here with the old guard logic, is that if we used this evil gateway just for 10 minutes (in an airport), the adversary could launch a path bias attack and force us to connect to her guard node. Then even after we left that airport, we would still stick to the evil guard node which is bad.
That is why we have some primary guards which we retry for some time, and not remove them from the list if we cannot connect to them one or two times. Our network could be down or the Guard's network could be down, etc.
Based on your comments, I changed the reset timer for primary guards retrying down to 5 minutes. I could also get behind the exponential backoff idea.
Also, an adversary that manages to own our guard using path bias attacks, then has further possibilites for biasing the rest of the circuit. What can this adversary do?
Would it make sense for Tor to change Guard if it fails more than n circuits at a given time? If the attacker owns our guard and wants to path bias attack the rest of the circuit, since the client is the one who selects the path, it will cause a lot of circuit failures on client side - we should use this as a metric to detect this possibility and defend against it.
Yes maybe. Not sure if the path bias code currently does this.
I will consider this as an orthogonal problem for now.
Hi,
As some of you may be aware, the mailing list for censorship events was recently put on hold indefinitely. This appears to be due to the detector providing too much false positive in it's current implementation. It also raises the question of the purpose for such a mailing list. Who are the stakeholders? What do they gain from an improvement?
I've read some of the documentations about this. As far as I can tell at a minimum an `improvement` in the event detector would be to:
- reduce false positives - distinguish between tor network reachability, and tor network interference - enable/promote client participation through the submission of results from an ephemeral test (itself having property provably correct and valid)
In order to be of use to the researchers it needs greater analysis capability. Is it enough to say censorship is detected? By this point the analysis is less interesting--because the discourse which itself lead to the tor use is probably evident (or it becomes harder to find). On the other hand, if a researcher is aware of some emerging trend they may predict the censorship event by predicting the use of tor. This may also be of use in analysis of other events.
- should detect more than just censorship - accept input from researchers
From the tech reports it looks like Philipp has a plan for an
implementation of the tests noted above. It's only the format of the results submission which is unknown.
- provide client test results to tor project developers - make decision related data available Regards --leeroy
On Thu, Aug 20, 2015 at 09:09:23AM -0400, l.m wrote:
Hi,
As some of you may be aware, the mailing list for censorship events was recently put on hold indefinitely. This appears to be due to the detector providing too much false positive in it's current implementation. It also raises the question of the purpose for such a mailing list. Who are the stakeholders? What do they gain from an improvement?
I've read some of the documentations about this. As far as I can tell at a minimum an `improvement` in the event detector would be to:
- reduce false positives
- distinguish between tor network reachability, and tor network
interference
- enable/promote client participation through the submission of
results from an ephemeral test (itself having property provably correct and valid)
In order to be of use to the researchers it needs greater analysis capability. Is it enough to say censorship is detected? By this point the analysis is less interesting--because the discourse which itself lead to the tor use is probably evident (or it becomes harder to find). On the other hand, if a researcher is aware of some emerging trend they may predict the censorship event by predicting the use of tor. This may also be of use in analysis of other events.
- should detect more than just censorship
- accept input from researchers
From the tech reports it looks like Philipp has a plan for an implementation of the tests noted above. It's only the format of the results submission which is unknown.
- provide client test results to tor project developers
- make decision related data available
Regards --leeroy
Hi,
These are well identified issues. We've been working here on a way to improve the current filtering detection approach, and several of the points above are things that we're actively hoping to work into our approach. Differentiating 'filtering' from 'other events that affect Tor usage' is tricky, and will most likely have to rely on other measurements from outside Tor. We're currently looking at ways to construct models of 'normal' behaviour in a way that incorporates multiple sources of data.
We have a paper up on arXiv that might be of interest. I'd be interested to be in touch with anyone who's actively working on this. (We have code, and would be very happy to work on getting it into production.) I've shared the paper with a few people directly, but not here on the list.
arXiv link: http://arxiv.org/abs/1507.05819
We were looking at any anomalies, not only pure Tor-based filtering events. For the broader analysis, significant shifts in Tor usage are very interesting. It's therefore useful to detect a range of unusual behaviours occurring around Tor, and have a set of criteria within that to allow differentiating 'hard' filtering events from softer anomalies occurring due to other factors.
Joss
Hi Joss,
Thank you for the fine paper. I look forward to reading it. Karsten would be keen on it too (and maybe also your offer) if you haven't already forwarded it to them. My interest in fixing it is (mostly) recreational. I have some thoughts on how to proceed, but I'm not a representative of tor project.
Regards --leeroy
Hi,
These are well identified issues. We've been working here on a way to improve the current filtering detection approach, and several of the points above are things that we're actively hoping to work into our approach. Differentiating 'filtering' from 'other events that affect Tor usage' is tricky, and will most likely have to rely on other measurements from outside Tor. We're currently looking at ways to construct models of 'normal' behaviour in a way that incorporates multiple sources of data.
We have a paper up on arXiv that might be of interest. I'd be interested to be in touch with anyone who's actively working on this. (We have code, and would be very happy to work on getting it into production.) I've shared the paper with a few people directly, but not here on the list.
arXiv link: http://arxiv.org/abs/1507.05819
We were looking at any anomalies, not only pure Tor-based filtering events. For the broader analysis, significant shifts in Tor usage are very interesting. It's therefore useful to detect a range of unusual behaviours occurring around Tor, and have a set of criteria within that to allow differentiating 'hard' filtering events from softer anomalies occurring due to other factors.
Joss
Hi all,
For all my sins I wrote parts of the algorithm that is at fault here.
I also echo, and confirm all the problems mentioned. One thing that would greatly help tune such systems is a database of known censored periods from different jurisdictions. The issue is that "anomalies" occur all the time -- and tor is presumably only interested in "intersting anomalies" that related to attacks.
Now I know more about this field, and happy to work with others to improve the state of the detector if there is interest.
George
On Thu, Aug 20, 2015 at 7:36 PM, l.m ter.one.leeboi@hush.com wrote:
Hi Joss,
Thank you for the fine paper. I look forward to reading it. Karsten would be keen on it too (and maybe also your offer) if you haven't already forwarded it to them. My interest in fixing it is (mostly) recreational. I have some thoughts on how to proceed, but I'm not a representative of tor project.
Regards --leeroy
Hi,
These are well identified issues. We've been working here on a way to improve the current filtering detection approach, and several of the points above are things that we're actively hoping to work into our approach. Differentiating 'filtering' from 'other events that affect Tor usage' is tricky, and will most likely have to rely on other measurements from outside Tor. We're currently looking at ways to construct models of 'normal' behaviour in a way that incorporates multiple sources of data.
We have a paper up on arXiv that might be of interest. I'd be interested to be in touch with anyone who's actively working on this. (We have code, and would be very happy to work on getting it into production.) I've shared the paper with a few people directly, but not here on the list.
arXiv link: http://arxiv.org/abs/1507.05819
We were looking at any anomalies, not only pure Tor-based filtering events. For the broader analysis, significant shifts in Tor usage are very interesting. It's therefore useful to detect a range of unusual behaviours occurring around Tor, and have a set of criteria within that to allow differentiating 'hard' filtering events from softer anomalies occurring due to other factors.
Joss
Dr. Joss Wright | Research Fellow Oxford Internet Institute, University of Oxford http://www.oii.ox.ac.uk/people/?id=176
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Hi George,
You sell yourself short. It was a good first attempt. Now I should clarify. The last time I spoke to Karsten about this they indicated that the measurement team has other priorities (not obvious from the outdated roadmap). Karsten quoted an approximation of a year+ before a replacement is expected.
I'm just an anon to them so I cannot change these things. I hope that clarifies your question of interest.
On the other hand, my interest in the censorship detector started as an improvement to metrics-lib and onionoo. In it's basic form the fork takes the data, recognizes patterns using applied linguistics, and performs some actions. Getting the data for analysis of censorship is in some ways a simplification. However progress will be slower than you might like because the effort here will be split between this and the fork of metrics-lib.
I really do appreciate your interest (and that of Joss) so I'd like to keep this discussion going.
In the paper by Joss Wright et al, events besides just censorship were found to be of use as an indicator of an environment where censoring services leads to an increase in tor use. This sounds like the database you mention. If such a database included events like China's attack on GitHub, or Turkey blocking twitter, or various other social-political indicators, this would make for a concrete improvement from the perspective of public-research stakeholders. I was also inspired by a recent paper that showed how linguistics can be applied to sample the social-political discourse to predict events. In the absence of data for a country, and service, if social indicators show dissatisfaction with a policy to block the service, you can consider this an entry to the database. Over time this sampling would lead to differing discourses which could be used to not just predict anomalies but to help identify why people use tor, and what motivates the censor. The only downside here is I'm not fluent in multiple spoken languages, so there may be some loss of context if the data source is chosen arbitrarily.
When it comes to distinguishing reachability and interference, a client may try to use tor at a laundry center in an otherwise `democratic` and `free` country. This location is independently controlled by the owner, and if they decide to block tor, that's ok. That shouldn't be included. This type of event is unlikely to influence results terribly anyway. I do wish OONI Project could help more here.
That just leaves the tor project developer stakeholder. I think I will leave this stakeholder to it's own devices. It's questionable to ask someone who's being censored to run any test without some assurance of their safety.
That's all from me for now. Danke --leeroy
George Kadianakis transcribed 5.2K bytes:
This new design focuses on protecting against path bias attacks, by slightly damaging our reachability.
Specifically, the old design is better at recovering in filtered networks, because it will keep on adding new nodes till one succeeds. In this new design, we will not try more than 80 relays per time. So if none of them passes the filtered network, bad luck no Tor.
While this failure mode should not happen much, it's bad news for users behind FascistFirewalls which are actually quite frequent. A quick fix here would be to always add an 80/443 guard on our list, however as it stands only 30% of the guards are 80/443 guards, so this has bad anonymity consequences.
What if we were to try to use meek as a sort of "are we actually on/offline" check? Or, sometime in the future (say six months or so), when BridgeDB is using meek's domain fronting ideas, [0] we could use the BridgeDB domain front to check if we're actually online (and also, potentially, request a bridge for good measure, in case the network is just filtered).
- Notice that the pseudocode contains no logic about bridges. I'm not sure how bridges should be handled here.
My 2¢: I think the bridge code should be kept separate to the entry guard code.
While it's understandable that we've lumped them together in the past because, functionally, clients would use one or the other, bridges are quite different. Even more so now that bridges will soon be using entry guards. [1] (Also, it makes reviewing the bridge code kind of a pain when it's haphazardly crammed into like ten other modules.)
What if we did something like:
int get_entry_guard(circuit_t) { if (get_options()->UseBridges) { go_do_the_bridgey_thing_in_the_bridge_module() return 0; } if (guard_list.n_guards_attempted_lately > GUARDS_ATTEMPTED_THRESHOLD) { […]
Or check from wherever get_entry_guards() is being called… but perhaps the latter would be more error prone if a programmer forgets to check that they should be using bridges instead (also, code duplication).
[0]: https://bugs.torproject.org/16650 [1]: https://gitweb.torproject.org/user/isis/tor.git/log/?h=bug7144
isis isis@torproject.org writes:
George Kadianakis transcribed 5.2K bytes:
This new design focuses on protecting against path bias attacks, by slightly damaging our reachability.
Specifically, the old design is better at recovering in filtered networks, because it will keep on adding new nodes till one succeeds. In this new design, we will not try more than 80 relays per time. So if none of them passes the filtered network, bad luck no Tor.
While this failure mode should not happen much, it's bad news for users behind FascistFirewalls which are actually quite frequent. A quick fix here would be to always add an 80/443 guard on our list, however as it stands only 30% of the guards are 80/443 guards, so this has bad anonymity consequences.
What if we were to try to use meek as a sort of "are we actually on/offline" check? Or, sometime in the future (say six months or so), when BridgeDB is using meek's domain fronting ideas, [0] we could use the BridgeDB domain front to check if we're actually online (and also, potentially, request a bridge for good measure, in case the network is just filtered).
Isn't that a bit like using cloudflare as our "online/offline" oracle? For this reason, I have mixed feelings about this idea.
A related (but terrible) idea would be to have a StatusAuthority, which clients can connect to when they want to learn if they are online or offline. Still terribly centralized though, and it also has security implications.
(FWIW, I generally really like the meek/BridgeDB integration idea.)
- Notice that the pseudocode contains no logic about bridges. I'm not sure how bridges should be handled here.
My 2¢: I think the bridge code should be kept separate to the entry guard code.
While it's understandable that we've lumped them together in the past because, functionally, clients would use one or the other, bridges are quite different. Even more so now that bridges will soon be using entry guards. [1] (Also, it makes reviewing the bridge code kind of a pain when it's haphazardly crammed into like ten other modules.)
What if we did something like:
int get_entry_guard(circuit_t) { if (get_options()->UseBridges) { go_do_the_bridgey_thing_in_the_bridge_module() return 0; } if (guard_list.n_guards_attempted_lately > GUARDS_ATTEMPTED_THRESHOLD) { […]
Or check from wherever get_entry_guards() is being called… but perhaps the latter would be more error prone if a programmer forgets to check that they should be using bridges instead (also, code duplication).
Agreed.
I wonder what go_do_the_bridgey_thing_in_the_bridge_module() should be doing.
Hi,
I'm curious what analysis has been done against a gateway adversary. In particular dealing with the effectiveness of entry guards against such an adversary. There's a part of me that thinks it doesn't work at all for this case. Only because I've been studying such an adversary at the AS-level and what I see over time is disturbing. Any pointer to related material?
thanks --leeroy
Hi Leeroy,
On Fri, Aug 21, 2015 at 08:09:13AM -0400, l.m wrote:
Hi,
I'm curious what analysis has been done against a gateway adversary. In particular dealing with the effectiveness of entry guards against such an adversary. There's a part of me that thinks it doesn't work at all for this case. Only because I've been studying such an adversary at the AS-level and what I see over time is disturbing. Any pointer to related material?
You may find the following useful. http://www.nrl.navy.mil/itd/chacs/biblio/users-get-routed-traffic-correlatio...
Analysis there is a now few years old, but this is the first attempt to try to fully consider the sort of question I think you are asking. This was one of the prompts for the move from three guards to one, as described in https://www.petsymposium.org/2014/papers/Dingledine.pdf
There is subsequent related published work on measurement and analysis of AS and similar adversaries, e.g., http://www.degruyter.com/view/j/popets.2015.2015.issue-2/popets-2015-0021/po...
Also subsequent work on managing assignment of guards in a practical and secure manner (although this paper pretty much assumes only relay adversaries). http://www.degruyter.com/view/j/popets.2015.2015.issue-2/popets-2015-0017/po...
This also remains an active area, both for analysis and for AS-aware route selection. (I haven't put in any pointers to papers on the latter.)
HTH, Paul
George Kadianakis:
Hello there,
recently we've been busy specifying various important improvements to entry guard security. For instance see proposals 250, 241 and ticket #16861.
Unfortunately, the current guard codebase is dusty and full of problems (see #12466, #12450). We believe that refactoring and cleaning up the entry guard code is essential before we proceed to more advanced security improvements.
We've been working on new algorithms and data structures for guard nodes as part of ticket #12595.
In this mail I include some pseudocode for this new algorithm with the hope that it will act as a draft for implementing these changes. You can find the pseucode here:
https://gitweb.torproject.org/user/asn/tor.git/tree/src/or/guardlist.c?h=bug...
Have you considered how these data structures might change for Prop 247?
I like Prop 247, and think it is a significant step in the right direction for the HS-specific problem (though I'd like to think a bit more about how it would compare to a more general virtual circuit mechanism that could also protect clients even for instances of application-specific Guard discovery).
More generally, I'm worried in that this is yet another case where upstream deliverables that were written over a year ago are going to cause us to arrive at a sub-optimal solution that we're forced to deploy only to rip out later. Hurray deliverables!
In terms of specific suggestions: I find that code conceptually makes more sense when it at least pretends to be object oriented. You might want to consider a single guardset object that holds all of the Guard configuration state, including the main guard list, vanguards, path bias info, etc. This object would then be passed in as the first argument to all guard-related functions, especially if we're talking about overhauling it anyway.
All guard-related functions would then be prefixed with guardset_ as well, to make it clear that you could find them in guardset.c. (Also note I deliberately used guardset_ instead of guardlist_ due to Prop 247).
A short description of the algorithm is included on top, and then various methods and functions are prototyped underneath to make the logic more concrete.
Apart from the comments and XXXs on the code, here are some more thoughts on this work:
This new design focuses on protecting against path bias attacks, by slightly damaging our reachability.
Specifically, the old design is better at recovering in filtered networks, because it will keep on adding new nodes till one succeeds. In this new design, we will not try more than 80 relays per time. So if none of them passes the filtered network, bad luck no Tor.
While this failure mode should not happen much, it's bad news for users behind FascistFirewalls which are actually quite frequent. A quick fix here would be to always add an 80/443 guard on our list, however as it stands only 30% of the guards are 80/443 guards, so this has bad anonymity consequences.
I had mixed feelings about not trying to handle this better in the past, but given that such users can use the default Tor Browser bridges (and likely effectively have to, since bootstrapping takes tens of minutes anyway in this case) it may not be worth micro-optimizing for?
- To improve our algorithm and make it more robust we need to understand further what kind of path bias attacks are relevant here. The adversary here is a network adversary (like a gateway) that can block our connections to certain guards. What nasty attacks can this adversary do?
I see this attack vector as adding two capabilities:
1. The adversary can induce you to connect to only its chosen guards to find you later on other networks through fingerprinting (as others have mentioned).
2. The adversary gets to perform attacks that would otherwise be impossible without the Guard node's identity key (which is required in order to unwrap TLS). There are lots of these attacks: XOR-based tagging of the cipherstream to force you to use specific exits, other instances of per-circuit failure, HS circuit fingerprinting, circuit-level traffic analysis, etc.
- In general, I tried to keep the number of heuristics and kludges to the minimum to keep the logic simple. Unfortunately, it seems that without a "network down" indicator (#16120) there is no way to avoid edge cases and false positives here.
If you can't find a more reliable OS-specific indicator, you might look at circuit_build_times_network_is_live() and where it is called. I think the channel code also sets indicators in similar places.
Mike Perry mikeperry@torproject.org writes:
George Kadianakis:
Hello there,
recently we've been busy specifying various important improvements to entry guard security. For instance see proposals 250, 241 and ticket #16861.
Unfortunately, the current guard codebase is dusty and full of problems (see #12466, #12450). We believe that refactoring and cleaning up the entry guard code is essential before we proceed to more advanced security improvements.
We've been working on new algorithms and data structures for guard nodes as part of ticket #12595.
In this mail I include some pseudocode for this new algorithm with the hope that it will act as a draft for implementing these changes. You can find the pseucode here:
https://gitweb.torproject.org/user/asn/tor.git/tree/src/or/guardlist.c?h=bug...
Have you considered how these data structures might change for Prop 247?
I like Prop 247, and think it is a significant step in the right direction for the HS-specific problem (though I'd like to think a bit more about how it would compare to a more general virtual circuit mechanism that could also protect clients even for instances of application-specific Guard discovery).
More generally, I'm worried in that this is yet another case where upstream deliverables that were written over a year ago are going to cause us to arrive at a sub-optimal solution that we're forced to deploy only to rip out later. Hurray deliverables!
In terms of specific suggestions: I find that code conceptually makes more sense when it at least pretends to be object oriented. You might want to consider a single guardset object that holds all of the Guard configuration state, including the main guard list, vanguards, path bias info, etc. This object would then be passed in as the first argument to all guard-related functions, especially if we're talking about overhauling it anyway.
I agree about looking at this from an OOP prespective.
I think the main class that is introduced here is the guardset (or guardlist). The second class is the guard itself, which is basically a node_t in the code.
A guardlist contains many guards, and knows how to to manipulate them correctly.
The way I was thinking that this would work along with prop247, is that each layer of guards will be a separate guardset. So when you pick your first-layer guard, you pick it from the first-layer guardset and when you pick your second-layer guards, you pick them from the second-layer guardset.
Now, if we want to do guard buckets for the third layer in a way that each second-layer guard has a dedicated guardset of third-layer guards, we will need to split the third-layer guards to N guardsets, and assign each of these guardset to one of the N second-layer guards.
This way, when you make a circuit through second-layer guard k, you will only use the third-layer guardset that corresponds to k.
There are probably more subtetlies to this design, but this is how I imagine these two proposals to interface in basic terms.
All guard-related functions would then be prefixed with guardset_ as well, to make it clear that you could find them in guardset.c. (Also note I deliberately used guardset_ instead of guardlist_ due to Prop 247).
A short description of the algorithm is included on top, and then various methods and functions are prototyped underneath to make the logic more concrete.
Apart from the comments and XXXs on the code, here are some more thoughts on this work:
This new design focuses on protecting against path bias attacks, by slightly damaging our reachability.
Specifically, the old design is better at recovering in filtered networks, because it will keep on adding new nodes till one succeeds. In this new design, we will not try more than 80 relays per time. So if none of them passes the filtered network, bad luck no Tor.
While this failure mode should not happen much, it's bad news for users behind FascistFirewalls which are actually quite frequent. A quick fix here would be to always add an 80/443 guard on our list, however as it stands only 30% of the guards are 80/443 guards, so this has bad anonymity consequences.
I had mixed feelings about not trying to handle this better in the past, but given that such users can use the default Tor Browser bridges (and likely effectively have to, since bootstrapping takes tens of minutes anyway in this case) it may not be worth micro-optimizing for?
As always it's a tradeoff between security (against path bias attacks) and connectivity (or how quickly if ever you manage to connect to Tor).
At this point, it might make sense to default to security, but make it easy for filtered people to get connectivity. So, like, if we fail to bootstrap for a while, maybe it makes sense for tor-launcher to present a "connect me anyway!" option to the user (which will use bridges or turn on FascistFirewall).
- To improve our algorithm and make it more robust we need to understand further what kind of path bias attacks are relevant here. The adversary here is a network adversary (like a gateway) that can block our connections to certain guards. What nasty attacks can this adversary do?
I see this attack vector as adding two capabilities:
- The adversary can induce you to connect to only its chosen guards to
find you later on other networks through fingerprinting (as others have mentioned).
- The adversary gets to perform attacks that would otherwise be
impossible without the Guard node's identity key (which is required in order to unwrap TLS). There are lots of these attacks: XOR-based tagging of the cipherstream to force you to use specific exits, other instances of per-circuit failure, HS circuit fingerprinting, circuit-level traffic analysis, etc.
- In general, I tried to keep the number of heuristics and kludges to the minimum to keep the logic simple. Unfortunately, it seems that without a "network down" indicator (#16120) there is no way to avoid edge cases and false positives here.
If you can't find a more reliable OS-specific indicator, you might look at circuit_build_times_network_is_live() and where it is called. I think the channel code also sets indicators in similar places.