Hi List,
Please have a look at this proposal.
Filename: Check-Maxmind-GeoIP-DB-before-distributing.txt Title: Check Maxmind GeoIP-DB before distributing Ticket(s): #26240 Author: Jaskaran Singh Created: June 2018 Status: Open
0. Motivation and Overview We're using Maxmind's (company registered in the US) GeoIP Database, which is not just antithetical to the philosophy that one should not totally rely on a service/software for all needs, but has some serious security repercussions too.
Trusting Maxmind's GeoIP Database is dangerous, as it may lead to some possible attacks on the Network. We propose that the Database be checked for integrity before distributing to the users. The whole process of checking for integrity can be assigned to the Directory Authorities (or any trusted systems) who would be responsible for completing it using a script.
We should also give a choice to the user whether she wants to use Maxmind's DB or any other DB of her choice, or even to not use any Geo-IP DB at all.
1. Threat Model We assume an adversary that is capable of introducing false information in the Maxmind GeoIP database, either by it's influence over the company or otherwise. The adversary also has enough resources to perform Sybil attack on the network.
2. Attacks on the Network
2.1 Sybil attack under the Radar The Tor Network is constantly monitored for any suspicious spike in nodes, as it may be an indication of an oncoming/undergoing sybil attack. A powerful adversary can coerce Maxmind to map some specific IP address blocks to different countries. This may lead to people/scripts monitoring the network to not feel suspicious about this event, and would result in the adversary staying under the radar.
2.2 False Location indication for a shady node A large percentage of people don't want the exit of their circuits to be located in certain countries where the communication is under surveillance. The powerful adversary knows this as well. Users generally add a line in their config that allows them to not form a circuit through nodes located in those locations. To overcome this, the adversary can coerce Maxmind to alter it's database to map some particular IP's to locations which the user thinks are havens of free speech.
3. Design of the Solution We should check Maxmind database against it's own previous versions. Additionally we should also simply stop using GeoIP database intrinsically for every purpose but still allow users to plug in their own databases through the interface we implement. Perhaps the latter can be introduced as ./configure option for when the user is highly distrustful of Maxmind and wants to use a service she trusts, or doesn't wants to use at all. The two solutions are explained below.
3.1 Checking for integrity
Step 1: The Dir Authorities (or any trusted computers) fetch the latest maxmind geoip-db along with its previous versions.
Step 2: Tor Nodes' location are checked against the previous versions for any changes.
Step 3: All the Dir Authorities perform the above two steps independently of each other. A count of the number of changes in node locations is maintained. If the changes are in significant amount, they are viewed with suspicion, since this can be the preparation of a sybil attack by the adversary. In such a case, the new changes into the database can be discarded. Though, even change in a single node's location is concerning, but it is not easy attribute that change to malice. Sometimes there are genuine reasons for a location to change.
Step 4. This database is then distributed to the users.
3.2 Doing away with GeoIP location altogether GeoIP databases are occasionally un-realiable and can be done away with safely. We can provide a ./configure option to the users that enables them to plug in their own trusted service. If the user doesn't have access to a database of her own choice, she can simply choose Maxmind, or not use any database at all. It would remove our dependence from just one database, and diversify our usage.
4. Licensing issues Maxmind has a pretty liberal license when it comes to their database, as summarized below
Maxmind - CC BY-SA 4.0 * Copy and redistribute the material in any medium or format * remix, transform, and build upon the material for any purpose, even commercially
5. Dealing with false positives Maxmind calculates geolocation of an IP addr using WHOIS records, Reverse DNS etc. It claims to have precision rate of 99.5% on country level. The other 0.5% is more likely to be those IP addresses for which neither WHOIS record nor Reverse DNS are setup.
A very large percentage of Tor Nodes are run from datacenters, which usually have all their records set up. It's highly unlikely for an IP address belonging to a datacenter to be mapped to a wrong location.
Hence, false positives would be very few, and can be safely ignored after a simple manual/scripted investigation.
Hi,
On 30/06/18 12:53, Jaskaran Singh wrote:
- Motivation and Overview
We're using Maxmind's (company registered in the US) GeoIP Database, which is not just antithetical to the philosophy that one should not totally rely on a service/software for all needs, but has some serious security repercussions too.
I would love to see a full list of all the places we currently use this database and what security consequences could be.
Relevant tickets to this discussion that you may want to read have the keyword "metrics-geoip" in trac.
Also, you may be interested in karsten's comment on #22203 where we talk about downloading signed GeoIP files from the dirauths instead of shipping them in the distribution.
Thanks, Iain.
Hi,
On 30.06.2018 13:53, Jaskaran Singh wrote:
- Dealing with false positives
Maxmind calculates geolocation of an IP addr using WHOIS records, Reverse DNS etc. It claims to have precision rate of 99.5% on country level. The other 0.5% is more likely to be those IP addresses for which neither WHOIS record nor Reverse DNS are setup.
A very large percentage of Tor Nodes are run from datacenters, which usually have all their records set up. It's highly unlikely for an IP address belonging to a datacenter to be mapped to a wrong location.
Hence, false positives would be very few, and can be safely ignored after a simple manual/scripted investigation.
We measured Tor relay locations a while ago using ICMP RTT measurements from multiple server instances located in Europe, North America, Asia, and Oceania. Using the minimum RTT for each connection*, we applied multilateration for estimating the location of a relay. Even though this approach is noisy because of varying network conditions and routes, we still get a good estimate of the relay's actual position.
We compared our estimated ICMP relay locations with the GeoIP information: - our test set consisted of a full consensus - we conducted the measurements within 5 days and repeated reference experiments a month later to test the stability of results - we sent 500 pings per relay from 8 remote servers and repeated the measurements multiple times - we use the minimum RTT as input for the multilateration
Results can be summarized as follows: - the median location error is in a range of 440km - 287 outliers are more than 2654km away from the position that GeoIP suggested. This represents ~4.6% of the tested relays - the 75th percentile of nodes differs by more than 1000km
Currently we repeat the experiments with 16 instead of 8 servers and work on improving the evaluation to improve the location estimate.
We cannot take these results as a ground truth, as a majority of GeoIP locations already document the actual country and continent a relay is in. Nevertheless, this is a good way to add an independent verification step. The location error for the outliers is a proof that there are nodes that actually run on a different continent and this is an important security issue if users want to circumvent a certain country. The same applies for the 75th percentile, which also leads to updated country information for a significant set of relays.
We can conclude that yes, a large percentage of Tor nodes have OK records. But the number of false positives is not that low and, from my opinion, cannot be ignored. Besides an independent verification step, for which I suggest timing measurements and multilateration, location errors that lead to an updated country code should be considered as update (or respective nodes should be flagged).
*this follows the motivation that no transmission can ever be faster than a certain threshold, so the minimum RTT is the closest we can get to this threshold
Cheers, Katharina
Thanks for your work. You may also consider Africa and South America, Canada, Russia, etc. And locations interior to all such that contacts within an RTT are not as likely to be across a pond or other border, vs as at some edge IX or landing. Cable maps may assist.