Towards a Tor Node Best Best Practices Document - tor-relays

12 Apr 2012


      It's occurred to me that we have yet to provide any official
recommendations with respect to best practices for operating Tor relays.
This post is my attempt to define a reasonable threat model and use it
to develop some recommendations. My plan is to post it here first, to
subject it to review by the relay operator community. After operators
have a chance to comment, the plan is to relocate the document to the
Tor Wiki and/or a blog post.
As always, to focus our thoughts, we start with the adversary's goals.
Adversary Goals
There is a significant difference between adversaries that can see
inside of router-to-router TLS vs those that cannot. I believe this
capability distinction governs the adversary goals in terms of
compromising relays as opposed to merely externally observing them.
Adversaries that can unwrap router TLS can perform every attack that an
actual node can perform, at any location between the user and the node,
and/or between the node and other nodes.
In particular, adversaries that can see inside router TLS can perform
tagging attacks (see
https://lists.torproject.org/pipermail/tor-dev/2012-March/003361.html)
as well as perform circuit-specific active and passive timing analysis.
These attacks can be quite severe. An adversary that is able to obtain
Guard identity keys is free to perform a tagging attack anywhere on the
Internet. In other words, if the adversary is interested in monitoring a
particular user, the adversary need only obtain the identity keys for
that user's 3 guard nodes, and from that point on, the adversary will be
able to transparently monitor everything that user does by way of using
tagging to bias the users paths to connect only to surveilled exit nodes
who also have had their identity keys compromised.
Based on this distinction, it seems that some simple best practices can
increase the costs of an adversary that wishes to compromise tor
traffic.
Let's now consider how the adversary goes about compromising router TLS.
Attack Vectors
There are two high-level vectors towards seeing inside node-to-node TLS
(which uses ephemeral keys that are rotated daily and authenticated via
the node's identity key). Both high-level vectors therefore revolve
around node identity key theft.
Attack Vector #1: One-Time Key Theft
The one-time adversary is interested in performing a grab of keys and
then operating transparently upstream afterwords. This adversary will
take the form of a coercive request at a datacenter/ISP to extract
identity node key material and from then on, operate externally as a
transparent upstream MITM, creating fake ephemeral TLS keys
authenticated with the stolen identity key. Tor nodes that encounter
this adversary will likely see it in the form of unexplained
reboots/mysterious downtime, which are inevitable in the lifespan of any
Tor node.
Attack Vector #2: Advanced Persistent Threat Key Theft
If one-time methods fail or are beyond reach, the adversary has to
resort to persistent machine compromise to retain access to node key
material.
The APT attacker can use the same vector as #1 or perhaps an external
vector such as daemon compromise, but they then must also plant a
backdoor that would do something like trawl through the RAM of a
machine, sniff out the keys (perhaps even grabbing the ephemeral TLS
keys directly), and transmit them offsite for collection.
This is a significantly more expensive position for the adversary to
maintain, because it is possible to notice upon a thorough forensic
investigation during a perhaps unrelated incident, and it may trigger
firewall warnings or other common least privilege defense alarms
inadvertently.
Unfortunately, it is also a more expensive attack to defend against,
because it requires extensive auditing and assurance mechanisms on the
part of the relay operator.
Defenses
It seems clear that the above indicates that at minimum relays should
protect against one-time key compromise. Some further thought shows that
it is possible to make the APT adversary's task harder as well, albeit
with significantly more effort.
Let's deal with defending against each vector in turn.
Prevent Vector #1 (One-Time Key Theft): Deploy Ephemeral Identity Keys
The simplest way to defend against the adversary who attempts to extract
relay keys through a reboot is to take advantage of the fact that even
node identity keys can be ephemeral, and do not need to persist long
term (certainly not past a reboot). This can be achieved with a boot
script that wipes your keys (they live in /var/lib/tor/keys) at startup,
or by using a ramdisk:
http://www.cyberciti.biz/faq/howto-create-linux-ram-disk-filesystem/
Of the two, the ramdisk option is superior, since it will prevent the
adversary from easily re-using your old keys after you begin using the
new ones.
Additionally, ssh server key theft is another one-time vector that can
be used to quickly bootstrap into node key theft. For this reason, node
admins should always use ssh key auth for tor node administration
accounts, since it prevents ssh server key theft from implying
continuous server compromise:
http://www.gremwell.com/ssh-mitm-public-key-authentication
Issues With Ephemeral Identity Keys
There are a few issues with deploying ephemeral identity keys.
Issues With Ephemeral Identity Keys: Client guard node loss
The primary issue with ephemeral identity keys is client Guard node
loss. If your relay obtains the Guard flag, you should endeavor to keep
it. If you have planned maintenance and controlled reboots, you should
copy your identity keys to a safe location prior to reboot so that
clients aren't forced to rotate their guards prematurely due to
unnecessary rekeying.
Issues With Ephemeral Identity Keys: MyFamily
The next issue is that identity key rotation makes the use of MyFamily
very complicated. For large families, it's nearly impossible to update
the MyFamily line of each node instance after every unexpected reboot.
We've filed a bug to see if we can find a more convenient way to
provide the same feature, but it's not clear that MyFamily is worth
maintaining: https://trac.torproject.org/projects/tor/ticket/5565
It is very likely that the security benefit from maintaining MyFamily
pales in comparison the gain from deploying ephemeral identity keys, and
MyFamily should be abandoned entirely.
Issues With Ephemeral Identity Keys: Tor Weather
The final issue with ephemeral identity keys is that node monitoring
mechanisms such as Tor Weather become difficult to use in the face of
rotating keys. We've filed this bug to improve Weather's subscription
mechanisms: https://trac.torproject.org/projects/tor/ticket/5564
Preventing Vector #2: Isolation Hardening and Readonly Runtime
Once one-time key theft has been dealt with, you can begin to consider
how to deal with the Advanced Persistent Threat.
The effort required to defend against this adversary is considerable,
and it is not expected that all operators will devote the effort to do
so.
To limit scope, we are not going to deal with the daemon compromise
vector; for that see your OS least-privilege mechanisms (such as
SElinux, AppArmor, Grsec RBAC, Seatbelt, etc).  Instead, we will deal
with how you can attempt to protect your identity keys once an adversary
already has root access.
If you are serious about defending against this adversary, the first
thing you will want to do is disable access to the 'ptrace' system call
from userland, which allows easy key theft using debugging tools. Note
that all current built-in kernel mechanisms to do this still allow root
users to use ptrace on arbitrary processes. In order to disable ptrace
for root users, you need to load a kernel module to patch the syscall
table to remove access to the syscall itself. Two options for this are:
https://gist.github.com/1216637
http://people.baicom.com/~agramajo/misc/no-ptrace.c
Once access to the ptrace system call is removed, you need to disable
module loading to prevent it from being restored. On Linux, this is
accomplished via 'sysctl kernel.modules_disabled=1'. You should perform
this operation as early in the boot process as possible. One technique
that works on Redhat-based systems is to place a shell script in
/etc/rc.modules to load the modules you need for operation, insert the
ptrace module, and then issue the sysctl to disable further module
loading. Redhat-derivatives launch /etc/rc.modules first thing at the
top of /etc/rc.sysinit.
After that comes ensuring runtime integrity. There are several ways to
achieve this, but most are easily subverted by an attacker with direct
access to the hardware. The most robust approach seems to be to create a
small encrypted loopback filesystem that contains all of the libraries
required to run the 'tor' process as well as all of the requisite
configuration files. Requisite libraries can be determined via 'ldd
/usr/bin/tor'. The encrypted loopback filesystem doesn't need to be more
than ~25M in size, but you will also need an auxillary var loopback that
needs to be a hundred megs or so.
Here are the commands for creating the root loopback filesystem:
dd if=/dev/urandom of=./tor-root.img bs=1k count=25k
 losetup /dev/loop1 ./tor-root.img
 cryptsetup luksFormat /dev/loop1
 cryptsetup luksOpen /dev/loop1 tor-root
 mkfs.ext4 /dev/mapper/tor-root
Once this encrypted loop is created such that it can run your relay's
Tor processes, you should take the sha1sum of the file and store it
offsite.
When you use this loopback, you will mount it readonly, and mount an
unencrypted var directory inside of it, and a ramdisk for your keys
inside of that:
dd if=/dev/urandom of=./tor-var.img bs=1k count=200k
 losetup /dev/loop2 ./tor-var.img
 mkfs.ext4 /dev/loop2
mount /dev/mapper/tor-root /mnt/tor-root -o ro
 mount /dev/loop2 /mnt/tor-root/var
mkfs -q /dev/ram1 128
 mount /dev/ram1 /mnt/tor-root/var/lib/tor/keys
 cd /mnt/tor-root/
 chroot .
 start_tor.sh
Once you start your tor process(es), you will want to copy your identity
key offsite, and then remove it. Tor does not need it to remain on disk
after startup, and removing it ensures that an attacker must deploy a
kernel exploit to obtain it from memory. While you should not re-use the
identity key after unexplained reboots, you may want to retain a copy
for planned reboots and tor maintenance.
scp /mnt/tor-root/var/lib/tor/keys/secret_id_key offsite_backup:/mnt/usb/tor_key
 rm /mnt/tor-root/var/lib/tor/keys/secret_id_key
Upon suspicious reboots, you can verify the integrity of your tor image
by simply calculating the sha1sum (perhaps copying the image offsite
first). You do not need to do anything special with the var loopback.
These steps should prevent even adversaries who compromise the root
account on your system (by rebooting it, for example) from obtaining
your identity keys directly, forcing them to resort to kernel exploits
and memory gymnastics in order to do so.
Don't forget to periodically update the libraries stored on your
loopback root using a trusted offsite source, as they won't receive
security updates from your distribution.
One alternative to make your loopback fs creation, tor startup, and
maintenance process simpler is to statically compile your image's tor
binary on an offsite, trusted computer. If you do this, you should no
longer need to bother with chrooting your tor processes or copying
libraries around. However, it still does not save you from the need to
recompile that binary whenever there is a security update to the
underlying libraries, and it may come at a cost of exploit resistance
due to the loss of per-library ASLR.
Ok, that's it. What do people think? Personally, I think that if we can
require a kernel exploit and/or weird memory gymnastics for key
compromise, that would be a *huge* improvement. Do the above
recommendations actually accomplish that?
If so, should we work on providing scripts to make the loopback
filesystem creation process easier, and/or provide loopback images
themselves?
Even the APT defenses end up not working out, I would sleep a lot better
at night if most relays deployed only the defenses to one-time key
theft... Thoughts on that?
-- 
Mike Perry