Hi,
I'm running Tor on a router and was wondering why the Tor daemon uses so much memory. Did a pmap:
pmap `pidof tor`
And got the following result:
1703: /usr/sbin/tor --PidFile /var/run/tor.pid 00400000 1024K r-x-- /usr/sbin/tor 0050f000 4K r---- /usr/sbin/tor 00510000 20K rw--- /usr/sbin/tor 00515000 12K rwx-- [ anon ] 00ae9000 36K rwx-- [ anon ] 00af2000 17288K rwx-- [ anon ] 7713a000 7140K r---- /tmp/lib/tor/cached-microdescs 77833000 516K rw--- [ anon ] 778f5000 348K r-x-- /lib/libuClibc-0.9.33.2.so 7794c000 60K ----- [ anon ] 7795b000 4K r---- /lib/libuClibc-0.9.33.2.so 7795c000 4K rw--- /lib/libuClibc-0.9.33.2.so 7795d000 20K rw--- [ anon ] 77962000 80K r-x-- /lib/libgcc_s.so.1 77976000 60K ----- [ anon ] 77985000 4K rw--- /lib/libgcc_s.so.1 77986000 12K r-x-- /lib/libdl-0.9.33.2.so 77989000 60K ----- [ anon ] 77998000 4K r---- /lib/libdl-0.9.33.2.so 77999000 4K rw--- /lib/libdl-0.9.33.2.so 7799a000 76K r-x-- /lib/libpthread-0.9.33.2.so 779ad000 60K ----- [ anon ] 779bc000 4K r---- /lib/libpthread-0.9.33.2.so 779bd000 4K rw--- /lib/libpthread-0.9.33.2.so 779be000 8K rw--- [ anon ] 779c0000 1300K r-x-- /usr/lib/libcrypto.so.1.0.0 77b05000 64K ----- [ anon ] 77b15000 72K rw--- /usr/lib/libcrypto.so.1.0.0 77b27000 4K rw--- [ anon ] 77b28000 292K r-x-- /usr/lib/libssl.so.1.0.0 77b71000 60K ----- [ anon ] 77b80000 20K rw--- /usr/lib/libssl.so.1.0.0 77b85000 176K r-x-- /usr/lib/libevent-2.0.so.5.1.9 77bb1000 64K ----- [ anon ] 77bc1000 4K rw--- /usr/lib/libevent-2.0.so.5.1.9 77bc2000 88K r-x-- /lib/libm-0.9.33.2.so 77bd8000 60K ----- [ anon ] 77be7000 4K rw--- /lib/libm-0.9.33.2.so 77be8000 56K r-x-- /usr/lib/libz.so.1.2.8 77bf6000 60K ----- [ anon ] 77c05000 4K rw--- /usr/lib/libz.so.1.2.8 77c06000 28K r-x-- /lib/ld-uClibc-0.9.33.2.so 77c1a000 8K rw--- [ anon ] 77c1c000 4K r---- /lib/ld-uClibc-0.9.33.2.so 77c1d000 4K rw--- /lib/ld-uClibc-0.9.33.2.so 7f9a0000 132K rw--- [ stack ] 7fff7000 4K r-x-- [ anon ] total 29360K
As you can see there is a large 17288K block which turns out to be a heap (of course). When I dumped the block and looked inside I found it was full of router data. Looks like it is mostly an in-memory database of the router list.
This worries me. If in the future the router list grows, my router (and many other routers running Tor) can run out of memory. For me, it looks a little bit strange to have an in-memory database of the router list. Is there a reason for having this data in memory? And, can something be done about it?
On 20 May 2016, at 06:03, Rob van der Hoeven robvanderhoeven@ziggo.nl wrote:
Hi,
I'm running Tor on a router and was wondering why the Tor daemon uses so much memory.
To clarify, do you mean "running a Tor client on a home Internet router"? What version of Tor?
Did a pmap:
pmap `pidof tor`
And got the following result:
1703: /usr/sbin/tor --PidFile /var/run/tor.pid 00400000 1024K r-x-- /usr/sbin/tor (snip < 1MB) 00af2000 17288K rwx-- [ anon ] 7713a000 7140K r---- /tmp/lib/tor/cached-microdescs (snip < 1MB) 779c0000 1300K r-x-- /usr/lib/libcrypto.so.1.0.0 (snip < 1MB) total 29360K
As you can see there is a large 17288K block which turns out to be a heap (of course). When I dumped the block and looked inside I found it was full of router data. Looks like it is mostly an in-memory database of the router list.
Yes, that heap likely contains the parsed version of cached-microdescs, and the parsed version of the consensus. As well as cell queues and many other in-memory data structures.
This worries me. If in the future the router list grows, my router (and many other routers running Tor) can run out of memory. For me, it looks a little bit strange to have an in-memory database of the router list. Is there a reason for having this data in memory?
Tor selects relays at random when it builds paths. It also uses the relay list for other operations like finding hidden service directories. It's faster to select relays from a parsed data structure in memory, particularly when each relay needs to be checked (for bandwidth, or address, or any number of attributes).
This memory usage is also an issue for mobile devices, embedded devices, and restricted environments (such as VPN network extensions) on any device.
And, can something be done about it?
Every so often, tor increases the minimum bandwidth required to be a relay. This reduces the size of the consensus, and the number of descriptors.
Over the longer term, there are a few design alternatives: 1. store the relay information on disk until needed, or 2. download and use fewer relays, or 3. keep less information in memory for each relay.
1. We could store the information in a database or flat file on disk, and access it as needed. But that could be excruciatingly slow, or result in difficult-to-predict performance. Perhaps a database engine could better cache frequently-used attributes or totals. But the current design still iterates through the entire list every time it selects a relay. We'd need to think carefully about changing this.
2. The issue with each tor client using fewer relays is that it becomes easy to identify individual tor clients. Perhaps we could split the list in two, and it would split clients into two groups. Maybe that's not too bad. But it might also enable other attacks.
I have seen designs described where tor can retrieve parts of the list of relays, while being able to prove they're part of the network-wide list. But that doesn't really help here, because you're still using fewer relays, and therefore easily distinguishable from other clients.
3. We already keep less information in memory using microdescriptors, which most tor clients use by default. If there is any unnecessary information in the consensus or in microdescriptors, we'd be happy to remove it.
I think our change to ed25519 keys might do this over the longer term, but in the interim, it means an increase in memory usage.
Please feel free to let us know if this is a pressing issue for you, and we'll see what we can do.
Tim
Tim Wilson-Brown (teor)
teor2345 at gmail dot com PGP 968F094B ricochet:ekmygaiu4rzgsk6n
I'm running Tor on a router and was wondering why the Tor daemon uses so much memory.
To clarify, do you mean "running a Tor client on a home Internet router"? What version of Tor?
I'm running version 0.2.5.12 (git-99d0579ff5e0349f) The router I use is an GL-AR150 with 64MB RAM, 16MB Flash memory More specs: http://www.gl-inet.com/ar-specifications/
Did a pmap:
pmap `pidof tor`
And got the following result:
1703: /usr/sbin/tor --PidFile /var/run/tor.pid 00400000 1024K r-x-- /usr/sbin/tor (snip < 1MB) 00af2000 17288K rwx-- [ anon ] 7713a000 7140K r---- /tmp/lib/tor/cached-microdescs (snip < 1MB) 779c0000 1300K r-x-- /usr/lib/libcrypto.so.1.0.0 (snip < 1MB) total 29360K
As you can see there is a large 17288K block which turns out to be a heap (of course). When I dumped the block and looked inside I found it was full of router data. Looks like it is mostly an in-memory database of the router list.
Yes, that heap likely contains the parsed version of cached-microdescs, and the parsed version of the consensus. As well as cell queues and many other in-memory data structures.
Seeing the amount of flash memory my device has, I realized that /tmp/lib/tor/cached-microdescs could not be on flash memory. It's on tmpfs and also stored in RAM.
This worries me. If in the future the router list grows, my router (and many other routers running Tor) can run out of memory. For me, it looks a little bit strange to have an in-memory database of the router list. Is there a reason for having this data in memory?
Tor selects relays at random when it builds paths. It also uses the relay list for other operations like finding hidden service directories. It's faster to select relays from a parsed data structure in memory, particularly when each relay needs to be checked (for bandwidth, or address, or any number of attributes).
This memory usage is also an issue for mobile devices, embedded devices, and restricted environments (such as VPN network extensions) on any device.
And, can something be done about it?
Every so often, tor increases the minimum bandwidth required to be a relay. This reduces the size of the consensus, and the number of descriptors.
That would certainly limit the risk of running out of memory.
Over the longer term, there are a few design alternatives:
store the relay information on disk until needed, or
download and use fewer relays, or
keep less information in memory for each relay.
We could store the information in a database or flat file on disk,
and access it as needed. But that could be excruciatingly slow, or result in difficult-to-predict performance. Perhaps a database engine could better cache frequently-used attributes or totals. But the current design still iterates through the entire list every time it selects a relay. We'd need to think carefully about changing this.
With only 16MB of flash, storing the data on flash is not an option. But, most routers have an USB connector which can be connected to a memory stick.
- The issue with each tor client using fewer relays is that it becomes
easy to identify individual tor clients. Perhaps we could split the list in two, and it would split clients into two groups. Maybe that's not too bad. But it might also enable other attacks.
I have seen designs described where tor can retrieve parts of the list of relays, while being able to prove they're part of the network-wide list. But that doesn't really help here, because you're still using fewer relays, and therefore easily distinguishable from other clients.
I like this solution. Maybe a client can download all descriptors, but only store a fixed number of (randomly selected) routers? This could be a configuration option, something like: maxDescriptorStorageCount.
The interesting question is: How does the number of stored descriptors affect the traceability of the client?
- We already keep less information in memory using microdescriptors,
which most tor clients use by default. If there is any unnecessary information in the consensus or in microdescriptors, we'd be happy to remove it.
I think our change to ed25519 keys might do this over the longer term, but in the interim, it means an increase in memory usage.
Please feel free to let us know if this is a pressing issue for you, and we'll see what we can do.
At the moment it is not a pressing issue for me, everything works fine. But if the router list keeps growing it will be a problem and it will break Tor router hardware.
There are not many users running Tor on the router, so I think it's not worth it to put much effort into a solution. I *really* like running Tor on the router, so if memory usage becomes a problem I will buy a router with more memory (or a Raspberry Pi 3).
Thank you for your detailed answer!
Note: For those who are interested, I wrote two articles about Tor on the router:
On Fri, 20 May 2016 12:03:35 +0200 Rob van der Hoeven robvanderhoeven@ziggo.nl wrote:
This worries me. If in the future the router list grows, my router (and many other routers running Tor) can run out of memory. For me, it looks a little bit strange to have an in-memory database of the router list. Is there a reason for having this data in memory? And, can something be done about it?
What's strange about it. The client does the path selection. To build a circuit, the client must know the public keys/ip/port for the entire path and the exit policy.
A few things could be done:
* Figure out the necessary crytographic trickery to allow client driven path selection without the full microdescriptor list a la TvdW's recent-ish blog post.
* Work off the microdescriptors saved to non-volatile storage.
Intuitively this seems like a bad idea due to:
* This is a lot of code, for a niche use-case.
* Similar concerns apply to "the absolute minimum amount of flash that the manufacturer thinks they can get away with" being too small to hold the microdescriptor list.
* Most embedded devices probably don't want to be writing out the microdescriptor list to non-volatile storage either, because flash is garbage.
* Carry on keeping the working set in RAM under the assumption that manufacturers will ship more RAM in their routers as time goes on.
Regards,
On 20 May 2016, at 11:59, Yawning Angel yawning@schwanenlied.me wrote:
What's strange about it. The client does the path selection. To build a circuit, the client must know the public keys/ip/port for the entire path and the exit policy.
Clients could get away with only knowing the key fingerprints for relays in their paths, except for their Guards, which are the only relays they connect to directly. (This might mean a protocol redesign, because I think we send IP and port as well as fingerprint at the moment.)
There are probably other fields we could drop in the common case, if we really needed to.
But do we really need to?
Tim
Tim Wilson-Brown (teor)
teor2345 at gmail dot com PGP 968F094B ricochet:ekmygaiu4rzgsk6n
On Fri, 20 May 2016 12:03:59 -0400 Tim Wilson-Brown - teor teor2345@gmail.com wrote:
On 20 May 2016, at 11:59, Yawning Angel yawning@schwanenlied.me wrote:
What's strange about it. The client does the path selection. To build a circuit, the client must know the public keys/ip/port for the entire path and the exit policy.
Clients could get away with only knowing the key fingerprints for relays in their paths, except for their Guards, which are the only relays they connect to directly. (This might mean a protocol redesign, because I think we send IP and port as well as fingerprint at the moment.)
There's a reason why the EXTEND2 cells contain an IP/port, and also why nodes don't enforce "traffic was from/is to something in the consensus".
The current existing design requires exactly what I stated (Everything required for a client to craft an `EXTEND2` cell with a ntor payload).
But do we really need to?
No. The person is complaining about something with 16 MiB of non-volatile storage anyway.
In general I would be against clever crypto based approaches to limit the amount of data the client downloads, just because "client knows everything and does path selection" is easy to reason about/analyze/implement. Maybe in the extreme long term this will make sense.
Regards,
But do we really need to?
No. The person is complaining about something with 16 MiB of non-volatile storage anyway.
I'm not complaining. I just care about Tor on the router. Memory usage is a concern, and I was wondering if something can be done about it *before* it becomes a problem.
Note: the 16 MB non-volatile storage can be extended by using an USB memory stick.