The objective it's making a single Tor Relay and using on the machine many daemons on a multicore server. I hope someone can give me a feedback if this kind of configuration can be problematic for Tor network before test in a real environment.
Many Tor daemon, with X from 0..n, binded on different port, all advertise the same ORPort/DirPort. An IPVS load balancing at kernel level.
Tor Config:
SocksPort 0 Nickname superrelay Address superrelay.net ORPort 443 NoListen
ORPort 127.0.0.1:100X NoAdvertise DirPort 80 NoListen
DirPort 127.0.0.1:200X NoAdvertise DataDirectory /var/lib/tor/childX
All daemon have the same SocksPort, Nickname, Address, MyFamily. All daemon point to different DataDirectory, but each of them have the 'keys' directory identical. All daemon advertise the ORPort 443 and DirPort 80, but bind on other ports.
IPVS configuration:
# Clear ipvsadm -C
# Add a virtual server (-A) for ORPort. ipvsadm -A -t superrelay.net:443 -s sh
# Add a virtual server (-A) for DirPort. ipvsadm -A -t superrelay.net:80 -s sh
# Add all real servers/daemons (-a) for ORPort ipvsadm -a -t superrelay.net:443 -r localhost:1000 -m ipvsadm -a -t superrelay.net:443 -r localhost:1001 -m ... ipvsadm -a -t superrelay.net:443 -r localhost:100X -m
# Add all real servers/daemons (-a) for DirPort ipvsadm -A -t superrelay.net:80 -s sh ipvsadm -a -t superrelay.net:80 -r localhost:2000 -m ipvsadm -a -t superrelay.net:80 -r localhost:2001 -m ... ipvsadm -a -t superrelay.net:80 -r localhost:200X -m
Note the scheduling method: sh - Source Hashing: assigns jobs to servers through looking up statically assigned hash table by their source IP addresses. Any source-ip will be always binded to the same daemon.
I think 'Advertised Bandwidth' on onionoo/atlas can be inexact.
I can experiment a relay like that, or can be problematic/unexpected for Tor network? Thanks for any feedback.
Clodo:
The objective it's making a single Tor Relay and using on the machine many daemons on a multicore server. I hope someone can give me a feedback if this kind of configuration can be problematic for Tor network before test in a real environment.
there can only be a single tor instance at a given IP:ORPort because tor clients expect a specific tor relay at that location (public key as defined in consensus)
you can simple run 2 tor instances per public IP using different ORPorts
On 8 Jul 2017, at 08:36, nusenu nusenu-lists@riseup.net wrote:
Clodo:
The objective it's making a single Tor Relay and using on the machine many daemons on a multicore server. I hope someone can give me a feedback if this kind of configuration can be problematic for Tor network before test in a real environment.
there can only be a single tor instance at a given IP:ORPort because tor clients expect a specific tor relay at that location (public key as defined in consensus)
These things will break: * if multiple tor daemons update the same onion keys at the same time, the key files may get corrupted or the cross-certification may not refer to the keys being used. This would break all Tor instances for any circuits after a week or a month (depending on the tor version). * your relays will place additional load on the directory authorities by uploading multiple identical descriptors * if these descriptors ever get out of sync, they will replace each other, causing unpredictable behaviour
Because clients expect to access the same process with the same identity: * your relay will not be usable as an HSDir * your relay will not be usable as an Introduction Point * your relay will not be usable as a Rendezvous Point
you can simple run 2 tor instances per public IP using different ORPorts
Tor uses multithreaded crypto already: depending on the speed of your processor, you can get up to 400 Mbps per instance (250 Mbps is typical).
You can also get a second IPv4 address, and run 2 Tor daemons on that IP address as well.
T
-- Tim Wilson-Brown (teor)
teor2345 at gmail dot com PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B ricochet:ekmygaiu4rzgsk6n xmpp: teor at torproject dot org ------------------------------------------------------------------------
Tor uses multithreaded crypto already: depending on the speed of your processor, you can get up to 400 Mbps per instance (250 Mbps is typical).
Here i see a pending project: https://trac.torproject.org/projects/tor/ticket/1749 and plans about that: https://trac.torproject.org/projects/tor/wiki/org/projects/Tor/Multithreaded... are stalled from years because of complex implementation (and the current possibility of running multiple daemon lower the priority, of course).
Maybe Tor uses multithread for some activity, but actually it's not a real/full multithread implementation from what i understand. Otherwise, what are the reason to run multiple daemon on the same server like a lot of people do with high-capacity 1gbit/s server does?
If i have a 10gbit/s unmetered port with a 24-core CPU, i'm forced to run multiple-daemons on multiple ip to use it at maximum. And the world see a lots of relay (within the same Family, of course). This is a wrong approach for me: - overhead OONION/Atlas. I know a guy that run over 30 relay on 3/4 physical machine (so for OONION/Atlas, 30 relay to track, collect stats and so on). - requirement to Tor relay volunteers to obtain many IP address to run high-capacity servers. - configuration issue, for example about running multiple-daemon with systemd.
so i'm thinking if can exists a better, easy solution. I just want to be constructive, i'm a open-source tools developer in my spare time.
These things will break:
- if multiple tor daemons update the same onion keys at the same time, the key files may get corrupted or the cross-certification may not refer to the keys being used. This would break all Tor instances for any circuits after a week or a month (depending on the tor version).
- your relays will place additional load on the directory authorities by uploading multiple identical descriptors
- if these descriptors ever get out of sync, they will replace each other, causing unpredictable behaviour
Because clients expect to access the same process with the same identity:
- your relay will not be usable as an HSDir
- your relay will not be usable as an Introduction Point
- your relay will not be usable as a Rendezvous Point
Honestly i don't know well this kind of details. It's the reason of this discussion with people like you. Maybe it's possibile to simply develop or patch specific options to obtain the objective to "made easy" an high-capacity server. Probably more easy than the works linked above about the Parallelizingcellcrypto. For example, "uploading multiple identical descriptors" maybe be simply avoided with an option that identify the 'master' daemon, other daemons simply skip the descriptors upload phase. Or some kind of syncronization channel between daemons, which currently do not communicate with each other.
Anyway, thanks for your feedback, i will study your details. Ciao! Fabrizio
On 9 Jul 2017, at 01:36, Clodo clodo@clodo.it wrote:
Tor uses multithreaded crypto already: depending on the speed of your processor, you can get up to 400 Mbps per instance (250 Mbps is typical).
Here i see a pending project: https://trac.torproject.org/projects/tor/ticket/1749 and plans about that: https://trac.torproject.org/projects/tor/wiki/org/projects/Tor/Multithreaded... are stalled from years because of complex implementation (and the current possibility of running multiple daemon lower the priority, of course).
Maybe Tor uses multithread for some activity, but actually it's not a real/full multithread implementation from what i understand. Otherwise, what are the reason to run multiple daemon on the same server like a lot of people do with high-capacity 1gbit/s server does?
If i have a 10gbit/s unmetered port with a 24-core CPU, i'm forced to run multiple-daemons on multiple ip to use it at maximum. And the world see a lots of relay (within the same Family, of course). This is a wrong approach for me:
- overhead OONION/Atlas. I know a guy that run over 30 relay on
3/4 physical machine (so for OONION/Atlas, 30 relay to track, collect stats and so on).
Yes, I have a similar issue where I have 8 relays across 2 machines.
- requirement to Tor relay volunteers to obtain many IP address to run
high-capacity servers.
Yes, this can be a problem.
- configuration issue, for example about running multiple-daemon with
systemd.
tor-instance-create is your friend here, at least on Debian and Ubuntu. It works really well.
Or you can use a tool like ansible-relayor for multiple servers.
so i'm thinking if can exists a better, easy solution. I just want to be constructive, i'm a open-source tools developer in my spare time.
We would welcome development help with the multithreaded crypto. But it's a complicated part of the code, and a large patch, so it might be a good idea to start with something small first, to learn our processes and coding standards.
Here's some background information:
https://trac.torproject.org/projects/tor/wiki/org/teams/NetworkTeam#Becoming... https://gitweb.torproject.org/tor.git/tree/doc/HACKING/README.1st.md https://gitweb.torproject.org/user/nickm/torguts.git/tree/
And here are some smaller tickets, you'll want the ones under "Core Tor/Tor": (They might not all be "easy".)
https://trac.torproject.org/projects/tor/report/30
These things will break:
- if multiple tor daemons update the same onion keys at the same time,
the key files may get corrupted or the cross-certification may not refer to the keys being used. This would break all Tor instances for any circuits after a week or a month (depending on the tor version).
- your relays will place additional load on the directory authorities
by uploading multiple identical descriptors
- if these descriptors ever get out of sync, they will replace each
other, causing unpredictable behaviour
Because clients expect to access the same process with the same identity:
- your relay will not be usable as an HSDir
- your relay will not be usable as an Introduction Point
- your relay will not be usable as a Rendezvous Point
Honestly i don't know well this kind of details. It's the reason of this discussion with people like you. Maybe it's possibile to simply develop or patch specific options to obtain the objective to "made easy" an high-capacity server. Probably more easy than the works linked above about the Parallelizingcellcrypto. For example, "uploading multiple identical descriptors" maybe be simply avoided with an option that identify the 'master' daemon, other daemons simply skip the descriptors upload phase.
PublishServerDescriptor is an existing option that does this.
Or some kind of syncronization channel between daemons, which currently do not communicate with each other.
I think this would be a very complicated patch, with some security, reliability, and performance drawbacks. It would also be hard to test.
For example, in the rendezvous point case, you would have to pass high-volume circuit traffic across this link.
On 9 Jul 2017, at 02:10, Roman Mamedov rm@romanrm.net wrote:
On Sat, 8 Jul 2017 09:54:20 +1000 teor teor2345@gmail.com wrote:
Tor uses multithreaded crypto already: depending on the speed of your processor, you can get up to 400 Mbps per instance (250 Mbps is typical).
In practice I don't remember seeing much more than 120-130% CPU use per process, and even that, only in brief peaks. Maybe crypto is not actually the bottleneck, but some other non-parallel operation instead.
Speaking of CPU use, is there any roadmap to phase out TAP mode circuits? IIRC those are very CPU-expensive compared to NTor. Even though now TAP counts are only 10-20% compared to NTor, could it be that those are actually responsible for something like 50%+ of total CPU usage.
ntor handshakes have been preferred since 0.2.4.17-rc (September 2013). We made them mandatory in 0.2.9.3-alpha (September 2016).
But the legacy hidden service protocol still requires TAP, so it can only be phased out when there are no longer any legacy hidden services on the network. I think that would be January 1, 2020 at the earliest, because we promised to support 0.2.9 until then:
https://trac.torproject.org/projects/tor/wiki/org/teams/NetworkTeam/CoreTorR...
Unless, of course, we choose to disable legacy hidden services earlier than that.
T
-- Tim Wilson-Brown (teor)
teor2345 at gmail dot com PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B ricochet:ekmygaiu4rzgsk6n xmpp: teor at torproject dot org ------------------------------------------------------------------------
On Sat, 8 Jul 2017 09:54:20 +1000 teor teor2345@gmail.com wrote:
Tor uses multithreaded crypto already: depending on the speed of your processor, you can get up to 400 Mbps per instance (250 Mbps is typical).
In practice I don't remember seeing much more than 120-130% CPU use per process, and even that, only in brief peaks. Maybe crypto is not actually the bottleneck, but some other non-parallel operation instead.
Speaking of CPU use, is there any roadmap to phase out TAP mode circuits? IIRC those are very CPU-expensive compared to NTor. Even though now TAP counts are only 10-20% compared to NTor, could it be that those are actually responsible for something like 50%+ of total CPU usage.
You can also get a second IPv4 address, and run 2 Tor daemons on that IP address as well.
This is not always feasible and carries additional expense even if it is.
Another idea that I proposed some time ago is raising the relay-per-IP limit from 2 to 4. There are almost no 1 or 2-core CPUs anymore, and 4-core CPUs (for 4 Tor processes) are extremely common. Especially considering the ARM architecture, where it's really common now to see 4-core CPUs, with each core being relatively weak on its own.
tor-relays@lists.torproject.org