For a few months I've been tracking this ticket:
https://trac.torproject.org/projects/tor/ticket/6676
Regarding the state of family support. I've been working on a project which could be used to expand the number of running relays and have been trying to find the best way to coordinate this so as to make it both obvious who the operator is (which can be done with contact info) as well as to help users avoid building circuits within these related nodes.
In the vein of "playing nicely" with the network my concern is that when running large scale infrastructure one needs to minimize the number of moving pieces possible. Ideally this would allow me (in the best case scenario) to supply a static family identifier en masse minimizing the need for managed configurations.
In the worst case scenario (that of an entity trying to launch a sybil attack) the administrator would not even attempt to populate this so as to try and appear as separate nodes in the network.
Do folks have suggestions on the best way to "play nice" here?
--redbeard
FWIW, I use MyFamily for what I am assuming Brian uses it for as well, multiple containers across various hosts.
On Fri, Mar 4, 2016 at 10:34 AM, Brian "redbeard" Harrington redbeard@coreos.com wrote:
For a few months I've been tracking this ticket:
https://trac.torproject.org/projects/tor/ticket/6676
Regarding the state of family support. I've been working on a project which could be used to expand the number of running relays and have been trying to find the best way to coordinate this so as to make it both obvious who the operator is (which can be done with contact info) as well as to help users avoid building circuits within these related nodes.
In the vein of "playing nicely" with the network my concern is that when running large scale infrastructure one needs to minimize the number of moving pieces possible. Ideally this would allow me (in the best case scenario) to supply a static family identifier en masse minimizing the need for managed configurations.
In the worst case scenario (that of an entity trying to launch a sybil attack) the administrator would not even attempt to populate this so as to try and appear as separate nodes in the network.
Do folks have suggestions on the best way to "play nice" here?
--redbeard
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Regarding the state of family support. I've been working on a project which could be used to expand the number of running relays and have been trying to find the best way to coordinate this so as to make it both obvious who the operator is (which can be done with contact info) as well as to help users avoid building circuits within these related nodes.
In the vein of "playing nicely" with the network my concern is that when running large scale infrastructure one needs to minimize the number of moving pieces possible. Ideally this would allow me (in the best case scenario) to supply a static family identifier en masse minimizing the need for managed configurations.
In the worst case scenario (that of an entity trying to launch a sybil attack) the administrator would not even attempt to populate this so as to try and appear as separate nodes in the network.
Do folks have suggestions on the best way to "play nice" here?
So you want to have a proper MyFamily configuration across a very high number of relays without reloading them all every time you add a new relay? Why are you worried about these reloads?
The only way I can think of is to preemptively create relay keys.
Lets say you are about to deploy 100 relays within the next week. First you would create 100 relay keys and collect all fingerprints to form the MyFamily string. Then you could use that static string and no reload is required as long as you do not run more than 100 relays.
Depending on how much "idle/spare" fingerprints are in your MyFamily string this might also costs the network an unnecessary overhead.
So adding fingerprints to MyFamily on demand is probably nicer than creating huge descriptors with spare fingerprints just because you do not want to reload your tor instances.
@"minimizing the need for managed configurations"
If you run "large scale infrastructure" you usually want to have "managed configurations". No one wants to manage many servers manually.
Also: Please consider AS and geo diversity when adding a significant amount of tor bw and maybe set yourself an upper boundary as to how big you want to grow.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
To speak to this a bit further both Jessica and I work at places which build out ephemeral infrastructure and you're absolutely correct. There is a bit of nuance here in what each of us means when we say managed configurations. In my case machines have a lifecycle. They come and they go, but if you need to "update" the machine you don't use a tool like Ansible, Puppet, Chef, etc to change the active running state of a host. This isn't to say that they're managed manually. This is truely treating them like the promise of OpenStack of having unicorns and having robots. You completely re-deploy those robots. This is true for bare metal as well as cloud providers.
While this presents a challenge due to the level of trust afforded to a node the longer it has been run, I'm looking to walk before running here.
Thinking of it in terms of an affinity group, each member can attest that they're part of the group, but this is more so that members of other federations know the scope of interaction.
"Lets say you are about to deploy 100 relays within the next week." - Take this an order of magnitude greater and we're on the right track with the correct scale. It is a regular occurrence for our users to deploy 500 to 5000 nodes at a time. This is not the scale that everyone uses, obviously, but in that case generating 1000 relay keys and coordinating that key distribution dance across the same number of nodes (more than likely in highly distributed environments) seems to bring more questions than it answers (securing the keys for those nodes, securely distributing them, etc). When compounding you concern about the network cost for "spare" nodes, I would say this turns it into a no-go as the whole point was to be able to deploy these nodes in the most productive, network friendly mechanism possible.
- --redbeard
On Fri, Mar 4, 2016 at 4:48 PM nusenu nusenu@openmailbox.org wrote:
Regarding the state of family support. I've been working on a project which could be used to expand the number of running relays and have been trying to find the best way to coordinate this so as to make it both obvious who the operator is (which can be done with contact info) as well as to help users avoid building circuits within these related nodes.
In the vein of "playing nicely" with the network my concern is that when running large scale infrastructure one needs to minimize the number of moving pieces possible. Ideally this would allow me (in the best case scenario) to supply a static family identifier en masse minimizing the
need
for managed configurations.
In the worst case scenario (that of an entity trying to launch a sybil attack) the administrator would not even attempt to populate this so as
to
try and appear as separate nodes in the network.
Do folks have suggestions on the best way to "play nice" here?
So you want to have a proper MyFamily configuration across a very high number of relays without reloading them all every time you add a new relay? Why are you worried about these reloads?
The only way I can think of is to preemptively create relay keys.
Lets say you are about to deploy 100 relays within the next week. First you would create 100 relay keys and collect all fingerprints to form the MyFamily string. Then you could use that static string and no reload is required as long as you do not run more than 100 relays.
Depending on how much "idle/spare" fingerprints are in your MyFamily string this might also costs the network an unnecessary overhead.
So adding fingerprints to MyFamily on demand is probably nicer than creating huge descriptors with spare fingerprints just because you do not want to reload your tor instances.
@"minimizing the need for managed configurations"
If you run "large scale infrastructure" you usually want to have "managed configurations". No one wants to manage many servers manually.
Also: Please consider AS and geo diversity when adding a significant amount of tor bw and maybe set yourself an upper boundary as to how big you want to grow.
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
On 5 Mar 2016, at 22:31, Brian redbeard Harrington redbeard@coreos.com wrote:
"Lets say you are about to deploy 100 relays within the next week." - Take this an order of magnitude greater and we're on the right track with the correct scale. It is a regular occurrence for our users to deploy 500 to 5000 nodes at a time. This is not the scale that everyone uses, obviously, but in that case generating 1000 relay keys and coordinating that key distribution dance across the same number of nodes (more than likely in highly distributed environments) seems to bring more questions than it answers (securing the keys for those nodes, securely distributing them, etc). When compounding you concern about the network cost for "spare" nodes, I would say this turns it into a no-go as the whole point was to be able to deploy these nodes in the most productive, network friendly mechanism possible.
If you deploy 1000 relays in a week, people may become quite concerned for the health and security of the network, even if your MyFamily field is consistent.
(And there's likely some limit on MyFamily or on descriptor size that would stop you listing 1000 fingerprints.)
Tim
Tim Wilson-Brown (teor)
teor2345 at gmail dot com PGP 968F094B
teor at blah dot im OTR CAD08081 9755866D 89E2A06F E3558B7F B5A9D14F
Hi,
Maybe this is better taken to tor-relays.
On 03/05/2016 10:31 PM, Brian "redbeard" Harrington wrote:
"Lets say you are about to deploy 100 relays within the next week." - Take this an order of magnitude greater and we're on the right track with the correct scale. It is a regular occurrence for our users to deploy 500 to 5000 nodes at a time.
Interesting. What is the use case for doing that? And why would you want to apply the same strategy to Tor relays? There are about 7000 relays in total, with over 1000 of them (almost 40% of the capacity) at only three ASes.
https://metrics.torproject.org/relayflags.html https://compass.torproject.org/