https://trac.torproject.org/projects/tor/wiki/doc/ResearchEthics
Any number of problems and obstacles to legitimate research areas exist with this...
" Examples of unacceptable research activity
It is not acceptable to run an HSDir, harvest onion addresses, and do a Web crawl of those onion services. Don't set up exit relays to sniff, or tamper with exit traffic. "
Assuming such bans, how is one supposed to legitimately research and report on the real makeup of onion or exit space? These are significant unanswered questions regarding tor use.
Concentrate not on bans, which will be ignored by both legit and illegit researchers anyways, but on proper design for data handling and minimization, particularly being sensitive to the tor users and operators involved lest that data become compromised at any stage before final anonymization and wiping.
https://trac.torproject.org/projects/tor/wiki/doc/ResearchEthics
Any number of problems and obstacles to legitimate research areas exist with this…
I would be interested in any others that you have other than the one you bring up below.
" It is not acceptable to run an HSDir, harvest onion addresses, and do a Web crawl of those onion services. Don't set up exit relays to sniff, or tamper with exit traffic. "
Assuming such bans, how is one supposed to legitimately research and report on the real makeup of onion or exit space? These are significant unanswered questions regarding tor use.
I happen to agree with requesting that nobody do harvesting of onion addresses on HSDirs. Tor will in fact soon make this impossible by changing the descriptors to hide the onion address. In the meantime, relays that are currently observed to do this are being kicked out of the network by the DirAuths. The reason is that people believe that Tor users should be able to run a hidden service without it becoming known to anybody else. This does limit information we can gather about the onion space, but such privacy is exactly what Tor is all about.
Concentrate not on bans, which will be ignored by both legit and illegit researchers anyways, but on proper design for data handling and minimization, particularly being sensitive to the tor users and operators involved lest that data become compromised at any stage before final anonymization and wiping.
There are many types of activity that are “banned”, as you say, and I doubt you disagree with it all. For example, one should not gather data about all the client IPs observed and when they were using Tor. No, we can’t tell if anybody is doing this. That doesn’t mean that Tor can’t request that it never be done. And “legitimate” researchers will absolutely follow community standards. First, because most of them aren’t jerks. Second, because conference program committees and journal editorial boards can and do reject papers for unethical behavior.
Best, Aaron
Forgive the lack of inlining.
I've been meaning to respond to this for a while. For what it's worth, I completely disagree that outright "banning" of certain data collection is the right answer here. There should be a standard "let's weigh the risks vs. the benefits and make a decision" for any/all cases. In most cases, there are ways to perform data collection over Tor (even trying to understand the makeup of hidden services) in a way that does not compromise privacy/security -- e.g., the harvest reports only the "class of the .onion site" and not the actual site itself. This answers the question the researcher is interested in, without compromising or revealing the .onion directly.
On Sun, Oct 4, 2015 at 3:37 AM, Aaron Johnson aaron.m.johnson@nrl.navy.mil wrote:
https://trac.torproject.org/projects/tor/wiki/doc/ResearchEthics
Any number of problems and obstacles to legitimate research areas exist with this…
I would be interested in any others that you have other than the one you bring up below.
" It is not acceptable to run an HSDir, harvest onion addresses, and do a Web crawl of those onion services. Don't set up exit relays to sniff, or tamper with exit traffic. "
Assuming such bans, how is one supposed to legitimately research and report on the real makeup of onion or exit space? These are significant unanswered questions regarding tor use.
I happen to agree with requesting that nobody do harvesting of onion addresses on HSDirs. Tor will in fact soon make this impossible by changing the descriptors to hide the onion address. In the meantime, relays that are currently observed to do this are being kicked out of the network by the DirAuths. The reason is that people believe that Tor users should be able to run a hidden service without it becoming known to anybody else. This does limit information we can gather about the onion space, but such privacy is exactly what Tor is all about.
Concentrate not on bans, which will be ignored by both legit and illegit researchers anyways, but on proper design for data handling and minimization, particularly being sensitive to the tor users and operators involved lest that data become compromised at any stage before final anonymization and wiping.
There are many types of activity that are “banned”, as you say, and I doubt you disagree with it all. For example, one should not gather data about all the client IPs observed and when they were using Tor. No, we can’t tell if anybody is doing this. That doesn’t mean that Tor can’t request that it never be done. And “legitimate” researchers will absolutely follow community standards. First, because most of them aren’t jerks. Second, because conference program committees and journal editorial boards can and do reject papers for unethical behavior.
Best, Aaron _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Hello Rishab,
I've been meaning to respond to this for a while.
Thanks for your thoughts.
For what it's worth, I completely disagree that outright "banning" of certain data collection is the right answer here. There should be a standard "let's weigh the risks vs. the benefits and make a decision" for any/all cases. In most cases, there are ways to perform data collection over Tor (even trying to understand the makeup of hidden services) in a way that does not compromise privacy/security -- e.g., the harvest reports only the "class of the .onion site" and not the actual site itself. This answers the question the researcher is interested in, without compromising or revealing the .onion directly.
I do agree that all cases should be judged in terms of costs and benefits. The idea of that list is to provide specific activities for which the costs are judged not to outweigh the benefits. In this case, the activity is not “collect information about the descriptors you see as an HSDir and then report aggregate statistics”; it is “collect information about the descriptors you see as an HSDir and then connect to those onion addresses that you observe to try and do a Web crawl of them and scrape their content”. The latter is judged to be unacceptable because Tor wants to provide onion-service operators with the ability to run an onion service privately, and definitely without having to deal with crawlers or other snooping parties.
I actually think a list with specific examples is far more useful than a set of abstract criteria that can easily be interpreted to be consistent with the goals of the interpreter.
Best, Aaron
On 9 Oct 2015, at 01:21, Aaron Johnson aaron.m.johnson@nrl.navy.mil wrote:
Hello Rishab,
I've been meaning to respond to this for a while.
Thanks for your thoughts.
For what it's worth, I completely disagree that outright "banning" of certain data collection is the right answer here. There should be a standard "let's weigh the risks vs. the benefits and make a decision" for any/all cases. In most cases, there are ways to perform data collection over Tor (even trying to understand the makeup of hidden services) in a way that does not compromise privacy/security -- e.g., the harvest reports only the "class of the .onion site" and not the actual site itself. This answers the question the researcher is interested in, without compromising or revealing the .onion directly.
I do agree that all cases should be judged in terms of costs and benefits. The idea of that list is to provide specific activities for which the costs are judged not to outweigh the benefits. In this case, the activity is not “collect information about the descriptors you see as an HSDir and then report aggregate statistics”; it is “collect information about the descriptors you see as an HSDir and then connect to those onion addresses that you observe to try and do a Web crawl of them and scrape their content”. The latter is judged to be unacceptable because Tor wants to provide onion-service operators with the ability to run an onion service privately, and definitely without having to deal with crawlers or other snooping parties.
I also wonder about the risk presented by such a concentration of .onion site addresses (or .onion site requests, if the addresses are never recorded anywhere). If an adversary accesses the researcher’s list, or is observing the researcher’s connection, or is observing the .onion site’s connection, how much does this increase the risk of discovering the site?
For example, if a site’s threat mitigation involves it being accessed a certain (small) number of times, and then changing address, crawlers could represent an unacceptable burden on the site’s operator and legitimate users.
Tim
Tim Wilson-Brown (teor)
teor2345 at gmail dot com PGP 968F094B
teor at blah dot im OTR CAD08081 9755866D 89E2A06F E3558B7F B5A9D14F
The idea of that list is to provide specific activities for which the costs are judged not to outweigh the benefits.
Sorry, that should have been "for which the costs are judged *to* outweigh the benefits”.
Also, I should have mentioned that even being on that list wouldn't necessarily be the final word. There could very well be benefits that weren’t properly appreciated. However, observing that your plans are on the list will hopefully help you realize that your plans are potentially dangerous to users, that they are likely to be opposed by key members of the Tor community without effort to change minds beforehand, and that Tor network operators may already be blacklisting relays that are observed participating in such activity. I think that will be a very helpful kind of communication between Tor and researchers that doesn’t exist today.
Best, Aaron
It seems the networking research community has already formed an ethics review board: https://www.ethicalresearch.org/efp/netsec/. Nick Feamster and Philipp Winter are on the board.
Maybe Tor can recommend this for researchers that wish to use Tor instead of forming its own review group?
Aaron
On Oct 8, 2015, at 10:33 AM, Aaron Johnson aaron.m.johnson@nrl.navy.mil wrote:
The idea of that list is to provide specific activities for which the costs are judged not to outweigh the benefits.
Sorry, that should have been "for which the costs are judged *to* outweigh the benefits”.
Also, I should have mentioned that even being on that list wouldn't necessarily be the final word. There could very well be benefits that weren’t properly appreciated. However, observing that your plans are on the list will hopefully help you realize that your plans are potentially dangerous to users, that they are likely to be opposed by key members of the Tor community without effort to change minds beforehand, and that Tor network operators may already be blacklisting relays that are observed participating in such activity. I think that will be a very helpful kind of communication between Tor and researchers that doesn’t exist today.
Best, Aaron
On 20 Oct 2015, at 01:30, Aaron Johnson aaron.m.johnson@nrl.navy.mil wrote:
It seems the networking research community has already formed an ethics review board: https://www.ethicalresearch.org/efp/netsec/. Nick Feamster and Philipp Winter are on the board.
Maybe Tor can recommend this for researchers that wish to use Tor instead of forming its own review group?
The panel encourages its panellists to "use existing codes of conduct".
Regardless of whether we decide to refer researchers to the panel, if we share the Tor Ethical Guidelines with the panel, any Tor-related research they do review will be reviewed in that context.
Tim
On Oct 8, 2015, at 10:33 AM, Aaron Johnson aaron.m.johnson@nrl.navy.mil wrote:
The idea of that list is to provide specific activities for which the costs are judged not to outweigh the benefits.
Sorry, that should have been "for which the costs are judged *to* outweigh the benefits”.
Also, I should have mentioned that even being on that list wouldn't necessarily be the final word. There could very well be benefits that weren’t properly appreciated. However, observing that your plans are on the list will hopefully help you realize that your plans are potentially dangerous to users, that they are likely to be opposed by key members of the Tor community without effort to change minds beforehand, and that Tor network operators may already be blacklisting relays that are observed participating in such activity. I think that will be a very helpful kind of communication between Tor and researchers that doesn’t exist today.
Best, Aaron
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev