Forgive the lack of inlining.
I've been meaning to respond to this for a while. For what it's worth, I completely disagree that outright "banning" of certain data collection is the right answer here. There should be a standard "let's weigh the risks vs. the benefits and make a decision" for any/all cases. In most cases, there are ways to perform data collection over Tor (even trying to understand the makeup of hidden services) in a way that does not compromise privacy/security -- e.g., the harvest reports only the "class of the .onion site" and not the actual site itself. This answers the question the researcher is interested in, without compromising or revealing the .onion directly.
On Sun, Oct 4, 2015 at 3:37 AM, Aaron Johnson aaron.m.johnson@nrl.navy.mil wrote:
https://trac.torproject.org/projects/tor/wiki/doc/ResearchEthics
Any number of problems and obstacles to legitimate research areas exist with this…
I would be interested in any others that you have other than the one you bring up below.
" It is not acceptable to run an HSDir, harvest onion addresses, and do a Web crawl of those onion services. Don't set up exit relays to sniff, or tamper with exit traffic. "
Assuming such bans, how is one supposed to legitimately research and report on the real makeup of onion or exit space? These are significant unanswered questions regarding tor use.
I happen to agree with requesting that nobody do harvesting of onion addresses on HSDirs. Tor will in fact soon make this impossible by changing the descriptors to hide the onion address. In the meantime, relays that are currently observed to do this are being kicked out of the network by the DirAuths. The reason is that people believe that Tor users should be able to run a hidden service without it becoming known to anybody else. This does limit information we can gather about the onion space, but such privacy is exactly what Tor is all about.
Concentrate not on bans, which will be ignored by both legit and illegit researchers anyways, but on proper design for data handling and minimization, particularly being sensitive to the tor users and operators involved lest that data become compromised at any stage before final anonymization and wiping.
There are many types of activity that are “banned”, as you say, and I doubt you disagree with it all. For example, one should not gather data about all the client IPs observed and when they were using Tor. No, we can’t tell if anybody is doing this. That doesn’t mean that Tor can’t request that it never be done. And “legitimate” researchers will absolutely follow community standards. First, because most of them aren’t jerks. Second, because conference program committees and journal editorial boards can and do reject papers for unethical behavior.
Best, Aaron _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev