It feels like you're engaging in rules lawyering, trying to find a policy
or statement that will let you do what you want to do.
Apologies---that's unintended. I simply seek a policy or statement that clarifies this issue one way or the other. If the community wants to explicitly ban .onion search engines, that is their right. I personally consider such a ban to be immensely unwise, but I would be satisfied with a clarification either way. Right now it's in a funny limbo that seemingly no one is willing to resolve (aside from yourself. Thanks BTW.)
====================================
Please engage with people's concerns instead.
I'm happy to calmly discuss people's concerns about onion.link and tor2web privacy, but I insist on clarifying the relatively easy robots.txt issue first. Talking about Virgil-specifics or whether Virgil-is-a-tolerable-person is currently a distraction. Because if we conclude that robots.txt is fully sufficient, and thus .onion content is by default "public data", then the whether virgil-is-tolerable discussion changes drastically. If robots.txt is deemed a sufficient standard, then it's worth going forward on a longer discussion where I hope to clarify the judgement calls I've made.
When I've seen people talk about "crawling .onion sites", the issue that
has
received the most focus is the harvesting of .onion addresses by running
a
malicious HSDir. We do things to prevent this behaviour, including blacklisting HSDirs. This behaviour is clearly unethical, there is a community consensus about it, and we invest resources in preventing it.
Sure. No complaints here.
As for accessing .onion sites via an automated process or non-anonymous
proxy
(e.g. Tor2web), that's something we're still talking about. There are significant issues around client anonymity, server anonymity, and access
to
sensitive data. We might decide we want to actively prevent it. We might decide we don't want to put any effort into supporting it in future.
There's also the issue of searching these sites. Perhaps some kinds of
search
are ok, but others are too powerful (like regular expressions, which many search sites avoid). Again, this is something we're discussing.
This is me imploring, begging, to have that discussion on search engines, regexes, etc. I've yet to find any argument for position (A), which as far as I can tell is the position currently enshrined in the ethics guidelines. This is me asking for either an argument for position (A), or a clarification that robots.txt is fine.
===== Even though I said I didn't want to get into tor2web until the robots.txt is largely addressed, I'm going to discuss it briefly just as an olive branch.
I don't know if I'd trust you to be in a position where you see client requests. I'm not sure I'd even trust you to run a Guard node, and Tor2web admins
see
far more than a Guard node does.
This is interesting. Because I actually consider a Guard node to have more private information than a Tor2web node. I claim two things:
(1) Whereas people use TBB for *things that matter* and have an expectation of privacy. I claim that tor2web users are interested in convenience and have little expectation of privacy. I see negligible difference between what onion.link does and what Twitter does when they write URLs to goto t.co so they can record on the clicks.
To put it another way, I do not consider Tor2web users to be "Tor users".
(2) Using the same logic as (1), I would argue Tor2web sees *less* private information than a Tor guard node. A guard node is half of the map to users who have explicitly said, "I wish my traffic to be unlinkable". Violating this would obviously be an "attack on Tor users". Offerring logs for a guard node would be zomg a violation of expectation of privacy and a damage to the network. I am 110% on board here. I wholly support banning anyone from the community who sells logs from TBB users.
-----
As an aside:
You might want to enable automatic redirects from http://onion.link to https://onion.link.
Already do it. I also recently enabled DNSSEC because some european ISPs were doing DNS poisoning and I wanted to stop them from doing that.
Normally I'd be concerned you use Google Analytics rather than a local analytics solution.
I've removed the Google Analytics. It'll go out in the next weekly release.
===========
The other issues you cited are worth discussing, and I welcome having them. But I want to resolve the comparatively easy robots.txt discussion first. I was asked to wait a month, and I did so. Can now we have that discussion? Or does it have to postpone another month? To kickstart the discussion, I gave the three vidws I've heard:
(A) isis et al: robots.txt is insufficient --- "Consent is not the absence of saying 'no' — it is explicitly saying
'yes'."
(B) onionlink/ahmia/notevil/grams: we respect robots.txt --- "Default is yes, but you can always opt-out."
(C) onionstats/memex: we ignore robots.txt --- "Don't care even if you opt-out." (see
https://onionscan.org/reports/may2016.html)
Isis did a good job arguing for (A) by claiming that representing (B) and
(C) are "blatant and disgusting workaround[s] to the trust and expectations which onion service operators place in the network." https://lists.torproject.org/pipermail/tor-project/2016-May/000356.html
This is me arguing for (B):
https://lists.torproject.org/pipermail/tor-project/2016-May/000411.html
I have no link arguing for (C).
I am imploring for there to be discussion arguing (A), (B), (C), or (D) other. Thus far we've gotten an argument for (A) from Isis and an argument for (B) from Juha.`
-V