> It feels like you're engaging in rules lawyering, trying to find a policy or statement that will let you do what you want to do.
Apologies---that's unintended. I simply seek a policy or statement that clarifies this issue one way or the other. If the community wants to explicitly ban .onion search engines, that is their right. I personally consider such a ban to be immensely unwise, but I would be satisfied with a clarification either way. Right now it's in a funny limbo that seemingly no one is willing to resolve (aside from yourself. Thanks BTW.)
====================================
> Please engage with people's concerns instead.
I'm happy to calmly discuss people's concerns about onion.link and tor2web privacy, but I insist on clarifying the relatively easy robots.txt issue first. Talking about Virgil-specifics or whether Virgil-is-a-tolerable-person is currently a distraction. Because if we conclude that robots.txt is fully sufficient, and thus .onion content is by default "public data", then the whether virgil-is-tolerable discussion changes drastically. If robots.txt is deemed a sufficient standard, then it's worth going forward on a longer discussion where I hope to clarify the judgement calls I've made.
> When I've seen people talk about "crawling .onion sites", the issue that has
> received the most focus is the harvesting of .onion addresses by running a
> malicious HSDir. We do things to prevent this behaviour, including
> blacklisting HSDirs. This behaviour is clearly unethical, there is a
> community consensus about it, and we invest resources in preventing it.
Sure. No complaints here.
> As for accessing .onion sites via an automated process or non-anonymous proxy
> (e.g. Tor2web), that's something we're still talking about. There are
> significant issues around client anonymity, server anonymity, and access to
> sensitive data. We might decide we want to actively prevent it. We might
> decide we don't want to put any effort into supporting it in future.
> There's also the issue of searching these sites. Perhaps some kinds of search
> are ok, but others are too powerful (like regular expressions, which many
> search sites avoid). Again, this is something we're discussing.
This is me imploring, begging, to have that discussion on search engines, regexes, etc. I've yet to find any argument for position (A), which as far as I can tell is the position currently enshrined in the ethics guidelines. This is me asking for either an argument for position (A), or a clarification that robots.txt is fine.
=====
Even though I said I didn't want to get into tor2web until the robots.txt is largely addressed, I'm going to discuss it briefly just as an olive branch.
> I don't know if I'd trust you to be in a position where you see client
> requests.
> I'm not sure I'd even trust you to run a Guard node, and Tor2web admins see
> far more than a Guard node does.
This is interesting. Because I actually consider a Guard node to have more private information than a Tor2web node.
I claim two things:
(1) Whereas people use TBB for *things that matter* and have an expectation of privacy. I claim that tor2web users are interested in convenience and have little expectation of privacy. I see negligible difference between what onion.link does and what Twitter does when they write URLs to goto
t.co so they can record on the clicks.
To put it another way, I do not consider Tor2web users to be "Tor users".
(2) Using the same logic as (1), I would argue Tor2web sees *less* private information than a Tor guard node. A guard node is half of the map to users who have explicitly said, "I wish my traffic to be unlinkable". Violating this would obviously be an "attack on Tor users". Offerring logs for a guard node would be zomg a violation of expectation of privacy and a damage to the network. I am 110% on board here. I wholly support banning anyone from the community who sells logs from TBB users.
-----
As an aside:
Already do it. I also recently enabled DNSSEC because some european ISPs were doing DNS poisoning and I wanted to stop them from doing that.
> Normally I'd be concerned you use Google Analytics rather than a local
> analytics solution.
I've removed the Google Analytics. It'll go out in the next weekly release.
===========
The other issues you cited are worth discussing, and I welcome having them. But I want to resolve the comparatively easy robots.txt discussion first. I was asked to wait a month, and I did so. Can now we have that discussion? Or does it have to postpone another month? To kickstart the discussion, I gave the three vidws I've heard:
> (A) isis et al: robots.txt is insufficient
I am imploring for there to be discussion arguing (A), (B), (C), or (D) other. Thus far we've gotten an argument for (A) from Isis and an argument for (B) from Juha.`
-V