Hey Virgil,
While I know you and I have talked about this in private recently, it seems like a good time to table this discussion for a couple of weeks. Considering everything else that's going on, this might not be the ideal time for everyone to contribute to the discussion.
<3
Griffin
Virgil Griffith wrote:
Here's yet another data point indicating the policy on crawling .onion
needs to be clarified. The new and popular OnionStats tool doesn't
even respect /robots.txt, see:
https://onionscan.org/reports/may2016.html
So now we have *three* different positions among respected members of
the Tor community.
(1) isis et al: robots.txt is insufficient
--- "Consent is not the absence of saying 'no' — it is explicitly
saying 'yes'."
(2) onionlink/ahmia/notevil/grams: we respect robots.txt
--- "Default is yes, but you can always opt-out."
(3) onionstats/memex: we ignore robots.txt
--- "Don't care even if you opt-out."
-V
On Wed, Jun 8, 2016 at 1:34 AM, Virgil Griffith <i@virgil.gr> wrote:
Hello all.https://lists.torproject.org/pipermail/tor-project/2016-May/000411.html
I wrote on this topic earlier at:
Given the lack of prior reaction as well as ahmia.fi [1] getting
This is me again asking for clarification. I choose this issue
because it is the most self-contained of the various ones raised by
isis et al, and it seemed wise to clarify this becoming opening up a
new one. If someone from Tor management writes me that social
reasons prohibit search engines from being addressed at this time, I
will drop it.
funded for GSoC (ahmia has followed /robots.txt from day zero), I
tentatively conclude this crawling .onion is non-controversial,
i.e., "Per Tor community standards, search engines obeying
robots.txt are a-okay. Equivalently, indexing .onion content is
treated equivalently as any other part of the web."
But, to motivate as well as give any concerned parties an
opportunity to be hard, I have republished the onion2bitcoin as well
as the bitcoin2onion anonymizing only the final 4 characters of the
.onion address instead of final 8.
-- http://virgil.gr/wp-content/uploads/2016/06/onion2btc.html
-- http://virgil.gr/wp-content/uploads/2016/06/btc2onion.html
-V
On Tue, May 31, 2016 at 10:05 PM, Virgil Griffith <i@virgil.gr>
wrote:
This seems like something people would have opinions on. Anyone?
-V
On Monday, 30 May 2016, Virgil Griffith <i@virgil.gr> wrote:
Hello all.
I am preparing a longer response to the issues Isis et al mentioned.
Most are interrelated, but this one is not. And I wanted to get
clarification on it.
Isis expressed a concern about making a list of bitcoin addresses
from .onion, citing, "Consent is not the absence of saying 'no' —
it is explicitly saying 'yes'."
For what it's worth, ahmia.fi [1] actually supports regex searching
right out of the box. In fact, a single line of JSON spits out all
known bitcoin addresses ahmia knows about.
For example, here's an anonymized list going .onion -> BTC which I
mined from Ahmia,
* http://virgil.gr/wp-content/uploads/2016/05/btc-on-dot-onion.html
[6MB]
And here's the same information going BTC -> .onion
* http://virgil.gr/wp-content/uploads/2016/05/btc2domains.v2.txt
[2mb]
If you want to check the results you can ask Juha for the JSON query
to do this.
Lets go out on a limb and assume that regexs are okay. Is the issue
then .onion search-engines? I understand Isis's preference for
there to always be affirmative consent but does that mean that until
such a standard exists all search engines from onion.link, ahmia.fi
[1], MEMEX, NotEvil, and Grams are violating official Tor community
policy?
----
Here's how I currently see this. I put on my amateur legal hat and
say, "Well, the Internet/world-wide-web is considered a public
space. Onion-sites are like the web, but with masked speakers."
*
Links:https://www.hks.harvard.edu/m-rcbg/research/j.camp_acm.computer_internet.as.public.space.pdf
* http://aims.muohio.edu/2011/02/01/is-the-internet-a-public-space/
Ergo, I would argue that, by default, content on .onion is public
the same way everything else on the web is. If you don't want to be
"indexed", for physical spaces you go in-doors, or for the web you
put up a login. As an aside, the web-standard is actually *kinder*
than physical public spaces because on the web one can have an
unobstrusive /robots.txt saying, "please don't index me". Which is
a great thing.
Whereas some would say Tor users are "anonymous", others would
instead say any and everything Tor is "private". I believe this
needs to be clarified. I once proposed to Roger that he delineate
the sub-types of privacy in the same way Stallman delineated his
"Four Freedoms". Roger replied that he preferred using the broad
catch-all term "Privacy". These confusions may be a caveat of using
a broad catch-all term. Interpreting broadly, Isis is correct.
However, this conclusion has a lot of unpleasant ramifications.
Comments appreciated,
-V
P.S. Mildly related, I saw this today involving DARPA, and Tor.
http://thehackernews.com/2016/05/darpa-trace-hacker.html
"""
The aim of Enhanced Attribution program is to track personas
continuously and create “algorithms for developing predictive
behavioral profiles.”
"""
I hope you all are aware this flows directly from MEMEX. Right?
This, and MEMEX, seems a much more appropriate target for outrage.
A lot of this work that numerous community members have worked on
gives even me pause.
------
[1] http://ahmia.fi
_______________________________________________
tor-project mailing list
tor-project@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-project
--
There are 10 kinds of people in the world: those who understand binary, those who don't, and people who didn't expect a base 3 joke.
_______________________________________________
tor-project mailing list
tor-project@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-project