Ethics Guidelines; crawling .onion

List overview All Threads
Download

newer

older

questions about new community...

Re: [tor-project] Who is attending...

Virgil Griffith

30 May 2016 30 May '16

8:38 a.m.

Hello all.

I am preparing a longer response to the issues Isis et al mentioned. Most are interrelated, but this one is not. And I wanted to get clarification on it.

Isis expressed a concern about making a list of bitcoin addresses from .onion, citing, "Consent is not the absence of saying 'no' — it is explicitly saying 'yes'."

For what it's worth, ahmia.fi actually supports regex searching right out of the box. In fact, a single line of JSON spits out all known bitcoin addresses ahmia knows about.

For example, here's an anonymized list going .onion -> BTC which I mined from Ahmia, * http://virgil.gr/wp-content/uploads/2016/05/btc-on-dot-onion.html [6MB]

And here's the same information going BTC -> .onion * http://virgil.gr/wp-content/uploads/2016/05/btc2domains.v2.txt [2mb]

If you want to check the results you can ask Juha for the JSON query to do this.

Lets go out on a limb and assume that regexs are okay. Is the issue then .onion search-engines? I understand Isis's preference for there to always be affirmative consent but does that mean that until such a standard exists all search engines from onion.link, ahmia.fi, MEMEX, NotEvil, and Grams are violating official Tor community policy?

---- Here's how I currently see this. I put on my amateur legal hat and say, "Well, the Internet/world-wide-web is considered a public space. Onion-sites are like the web, but with masked speakers."

* https://www.hks.harvard.edu/m-rcbg/research/j.camp_acm.computer_internet.as.... * http://aims.muohio.edu/2011/02/01/is-the-internet-a-public-space/

Ergo, I would argue that, by default, content on .onion is public the same way everything else on the web is. If you don't want to be "indexed", for physical spaces you go in-doors, or for the web you put up a login. As an aside, the web-standard is actually *kinder* than physical public spaces because on the web one can have an unobstrusive /robots.txt saying, "please don't index me". Which is a great thing.

Whereas some would say Tor users are "anonymous", others would instead say any and everything Tor is "private". I believe this needs to be clarified. I once proposed to Roger that he delineate the sub-types of privacy in the same way Stallman delineated his "Four Freedoms". Roger replied that he preferred using the broad catch-all term "Privacy". These confusions may be a caveat of using a broad catch-all term. Interpreting broadly, Isis is correct. However, this conclusion has a lot of unpleasant ramifications.

Comments appreciated, -V

P.S. Mildly related, I saw this today involving DARPA, and Tor. http://thehackernews.com/2016/05/darpa-trace-hacker.html

""" The aim of Enhanced Attribution program is to track personas continuously and create “algorithms for developing predictive behavioral profiles.” """

I hope you all are aware this flows directly from MEMEX. Right? This, and MEMEX, seems a much more appropriate target for outrage. A lot of this work that numerous community members have worked on gives even me pause.

Attachments:

attachment.html (text/html — 4.4 KB)

Show replies by date

Virgil Griffith

31 May 31 May

2:05 p.m.

This seems like something people would have opinions on. Anyone?

-V

On Monday, 30 May 2016, Virgil Griffith i@virgil.gr wrote:

...

Hello all.

I am preparing a longer response to the issues Isis et al mentioned. Most are interrelated, but this one is not. And I wanted to get clarification on it.

Isis expressed a concern about making a list of bitcoin addresses from .onion, citing, "Consent is not the absence of saying 'no' — it is explicitly saying 'yes'."

For what it's worth, ahmia.fi actually supports regex searching right out of the box. In fact, a single line of JSON spits out all known bitcoin addresses ahmia knows about.

For example, here's an anonymized list going .onion -> BTC which I mined from Ahmia,

http://virgil.gr/wp-content/uploads/2016/05/btc-on-dot-onion.html [6MB]

And here's the same information going BTC -> .onion

http://virgil.gr/wp-content/uploads/2016/05/btc2domains.v2.txt [2mb]

If you want to check the results you can ask Juha for the JSON query to do this.

Lets go out on a limb and assume that regexs are okay. Is the issue then .onion search-engines? I understand Isis's preference for there to always be affirmative consent but does that mean that until such a standard exists all search engines from onion.link, ahmia.fi, MEMEX, NotEvil, and Grams are violating official Tor community policy?

Here's how I currently see this. I put on my amateur legal hat and say, "Well, the Internet/world-wide-web is considered a public space. Onion-sites are like the web, but with masked speakers."

https://www.hks.harvard.edu/m-rcbg/research/j.camp_acm.computer_internet.as....

http://aims.muohio.edu/2011/02/01/is-the-internet-a-public-space/

Ergo, I would argue that, by default, content on .onion is public the same way everything else on the web is. If you don't want to be "indexed", for physical spaces you go in-doors, or for the web you put up a login. As an aside, the web-standard is actually *kinder* than physical public spaces because on the web one can have an unobstrusive /robots.txt saying, "please don't index me". Which is a great thing.

Whereas some would say Tor users are "anonymous", others would instead say any and everything Tor is "private". I believe this needs to be clarified. I once proposed to Roger that he delineate the sub-types of privacy in the same way Stallman delineated his "Four Freedoms". Roger replied that he preferred using the broad catch-all term "Privacy". These confusions may be a caveat of using a broad catch-all term. Interpreting broadly, Isis is correct. However, this conclusion has a lot of unpleasant ramifications.

Comments appreciated, -V

P.S. Mildly related, I saw this today involving DARPA, and Tor. http://thehackernews.com/2016/05/darpa-trace-hacker.html

""" The aim of Enhanced Attribution program is to track personas continuously and create “algorithms for developing predictive behavioral profiles.” """

I hope you all are aware this flows directly from MEMEX. Right? This, and MEMEX, seems a much more appropriate target for outrage. A lot of this work that numerous community members have worked on gives even me pause.

Virgil Griffith

7 Jun 7 Jun

5:34 p.m.

Hello all.

I wrote on this topic earlier at:

https://lists.torproject.org/pipermail/tor-project/2016-May/000411.html

This is me again asking for clarification. I choose this issue because it is the most self-contained of the various ones raised by isis et al, and it seemed wise to clarify this becoming opening up a new one. If someone from Tor management writes me that social reasons prohibit search engines from being addressed at this time, I will drop it.

Given the lack of prior reaction as well as ahmia.fi getting funded for GSoC (ahmia has followed /robots.txt from day zero), I tentatively conclude this crawling .onion is non-controversial, i.e., "Per Tor community standards, search engines obeying robots.txt are a-okay. Equivalently, indexing .onion content is treated equivalently as any other part of the web."

But, to motivate as well as give any concerned parties an opportunity to be hard, I have republished the onion2bitcoin as well as the bitcoin2onion anonymizing only the final 4 characters of the .onion address instead of final 8.

-- http://virgil.gr/wp-content/uploads/2016/06/onion2btc.html -- http://virgil.gr/wp-content/uploads/2016/06/btc2onion.html

-V

On Tue, May 31, 2016 at 10:05 PM, Virgil Griffith i@virgil.gr wrote:

...

This seems like something people would have opinions on. Anyone?

-V

On Monday, 30 May 2016, Virgil Griffith i@virgil.gr wrote:

...
Hello all.

I am preparing a longer response to the issues Isis et al mentioned. Most are interrelated, but this one is not. And I wanted to get clarification on it.

Isis expressed a concern about making a list of bitcoin addresses from .onion, citing, "Consent is not the absence of saying 'no' — it is explicitly saying 'yes'."

For what it's worth, ahmia.fi actually supports regex searching right out of the box. In fact, a single line of JSON spits out all known bitcoin addresses ahmia knows about.

For example, here's an anonymized list going .onion -> BTC which I mined from Ahmia,

http://virgil.gr/wp-content/uploads/2016/05/btc-on-dot-onion.html

[6MB]

And here's the same information going BTC -> .onion

http://virgil.gr/wp-content/uploads/2016/05/btc2domains.v2.txt [2mb]

If you want to check the results you can ask Juha for the JSON query to do this.

Lets go out on a limb and assume that regexs are okay. Is the issue then .onion search-engines? I understand Isis's preference for there to always be affirmative consent but does that mean that until such a standard exists all search engines from onion.link, ahmia.fi, MEMEX, NotEvil, and Grams are violating official Tor community policy?

Here's how I currently see this. I put on my amateur legal hat and say, "Well, the Internet/world-wide-web is considered a public space. Onion-sites are like the web, but with masked speakers."

https://www.hks.harvard.edu/m-rcbg/research/j.camp_acm.computer_internet.as....

http://aims.muohio.edu/2011/02/01/is-the-internet-a-public-space/

Ergo, I would argue that, by default, content on .onion is public the same way everything else on the web is. If you don't want to be "indexed", for physical spaces you go in-doors, or for the web you put up a login. As an aside, the web-standard is actually *kinder* than physical public spaces because on the web one can have an unobstrusive /robots.txt saying, "please don't index me". Which is a great thing.

Whereas some would say Tor users are "anonymous", others would instead say any and everything Tor is "private". I believe this needs to be clarified. I once proposed to Roger that he delineate the sub-types of privacy in the same way Stallman delineated his "Four Freedoms". Roger replied that he preferred using the broad catch-all term "Privacy". These confusions may be a caveat of using a broad catch-all term. Interpreting broadly, Isis is correct. However, this conclusion has a lot of unpleasant ramifications.

Comments appreciated, -V

P.S. Mildly related, I saw this today involving DARPA, and Tor. http://thehackernews.com/2016/05/darpa-trace-hacker.html

""" The aim of Enhanced Attribution program is to track personas continuously and create “algorithms for developing predictive behavioral profiles.” """

I hope you all are aware this flows directly from MEMEX. Right? This, and MEMEX, seems a much more appropriate target for outrage. A lot of this work that numerous community members have worked on gives even me pause.

Virgil Griffith

8 Jun 8 Jun

7:29 a.m.

Here's yet another data point indicating the policy on crawling .onion needs to be clarified. The new and popular OnionStats tool doesn't even respect /robots.txt, see: https://onionscan.org/reports/may2016.html

So now we have *three* different positions among respected members of the Tor community.

(1) isis et al: robots.txt is insufficient --- "Consent is not the absence of saying 'no' — it is explicitly saying 'yes'."

(2) onionlink/ahmia/notevil/grams: we respect robots.txt --- "Default is yes, but you can always opt-out."

(3) onionstats/memex: we ignore robots.txt --- "Don't care even if you opt-out."

-V

On Wed, Jun 8, 2016 at 1:34 AM, Virgil Griffith i@virgil.gr wrote:

...

Hello all.

I wrote on this topic earlier at:

https://lists.torproject.org/pipermail/tor-project/2016-May/000411.html

This is me again asking for clarification. I choose this issue because it is the most self-contained of the various ones raised by isis et al, and it seemed wise to clarify this becoming opening up a new one. If someone from Tor management writes me that social reasons prohibit search engines from being addressed at this time, I will drop it.

Given the lack of prior reaction as well as ahmia.fi getting funded for GSoC (ahmia has followed /robots.txt from day zero), I tentatively conclude this crawling .onion is non-controversial, i.e., "Per Tor community standards, search engines obeying robots.txt are a-okay. Equivalently, indexing .onion content is treated equivalently as any other part of the web."

But, to motivate as well as give any concerned parties an opportunity to be hard, I have republished the onion2bitcoin as well as the bitcoin2onion anonymizing only the final 4 characters of the .onion address instead of final 8.

-- http://virgil.gr/wp-content/uploads/2016/06/onion2btc.html -- http://virgil.gr/wp-content/uploads/2016/06/btc2onion.html

-V

On Tue, May 31, 2016 at 10:05 PM, Virgil Griffith i@virgil.gr wrote:

...
This seems like something people would have opinions on. Anyone?

-V

On Monday, 30 May 2016, Virgil Griffith i@virgil.gr wrote:

...
Hello all.

I am preparing a longer response to the issues Isis et al mentioned. Most are interrelated, but this one is not. And I wanted to get clarification on it.

Isis expressed a concern about making a list of bitcoin addresses from .onion, citing, "Consent is not the absence of saying 'no' — it is explicitly saying 'yes'."

For what it's worth, ahmia.fi actually supports regex searching right out of the box. In fact, a single line of JSON spits out all known bitcoin addresses ahmia knows about.

For example, here's an anonymized list going .onion -> BTC which I mined from Ahmia,

http://virgil.gr/wp-content/uploads/2016/05/btc-on-dot-onion.html

[6MB]

And here's the same information going BTC -> .onion

http://virgil.gr/wp-content/uploads/2016/05/btc2domains.v2.txt [2mb]

If you want to check the results you can ask Juha for the JSON query to do this.

Lets go out on a limb and assume that regexs are okay. Is the issue then .onion search-engines? I understand Isis's preference for there to always be affirmative consent but does that mean that until such a standard exists all search engines from onion.link, ahmia.fi, MEMEX, NotEvil, and Grams are violating official Tor community policy?

Here's how I currently see this. I put on my amateur legal hat and say, "Well, the Internet/world-wide-web is considered a public space. Onion-sites are like the web, but with masked speakers."

https://www.hks.harvard.edu/m-rcbg/research/j.camp_acm.computer_internet.as....

http://aims.muohio.edu/2011/02/01/is-the-internet-a-public-space/

Ergo, I would argue that, by default, content on .onion is public the same way everything else on the web is. If you don't want to be "indexed", for physical spaces you go in-doors, or for the web you put up a login. As an aside, the web-standard is actually *kinder* than physical public spaces because on the web one can have an unobstrusive /robots.txt saying, "please don't index me". Which is a great thing.

Whereas some would say Tor users are "anonymous", others would instead say any and everything Tor is "private". I believe this needs to be clarified. I once proposed to Roger that he delineate the sub-types of privacy in the same way Stallman delineated his "Four Freedoms". Roger replied that he preferred using the broad catch-all term "Privacy". These confusions may be a caveat of using a broad catch-all term. Interpreting broadly, Isis is correct. However, this conclusion has a lot of unpleasant ramifications.

Comments appreciated, -V

P.S. Mildly related, I saw this today involving DARPA, and Tor. http://thehackernews.com/2016/05/darpa-trace-hacker.html

""" The aim of Enhanced Attribution program is to track personas continuously and create “algorithms for developing predictive behavioral profiles.” """

I hope you all are aware this flows directly from MEMEX. Right? This, and MEMEX, seems a much more appropriate target for outrage. A lot of this work that numerous community members have worked on gives even me pause.

Griffin Boyce

7:55 a.m.

Hey Virgil,

While I know you and I have talked about this in private recently, it seems like a good time to table this discussion for a couple of weeks. Considering everything else that's going on, this might not be the ideal time for everyone to contribute to the discussion.

<3 Griffin

Virgil Griffith wrote:

...

Here's yet another data point indicating the policy on crawling .onion needs to be clarified. The new and popular OnionStats tool doesn't even respect /robots.txt, see: https://onionscan.org/reports/may2016.html

So now we have *three* different positions among respected members of the Tor community.

(1) isis et al: robots.txt is insufficient --- "Consent is not the absence of saying 'no' — it is explicitly saying 'yes'."

(2) onionlink/ahmia/notevil/grams: we respect robots.txt --- "Default is yes, but you can always opt-out."

(3) onionstats/memex: we ignore robots.txt --- "Don't care even if you opt-out."

-V

On Wed, Jun 8, 2016 at 1:34 AM, Virgil Griffith i@virgil.gr wrote:

...
Hello all.

I wrote on this topic earlier at:

https://lists.torproject.org/pipermail/tor-project/2016-May/000411.html

...
This is me again asking for clarification. I choose this issue because it is the most self-contained of the various ones raised by isis et al, and it seemed wise to clarify this becoming opening up a new one. If someone from Tor management writes me that social reasons prohibit search engines from being addressed at this time, I will drop it.

Given the lack of prior reaction as well as ahmia.fi [1] getting funded for GSoC (ahmia has followed /robots.txt from day zero), I tentatively conclude this crawling .onion is non-controversial, i.e., "Per Tor community standards, search engines obeying robots.txt are a-okay. Equivalently, indexing .onion content is treated equivalently as any other part of the web."

But, to motivate as well as give any concerned parties an opportunity to be hard, I have republished the onion2bitcoin as well as the bitcoin2onion anonymizing only the final 4 characters of the .onion address instead of final 8.

-- http://virgil.gr/wp-content/uploads/2016/06/onion2btc.html -- http://virgil.gr/wp-content/uploads/2016/06/btc2onion.html

-V

On Tue, May 31, 2016 at 10:05 PM, Virgil Griffith i@virgil.gr wrote: This seems like something people would have opinions on. Anyone?

-V

On Monday, 30 May 2016, Virgil Griffith i@virgil.gr wrote:

Hello all.

I am preparing a longer response to the issues Isis et al mentioned. Most are interrelated, but this one is not. And I wanted to get clarification on it.

Isis expressed a concern about making a list of bitcoin addresses from .onion, citing, "Consent is not the absence of saying 'no' — it is explicitly saying 'yes'."

For what it's worth, ahmia.fi [1] actually supports regex searching right out of the box. In fact, a single line of JSON spits out all known bitcoin addresses ahmia knows about.

For example, here's an anonymized list going .onion -> BTC which I mined from Ahmia,

http://virgil.gr/wp-content/uploads/2016/05/btc-on-dot-onion.html

[6MB]

And here's the same information going BTC -> .onion

http://virgil.gr/wp-content/uploads/2016/05/btc2domains.v2.txt

[2mb]

If you want to check the results you can ask Juha for the JSON query to do this.

Lets go out on a limb and assume that regexs are okay. Is the issue then .onion search-engines? I understand Isis's preference for there to always be affirmative consent but does that mean that until such a standard exists all search engines from onion.link, ahmia.fi [1], MEMEX, NotEvil, and Grams are violating official Tor community policy?

Here's how I currently see this. I put on my amateur legal hat and say, "Well, the Internet/world-wide-web is considered a public space. Onion-sites are like the web, but with masked speakers."

https://www.hks.harvard.edu/m-rcbg/research/j.camp_acm.computer_internet.as....

...

http://aims.muohio.edu/2011/02/01/is-the-internet-a-public-space/

Ergo, I would argue that, by default, content on .onion is public the same way everything else on the web is. If you don't want to be "indexed", for physical spaces you go in-doors, or for the web you put up a login. As an aside, the web-standard is actually *kinder* than physical public spaces because on the web one can have an unobstrusive /robots.txt saying, "please don't index me". Which is a great thing.

Whereas some would say Tor users are "anonymous", others would instead say any and everything Tor is "private". I believe this needs to be clarified. I once proposed to Roger that he delineate the sub-types of privacy in the same way Stallman delineated his "Four Freedoms". Roger replied that he preferred using the broad catch-all term "Privacy". These confusions may be a caveat of using a broad catch-all term. Interpreting broadly, Isis is correct. However, this conclusion has a lot of unpleasant ramifications.

Comments appreciated, -V

P.S. Mildly related, I saw this today involving DARPA, and Tor. http://thehackernews.com/2016/05/darpa-trace-hacker.html

""" The aim of Enhanced Attribution program is to track personas continuously and create “algorithms for developing predictive behavioral profiles.” """

I hope you all are aware this flows directly from MEMEX. Right? This, and MEMEX, seems a much more appropriate target for outrage. A lot of this work that numerous community members have worked on gives even me pause.

Links:

[1] http://ahmia.fi

tor-project mailing list tor-project@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-project

-- There are 10 kinds of people in the world: those who understand binary, those who don't, and people who didn't expect a base 3 joke.

Virgil Griffith

8:48 a.m.

Okay. Can do. Tabling this for a month.

-V

On Wed, Jun 8, 2016 at 3:55 PM, Griffin Boyce griffin@cryptolab.net wrote:

...

Hey Virgil,

While I know you and I have talked about this in private recently, it seems like a good time to table this discussion for a couple of weeks. Considering everything else that's going on, this might not be the ideal time for everyone to contribute to the discussion.

<3 Griffin

Virgil Griffith wrote:

...
Here's yet another data point indicating the policy on crawling .onion needs to be clarified. The new and popular OnionStats tool doesn't even respect /robots.txt, see: https://onionscan.org/reports/may2016.html

So now we have *three* different positions among respected members of the Tor community.

(1) isis et al: robots.txt is insufficient --- "Consent is not the absence of saying 'no' — it is explicitly saying 'yes'."

(2) onionlink/ahmia/notevil/grams: we respect robots.txt --- "Default is yes, but you can always opt-out."

(3) onionstats/memex: we ignore robots.txt --- "Don't care even if you opt-out."

-V

On Wed, Jun 8, 2016 at 1:34 AM, Virgil Griffith i@virgil.gr wrote:

Hello all.

...
I wrote on this topic earlier at:

https://lists.torproject.org/pipermail/tor-project/2016-May/000411.html

...
This is me again asking for clarification. I choose this issue because it is the most self-contained of the various ones raised by isis et al, and it seemed wise to clarify this becoming opening up a new one. If someone from Tor management writes me that social reasons prohibit search engines from being addressed at this time, I will drop it.

Given the lack of prior reaction as well as ahmia.fi [1] getting funded for GSoC (ahmia has followed /robots.txt from day zero), I tentatively conclude this crawling .onion is non-controversial, i.e., "Per Tor community standards, search engines obeying robots.txt are a-okay. Equivalently, indexing .onion content is treated equivalently as any other part of the web."

But, to motivate as well as give any concerned parties an opportunity to be hard, I have republished the onion2bitcoin as well as the bitcoin2onion anonymizing only the final 4 characters of the .onion address instead of final 8.

-- http://virgil.gr/wp-content/uploads/2016/06/onion2btc.html -- http://virgil.gr/wp-content/uploads/2016/06/btc2onion.html

-V

On Tue, May 31, 2016 at 10:05 PM, Virgil Griffith i@virgil.gr wrote: This seems like something people would have opinions on. Anyone?

-V

On Monday, 30 May 2016, Virgil Griffith i@virgil.gr wrote:

Hello all.

I am preparing a longer response to the issues Isis et al mentioned. Most are interrelated, but this one is not. And I wanted to get clarification on it.

Isis expressed a concern about making a list of bitcoin addresses from .onion, citing, "Consent is not the absence of saying 'no' — it is explicitly saying 'yes'."

For what it's worth, ahmia.fi [1] actually supports regex searching right out of the box. In fact, a single line of JSON spits out all known bitcoin addresses ahmia knows about.

For example, here's an anonymized list going .onion -> BTC which I mined from Ahmia,

http://virgil.gr/wp-content/uploads/2016/05/btc-on-dot-onion.html

[6MB]

And here's the same information going BTC -> .onion

http://virgil.gr/wp-content/uploads/2016/05/btc2domains.v2.txt

[2mb]

If you want to check the results you can ask Juha for the JSON query to do this.

Lets go out on a limb and assume that regexs are okay. Is the issue then .onion search-engines? I understand Isis's preference for there to always be affirmative consent but does that mean that until such a standard exists all search engines from onion.link, ahmia.fi [1], MEMEX, NotEvil, and Grams are violating official Tor community

policy?

Here's how I currently see this. I put on my amateur legal hat and say, "Well, the Internet/world-wide-web is considered a public space. Onion-sites are like the web, but with masked speakers."

https://www.hks.harvard.edu/m-rcbg/research/j.camp_acm.computer_internet.as....

...

http://aims.muohio.edu/2011/02/01/is-the-internet-a-public-space/

Ergo, I would argue that, by default, content on .onion is public the same way everything else on the web is. If you don't want to be "indexed", for physical spaces you go in-doors, or for the web you put up a login. As an aside, the web-standard is actually *kinder* than physical public spaces because on the web one can have an unobstrusive /robots.txt saying, "please don't index me". Which is a great thing.

Whereas some would say Tor users are "anonymous", others would instead say any and everything Tor is "private". I believe this needs to be clarified. I once proposed to Roger that he delineate the sub-types of privacy in the same way Stallman delineated his "Four Freedoms". Roger replied that he preferred using the broad catch-all term "Privacy". These confusions may be a caveat of using a broad catch-all term. Interpreting broadly, Isis is correct. However, this conclusion has a lot of unpleasant ramifications.

Comments appreciated, -V

P.S. Mildly related, I saw this today involving DARPA, and Tor. http://thehackernews.com/2016/05/darpa-trace-hacker.html

""" The aim of Enhanced Attribution program is to track personas continuously and create “algorithms for developing predictive behavioral profiles.” """

I hope you all are aware this flows directly from MEMEX. Right? This, and MEMEX, seems a much more appropriate target for outrage. A lot of this work that numerous community members have worked on gives even me pause.

Links:

[1] http://ahmia.fi

tor-project mailing list tor-project@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-project

-- There are 10 kinds of people in the world: those who understand binary, those who don't, and people who didn't expect a base 3 joke. _______________________________________________ tor-project mailing list tor-project@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-project

Virgil Griffith

7 Jul 7 Jul

4:40 a.m.

Hello all. Back in June Griffin asked for this conversation to be temporarily tabled, and it's been a month!

Let us discuss robots.txt and crawling of .onion. Right now we have *three* camps! They are:

So now we have *three* different positions among respected members of the Tor community.

(A) isis et al: robots.txt is insufficient --- "Consent is not the absence of saying 'no' — it is explicitly saying 'yes'."

(B) onionlink/ahmia/notevil/grams: we respect robots.txt --- "Default is yes, but you can always opt-out."

(C) onionstats/memex: we ignore robots.txt --- "Don't care even if you opt-out." (see https://onionscan.org/reports/may2016.html)

Isis did a good job arguing for (A) by claiming that representing (B) and (C) are "blatant and disgusting workaround[s] to the trust and expectations which onion service operators place in the network." https://lists.torproject.org/pipermail/tor-project/2016-May/000356.html

This is me arguing for (B): https://lists.torproject.org/pipermail/tor-project/2016-May/000411.html

I have no link arguing for (C).

I had tried to get this conversation moving before. So to poke this discussion to go forward this time, I have republished the onion2bitcoin as well as the bitcoin2onion anonymizing only the final 4 characters of the .onion address instead of final 8. Under (A), compiling this list is deeply heretical. In the view of either (B) or (C), .onion content is by default public (presumably running regexs is fine), compiling such data is a perfectly fine thing to do. -- http://virgil.gr/wp-content/uploads/2016/06/onion2btc.html -- http://virgil.gr/wp-content/uploads/2016/06/btc2onion.html

Let's discuss!

-V

Tim Wilson-Brown - teor

4:54 a.m.

...

On 7 Jul 2016, at 14:40, Virgil Griffith i@virgil.gr wrote:

Hello all. Back in June Griffin asked for this conversation to be temporarily tabled, and it's been a month!

Let us discuss robots.txt and crawling of .onion. Right now we have *three* camps! They are:

Please define "crawling of .onion". I don't know enough about the details of what you're doing to have a strong opinion.

How do you make your list of .onion addresses to crawl? * by running a HSDir? * using Tor2web request logs? * using .onion addresses found via a search engine? * using .onion addresses found on HTML pages on other .onion sites? * through some other method?

How do you access and index the web content on those .onion sites? How often do you access the site? How many pages deep do you go on the site? Do you follow links to other .onion sites?

How do you make sure that Tor2web users are anonymised (as possible) when accessing hidden services?

...

So now we have *three* different positions among respected members of the Tor community.

(A) isis et al: robots.txt is insufficient --- "Consent is not the absence of saying 'no' — it is explicitly saying 'yes'."

(B) onionlink/ahmia/notevil/grams: we respect robots.txt --- "Default is yes, but you can always opt-out."

(C) onionstats/memex: we ignore robots.txt --- "Don't care even if you opt-out." (see https://onionscan.org/reports/may2016.html)

Isis did a good job arguing for (A) by claiming that representing (B) and (C) are "blatant and disgusting workaround[s] to the trust and expectations which onion service operators place in the network." https://lists.torproject.org/pipermail/tor-project/2016-May/000356.html

This is me arguing for (B): https://lists.torproject.org/pipermail/tor-project/2016-May/000411.html

I have no link arguing for (C).

I had tried to get this conversation moving before. So to poke this discussion to go forward this time, I have republished the onion2bitcoin as well as the bitcoin2onion anonymizing only the final 4 characters of the .onion address instead of final 8. Under (A), compiling this list is deeply heretical. In the view of either (B) or (C), .onion content is by default public (presumably running regexs is fine), compiling such data is a perfectly fine thing to do. -- http://virgil.gr/wp-content/uploads/2016/06/onion2btc.html -- http://virgil.gr/wp-content/uploads/2016/06/btc2onion.html

Please stop releasing logs. It could easily be seen as a provocative act. And it's not a good way to encourage people to talk to you. One possible consequence is that individuals or groups decide it's poor behaviour, and therefore refuse to deal with you.

Tim

Tim Wilson-Brown (teor)

teor2345 at gmail dot com PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B ricochet:ekmygaiu4rzgsk6n

Virgil Griffith

5:24 a.m.

...

Please define "crawling of .onion". I don't know enough about the details of what you're doing to have a

strong opinion.

I mean search engines crawling HTML pages on .onion. Like doing: https://www.google.com/search?q=site%3Aonion.to

ahmia.fi does do crawling. I leave further discussion to them. OnionLink actually does *zero* crawling. I leave it to Google et al.

When Google crawls me they use: * using .onion addresses found via a search engine. * using .onion addresses found on HTML pages on other .onion sites.

None of the rest. Nothing with HSDirs, etc. The *only* HSDir thing that has ever existed is caching NXDOMAIN responses from HSDirs to reduce the load Tor2web places on the Tor network. This was solely to *be kind to the operators*. However, as the caching caused some uproar I've stopped caching NXDOMAINs and have returned to unnessecarily burdening the Tor network.

...

How do you access and index the web content on those .onion sites?

The accessing is just plain Tor2web HTTP requests. They announce themselves with the HTTP header `x-tor2web: true` . Google does the indexing.

...

How often do you access the site?

Looking at analytics from Googlebot accessing Onionlink, every 7-21 days.

...

How many pages deep do you go on the site?

Don't know. I suppose Google goes as deep as possible.

...

Do you follow links to other .onion sites?

Yes. Corresponding to the other .onion sites /robots.txt policy.

...

How do you make sure that Tor2web users are anonymised (as possible) when

accessing hidden services?

I make a good faith effort not to wantonly reveal personally identifying information. But in short, it's hard. I urge people to think of tor2web nodes as closer to Twitter where they record what links you click. I wholly support having the "where is Tor2web in regards to user privacy" discussion (hopefully could even make some improvements to it!), but it is orthogonal to the "robots.txt on .onion" discussion. Let's address the robots.txt issue and then we can return to Tor2web user-privacy.

...

Please stop releasing logs. It could easily be seen as a provocative act.

Yeah I understand. This is my 3rd or 4th attempt to discuss this and I was intentionally being a little pokey. I have no intention or desire of actually compromising anomymity.

-V

On Thu, Jul 7, 2016 at 12:54 PM, Tim Wilson-Brown - teor <teor2345@gmail.com

...

wrote:

...

...
On 7 Jul 2016, at 14:40, Virgil Griffith i@virgil.gr wrote:

Hello all. Back in June Griffin asked for this conversation to be

temporarily tabled, and it's been a month!

...
Let us discuss robots.txt and crawling of .onion. Right now we have

*three* camps! They are:

Please define "crawling of .onion". I don't know enough about the details of what you're doing to have a strong opinion.

How do you make your list of .onion addresses to crawl?

by running a HSDir?

using Tor2web request logs?

using .onion addresses found via a search engine?

using .onion addresses found on HTML pages on other .onion sites?

through some other method?

How do you access and index the web content on those .onion sites? How often do you access the site? How many pages deep do you go on the site? Do you follow links to other .onion sites?

How do you make sure that Tor2web users are anonymised (as possible) when accessing hidden services?

...
So now we have *three* different positions among respected members of

the Tor community.

...
(A) isis et al: robots.txt is insufficient --- "Consent is not the absence of saying 'no' — it is explicitly saying

'yes'."

...
(B) onionlink/ahmia/notevil/grams: we respect robots.txt --- "Default is yes, but you can always opt-out."

(C) onionstats/memex: we ignore robots.txt --- "Don't care even if you opt-out." (see

https://onionscan.org/reports/may2016.html)

...
Isis did a good job arguing for (A) by claiming that representing (B)

and (C) are "blatant and disgusting workaround[s] to the trust and expectations which onion service operators place in the network." https://lists.torproject.org/pipermail/tor-project/2016-May/000356.html

...
This is me arguing for (B):

https://lists.torproject.org/pipermail/tor-project/2016-May/000411.html

...
I have no link arguing for (C).

I had tried to get this conversation moving before. So to poke this

discussion to go forward this time, I have republished the onion2bitcoin as well as the bitcoin2onion anonymizing only the final 4 characters of the .onion address instead of final 8. Under (A), compiling this list is deeply heretical. In the view of either (B) or (C), .onion content is by default public (presumably running regexs is fine), compiling such data is a perfectly fine thing to do.

...
-- http://virgil.gr/wp-content/uploads/2016/06/onion2btc.html -- http://virgil.gr/wp-content/uploads/2016/06/btc2onion.html

Please stop releasing logs. It could easily be seen as a provocative act. And it's not a good way to encourage people to talk to you. One possible consequence is that individuals or groups decide it's poor behaviour, and therefore refuse to deal with you.

Tim

Tim Wilson-Brown (teor)

teor2345 at gmail dot com PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B ricochet:ekmygaiu4rzgsk6n

tor-project mailing list tor-project@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-project

Tim Wilson-Brown - teor

5:44 a.m.

...

On 7 Jul 2016, at 15:24, Virgil Griffith i@virgil.gr wrote:

...
How do you make sure that Tor2web users are anonymised (as possible) when accessing hidden services?

I make a good faith effort not to wantonly reveal personally identifying information. But in short, it's hard. I urge people to think of tor2web nodes as closer to Twitter where they record what links you click. I wholly support having the "where is Tor2web in regards to user privacy" discussion (hopefully could even make some improvements to it!), but it is orthogonal to the "robots.txt on .onion" discussion. Let's address the robots.txt issue and then we can return to Tor2web user-privacy.

Well, as a separate issue, you might want to remove the client IP address (X-Forwarded-For) from HTTP headers your caching proxies send to hidden services. And work out if any of the other headers are sensitive.

...

On 7 Jul 2016, at 14:40, Virgil Griffith i@virgil.gr wrote:

So now we have *three* different positions among respected members of the Tor community.

(A) isis et al: robots.txt is insufficient --- "Consent is not the absence of saying 'no' — it is explicitly saying 'yes'."

(B) onionlink/ahmia/notevil/grams: we respect robots.txt --- "Default is yes, but you can always opt-out."

Is the opt-out permanent, or does your server re-check every time it connects? I can imagine there being issues with either model - one involves storing a list, the other, regular connections.

...

(C) onionstats/memex: we ignore robots.txt --- "Don't care even if you opt-out." (see https://onionscan.org/reports/may2016.html)

Isis did a good job arguing for (A) by claiming that representing (B) and (C) are "blatant and disgusting workaround[s] to the trust and expectations which onion service operators place in the network." https://lists.torproject.org/pipermail/tor-project/2016-May/000356.html

This is me arguing for (B): https://lists.torproject.org/pipermail/tor-project/2016-May/000411.html

I have no link arguing for (C).

I am disappointed that we have a Tor2web design where Tor2web needs to connect to a hidden service first, then check if it has given permission for Tor2web to connect to it. I am also disappointed that this only works for HTTP onions on the default port 80.

I would like to see a much better design for this.

I am also concerned about threat models where a single unwanted connection, or a number of unwanted connections, are security factors. For example: Imagine there is an (unknown) attack which can determine 1 bit of the 1024-bit RSA key per hidden service connection. (Some known attacks on broken crypto systems are like this, as are some side-channels.) Or imagine there is an attack which can determine 1 bit of the IPv4 address per connection.

For security, a hidden service operator decides to only allow 10 connections before rolling over their hidden service to a new key and server.

There are at least 10 connections to known .onion addresses every week, because there are at least 10 Tor2web or memex or onionstats instances on the web. Therefore, every week, the operator must roll over their hidden service, and arrange to notify users of the new address in a secure fashion. Alternately, they must keep the address secret, even from the HSDir hash ring, which is not possible.

Is there an alternative to position (A) that supports threat models like this?

I believe that a technical solution to this threat model is hidden service client authentication (and the next-generation hidden service protocol, when available). However, there is also the possibility of exerting social pressure to prevent people from running servers that continually connect to tor hidden services.

Tim

Tim Wilson-Brown (teor)

teor2345 at gmail dot com PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B ricochet:ekmygaiu4rzgsk6n

Tim

Tim Wilson-Brown (teor)

teor2345 at gmail dot com PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B ricochet:ekmygaiu4rzgsk6n

Virgil Griffith

6:28 a.m.

...

you might want to remove the client IP address (X-Forwarded-For) from

HTTP headers

Agreed! And yes we already remove x-forwarded-for. https://github.com/globaleaks/Tor2web/blob/master/tor2web/t2w.py#L701

I recall that the very, very beginning we had a python proxy library automatically adding x-forwarded-for, but once we realized it was doing that we corrected it. FWIW, it was actually Aaron who wrote that code ;)

AFAIK Tor2web hasn't leaked any privacy-invading headers for sometime. If ones are discovered they would be fixed ASAP.

...

Is the opt-out permanent, or does your server re-check every time it

connects?

...

I can imagine there being issues with either model - one involves storing

a list, the other, regular connections.

I don't know. This is Google/Bing's department. Do we have someone on list familiar enough with either? If I were to guess the Googley/Bingy-way of doing this, I'd imagine them storing the list, and then when crawling the site again they'd do a HEAD request to see if the /robots.txt has changed. And if the /robots.txt has changed, to overwrite their stored list.

...

I am disappointed that we have a Tor2web design where Tor2web needs to

connect to a hidden service first, then check if it has given permission for Tor2web to connect to it.

/robots.txt isn't a permission to "connect to", it's a permission to crawl/index. I'm aware of no standard within or outside of Tor to say whether node A has permission to connect to node B. If such a standard or even unofficial exists I'm down for spending some weekends implementing it.

...

I am also disappointed that this only works for HTTP onions on the

default port 80.

I agree completely. But if the issue is operator privacy, isn't it even *better* that tor2web only works for port 80? As an aside, there is tor2tcp at: https://cryptoparty.at/tor2tcp

...

I am also concerned about threat models where a single unwanted

connection, or a number of unwanted connections, are security factors.

...

For example: Imagine there is an (unknown) attack which can determine 1 bit of the

1024-bit RSA key per hidden service connection.

...

(Some known attacks on broken crypto systems are like this, as are some

side-channels.)

...

Or imagine there is an attack which can determine 1 bit of the IPv4

address per connection.

...

Is there an alternative to position (A) that supports threat models like

this?

I don't have a good solution to this. As stated above, I'm aware of no protocol for saying "Please don't connect to me." The security person in me is a little skeptical how useful it would be---if someone wanted to make many connections to learn a private key, I presume she won't be obeying said requests. However, if someone doesn't want to be connected to, upon such a standard existing I would happily abide by it.

...

there is also the possibility of exerting social pressure to prevent

people from running servers that continually connect to tor hidden services.

The closest things I know of for social pressure are:

(1) Liberal caching headers in the HTTP response:

``` max-age=604800 #can be cached by browser and any intermediary caches for up to 1 week ```

(2) In /robots.txt putting long crawl-delays:

``` User-Agent: * Crawl-delay: 86400 #wait 1 day between each fetch. ```

...

I believe that a technical solution to this threat model is hidden

service client authentication (and the next-generation hidden service protocol, when available).

Agreed.

-V

On Thu, Jul 7, 2016 at 1:44 PM, Tim Wilson-Brown - teor teor2345@gmail.com wrote:

...

...
On 7 Jul 2016, at 15:24, Virgil Griffith i@virgil.gr wrote:

...
How do you make sure that Tor2web users are anonymised (as possible)

when accessing hidden services?

...
I make a good faith effort not to wantonly reveal personally identifying

information. But in short, it's hard. I urge people to think of tor2web nodes as closer to Twitter where they record what links you click. I wholly support having the "where is Tor2web in regards to user privacy" discussion (hopefully could even make some improvements to it!), but it is orthogonal to the "robots.txt on .onion" discussion. Let's address the robots.txt issue and then we can return to Tor2web user-privacy.

Well, as a separate issue, you might want to remove the client IP address (X-Forwarded-For) from HTTP headers your caching proxies send to hidden services. And work out if any of the other headers are sensitive.

...
On 7 Jul 2016, at 14:40, Virgil Griffith i@virgil.gr wrote:

So now we have *three* different positions among respected members of

the Tor community.

...
(A) isis et al: robots.txt is insufficient --- "Consent is not the absence of saying 'no' — it is explicitly saying

'yes'."

...
(B) onionlink/ahmia/notevil/grams: we respect robots.txt --- "Default is yes, but you can always opt-out."

Is the opt-out permanent, or does your server re-check every time it connects? I can imagine there being issues with either model - one involves storing a list, the other, regular connections.

...
(C) onionstats/memex: we ignore robots.txt --- "Don't care even if you opt-out." (see

https://onionscan.org/reports/may2016.html)

...
Isis did a good job arguing for (A) by claiming that representing (B)

and (C) are "blatant and disgusting workaround[s] to the trust and expectations which onion service operators place in the network." https://lists.torproject.org/pipermail/tor-project/2016-May/000356.html

...
This is me arguing for (B):

https://lists.torproject.org/pipermail/tor-project/2016-May/000411.html

...
I have no link arguing for (C).

I am disappointed that we have a Tor2web design where Tor2web needs to connect to a hidden service first, then check if it has given permission for Tor2web to connect to it. I am also disappointed that this only works for HTTP onions on the default port 80.

I would like to see a much better design for this.

I am also concerned about threat models where a single unwanted connection, or a number of unwanted connections, are security factors. For example: Imagine there is an (unknown) attack which can determine 1 bit of the 1024-bit RSA key per hidden service connection. (Some known attacks on broken crypto systems are like this, as are some side-channels.) Or imagine there is an attack which can determine 1 bit of the IPv4 address per connection.

For security, a hidden service operator decides to only allow 10 connections before rolling over their hidden service to a new key and server.

There are at least 10 connections to known .onion addresses every week, because there are at least 10 Tor2web or memex or onionstats instances on the web. Therefore, every week, the operator must roll over their hidden service, and arrange to notify users of the new address in a secure fashion. Alternately, they must keep the address secret, even from the HSDir hash ring, which is not possible.

Is there an alternative to position (A) that supports threat models like this?

I believe that a technical solution to this threat model is hidden service client authentication (and the next-generation hidden service protocol, when available). However, there is also the possibility of exerting social pressure to prevent people from running servers that continually connect to tor hidden services.

Tim

Tim Wilson-Brown (teor)

teor2345 at gmail dot com PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B ricochet:ekmygaiu4rzgsk6n

Tim

Tim Wilson-Brown (teor)

teor2345 at gmail dot com PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B ricochet:ekmygaiu4rzgsk6n

tor-project mailing list tor-project@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-project

Nurmi, Juha

19 Jul 19 Jul

8:21 a.m.

Hi,

Virgil pointed out several good points with onion search engines.

1) Anonymous vs. hidden

...

...
...
Whereas some would say Tor users are "anonymous", others would instead

say any and everything Tor is "private". I believe this needs to be clarified.

I am publishing a paper about my onion service experiment: I deployed 100 onion servers and followed TCP traffic to these services. As a result, they got accessed by multiple different scanners (curl, wget, browser, scrapers, ssh). This means that some people do HSDir harvesting and scan onions.

2) Search engines can efficiently map content

...

...
...
For what it's worth, ahmia.fi actually supports regex searching right

out of the box. In fact, a single line of JSON spits out all known bitcoin addresses ahmia knows about.

At the moment I have no public documentation how to use regex search but Ahmia supports this feature. Is this good or not? I know that Google has disabled these kind of features because privacy issues.

3) Is a web-site a public place?

...

...
...
Here's how I currently see this. I put on my amateur legal hat and

say, "Well, the Internet/world-wide-web is considered a public space. Onion-sites are like the web, but with masked speakers."

Good point! I think you are right.

Best, Juha

On Thu, Jul 7, 2016 at 9:28 AM, Virgil Griffith i@virgil.gr wrote:

...

...
you might want to remove the client IP address (X-Forwarded-For) from

HTTP headers

Agreed! And yes we already remove x-forwarded-for. https://github.com/globaleaks/Tor2web/blob/master/tor2web/t2w.py#L701

I recall that the very, very beginning we had a python proxy library automatically adding x-forwarded-for, but once we realized it was doing that we corrected it. FWIW, it was actually Aaron who wrote that code ;)

AFAIK Tor2web hasn't leaked any privacy-invading headers for sometime. If ones are discovered they would be fixed ASAP.

...
Is the opt-out permanent, or does your server re-check every time it

connects?

...
I can imagine there being issues with either model - one involves

storing a list, the other, regular connections.

I don't know. This is Google/Bing's department. Do we have someone on list familiar enough with either? If I were to guess the Googley/Bingy-way of doing this, I'd imagine them storing the list, and then when crawling the site again they'd do a HEAD request to see if the /robots.txt has changed. And if the /robots.txt has changed, to overwrite their stored list.

...
I am disappointed that we have a Tor2web design where Tor2web needs to

connect to a hidden service first, then check if it has given permission for Tor2web to connect to it.

/robots.txt isn't a permission to "connect to", it's a permission to crawl/index. I'm aware of no standard within or outside of Tor to say whether node A has permission to connect to node B. If such a standard or even unofficial exists I'm down for spending some weekends implementing it.

...
I am also disappointed that this only works for HTTP onions on the

default port 80.

I agree completely. But if the issue is operator privacy, isn't it even *better* that tor2web only works for port 80? As an aside, there is tor2tcp at: https://cryptoparty.at/tor2tcp

...
I am also concerned about threat models where a single unwanted

connection, or a number of unwanted connections, are security factors.

...
For example: Imagine there is an (unknown) attack which can determine 1 bit of the

1024-bit RSA key per hidden service connection.

...
(Some known attacks on broken crypto systems are like this, as are some

side-channels.)

...
Or imagine there is an attack which can determine 1 bit of the IPv4

address per connection.

...
Is there an alternative to position (A) that supports threat models like

this?

I don't have a good solution to this. As stated above, I'm aware of no protocol for saying "Please don't connect to me." The security person in me is a little skeptical how useful it would be---if someone wanted to make many connections to learn a private key, I presume she won't be obeying said requests. However, if someone doesn't want to be connected to, upon such a standard existing I would happily abide by it.

...
there is also the possibility of exerting social pressure to prevent

people from running servers that continually connect to tor hidden services.

The closest things I know of for social pressure are:

(1) Liberal caching headers in the HTTP response:
max-age=604800   #can be cached by browser and any intermediary caches
for up to 1 week
(2) In /robots.txt putting long crawl-delays:
User-Agent: *
Crawl-delay: 86400   #wait 1 day between each fetch.
...
I believe that a technical solution to this threat model is hidden

service client authentication (and the next-generation hidden service protocol, when available).

Agreed.

-V

On Thu, Jul 7, 2016 at 1:44 PM, Tim Wilson-Brown - teor < teor2345@gmail.com> wrote:

...
...
On 7 Jul 2016, at 15:24, Virgil Griffith i@virgil.gr wrote:

...
How do you make sure that Tor2web users are anonymised (as possible)

when accessing hidden services?

...
I make a good faith effort not to wantonly reveal personally

identifying information. But in short, it's hard. I urge people to think of tor2web nodes as closer to Twitter where they record what links you click. I wholly support having the "where is Tor2web in regards to user privacy" discussion (hopefully could even make some improvements to it!), but it is orthogonal to the "robots.txt on .onion" discussion. Let's address the robots.txt issue and then we can return to Tor2web user-privacy.

Well, as a separate issue, you might want to remove the client IP address (X-Forwarded-For) from HTTP headers your caching proxies send to hidden services. And work out if any of the other headers are sensitive.

...
On 7 Jul 2016, at 14:40, Virgil Griffith i@virgil.gr wrote:

So now we have *three* different positions among respected members of

the Tor community.

...
(A) isis et al: robots.txt is insufficient --- "Consent is not the absence of saying 'no' — it is explicitly

saying 'yes'."

...
(B) onionlink/ahmia/notevil/grams: we respect robots.txt --- "Default is yes, but you can always opt-out."

Is the opt-out permanent, or does your server re-check every time it connects? I can imagine there being issues with either model - one involves storing a list, the other, regular connections.

...
(C) onionstats/memex: we ignore robots.txt --- "Don't care even if you opt-out." (see

https://onionscan.org/reports/may2016.html)

...
Isis did a good job arguing for (A) by claiming that representing (B)

and (C) are "blatant and disgusting workaround[s] to the trust and expectations which onion service operators place in the network." https://lists.torproject.org/pipermail/tor-project/2016-May/000356.html

...
This is me arguing for (B):

https://lists.torproject.org/pipermail/tor-project/2016-May/000411.html

...
I have no link arguing for (C).

I am disappointed that we have a Tor2web design where Tor2web needs to connect to a hidden service first, then check if it has given permission for Tor2web to connect to it. I am also disappointed that this only works for HTTP onions on the default port 80.

I would like to see a much better design for this.

I am also concerned about threat models where a single unwanted connection, or a number of unwanted connections, are security factors. For example: Imagine there is an (unknown) attack which can determine 1 bit of the 1024-bit RSA key per hidden service connection. (Some known attacks on broken crypto systems are like this, as are some side-channels.) Or imagine there is an attack which can determine 1 bit of the IPv4 address per connection.

For security, a hidden service operator decides to only allow 10 connections before rolling over their hidden service to a new key and server.

There are at least 10 connections to known .onion addresses every week, because there are at least 10 Tor2web or memex or onionstats instances on the web. Therefore, every week, the operator must roll over their hidden service, and arrange to notify users of the new address in a secure fashion. Alternately, they must keep the address secret, even from the HSDir hash ring, which is not possible.

Is there an alternative to position (A) that supports threat models like this?

I believe that a technical solution to this threat model is hidden service client authentication (and the next-generation hidden service protocol, when available). However, there is also the possibility of exerting social pressure to prevent people from running servers that continually connect to tor hidden services.

Tim

Tim Wilson-Brown (teor)

teor2345 at gmail dot com PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B ricochet:ekmygaiu4rzgsk6n

Tim

Tim Wilson-Brown (teor)

teor2345 at gmail dot com PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B ricochet:ekmygaiu4rzgsk6n

tor-project mailing list tor-project@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-project

tor-project mailing list tor-project@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-project

Virgil Griffith

21 Jul 21 Jul

4:23 a.m.

Does anyone want to vouch for view (A) ? Note that view (A) is currently enshrined in the ethics guidelines. The following are currently in conflict with (A):

* the largest tor2web nodes * MEMEX and other government programs * beloved metrics applications like OnionStats

-V

On Tuesday, 19 July 2016, Nurmi, Juha juha.nurmi@ahmia.fi wrote:

...

Hi,

Virgil pointed out several good points with onion search engines.

Anonymous vs. hidden

...
...
...
Whereas some would say Tor users are "anonymous", others would instead

say any and everything Tor is "private". I believe this needs to be clarified.

I am publishing a paper about my onion service experiment: I deployed 100 onion servers and followed TCP traffic to these services. As a result, they got accessed by multiple different scanners (curl, wget, browser, scrapers, ssh). This means that some people do HSDir harvesting and scan onions.

Search engines can efficiently map content

...
...
...
For what it's worth, ahmia.fi actually supports regex searching right

out of the box. In fact, a single line of JSON spits out all known bitcoin addresses ahmia knows about.

At the moment I have no public documentation how to use regex search but Ahmia supports this feature. Is this good or not? I know that Google has disabled these kind of features because privacy issues.

Is a web-site a public place?

...
...
...
Here's how I currently see this. I put on my amateur legal hat and

say, "Well, the Internet/world-wide-web is considered a public space. Onion-sites are like the web, but with masked speakers."

Good point! I think you are right.

Best, Juha

On Thu, Jul 7, 2016 at 9:28 AM, Virgil Griffith <i@virgil.gr javascript:_e(%7B%7D,'cvml','i@virgil.gr');> wrote:

...
...
you might want to remove the client IP address (X-Forwarded-For) from

HTTP headers

Agreed! And yes we already remove x-forwarded-for. https://github.com/globaleaks/Tor2web/blob/master/tor2web/t2w.py#L701

I recall that the very, very beginning we had a python proxy library automatically adding x-forwarded-for, but once we realized it was doing that we corrected it. FWIW, it was actually Aaron who wrote that code ;)

AFAIK Tor2web hasn't leaked any privacy-invading headers for sometime. If ones are discovered they would be fixed ASAP.

...
Is the opt-out permanent, or does your server re-check every time it

connects?

...
I can imagine there being issues with either model - one involves

storing a list, the other, regular connections.

I don't know. This is Google/Bing's department. Do we have someone on list familiar enough with either? If I were to guess the Googley/Bingy-way of doing this, I'd imagine them storing the list, and then when crawling the site again they'd do a HEAD request to see if the /robots.txt has changed. And if the /robots.txt has changed, to overwrite their stored list.

...
I am disappointed that we have a Tor2web design where Tor2web needs to

connect to a hidden service first, then check if it has given permission for Tor2web to connect to it.

/robots.txt isn't a permission to "connect to", it's a permission to crawl/index. I'm aware of no standard within or outside of Tor to say whether node A has permission to connect to node B. If such a standard or even unofficial exists I'm down for spending some weekends implementing it.

...
I am also disappointed that this only works for HTTP onions on the

default port 80.

I agree completely. But if the issue is operator privacy, isn't it even *better* that tor2web only works for port 80? As an aside, there is tor2tcp at: https://cryptoparty.at/tor2tcp

...
I am also concerned about threat models where a single unwanted

connection, or a number of unwanted connections, are security factors.

...
For example: Imagine there is an (unknown) attack which can determine 1 bit of the

1024-bit RSA key per hidden service connection.

...
(Some known attacks on broken crypto systems are like this, as are some

side-channels.)

...
Or imagine there is an attack which can determine 1 bit of the IPv4

address per connection.

...
Is there an alternative to position (A) that supports threat models

like this?

I don't have a good solution to this. As stated above, I'm aware of no protocol for saying "Please don't connect to me." The security person in me is a little skeptical how useful it would be---if someone wanted to make many connections to learn a private key, I presume she won't be obeying said requests. However, if someone doesn't want to be connected to, upon such a standard existing I would happily abide by it.

...
there is also the possibility of exerting social pressure to prevent

people from running servers that continually connect to tor hidden services.

The closest things I know of for social pressure are:

(1) Liberal caching headers in the HTTP response:
max-age=604800   #can be cached by browser and any intermediary caches
for up to 1 week
(2) In /robots.txt putting long crawl-delays:
User-Agent: *
Crawl-delay: 86400   #wait 1 day between each fetch.
...
I believe that a technical solution to this threat model is hidden

service client authentication (and the next-generation hidden service protocol, when available).

Agreed.

-V

On Thu, Jul 7, 2016 at 1:44 PM, Tim Wilson-Brown - teor < teor2345@gmail.com javascript:_e(%7B%7D,'cvml','teor2345@gmail.com');> wrote:

...
...
On 7 Jul 2016, at 15:24, Virgil Griffith <i@virgil.gr

javascript:_e(%7B%7D,'cvml','i@virgil.gr');> wrote:

...
...
How do you make sure that Tor2web users are anonymised (as possible)

when accessing hidden services?

...
I make a good faith effort not to wantonly reveal personally

identifying information. But in short, it's hard. I urge people to think of tor2web nodes as closer to Twitter where they record what links you click. I wholly support having the "where is Tor2web in regards to user privacy" discussion (hopefully could even make some improvements to it!), but it is orthogonal to the "robots.txt on .onion" discussion. Let's address the robots.txt issue and then we can return to Tor2web user-privacy.

Well, as a separate issue, you might want to remove the client IP address (X-Forwarded-For) from HTTP headers your caching proxies send to hidden services. And work out if any of the other headers are sensitive.

...
On 7 Jul 2016, at 14:40, Virgil Griffith <i@virgil.gr

javascript:_e(%7B%7D,'cvml','i@virgil.gr');> wrote:

...
So now we have *three* different positions among respected members of

the Tor community.

...
(A) isis et al: robots.txt is insufficient --- "Consent is not the absence of saying 'no' — it is explicitly

saying 'yes'."

...
(B) onionlink/ahmia/notevil/grams: we respect robots.txt --- "Default is yes, but you can always opt-out."

Is the opt-out permanent, or does your server re-check every time it connects? I can imagine there being issues with either model - one involves storing a list, the other, regular connections.

...
(C) onionstats/memex: we ignore robots.txt --- "Don't care even if you opt-out." (see

https://onionscan.org/reports/may2016.html)

...
Isis did a good job arguing for (A) by claiming that representing (B)

and (C) are "blatant and disgusting workaround[s] to the trust and expectations which onion service operators place in the network." https://lists.torproject.org/pipermail/tor-project/2016-May/000356.html

...
This is me arguing for (B):

https://lists.torproject.org/pipermail/tor-project/2016-May/000411.html

...
I have no link arguing for (C).

I am disappointed that we have a Tor2web design where Tor2web needs to connect to a hidden service first, then check if it has given permission for Tor2web to connect to it. I am also disappointed that this only works for HTTP onions on the default port 80.

I would like to see a much better design for this.

I am also concerned about threat models where a single unwanted connection, or a number of unwanted connections, are security factors. For example: Imagine there is an (unknown) attack which can determine 1 bit of the 1024-bit RSA key per hidden service connection. (Some known attacks on broken crypto systems are like this, as are some side-channels.) Or imagine there is an attack which can determine 1 bit of the IPv4 address per connection.

For security, a hidden service operator decides to only allow 10 connections before rolling over their hidden service to a new key and server.

There are at least 10 connections to known .onion addresses every week, because there are at least 10 Tor2web or memex or onionstats instances on the web. Therefore, every week, the operator must roll over their hidden service, and arrange to notify users of the new address in a secure fashion. Alternately, they must keep the address secret, even from the HSDir hash ring, which is not possible.

Is there an alternative to position (A) that supports threat models like this?

I believe that a technical solution to this threat model is hidden service client authentication (and the next-generation hidden service protocol, when available). However, there is also the possibility of exerting social pressure to prevent people from running servers that continually connect to tor hidden services.

Tim

Tim Wilson-Brown (teor)

teor2345 at gmail dot com PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B ricochet:ekmygaiu4rzgsk6n

Tim

Tim Wilson-Brown (teor)

teor2345 at gmail dot com PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B ricochet:ekmygaiu4rzgsk6n

tor-project mailing list tor-project@lists.torproject.org javascript:_e(%7B%7D,'cvml','tor-project@lists.torproject.org'); https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-project

tor-project mailing list tor-project@lists.torproject.org javascript:_e(%7B%7D,'cvml','tor-project@lists.torproject.org'); https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-project

Tim Wilson-Brown - teor

4:31 a.m.

...

On 21 Jul 2016, at 14:23, Virgil Griffith i@virgil.gr wrote:

Does anyone want to vouch for view (A) ? Note that view (A) is currently enshrined in the ethics guidelines.

I think you've misinterpreted the ethics guidelines here. "crawling" means running a HSDir to discover .onion addresses that would otherwise be private. It doesn't (necessarily) mean accessing web pages on .onion sites using an automated process.

Tim

...

The following are currently in conflict with (A):

the largest tor2web nodes

MEMEX and other government programs

beloved metrics applications like OnionStats

-V

On Tuesday, 19 July 2016, Nurmi, Juha juha.nurmi@ahmia.fi wrote: Hi,

Virgil pointed out several good points with onion search engines.

Anonymous vs. hidden

...
...
...
Whereas some would say Tor users are "anonymous", others would instead say any and everything Tor is "private". I believe this needs to be clarified.

I am publishing a paper about my onion service experiment: I deployed 100 onion servers and followed TCP traffic to these services. As a result, they got accessed by multiple different scanners (curl, wget, browser, scrapers, ssh). This means that some people do HSDir harvesting and scan onions.

Search engines can efficiently map content

...
...
...
For what it's worth, ahmia.fi actually supports regex searching right out of the box. In fact, a single line of JSON spits out all known bitcoin addresses ahmia knows about.

At the moment I have no public documentation how to use regex search but Ahmia supports this feature. Is this good or not? I know that Google has disabled these kind of features because privacy issues.

Is a web-site a public place?

...
...
...
Here's how I currently see this. I put on my amateur legal hat and say, "Well, the Internet/world-wide-web is considered a public space. Onion-sites are like the web, but with masked speakers."

Good point! I think you are right.

Best, Juha

On Thu, Jul 7, 2016 at 9:28 AM, Virgil Griffith i@virgil.gr wrote:

...
you might want to remove the client IP address (X-Forwarded-For) from HTTP headers

Agreed! And yes we already remove x-forwarded-for. https://github.com/globaleaks/Tor2web/blob/master/tor2web/t2w.py#L701

I recall that the very, very beginning we had a python proxy library automatically adding x-forwarded-for, but once we realized it was doing that we corrected it. FWIW, it was actually Aaron who wrote that code ;)

AFAIK Tor2web hasn't leaked any privacy-invading headers for sometime. If ones are discovered they would be fixed ASAP.

...
Is the opt-out permanent, or does your server re-check every time it connects? I can imagine there being issues with either model - one involves storing a list, the other, regular connections.

I don't know. This is Google/Bing's department. Do we have someone on list familiar enough with either? If I were to guess the Googley/Bingy-way of doing this, I'd imagine them storing the list, and then when crawling the site again they'd do a HEAD request to see if the /robots.txt has changed. And if the /robots.txt has changed, to overwrite their stored list.

...
I am disappointed that we have a Tor2web design where Tor2web needs to connect to a hidden service first, then check if it has given permission for Tor2web to connect to it.

/robots.txt isn't a permission to "connect to", it's a permission to crawl/index. I'm aware of no standard within or outside of Tor to say whether node A has permission to connect to node B. If such a standard or even unofficial exists I'm down for spending some weekends implementing it.

...
I am also disappointed that this only works for HTTP onions on the default port 80.

I agree completely. But if the issue is operator privacy, isn't it even *better* that tor2web only works for port 80? As an aside, there is tor2tcp at: https://cryptoparty.at/tor2tcp

...
I am also concerned about threat models where a single unwanted connection, or a number of unwanted connections, are security factors. For example: Imagine there is an (unknown) attack which can determine 1 bit of the 1024-bit RSA key per hidden service connection. (Some known attacks on broken crypto systems are like this, as are some side-channels.) Or imagine there is an attack which can determine 1 bit of the IPv4 address per connection. Is there an alternative to position (A) that supports threat models like this?

I don't have a good solution to this. As stated above, I'm aware of no protocol for saying "Please don't connect to me." The security person in me is a little skeptical how useful it would be---if someone wanted to make many connections to learn a private key, I presume she won't be obeying said requests. However, if someone doesn't want to be connected to, upon such a standard existing I would happily abide by it.

...
there is also the possibility of exerting social pressure to prevent people from running servers that continually connect to tor hidden services.

The closest things I know of for social pressure are:

(1) Liberal caching headers in the HTTP response:
max-age=604800	   #can be cached by browser and any intermediary caches for up to 1 week
(2) In /robots.txt putting long crawl-delays:
User-Agent: *
Crawl-delay: 86400   #wait 1 day between each fetch.
...
I believe that a technical solution to this threat model is hidden service client authentication (and the next-generation hidden service protocol, when available).

Agreed.

-V

On Thu, Jul 7, 2016 at 1:44 PM, Tim Wilson-Brown - teor teor2345@gmail.com wrote:

...
On 7 Jul 2016, at 15:24, Virgil Griffith i@virgil.gr wrote:

...
How do you make sure that Tor2web users are anonymised (as possible) when accessing hidden services?

I make a good faith effort not to wantonly reveal personally identifying information. But in short, it's hard. I urge people to think of tor2web nodes as closer to Twitter where they record what links you click. I wholly support having the "where is Tor2web in regards to user privacy" discussion (hopefully could even make some improvements to it!), but it is orthogonal to the "robots.txt on .onion" discussion. Let's address the robots.txt issue and then we can return to Tor2web user-privacy.

Well, as a separate issue, you might want to remove the client IP address (X-Forwarded-For) from HTTP headers your caching proxies send to hidden services. And work out if any of the other headers are sensitive.

...
On 7 Jul 2016, at 14:40, Virgil Griffith i@virgil.gr wrote:

So now we have *three* different positions among respected members of the Tor community.

(A) isis et al: robots.txt is insufficient --- "Consent is not the absence of saying 'no' — it is explicitly saying 'yes'."

(B) onionlink/ahmia/notevil/grams: we respect robots.txt --- "Default is yes, but you can always opt-out."

Is the opt-out permanent, or does your server re-check every time it connects? I can imagine there being issues with either model - one involves storing a list, the other, regular connections.

...
(C) onionstats/memex: we ignore robots.txt --- "Don't care even if you opt-out." (see https://onionscan.org/reports/may2016.html)

Isis did a good job arguing for (A) by claiming that representing (B) and (C) are "blatant and disgusting workaround[s] to the trust and expectations which onion service operators place in the network." https://lists.torproject.org/pipermail/tor-project/2016-May/000356.html

This is me arguing for (B): https://lists.torproject.org/pipermail/tor-project/2016-May/000411.html

I have no link arguing for (C).

I am disappointed that we have a Tor2web design where Tor2web needs to connect to a hidden service first, then check if it has given permission for Tor2web to connect to it. I am also disappointed that this only works for HTTP onions on the default port 80.

I would like to see a much better design for this.

I am also concerned about threat models where a single unwanted connection, or a number of unwanted connections, are security factors. For example: Imagine there is an (unknown) attack which can determine 1 bit of the 1024-bit RSA key per hidden service connection. (Some known attacks on broken crypto systems are like this, as are some side-channels.) Or imagine there is an attack which can determine 1 bit of the IPv4 address per connection.

For security, a hidden service operator decides to only allow 10 connections before rolling over their hidden service to a new key and server.

There are at least 10 connections to known .onion addresses every week, because there are at least 10 Tor2web or memex or onionstats instances on the web. Therefore, every week, the operator must roll over their hidden service, and arrange to notify users of the new address in a secure fashion. Alternately, they must keep the address secret, even from the HSDir hash ring, which is not possible.

Is there an alternative to position (A) that supports threat models like this?

I believe that a technical solution to this threat model is hidden service client authentication (and the next-generation hidden service protocol, when available). However, there is also the possibility of exerting social pressure to prevent people from running servers that continually connect to tor hidden services.

Tim

Tim Wilson-Brown (teor)

teor2345 at gmail dot com PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B ricochet:ekmygaiu4rzgsk6n

Tim

Tim Wilson-Brown (teor)

teor2345 at gmail dot com PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B ricochet:ekmygaiu4rzgsk6n

tor-project mailing list tor-project@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-project

tor-project mailing list tor-project@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-project

tor-project mailing list tor-project@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-project

Tim Wilson-Brown (teor)

teor2345 at gmail dot com PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B OTR 8F39BCAC 9C9DDF9A DF5FAE48 1D7D99D4 3B406880 ricochet:ekmygaiu4rzgsk6n

Virgil Griffith

8:10 a.m.

...

I think you've misinterpreted the ethics guidelines here. "crawling" means running a HSDir to discover .onion addresses that would

otherwise be private.

...

It doesn't (necessarily) mean accessing web pages on .onion sites using

an automated process.

If so, this is news to me, and I would be delighted to hear it.

Can we get a confirmation then that /robots.txt is a totally cool standard?

-V

On Thu, Jul 21, 2016 at 12:31 PM, Tim Wilson-Brown - teor < teor2345@gmail.com> wrote:

...

...
On 21 Jul 2016, at 14:23, Virgil Griffith i@virgil.gr wrote:

Does anyone want to vouch for view (A) ? Note that view (A) is

currently enshrined in the ethics guidelines.

I think you've misinterpreted the ethics guidelines here. "crawling" means running a HSDir to discover .onion addresses that would otherwise be private. It doesn't (necessarily) mean accessing web pages on .onion sites using an automated process.

Tim

...
The following are currently in conflict with (A):

the largest tor2web nodes

MEMEX and other government programs

beloved metrics applications like OnionStats

-V

On Tuesday, 19 July 2016, Nurmi, Juha juha.nurmi@ahmia.fi wrote: Hi,

Virgil pointed out several good points with onion search engines.

Anonymous vs. hidden

...
...
...
Whereas some would say Tor users are "anonymous", others would

instead say any and everything Tor is "private". I believe this needs to be clarified.

...
I am publishing a paper about my onion service experiment: I deployed

100 onion servers and followed TCP traffic to these services. As a result, they got accessed by multiple different scanners (curl, wget, browser, scrapers, ssh). This means that some people do HSDir harvesting and scan onions.

...

Search engines can efficiently map content

...
...
...
For what it's worth, ahmia.fi actually supports regex searching

right out of the box. In fact, a single line of JSON spits out all known bitcoin addresses ahmia knows about.

...
At the moment I have no public documentation how to use regex search but

Ahmia supports this feature. Is this good or not? I know that Google has disabled these kind of features because privacy issues.

...

Is a web-site a public place?

...
...
...
Here's how I currently see this. I put on my amateur legal hat and

say, "Well, the Internet/world-wide-web is considered a public space. Onion-sites are like the web, but with masked speakers."

...
Good point! I think you are right.

Best, Juha

On Thu, Jul 7, 2016 at 9:28 AM, Virgil Griffith i@virgil.gr wrote:

...
you might want to remove the client IP address (X-Forwarded-For) from

HTTP headers

...
Agreed! And yes we already remove x-forwarded-for. https://github.com/globaleaks/Tor2web/blob/master/tor2web/t2w.py#L701

I recall that the very, very beginning we had a python proxy library

automatically adding x-forwarded-for, but once we realized it was doing that we corrected it. FWIW, it was actually Aaron who wrote that code ;)

...
AFAIK Tor2web hasn't leaked any privacy-invading headers for sometime.

If ones are discovered they would be fixed ASAP.

...
...
Is the opt-out permanent, or does your server re-check every time it

connects?

...
...
I can imagine there being issues with either model - one involves

storing a list, the other, regular connections.

...
I don't know. This is Google/Bing's department. Do we have someone on

list familiar enough with either? If I were to guess the Googley/Bingy-way of doing this, I'd imagine them storing the list, and then when crawling the site again they'd do a HEAD request to see if the /robots.txt has changed. And if the /robots.txt has changed, to overwrite their stored list.

...
...
I am disappointed that we have a Tor2web design where Tor2web needs to

connect to a hidden service first, then check if it has given permission for Tor2web to connect to it.

...
/robots.txt isn't a permission to "connect to", it's a permission to

crawl/index. I'm aware of no standard within or outside of Tor to say whether node A has permission to connect to node B. If such a standard or even unofficial exists I'm down for spending some weekends implementing it.

...
...
I am also disappointed that this only works for HTTP onions on the

default port 80.

...
I agree completely. But if the issue is operator privacy, isn't it even

*better* that tor2web only works for port 80? As an aside, there is tor2tcp at: https://cryptoparty.at/tor2tcp

...
...
I am also concerned about threat models where a single unwanted

connection, or a number of unwanted connections, are security factors.

...
...
For example: Imagine there is an (unknown) attack which can determine 1 bit of the

1024-bit RSA key per hidden service connection.

...
...
(Some known attacks on broken crypto systems are like this, as are

some side-channels.)

...
...
Or imagine there is an attack which can determine 1 bit of the IPv4

address per connection.

...
...
Is there an alternative to position (A) that supports threat models

like this?

...
I don't have a good solution to this. As stated above, I'm aware of no

protocol for saying "Please don't connect to me." The security person in me is a little skeptical how useful it would be---if someone wanted to make many connections to learn a private key, I presume she won't be obeying said requests. However, if someone doesn't want to be connected to, upon such a standard existing I would happily abide by it.

...
...
there is also the possibility of exerting social pressure to prevent

people from running servers that continually connect to tor hidden services.

...
The closest things I know of for social pressure are:

(1) Liberal caching headers in the HTTP response:
max-age=604800           #can be cached by browser and any intermediary
caches for up to 1 week

...
(2) In /robots.txt putting long crawl-delays:
User-Agent: * Crawl-delay: 86400 #wait 1 day between each fetch.
> I believe that a technical solution to this threat model is hidden
service client authentication (and the next-generation hidden service protocol, when available).

...
Agreed.

-V

On Thu, Jul 7, 2016 at 1:44 PM, Tim Wilson-Brown - teor <

teor2345@gmail.com> wrote:

...
...
On 7 Jul 2016, at 15:24, Virgil Griffith i@virgil.gr wrote:

...
How do you make sure that Tor2web users are anonymised (as possible)

when accessing hidden services?

...
...
I make a good faith effort not to wantonly reveal personally

identifying information. But in short, it's hard. I urge people to think of tor2web nodes as closer to Twitter where they record what links you click. I wholly support having the "where is Tor2web in regards to user privacy" discussion (hopefully could even make some improvements to it!), but it is orthogonal to the "robots.txt on .onion" discussion. Let's address the robots.txt issue and then we can return to Tor2web user-privacy.

...
Well, as a separate issue, you might want to remove the client IP

address (X-Forwarded-For) from HTTP headers your caching proxies send to hidden services. And work out if any of the other headers are sensitive.

...
...
On 7 Jul 2016, at 14:40, Virgil Griffith i@virgil.gr wrote:

So now we have *three* different positions among respected members of

the Tor community.

...
...
(A) isis et al: robots.txt is insufficient --- "Consent is not the absence of saying 'no' — it is explicitly

saying 'yes'."

...
...
(B) onionlink/ahmia/notevil/grams: we respect robots.txt --- "Default is yes, but you can always opt-out."

Is the opt-out permanent, or does your server re-check every time it

connects?

...
I can imagine there being issues with either model - one involves

storing a list, the other, regular connections.

...
...
(C) onionstats/memex: we ignore robots.txt --- "Don't care even if you opt-out." (see

https://onionscan.org/reports/may2016.html)

...
...
Isis did a good job arguing for (A) by claiming that representing (B)

and (C) are "blatant and disgusting workaround[s] to the trust and expectations which onion service operators place in the network." https://lists.torproject.org/pipermail/tor-project/2016-May/000356.html

...
...
This is me arguing for (B):

https://lists.torproject.org/pipermail/tor-project/2016-May/000411.html

...
...
I have no link arguing for (C).

I am disappointed that we have a Tor2web design where Tor2web needs to

connect to a hidden service first, then check if it has given permission for Tor2web to connect to it. I am also disappointed that this only works for HTTP onions on the default port 80.

...
I would like to see a much better design for this.

I am also concerned about threat models where a single unwanted

connection, or a number of unwanted connections, are security factors.

...
For example: Imagine there is an (unknown) attack which can determine 1 bit of the

1024-bit RSA key per hidden service connection.

...
(Some known attacks on broken crypto systems are like this, as are some

side-channels.)

...
Or imagine there is an attack which can determine 1 bit of the IPv4

address per connection.

...
For security, a hidden service operator decides to only allow 10

connections before rolling over their hidden service to a new key and server.

...
There are at least 10 connections to known .onion addresses every week,

because there are at least 10 Tor2web or memex or onionstats instances on the web.

...
Therefore, every week, the operator must roll over their hidden service,

and arrange to notify users of the new address in a secure fashion. Alternately, they must keep the address secret, even from the HSDir hash ring, which is not possible.

...
Is there an alternative to position (A) that supports threat models like

this?

...
I believe that a technical solution to this threat model is hidden

service client authentication (and the next-generation hidden service protocol, when available).

...
However, there is also the possibility of exerting social pressure to

prevent people from running servers that continually connect to tor hidden services.

...
Tim

Tim Wilson-Brown (teor)

teor2345 at gmail dot com PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B ricochet:ekmygaiu4rzgsk6n

Tim

Tim Wilson-Brown (teor)

teor2345 at gmail dot com PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B ricochet:ekmygaiu4rzgsk6n

tor-project mailing list tor-project@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-project

tor-project mailing list tor-project@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-project

tor-project mailing list tor-project@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-project

Tim Wilson-Brown (teor)

teor2345 at gmail dot com PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B OTR 8F39BCAC 9C9DDF9A DF5FAE48 1D7D99D4 3B406880 ricochet:ekmygaiu4rzgsk6n

tor-project mailing list tor-project@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-project

Tim Wilson-Brown - teor

23 Jul 23 Jul

12:37 a.m.

...

On 21 Jul 2016, at 18:10, Virgil Griffith i@virgil.gr wrote:

...
I think you've misinterpreted the ethics guidelines here. "crawling" means running a HSDir to discover .onion addresses that would otherwise be private. It doesn't (necessarily) mean accessing web pages on .onion sites using an automated process.

If so, this is news to me, and I would be delighted to hear it.

Can we get a confirmation then that /robots.txt is a totally cool standard?

That's not what I said, Virgil. There are a number of different opinions here. No interpretation of the ethics guidelines changes that. It feels like you're engaging in rules lawyering, trying to find a policy or statement that will let you do what you want to do. Please engage with people's concerns instead.

For the sake of clarifying what I meant, here is my analysis:

When I've seen people talk about "crawling .onion sites", the issue that has received the most focus is the harvesting of .onion addresses by running a malicious HSDir. We do things to prevent this behaviour, including blacklisting HSDirs. This behaviour is clearly unethical, there is a community consensus about it, and we invest resources in preventing it.

As for accessing .onion sites via an automated process or non-anonymous proxy (e.g. Tor2web), that's something we're still talking about. There are significant issues around client anonymity, server anonymity, and access to sensitive data. We might decide we want to actively prevent it. We might decide we don't want to put any effort into supporting it in future.

There's also the issue of searching these sites. Perhaps some kinds of search are ok, but others are too powerful (like regular expressions, which many search sites avoid). Again, this is something we're discussing.

And there's a final issue here: your previous and current behaviour. You've attempted to monetise Tor2web client information by selling it to investigative agencies, including providing public samples. You've tried to push the discussion (towards supporting you?) by acting in ways that could potentially harm users. You've repeatedly released potentially sensitive client logs (some had minimal redactions). People have asked you to stop. But you still keep on releasing logs and lists of scraped data in your emails.

You just can't seem to stop showing off the kinds of data that you have access to. You've illustrated exactly the sort of things that an unscrupulous Tor2web operator can do. (Even if you were doing it for demonstration purposes, the ethical way to demonstrate is with test data, not live client requests from actual people.)

I don't know if I'd trust you to be in a position where you see client requests. I'm not sure I'd even trust you to run a Guard node, and Tor2web admins see far more than a Guard node does.

-----

As an aside:

You might want to enable automatic redirects from http://onion.link to https://onion.link. I know some people object to automatic HTTPS redirects. But I think in this case it's an important protection for clients.

Normally I'd be concerned you use Google Analytics rather than a local analytics solution. But since you're loading an embedded Google search box, Google gets all the client data anyway, including search queries and client IP addresses. You could use a privacy-preserving search site, or proxy the requests to Google to hide client IP addresses.

Your headers look fine, but onion.link leaves the connection open for a long time if the hidden service receives a request, and keeps the connection open, but doesn't respond. I wonder if this is a bug in Tor2web, or in onion.link, or desired behaviour. In either case, it's a denial of service risk. (It's not that serious, because each open client connection requires an open hidden service connection. And I wonder if it would time out eventually.)

Tim

...

-V

On Thu, Jul 21, 2016 at 12:31 PM, Tim Wilson-Brown - teor teor2345@gmail.com wrote:

...
On 21 Jul 2016, at 14:23, Virgil Griffith i@virgil.gr wrote:

Does anyone want to vouch for view (A) ? Note that view (A) is currently enshrined in the ethics guidelines.

I think you've misinterpreted the ethics guidelines here. "crawling" means running a HSDir to discover .onion addresses that would otherwise be private. It doesn't (necessarily) mean accessing web pages on .onion sites using an automated process.

Tim

...
The following are currently in conflict with (A):

the largest tor2web nodes

MEMEX and other government programs

beloved metrics applications like OnionStats

-V

On Tuesday, 19 July 2016, Nurmi, Juha juha.nurmi@ahmia.fi wrote: Hi,

Virgil pointed out several good points with onion search engines.

Anonymous vs. hidden

...
...
...
Whereas some would say Tor users are "anonymous", others would instead say any and everything Tor is "private". I believe this needs to be clarified.

I am publishing a paper about my onion service experiment: I deployed 100 onion servers and followed TCP traffic to these services. As a result, they got accessed by multiple different scanners (curl, wget, browser, scrapers, ssh). This means that some people do HSDir harvesting and scan onions.

Search engines can efficiently map content

...
...
...
For what it's worth, ahmia.fi actually supports regex searching right out of the box. In fact, a single line of JSON spits out all known bitcoin addresses ahmia knows about.

At the moment I have no public documentation how to use regex search but Ahmia supports this feature. Is this good or not? I know that Google has disabled these kind of features because privacy issues.

Is a web-site a public place?

...
...
...
Here's how I currently see this. I put on my amateur legal hat and say, "Well, the Internet/world-wide-web is considered a public space. Onion-sites are like the web, but with masked speakers."

Good point! I think you are right.

Best, Juha

On Thu, Jul 7, 2016 at 9:28 AM, Virgil Griffith i@virgil.gr wrote:

...
you might want to remove the client IP address (X-Forwarded-For) from HTTP headers

Agreed! And yes we already remove x-forwarded-for. https://github.com/globaleaks/Tor2web/blob/master/tor2web/t2w.py#L701

I recall that the very, very beginning we had a python proxy library automatically adding x-forwarded-for, but once we realized it was doing that we corrected it. FWIW, it was actually Aaron who wrote that code ;)

AFAIK Tor2web hasn't leaked any privacy-invading headers for sometime. If ones are discovered they would be fixed ASAP.

...
Is the opt-out permanent, or does your server re-check every time it connects? I can imagine there being issues with either model - one involves storing a list, the other, regular connections.

I don't know. This is Google/Bing's department. Do we have someone on list familiar enough with either? If I were to guess the Googley/Bingy-way of doing this, I'd imagine them storing the list, and then when crawling the site again they'd do a HEAD request to see if the /robots.txt has changed. And if the /robots.txt has changed, to overwrite their stored list.

...
I am disappointed that we have a Tor2web design where Tor2web needs to connect to a hidden service first, then check if it has given permission for Tor2web to connect to it.

/robots.txt isn't a permission to "connect to", it's a permission to crawl/index. I'm aware of no standard within or outside of Tor to say whether node A has permission to connect to node B. If such a standard or even unofficial exists I'm down for spending some weekends implementing it.

...
I am also disappointed that this only works for HTTP onions on the default port 80.

I agree completely. But if the issue is operator privacy, isn't it even *better* that tor2web only works for port 80? As an aside, there is tor2tcp at: https://cryptoparty.at/tor2tcp

...
I am also concerned about threat models where a single unwanted connection, or a number of unwanted connections, are security factors. For example: Imagine there is an (unknown) attack which can determine 1 bit of the 1024-bit RSA key per hidden service connection. (Some known attacks on broken crypto systems are like this, as are some side-channels.) Or imagine there is an attack which can determine 1 bit of the IPv4 address per connection. Is there an alternative to position (A) that supports threat models like this?

I don't have a good solution to this. As stated above, I'm aware of no protocol for saying "Please don't connect to me." The security person in me is a little skeptical how useful it would be---if someone wanted to make many connections to learn a private key, I presume she won't be obeying said requests. However, if someone doesn't want to be connected to, upon such a standard existing I would happily abide by it.

...
there is also the possibility of exerting social pressure to prevent people from running servers that continually connect to tor hidden services.

The closest things I know of for social pressure are:

(1) Liberal caching headers in the HTTP response:
max-age=604800           #can be cached by browser and any intermediary caches for up to 1 week
(2) In /robots.txt putting long crawl-delays:
User-Agent: *
Crawl-delay: 86400   #wait 1 day between each fetch.
...
I believe that a technical solution to this threat model is hidden service client authentication (and the next-generation hidden service protocol, when available).

Agreed.

-V

On Thu, Jul 7, 2016 at 1:44 PM, Tim Wilson-Brown - teor teor2345@gmail.com wrote:

...
On 7 Jul 2016, at 15:24, Virgil Griffith i@virgil.gr wrote:

...
How do you make sure that Tor2web users are anonymised (as possible) when accessing hidden services?

I make a good faith effort not to wantonly reveal personally identifying information. But in short, it's hard. I urge people to think of tor2web nodes as closer to Twitter where they record what links you click. I wholly support having the "where is Tor2web in regards to user privacy" discussion (hopefully could even make some improvements to it!), but it is orthogonal to the "robots.txt on .onion" discussion. Let's address the robots.txt issue and then we can return to Tor2web user-privacy.

Well, as a separate issue, you might want to remove the client IP address (X-Forwarded-For) from HTTP headers your caching proxies send to hidden services. And work out if any of the other headers are sensitive.

...
On 7 Jul 2016, at 14:40, Virgil Griffith i@virgil.gr wrote:

So now we have *three* different positions among respected members of the Tor community.

(A) isis et al: robots.txt is insufficient --- "Consent is not the absence of saying 'no' — it is explicitly saying 'yes'."

(B) onionlink/ahmia/notevil/grams: we respect robots.txt --- "Default is yes, but you can always opt-out."

Is the opt-out permanent, or does your server re-check every time it connects? I can imagine there being issues with either model - one involves storing a list, the other, regular connections.

...
(C) onionstats/memex: we ignore robots.txt --- "Don't care even if you opt-out." (see https://onionscan.org/reports/may2016.html)

Isis did a good job arguing for (A) by claiming that representing (B) and (C) are "blatant and disgusting workaround[s] to the trust and expectations which onion service operators place in the network." https://lists.torproject.org/pipermail/tor-project/2016-May/000356.html

This is me arguing for (B): https://lists.torproject.org/pipermail/tor-project/2016-May/000411.html

I have no link arguing for (C).

I am disappointed that we have a Tor2web design where Tor2web needs to connect to a hidden service first, then check if it has given permission for Tor2web to connect to it. I am also disappointed that this only works for HTTP onions on the default port 80.

I would like to see a much better design for this.

I am also concerned about threat models where a single unwanted connection, or a number of unwanted connections, are security factors. For example: Imagine there is an (unknown) attack which can determine 1 bit of the 1024-bit RSA key per hidden service connection. (Some known attacks on broken crypto systems are like this, as are some side-channels.) Or imagine there is an attack which can determine 1 bit of the IPv4 address per connection.

For security, a hidden service operator decides to only allow 10 connections before rolling over their hidden service to a new key and server.

There are at least 10 connections to known .onion addresses every week, because there are at least 10 Tor2web or memex or onionstats instances on the web. Therefore, every week, the operator must roll over their hidden service, and arrange to notify users of the new address in a secure fashion. Alternately, they must keep the address secret, even from the HSDir hash ring, which is not possible.

Is there an alternative to position (A) that supports threat models like this?

I believe that a technical solution to this threat model is hidden service client authentication (and the next-generation hidden service protocol, when available). However, there is also the possibility of exerting social pressure to prevent people from running servers that continually connect to tor hidden services.

Tim

Tim Wilson-Brown (teor)

teor2345 at gmail dot com PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B ricochet:ekmygaiu4rzgsk6n

Tim

Tim Wilson-Brown (teor)

teor2345 at gmail dot com PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B ricochet:ekmygaiu4rzgsk6n

tor-project mailing list tor-project@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-project

tor-project mailing list tor-project@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-project

tor-project mailing list tor-project@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-project
Tim Wilson-Brown (teor)

teor2345 at gmail dot com PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B OTR 8F39BCAC 9C9DDF9A DF5FAE48 1D7D99D4 3B406880 ricochet:ekmygaiu4rzgsk6n

tor-project mailing list tor-project@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-project

tor-project mailing list tor-project@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-project

Tim Wilson-Brown (teor)

teor2345 at gmail dot com PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B OTR 8F39BCAC 9C9DDF9A DF5FAE48 1D7D99D4 3B406880 ricochet:ekmygaiu4rzgsk6n

Virgil Griffith

25 Jul 25 Jul

9:25 a.m.

...

It feels like you're engaging in rules lawyering, trying to find a policy

or statement that will let you do what you want to do.

Apologies---that's unintended. I simply seek a policy or statement that clarifies this issue one way or the other. If the community wants to explicitly ban .onion search engines, that is their right. I personally consider such a ban to be immensely unwise, but I would be satisfied with a clarification either way. Right now it's in a funny limbo that seemingly no one is willing to resolve (aside from yourself. Thanks BTW.)

====================================

...

Please engage with people's concerns instead.

I'm happy to calmly discuss people's concerns about onion.link and tor2web privacy, but I insist on clarifying the relatively easy robots.txt issue first. Talking about Virgil-specifics or whether Virgil-is-a-tolerable-person is currently a distraction. Because if we conclude that robots.txt is fully sufficient, and thus .onion content is by default "public data", then the whether virgil-is-tolerable discussion changes drastically. If robots.txt is deemed a sufficient standard, then it's worth going forward on a longer discussion where I hope to clarify the judgement calls I've made.

...

When I've seen people talk about "crawling .onion sites", the issue that

has

...

received the most focus is the harvesting of .onion addresses by running

...

malicious HSDir. We do things to prevent this behaviour, including blacklisting HSDirs. This behaviour is clearly unethical, there is a community consensus about it, and we invest resources in preventing it.

Sure. No complaints here.

...

As for accessing .onion sites via an automated process or non-anonymous

proxy

...

(e.g. Tor2web), that's something we're still talking about. There are significant issues around client anonymity, server anonymity, and access

...

sensitive data. We might decide we want to actively prevent it. We might decide we don't want to put any effort into supporting it in future.

...

There's also the issue of searching these sites. Perhaps some kinds of

...

are ok, but others are too powerful (like regular expressions, which many search sites avoid). Again, this is something we're discussing.

This is me imploring, begging, to have that discussion on search engines, regexes, etc. I've yet to find any argument for position (A), which as far as I can tell is the position currently enshrined in the ethics guidelines. This is me asking for either an argument for position (A), or a clarification that robots.txt is fine.

===== Even though I said I didn't want to get into tor2web until the robots.txt is largely addressed, I'm going to discuss it briefly just as an olive branch.

...

I don't know if I'd trust you to be in a position where you see client requests. I'm not sure I'd even trust you to run a Guard node, and Tor2web admins

see

...

far more than a Guard node does.

This is interesting. Because I actually consider a Guard node to have more private information than a Tor2web node. I claim two things:

(1) Whereas people use TBB for *things that matter* and have an expectation of privacy. I claim that tor2web users are interested in convenience and have little expectation of privacy. I see negligible difference between what onion.link does and what Twitter does when they write URLs to goto t.co so they can record on the clicks.

To put it another way, I do not consider Tor2web users to be "Tor users".

(2) Using the same logic as (1), I would argue Tor2web sees *less* private information than a Tor guard node. A guard node is half of the map to users who have explicitly said, "I wish my traffic to be unlinkable". Violating this would obviously be an "attack on Tor users". Offerring logs for a guard node would be zomg a violation of expectation of privacy and a damage to the network. I am 110% on board here. I wholly support banning anyone from the community who sells logs from TBB users.

-----

As an aside:

...

You might want to enable automatic redirects from http://onion.link to https://onion.link.

Already do it. I also recently enabled DNSSEC because some european ISPs were doing DNS poisoning and I wanted to stop them from doing that.

...

Normally I'd be concerned you use Google Analytics rather than a local analytics solution.

I've removed the Google Analytics. It'll go out in the next weekly release.

===========

The other issues you cited are worth discussing, and I welcome having them. But I want to resolve the comparatively easy robots.txt discussion first. I was asked to wait a month, and I did so. Can now we have that discussion? Or does it have to postpone another month? To kickstart the discussion, I gave the three vidws I've heard:

...

(A) isis et al: robots.txt is insufficient --- "Consent is not the absence of saying 'no' — it is explicitly saying

'yes'."

...

(B) onionlink/ahmia/notevil/grams: we respect robots.txt --- "Default is yes, but you can always opt-out."

(C) onionstats/memex: we ignore robots.txt --- "Don't care even if you opt-out." (see

https://onionscan.org/reports/may2016.html)

...

Isis did a good job arguing for (A) by claiming that representing (B) and

(C) are "blatant and disgusting workaround[s] to the trust and expectations which onion service operators place in the network." https://lists.torproject.org/pipermail/tor-project/2016-May/000356.html

...

This is me arguing for (B):

https://lists.torproject.org/pipermail/tor-project/2016-May/000411.html

...

I have no link arguing for (C).

I am imploring for there to be discussion arguing (A), (B), (C), or (D) other. Thus far we've gotten an argument for (A) from Isis and an argument for (B) from Juha.`

-V

Tim Wilson-Brown - teor

1:43 p.m.

...

On 25 Jul 2016, at 19:25, Virgil Griffith i@virgil.gr wrote:

I don't know if I'd trust you to be in a position where you see client

...
requests. I'm not sure I'd even trust you to run a Guard node, and Tor2web admins see far more than a Guard node does.

This is interesting. Because I actually consider a Guard node to have more private information than a Tor2web node. I claim two things:

(1) Whereas people use TBB for *things that matter* and have an expectation of privacy. I claim that tor2web users are interested in convenience and have little expectation of privacy. I see negligible difference between what onion.link does and what Twitter does when they write URLs to goto t.co so they can record on the clicks.

To put it another way, I do not consider Tor2web users to be "Tor users".

I disagree with you, and therefore think that keeping detailed logs is unethical, particularly for commercial or capability demonstration purposes. And when the name of the service is "Tor2web", it's hard to dissociate it from Tor.

And I would put it to you that the ethics guidelines, and various other community standards, aim to protect user privacy in general, not just for Tor Browser users, and not just when users expect privacy.

If you want a different standard, where we're allowed to keep identifiable information about some users of some tools accessing them via some methods, then you really need to make a strong argument for it. Otherwise, the overarching principle applies.

...

(2) Using the same logic as (1), I would argue Tor2web sees *less* private information than a Tor guard node. A guard node is half of the map to users who have explicitly said, "I wish my traffic to be unlinkable". Violating this would obviously be an "attack on Tor users". Offerring logs for a guard node would be zomg a violation of expectation of privacy and a damage to the network. I am 110% on board here. I wholly support banning anyone from the community who sells logs from TBB users.

Guard nodes don't see what sites users are accessing. Tor2web nodes do. So it's possible to create logs with user IP addresses and the onion sites they've accessed (as you've demonstrated). A guard can't do that.

...

As an aside:

...
You might want to enable automatic redirects from http://onion.link to https://onion.link.

Already do it. I also recently enabled DNSSEC because some european ISPs were doing DNS poisoning and I wanted to stop them from doing that.

It didn't work for me when I tried it before sending my last email. Now it does. Thanks!

...

...
Normally I'd be concerned you use Google Analytics rather than a local analytics solution.

I've removed the Google Analytics. It'll go out in the next weekly release.

Thanks again, but the search is still Google, so user IPs and onion sites not only go to onion.link, but also Google.

...

===========

The other issues you cited are worth discussing, and I welcome having them. But I want to resolve the comparatively easy robots.txt discussion first. I was asked to wait a month, and I did so. Can now we have that discussion? Or does it have to postpone another month? To kickstart the discussion, I gave the three vidws I've heard:

...
(A) isis et al: robots.txt is insufficient --- "Consent is not the absence of saying 'no' — it is explicitly saying 'yes'."

(B) onionlink/ahmia/notevil/grams: we respect robots.txt --- "Default is yes, but you can always opt-out."

(C) onionstats/memex: we ignore robots.txt --- "Don't care even if you opt-out." (see https://onionscan.org/reports/may2016.html)

Isis did a good job arguing for (A) by claiming that representing (B) and (C) are "blatant and disgusting workaround[s] to the trust and expectations which onion service operators place in the network." https://lists.torproject.org/pipermail/tor-project/2016-May/000356.html

This is me arguing for (B): https://lists.torproject.org/pipermail/tor-project/2016-May/000411.html

I have no link arguing for (C).

I am imploring for there to be discussion arguing (A), (B), (C), or (D) other. Thus far we've gotten an argument for (A) from Isis and an argument for (B) from Juha.`

You seem to be trying very hard to make this conversation happen on your schedule. But maybe it's going to take time and thought and even research and experiments for this conversation to develop. Perhaps you'll have to live with the uncertainty for a while.

I'm not going to repeat what I said previously about client authentication, but I do have something new to add: Some recent US legal judgements require explicit permission to access every website for the wider Internet: without permission, it's illegal to access any website. So that's is one reason to be wary of using explicit permission to access as our standard - we'd likely oppose it when applied to non-onion websites.

Then again, maybe our expectations of the wider Internet and .onion sites are different, and should be different.

Tim

Tim Wilson-Brown (teor)

teor2345 at gmail dot com PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B OTR 8F39BCAC 9C9DDF9A DF5FAE48 1D7D99D4 3B406880 ricochet:ekmygaiu4rzgsk6n

Virgil Griffith

2:41 p.m.

I had hoped to discuss robots.txt instead of Tor2web, but so be it.

...

I disagree with you, and therefore think that keeping detailed logs is unethical, particularly for commercial or capability demonstration

purposes.

I would prefer not to log, and that was the original design. Then when your servers start pushing 700+ hits/sec, it gets hard to sustain without some sort of revenue model. And then because onion.link is such a lawsuit magnet, granting agencies typically don't want to touch it (which I understand). I considered charging for the service, but if only paid users could see the content, that would defeat the purpose of the goal of being a global "whistleblowing platform". So that left the various free models. Among the free models, ads and logs are the tried-and-true methods. So it's what I've tried experimenting with. I'm fine being considered the moral equivalent of a non-profit Twitter which makes a good faith effort to minimize exposure, yet still tracks user behavior.

...

And when the name of the service is "Tor2web", it's hard to dissociate it from Tor.

That's totally reasonable. I think this is actually part of the reason tor2web.org is talking about merely hosting code and letting the implementations brand themselves appropriately.

...

And I would put it to you that the ethics guidelines, and various other community standards, aim to protect user privacy in general, not just for

Tor

...

Browser users, and not just when users expect privacy.

Well that's a claim. And one that certainly settles the issue. In short, I am content with the lesser condition of a world where people can opt-out of tracking. I am ethically satisfied as long that opt-out is easily available. One concern with this approach is that it puts Tor as ethically opposed to every large free online service in the world. Including many that Tor Project uses.

...

If you want a different standard, where we're allowed to keep

identifiable

...

information about some users of some tools accessing them via some

methods,

...

then you really need to make a strong argument for it. Otherwise, the overarching principle applies.

In the worst case I'd think the "privacy all the time" is impractical with the modern Internet. As for Tor itself I don't think it should keep identifiable information, but that's different from excommunicating those who work in organizations that do. This standard would expel many existing productive members of the Tor community.

...

Guard nodes don't see what sites users are accessing. Tor2web nodes do. So it's possible to create logs with user IP addresses and the onion

sites

...

they've accessed (as you've demonstrated). A guard can't do that.

Same position as before. I consider guard node traffic to be vastly more private than tor2web traffic because people using TBB have expressed a desire to be private. Onion.link is about convenient access. For privacy, use TBB if you want privacy while using that convenient access---problem solved.

...

Thanks again, but the search is still Google, so user IPs and onion sites

not

...

only go to onion.link, but also Google.

Open to changing that. After the robots.txt discussion.

...

You seem to be trying very hard to make this conversation happen on your schedule. But maybe it's going to take time and thought and even research and experiments for this conversation to develop. Perhaps you'll have to live with the uncertainty for a while.

Fair enough. I've waited since the Berlin meeting last year for this discussion. And bluntly---it is *really* that hard? Celebrated Tor products already *directly depend* on the answer being either (B) or (C). Given several products already depend on it, is rejecting (A) really that hard?

...

I'm not going to repeat what I said previously about client

authentication,

...

but I do have something new to add: Some recent US legal judgements require explicit permission to access

every

...

website for the wider Internet: without permission, it's illegal to

access

...

any website. So that's is one reason to be wary of using explicit

permission

...

to access as our standard - we'd likely oppose it when applied to

non-onion

...

websites.

I'd oppose it as well.

-V

Virgil Griffith

7:33 p.m.

...

And when the name of the service is "Tor2web", it's hard to dissociate it from Tor.

I thought about this claim that the word "Tor" in "Tor2web" has connotations of high-privacy. And (obviously) Tor2web doesn't have those privacy guarantees.

My first reaction to this was: What part of the "2web" didn't you understand? We're the Tor, you're the web. That's why it's "Tor2web". I do not know how to express this setup any less ambiguously.

But moving beyond that initial reaction, if using the word "Tor" here is a social sticking point, okay sure---let's change it! How about something suitably banal like "Onionproxy"? Tor2web is obviously bigger than me and we'll have to discuss the renaming. But I think we'll find something acceptable that doesn't contain the word "Tor". Does this constitute a step-forward?

-V

Virgil Griffith

27 Jul 27 Jul

5:17 p.m.

We have a verdict on the renaming of Tor2web: http://lists.ghserv.net/pipermail/tor2web-talk/2016-July/000166.html

Although we dislike changing a name we've had for six proud years, we recognize the concern. Let it never be said we are disagreeable.

In the spirit of harmony, we hereby take the name "OnionAccess".

We request confirmation of acceptability of said name.

-V

On Tue, Jul 26, 2016 at 3:33 AM, Virgil Griffith i@virgil.gr wrote:

...

...
And when the name of the service is "Tor2web", it's hard to dissociate

it

...
from Tor.

I thought about this claim that the word "Tor" in "Tor2web" has connotations of high-privacy. And (obviously) Tor2web doesn't have those privacy guarantees.

My first reaction to this was: What part of the "2web" didn't you understand? We're the Tor, you're the web. That's why it's "Tor2web". I do not know how to express this setup any less ambiguously.

But moving beyond that initial reaction, if using the word "Tor" here is a social sticking point, okay sure---let's change it! How about something suitably banal like "Onionproxy"? Tor2web is obviously bigger than me and we'll have to discuss the renaming. But I think we'll find something acceptable that doesn't contain the word "Tor". Does this constitute a step-forward?

-V

Fabio Pietrosanti (naif) - lists

28 Jul 28 Jul

2:54 p.m.

On 7/27/16 7:17 PM, Virgil Griffith wrote:

...

We have a verdict on the renaming of Tor2web: http://lists.ghserv.net/pipermail/tor2web-talk/2016-July/000166.html

Although we dislike changing a name we've had for six proud years, we recognize the concern. Let it never be said we are disagreeable.

In the spirit of harmony, we hereby take the name "OnionAccess".

We request confirmation of acceptability of said name.

It's doable, we may take the occasion to "revamp" the tor2web website (still a bit outdated) with such a new name along the "project github".

The software is pretty stable and have many deployments, not only for general public onion access but also for simple-to-be-deployed https-to-onion proxy for single websites.

We shall see the time and effort to handle the rebranding, we need to check if/how it's possible to manage the change of package name too, or if only work in terms of "public presence and project name".

Tor2web is a long standing project that we keep alive, slowly but actively developed, and it's a real challenge for anyone that wanted to play with it, as it face the reality of the frictions between the "internet world" and the "onion land" (and i got personally blacklisted as "person non grata" by multiple hosting company due to that) ;-)

-- Fabio Pietrosanti (naif) HERMES - Center for Transparency and Digital Human Rights http://logioshermes.org - https://globaleaks.org - https://tor2web.org - https://ahmia.fi

Virgil Griffith

4:06 p.m.

...

"internet world" and the "onion land" (and i got personally blacklisted as "person non grata" by multiple hosting company due to that) ;-)

Same thing happened to me =P

I ended up getting my own IP range which helped a lot with hosting companies. You're welcome to use mine at 103.198.0.0/24 ! But I bet RIPE would give you your own /24 if you explained to them (in person) what your needs were.

-V

On Thu, Jul 28, 2016 at 10:54 PM, Fabio Pietrosanti (naif) - lists < lists@infosecurity.ch> wrote:

...

Tor2web is a long standing project that we keep alive, slowly but actively developed, and it's a real challenge for anyone that wanted to play with it, as it face the reality of the frictions between the "internet world" and the "onion land" (and i got personally blacklisted as "person non grata" by multiple hosting company due to that) ;-)

3024

Age (days ago)

3083

Last active (days ago)

tor-project@lists.torproject.org

22 comments

5 participants

tags (0)

participants (5)

Fabio Pietrosanti (naif) - lists
Griffin Boyce
Nurmi, Juha
Tim Wilson-Brown - teor
Virgil Griffith