On 2017-12-31 at 10:48:52 +0000, Yawning Angel yawning@schwanenlied.me wrote:
This is pointless because internationalized domain names are standardized around Punycode encoding (Unicode<->ASCII), and said standard is supported by applications that support IDN queries.
I am firmly against this change, and I'm not particularly thrilled by the thought of homograph attacks either.
Happy New Year, Yawning; and apologies for the delayed reply. I thought I’d best work up some code for an object demonstration of why I urge the importance of UTF-8 (and also embedded spaces, which I forgot to mention explicitly).
Here is an 8-word mnemonic phrase encoding for Wikileaks (http://wlupld3ptjvsgwqw.onion/), in 8 different languages or writing systems:
real element glow tennis pluck museum hair shuffle 洁 爱 唱 仰 泪 吴 乎 怒 潔 愛 唱 仰 淚 吳 乎 怒 parole distance fautif sombre notoire loyal flairer ratisser retina erba idillio suonare potassio opposto india scuderia にもつ けろけろ しちりん ほめる とかす たんまつ しゃうん はんしゃ 잠자리 반죽 상품 큰딸 이불 열차 선풍기 중반 pie dulce gimnasio tabla oscuro molde guerra repetir
Imagine an activist whispering this address in someone’s ear, in the people’s native tongue!
Respectively, those mnemonics are in English, Chinese (Simplified), Chinese (Traditional), French, Italian, Japanese, Korean, and Spanish. Those are not my selections; they are the languages for which wordlists are currently available in the standard I am adapting. Here is a hint on how to produce these phrases: https://github.com/nym-zone/easyseed/commit/ba77be1b1a1f0c6af50ceba5c89f4ade...
As for Punycode vs. UTF-8:
Homograph attacks are not “solved” by Punycode any more than they would be fixed by base64ing all addresses. Punycode is not a security feature; to the contrary! CVE-2013-7424, CVE-2015-8948, CVE-2016-6261, CVE-2016-6262, CVE-2017-14062.... Need I say more?
With some care, I can write a perfectly secure UTF-8 handler (forbidding non-shortest form, with a proper U+FFFD replacement algorithm, etc.). Whereas I have never seen a Punycode decoder which gives me confidence in its behaviour under all possible inputs. I assiduously avoid interacting with the bloat and pitfalls of IDNA and Punycode, insofar as I can. By contrast, UTF-8 has been happily in use on Unix/Plan9 systems for a quarter-century.
I know that as you say, applications which handle a string as a “domain” will Punycode it before Tor even sees it. But my thinking from the beginning was not in terms of DNS names. One of my constructive criticisms of prop-279 is that it makes that assumption.
The proper question is not, “How do we make more flexible pseudo-DNS lookups?”, but rather more generally: **How can we turn the pseudorandom binary data from .onion names into forms friendlier to humans?** If the Name System API could be in some way modified to admit better answers in the long term, then it would be my pleasure to help achieve that.
Now since I know that Alec Muffett is reading this thread, here are mnemonics in the same languages for facebookcorewwwi.onion:
chimney capital common neither demand certain hen athlete 身 热 界 巨 置 证 假 然 身 熱 界 巨 置 證 假 然 caméra boussole chasseur mairie crayon butiner fougère annuel casuale buffone collare osare derivare capello intuito apatico かいさつ おこす かんそう ちせい ぐうせい おもたい しゅらば いはつ 노력 기획 답변 예방 매장 남자 세월 고급 calor brazo centro mover crema cabeza helio antojo
Dare to dream outside the quasi-DNS box about how .onion addresses can be represented!