On 2018-01-01 at 22:36:53 +0000, Taylor R Campbell campbell+tor-dev@mumble.net wrote:
Date: Sun, 31 Dec 2017 11:46:28 +0000 From: Alec Muffett alec.muffett@gmail.com
Or, indeed, you could leave out the hyphens and do the same; the Prop224 Onion address is 59 characters, leaving a budget of 63-59==4 characters or 20 bits; we could put these at the end, in the space marked "@@@@":
https://www4acth47i6kxnvkewtm6q7ib2s3ufpo5sqbsnzjpbi7utijcltosqemad@@@@.onio...
Actually, the label part is 56 characters, not 59 characters. rend-spec-v3.txt, § 6 [ONIONADDRESS]. See also § 1.2 [NAMING] (“The result is a 56-character domain name”—nit, that should be “label”). Using the first example example address therefrom:
$ bech32 -e pg6mmjiyjmcrsslvykfwnntlaru7p5svn6y2ymmju6nubxndf4pscryd.onion onion10x7vvfgcfvz3jjt4c29kddntq35l0aj4d7c6cvvf57d5phdr9u0jz3crm5jhsx
Of course, 56 + 6 = ...
$ echo -n \ 0x7vvfgcfvz3jjt4c29kddntq35l0aj4d7c6cvvf57d5phdr9u0jz3crm5jhsx \ | wc -c 62
N.b. that this still includes the two octets of truncated SHA3-256, wrapped inside a format with 30 bits of error-correcting BCH code. Decoding/re-encoding the name to drop the SHA3 bits would cut the payload from 280 to 264 octets, which could be represented in 53+6=59 Bech32 characters with the BCH ECC.
I also question whether the onion version needs a whole octet. In the specific application of Bech32 to Bitcoin, the “witness version” (version of encoded tx auth program) is restricted to 0–16, inclusive; and the Bech32 coding is done with one of what I will call a “quintet” char (5 bits) for the version, followed by the encoding of 8-bit octets of the witness program.[0] If the .onion version were resticted to 0–15 so as to fit in 4 bits, then only 260 bits = 52 quintets would be needed to express the version plus the 256-bit master identity key. How many .onion address versions are expected in, say, the next 20–30 years? Adding a 6-char BCH code, the total label length would be 58 quintet characters.
At these lengths, I think every character of pseudorandom data which can be reasonably shaved off is a significant win for wetware UX.
0. Note, Bech32 encoding rules do not require that the encoded bit length be a multiple of 5. The standard prescribes the simple rule that strings of octets be zero-padded to a multiple of 5 bits when encoding, and decoded to octets with up to 4 trailing 0 bits discarded. https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki
Existing checksum in v3 addresses aside, what would prevent using a second DNS label for a longer checksum if you wanted a bigger budget?
The labels are limited to 63 octets, but the whole name can be up to 255 (including label length bytes).
I expect that the user burden of a greater length of pseudorandom gibberish would outweigh any possible UX benefit of adding more checksum data. A 6-quintet BCH code already provides error correction, guarantees detection of errors affecting not more than 4 characters, and has a <10^-9 probability of failing to detect a greater number of errors. Is better than that really needed?
Upon the same cryptographic self-validation principle which .onion applies in the first place, I have also considered such possibilities as encoding a TLS public key fingerprint in subdomain labels. The fingerprint could be automatically verified by the connecting TLS client against the same data it itself provides via SNI. This could alleviate the current need to get CAB Forum to approve some form of DV for .onion certificates. However, the results must be considered absolutely impracticable for humans transcription. The usage model would rely exclusively on bookmarks, copypaste, etc.