Hi,
Right now, .onion URLs are not human readable. Neither are they easy for humans to recognise OR recall.
The way information is presented to humans can greatly influence how we recognise, recall it and process it.
The ideal situation is for the user to recognise a piece of information. It takes less cognitive processing to do this. This is not possible here as they are required to type the URL into the browser.
In this case the user will be required to recall the address, as someone reads it out to them, or they read it off a device screen. Recall is more difficult for humans to carry out.
Alec’s idea below is essentially the “chunking” of .onion addresses. It is an often used technique when it comes to many human-computer interactions that require humans to process information.
Research shows that humans’ "short-term memory” has different characteristics to our “long-term memory. [1, 2]
This short term memory (also called working memory) provides temporary storage of information and has a limit (often compared with a channel's capacity).
This working memory capacity can be increased by information chunking - the information is recoded into many chunks which contain a number of bits per chunk. Bit in this context means an item of information.
In our context it would be equivalent to a letter or number in the URL.
As stated in Millers paper below the so-called 'magic number’ is seven +/-2 bits, per chunk.
While the definition of a chunk is not strict, in terms of .onion addresses, this could be achieved through experimentation and drawing on already published research. (Haven’t found any yet, but thats not to say it isn’t out there). The lower end of Millers chunk size, 5, would be a good place to start.
The human is able to see patterns, categories, groupings and use this to increase her/his working memory temporarily.
As someone mentioned already this aids humans in recalling telephone numbers, 08057-282-9320, instead of 080572829320.
In this case I have chunked this number to take into account the chunk 08057 because for me it is a manageable sized chunk and the pattern “282” in the middle.
Chunking may also assist the user in recognising they have transferred the information correctly.
It would be interesting to observe in a usability test, how a user would transfer a .onion address from say, a chat session to their browser.
If the address was chunked it could be expected it would assist them in seeing the difference between:
a1uik-0w1gm-fq3i5-ievxd-m9ceu-27e88-g6o7p-e0rff-dw9jm-ntwkd-sd5sp.onion
the real URL and
shdue-duqld-7p3i5-ievxd-m6oeu-27388-g607q-e0rff-dw9jm-ntwkd-srfcg.onion
a malicious URL.
It would be wonderful to see some user research into this. In terms of UI, could the Tor Browser warn the user “This looks similar to, but not the exact same as, a URL you have stored. Do you really want to follow the link?”?
There is however, a balance to be struck with using information chunking. A users' working memory can be overloaded with presenting them with too much information.
This may be an issue here as the proposed .onion address is 52 letters?
The ideal situation would be for the information to eventually transfer from the users working memory to their long-term memory so they can type the address every time they want to - the address would be memorised.
I would suspect this will not be achievable in most instances, as the process of “storing” information to their long term memory requires repeated exposure to the information - I would not expect the user to willingly type a 23 word long URL (256 bits, 11 bits in a word?) into their browser repeatedly.
This could easily be overcome by the user bookmarking the URL in the Tor browser for future use, once it has been entered correctly.
This would be an improvement over the current situation.
The second topic would be to work out what character would be best used to visually represent that chunking.
For telephone numbers this can be a space, hyphen, period, forward-slash. It would be best to find out from users (or previous research) what would help here.
Again, experimentation could point to characters that would help here. Please consider carrying out some research into this.
The above will improve the situation, but we would still be requiring the user to recall multiple random strings of characters (even with a checksum).
While I don’t know about the possibility of this from a technical point of view, an improvement on this (from the point of view of the usability of .onion addresses) *may* be to produce pronounceable word .onion addresses. [3, 4]
So instead of:
a1uik-0w1gm-fq3i5-ievxd-m9ceu-27e88-g6o7p-e0rff-dw9jm-ntwkd-sdt5sp.onion
it would be:
correct-battery-horse-staple-chair-banana-table-river-pizza.onion
or using an already established dictionary:
fat-gin-keg-log-oak-pit-pup-darn-fury-knee-mark-year-wand-tram-it.onion
If the goal is to improve the usability of .onion addresses, I would support Alec’s suggestion of chunking addresses.
I would extend it by suggesting the addresses are someway human readable.
I would also support carrying out user research with users to come up with approaches the user will understand. I will happily help if there is willingness.
Thanks, Bernard
[1] The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information, http://ruccs.rutgers.edu/faculty/pylyshyn/Proseminar06/Miller_MagicNumber7c.... [2] https://www.interaction-design.org/encyclopedia/chunking.html [3] https://xkcd.com/936/ [4] https://cups.cs.cmu.edu/soups/2012/proceedings/a7_Shay.pdf
On 8 Aug 2015, at 14:19, Alec Muffett alecm@fb.com wrote:
Gah, I am evidently having a bad day with e-mail, so I am going to send a typo correction with this and then go do something else instead.
Corrections in caps, below.
— Alec Muffett Security Infrastructure Facebook Engineering London
On Aug 8, 2015, at 2:14 PM, Alec Muffett alecm@fb.com wrote:
Please let a thousand discovery mechanisms bloom - including peer-to-peer directories and tweeted URLs.
But, what they boil down to, please let *that* be human-readable, too. The more I THINK about it, the more I like:
a1uik-0w1gm-fq3i5-ievxd-m9ceu-27e88-g6o7p-e0rff-dw9jm-ntwkd-sdxxx.onion
…where the final “xxx” is a 15-bit truncated secure hash of the rest of the original raw address bitstring.
That way people looking to quickly compare addresses can check the first QUINTET, and the last, and sample a few of the inner ones (“…people compare glyphs not words…” / “there’s IEVXD and there’s E0RFF, I like that one, it’s like Eeyore in Winnie-The-Pooh, and 0WLGM reminds me of Owls") and be reasonably satisfied and reasonably secure.
And the XXX can be checked by the browser and tell the user that they’ve goofed-up cut/paste/typing-it-in. And then they bookmark it once it loads.
------------- Bernard Tyers If you want to contact me please do: http://contactme.ei8fdb.org