Quoting teor (2018-01-10 00:19:54)
These are called "Unicode Scalar Values". https://www.unicode.org/glossary/#unicode_scalar_value
Let's reference that.
"Unicode Scalar Value" includes U+0, which I think we probably want to exclude.
* each encoded with the shortest possible encoding. * without any BOM
Are there other restrictions we should make? If so, how should we phrase them?
These seem fine, and not tied to a particular unicode version.
But I don't know enough about Unicode to know if there is anything else we should specify.
Skimming through https://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt, I think it might be good to additionally forbid the code points listed at the end: U+nFFF{E,F} for n = 0..10, and U+FDD0 through U+FDEF.