Le Fri 2/12/2005, Chris Thomas disait
If you want additional detail about Unicode encoding geekery:
http://www.tbray.org/ongoing/When/200x/2003/04/26/UTF ...
UTF-16 is probably what most people thought most programmers would use for Unicode; this is reflected in the fact that the native character type in both Java and C# is a sixteen-bit quantity. Of course, it doesn't really represent a Unicode character, exactly (although it does most times), it represents a UTF-16 codepoint.
UTF-16 is about the most efficient way possible of representing Asian character strings, each character nestling snugly into two bytes of storage. For ASCII characters, of course, you end up using two bytes to represent what would actually fit into one.
...
except there are 2 different UTF-16 (big and little endian) and that a bunch of preexisting software considers the 0 byte as end of string...