Unicode encodings (was Re: [TxMt] Tidy and XML)
Erwan David
erwan at rail.eu.org
Fri Dec 2 14:51:38 UTC 2005
Le Fri 2/12/2005, Chris Thomas disait
> If you want additional detail about Unicode encoding geekery:
>
> http://www.tbray.org/ongoing/When/200x/2003/04/26/UTF
> ...
> >UTF-16 is probably what most people thought most programmers would
> >use for Unicode; this is reflected in the fact that the native
> >character type in both Java and C# is a sixteen-bit quantity. Of
> >course, it doesn't really represent a Unicode character, exactly
> >(although it does most times), it represents a UTF-16 codepoint.
> >
> >UTF-16 is about the most efficient way possible of representing
> >Asian character strings, each character nestling snugly into two
> >bytes of storage. For ASCII characters, of course, you end up using
> >two bytes to represent what would actually fit into one.
> ...
except there are 2 different UTF-16 (big and little endian) and that a
bunch of preexisting software considers the 0 byte as end of string...
--
Erwan David
More information about the textmate
mailing list