Unicode encodings (was Re: [TxMt] Tidy and XML)
Chris Thomas
chris at cjack.com
Fri Dec 2 14:42:25 UTC 2005
If you want additional detail about Unicode encoding geekery:
http://www.tbray.org/ongoing/When/200x/2003/04/26/UTF
...
> UTF-16 is probably what most people thought most programmers would
> use for Unicode; this is reflected in the fact that the native
> character type in both Java and C# is a sixteen-bit quantity. Of
> course, it doesn't really represent a Unicode character, exactly
> (although it does most times), it represents a UTF-16 codepoint.
>
> UTF-16 is about the most efficient way possible of representing
> Asian character strings, each character nestling snugly into two
> bytes of storage. For ASCII characters, of course, you end up using
> two bytes to represent what would actually fit into one.
...
Chris
More information about the textmate
mailing list