Unicode encodings (was Re: [TxMt] Tidy and XML)

Chris Thomas chris at cjack.com
Fri Dec 2 14:42:25 UTC 2005


If you want additional detail about Unicode encoding geekery:

http://www.tbray.org/ongoing/When/200x/2003/04/26/UTF
...
> UTF-16 is probably what most people thought most programmers would  
> use for Unicode; this is reflected in the fact that the native  
> character type in both Java and C# is a sixteen-bit quantity. Of  
> course, it doesn't really represent a Unicode character, exactly  
> (although it does most times), it represents a UTF-16 codepoint.
>
> UTF-16 is about the most efficient way possible of representing  
> Asian character strings, each character nestling snugly into two  
> bytes of storage. For ASCII characters, of course, you end up using  
> two bytes to represent what would actually fit into one.
...

Chris



More information about the textmate mailing list