[TxMt] NEED Japanese Text Encoding! (pretty please?)

Patrice Neff mailinglistst at patrice.ch
Tue Jul 26 04:26:27 UTC 2005


Am 25.07.2005 um 05:32 schrieb Sean Schertell:

>> UTF-8 sucks for Japanese and Chinese texts mainly due to space  
>> reasons. If anything makes sense, then it is UTF-16, which  
>> Textmate also supports.
>
> Could you explain what you mean by "space reasons"?

Due to the way UTF-8 works, it used 1 byte for US-ASCII characters,  
but up to four bytes depending on the Unicode number. Many alphabets  
can be encoded with two bytes (especially the European ones, but also  
Hebrew or Arabic). Chinese and Japanese characters will require three  
or four bytes.

UTF-16 on the other hand encodes everything in two bytes. So that's  
why for Chinese or Japanese texts you will waste space when using  
UTF-8 compared to UTF-16, while with English and most European  
languages you will save a lot of space using UTF-8 compared to  
UTF-16. And the latter was IMHO one of the main reasons for  
developing UTF-8.

You can read some more about it at http://en.wikipedia.org/wiki/UTF-8.

Patrice



More information about the textmate mailing list