Am 25.07.2005 um 05:32 schrieb Sean Schertell:
UTF-8 sucks for Japanese and Chinese texts mainly due to space reasons. If anything makes sense, then it is UTF-16, which Textmate also supports.
Could you explain what you mean by "space reasons"?
Due to the way UTF-8 works, it used 1 byte for US-ASCII characters, but up to four bytes depending on the Unicode number. Many alphabets can be encoded with two bytes (especially the European ones, but also Hebrew or Arabic). Chinese and Japanese characters will require three or four bytes.
UTF-16 on the other hand encodes everything in two bytes. So that's why for Chinese or Japanese texts you will waste space when using UTF-8 compared to UTF-16, while with English and most European languages you will save a lot of space using UTF-8 compared to UTF-16. And the latter was IMHO one of the main reasons for developing UTF-8.
You can read some more about it at http://en.wikipedia.org/wiki/UTF-8.
Patrice