[TxMt] NEED Japanese Text Encoding! (pretty please?)

Allan Odgaard allan at macromates.com
Tue Jul 26 05:11:19 UTC 2005


On 26/07/2005, at 6.26, Patrice Neff wrote:

> [...] while with English and most European languages you will save  
> a lot of space using UTF-8 compared to UTF-16. And the latter was  
> IMHO one of the main reasons for developing UTF-8.

Well, at best you'll save 50%, where enabling gzip as transfer- 
compression will likely save you >75% :)

The motivation for UTF-8 is that ASCII characters are encoded as they  
would have been, had it been a plain ASCII document.

This means that a lot of existing software doesn't need to be updated  
to actually handle UTF-8 (as long as they are 8 bit clean). For  
example I use UTF-8 for my source code, even though my compiler isn't  
UTF-8 aware, this means I can use non-ASCII in strings and comments  
-- some compilers/interpreters (e.g. PHP) will also allow user  
defined variables to be in UTF-8 (while still only knowing about the  
ASCII tokens).

So UTF-8 exists because a lot of software is made to work with 8-bit  
sequences (not 16 bit, as UTF-16 would have called for), and some  
software will look for tokens encoded as ASCII in these 8-bit sequences.

UTF-8 is a brilliant way to give this software access to the full  
unicode range.





More information about the textmate mailing list