[TxMt] NEED Japanese Text Encoding! (pretty please?)
Allan Odgaard
allan at macromates.com
Tue Jul 26 05:11:19 UTC 2005
On 26/07/2005, at 6.26, Patrice Neff wrote:
> [...] while with English and most European languages you will save
> a lot of space using UTF-8 compared to UTF-16. And the latter was
> IMHO one of the main reasons for developing UTF-8.
Well, at best you'll save 50%, where enabling gzip as transfer-
compression will likely save you >75% :)
The motivation for UTF-8 is that ASCII characters are encoded as they
would have been, had it been a plain ASCII document.
This means that a lot of existing software doesn't need to be updated
to actually handle UTF-8 (as long as they are 8 bit clean). For
example I use UTF-8 for my source code, even though my compiler isn't
UTF-8 aware, this means I can use non-ASCII in strings and comments
-- some compilers/interpreters (e.g. PHP) will also allow user
defined variables to be in UTF-8 (while still only knowing about the
ASCII tokens).
So UTF-8 exists because a lot of software is made to work with 8-bit
sequences (not 16 bit, as UTF-16 would have called for), and some
software will look for tokens encoded as ASCII in these 8-bit sequences.
UTF-8 is a brilliant way to give this software access to the full
unicode range.
More information about the textmate
mailing list