Re: [TxMt] NEED Japanese Text Encoding! (pretty please?)

26 Jul 2005


      On 26/07/2005, at 6.26, Patrice Neff wrote:
...
[...] while with English and most European languages you will save  
a lot of space using UTF-8 compared to UTF-16. And the latter was  
IMHO one of the main reasons for developing UTF-8.
Well, at best you'll save 50%, where enabling gzip as transfer- 
compression will likely save you >75% :)
The motivation for UTF-8 is that ASCII characters are encoded as they  
would have been, had it been a plain ASCII document.
This means that a lot of existing software doesn't need to be updated  
to actually handle UTF-8 (as long as they are 8 bit clean). For  
example I use UTF-8 for my source code, even though my compiler isn't  
UTF-8 aware, this means I can use non-ASCII in strings and comments  
-- some compilers/interpreters (e.g. PHP) will also allow user  
defined variables to be in UTF-8 (while still only knowing about the  
ASCII tokens).
So UTF-8 exists because a lot of software is made to work with 8-bit  
sequences (not 16 bit, as UTF-16 would have called for), and some  
software will look for tokens encoded as ASCII in these 8-bit sequences.
UTF-8 is a brilliant way to give this software access to the full  
unicode range.

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [TxMt] NEED Japanese Text Encoding! (pretty please?)