Re: Unicode encodings (was Re: [TxMt] Tidy and XML)

2 Dec 2005


      Le Fri  2/12/2005, Chris Thomas disait
...
If you want additional detail about Unicode encoding geekery:
http://www.tbray.org/ongoing/When/200x/2003/04/26/UTF
...
...
UTF-16 is probably what most people thought most programmers would  
use for Unicode; this is reflected in the fact that the native  
character type in both Java and C# is a sixteen-bit quantity. Of  
course, it doesn't really represent a Unicode character, exactly  
(although it does most times), it represents a UTF-16 codepoint.
UTF-16 is about the most efficient way possible of representing  
Asian character strings, each character nestling snugly into two  
bytes of storage. For ASCII characters, of course, you end up using  
two bytes to represent what would actually fit into one.
...
except there are 2 different UTF-16 (big and little endian) and that a
bunch of preexisting software considers the 0 byte as end of string...
-- 
Erwan David

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: Unicode encodings (was Re: [TxMt] Tidy and XML)