[TxMt] Re: unicode issue with QuickLook on Leopard

30 Jun 2008


      On 30.06.2008, at 13:04, Vincent Noel wrote:
...
On Mon, Jun 30, 2008 at 12:49, Hans-Joerg Bibiko  
bibiko@eva.mpg.de wrote:
...
The only definite way to get the encoding is to parse the ENTIRE file
or parse to the first byte sequence which determine the used encoding
one-to-one. Or one uses the obsolete UTF-8 BOM (byte order marker at
the beginning of a file).
Ok... So I guess the real bug is that Quicklook and other utilities
decide to fall back on MacRoman instead of utf8.
This would be one possibility. But the whole issue is much more  
complicated.
E.g. it is not possible to distinguish the text encoding if the text  
is stored in ISO-8859-1..12. Each byte sequence would be valid, but  
each byte represents different glyphs according to its encoding.
Even for UTF-8 it is very complex. For instance the UTF-8 byte  
sequence C3 A4 (ä) could also be ISO-8859-1 (Ã¤) [it could be that  
this makes sense].
My general suggestion to Apple would be to introduce an unique  
attribute 'encoding'. By doing so each application could store the  
correct text encoding in that attribute file.
--Hans

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

[TxMt] Re: unicode issue with QuickLook on Leopard