[TxMt] Re: unicode issue with QuickLook on Leopard

Hans-Jörg Bibiko bibiko at eva.mpg.de
Mon Jun 30 14:04:58 UTC 2008


On 30.06.2008, at 13:04, Vincent Noel wrote:

> On Mon, Jun 30, 2008 at 12:49, Hans-Joerg Bibiko  
> <bibiko at eva.mpg.de> wrote:
>> The only definite way to get the encoding is to parse the ENTIRE file
>> or parse to the first byte sequence which determine the used encoding
>> one-to-one. Or one uses the obsolete UTF-8 BOM (byte order marker at
>> the beginning of a file).
>
> Ok... So I guess the real bug is that Quicklook and other utilities
> decide to fall back on MacRoman instead of utf8.
>

This would be one possibility. But the whole issue is much more  
complicated.
E.g. it is not possible to distinguish the text encoding if the text  
is stored in ISO-8859-1..12. Each byte sequence would be valid, but  
each byte represents different glyphs according to its encoding.
Even for UTF-8 it is very complex. For instance the UTF-8 byte  
sequence C3 A4 (ä) could also be ISO-8859-1 (ä) [it could be that  
this makes sense].

My general suggestion to Apple would be to introduce an unique  
attribute 'encoding'. By doing so each application could store the  
correct text encoding in that attribute file.

--Hans


More information about the textmate mailing list