[TxMt] Re: unicode issue with QuickLook on Leopard

Hans-Jörg Bibiko bibiko at eva.mpg.de
Mon Jun 30 14:04:58 UTC 2008

On 30.06.2008, at 13:04, Vincent Noel wrote:

> On Mon, Jun 30, 2008 at 12:49, Hans-Joerg Bibiko  
> <bibiko at eva.mpg.de> wrote:
>> The only definite way to get the encoding is to parse the ENTIRE file
>> or parse to the first byte sequence which determine the used encoding
>> one-to-one. Or one uses the obsolete UTF-8 BOM (byte order marker at
>> the beginning of a file).
> Ok... So I guess the real bug is that Quicklook and other utilities
> decide to fall back on MacRoman instead of utf8.

This would be one possibility. But the whole issue is much more  
E.g. it is not possible to distinguish the text encoding if the text  
is stored in ISO-8859-1..12. Each byte sequence would be valid, but  
each byte represents different glyphs according to its encoding.
Even for UTF-8 it is very complex. For instance the UTF-8 byte  
sequence C3 A4 (ä) could also be ISO-8859-1 (ä) [it could be that  
this makes sense].

My general suggestion to Apple would be to introduce an unique  
attribute 'encoding'. By doing so each application could store the  
correct text encoding in that attribute file.


More information about the textmate mailing list