[TxMt] Re: unicode issue with QuickLook on Leopard
bibiko at eva.mpg.de
Mon Jun 30 14:04:58 UTC 2008
On 30.06.2008, at 13:04, Vincent Noel wrote:
> On Mon, Jun 30, 2008 at 12:49, Hans-Joerg Bibiko
> <bibiko at eva.mpg.de> wrote:
>> The only definite way to get the encoding is to parse the ENTIRE file
>> or parse to the first byte sequence which determine the used encoding
>> one-to-one. Or one uses the obsolete UTF-8 BOM (byte order marker at
>> the beginning of a file).
> Ok... So I guess the real bug is that Quicklook and other utilities
> decide to fall back on MacRoman instead of utf8.
This would be one possibility. But the whole issue is much more
E.g. it is not possible to distinguish the text encoding if the text
is stored in ISO-8859-1..12. Each byte sequence would be valid, but
each byte represents different glyphs according to its encoding.
Even for UTF-8 it is very complex. For instance the UTF-8 byte
sequence C3 A4 (ä) could also be ISO-8859-1 (Ã¤) [it could be that
this makes sense].
My general suggestion to Apple would be to introduce an unique
attribute 'encoding'. By doing so each application could store the
correct text encoding in that attribute file.
More information about the textmate