[TxMt] Re: unicode issue with QuickLook on Leopard
Hans-Jörg Bibiko
bibiko at eva.mpg.de
Mon Jun 30 14:04:58 UTC 2008
On 30.06.2008, at 13:04, Vincent Noel wrote:
> On Mon, Jun 30, 2008 at 12:49, Hans-Joerg Bibiko
> <bibiko at eva.mpg.de> wrote:
>> The only definite way to get the encoding is to parse the ENTIRE file
>> or parse to the first byte sequence which determine the used encoding
>> one-to-one. Or one uses the obsolete UTF-8 BOM (byte order marker at
>> the beginning of a file).
>
> Ok... So I guess the real bug is that Quicklook and other utilities
> decide to fall back on MacRoman instead of utf8.
>
This would be one possibility. But the whole issue is much more
complicated.
E.g. it is not possible to distinguish the text encoding if the text
is stored in ISO-8859-1..12. Each byte sequence would be valid, but
each byte represents different glyphs according to its encoding.
Even for UTF-8 it is very complex. For instance the UTF-8 byte
sequence C3 A4 (ä) could also be ISO-8859-1 (ä) [it could be that
this makes sense].
My general suggestion to Apple would be to introduce an unique
attribute 'encoding'. By doing so each application could store the
correct text encoding in that attribute file.
--Hans
More information about the textmate
mailing list