[TxMt] Re: unicode issue with QuickLook on Leopard
Hans-Joerg Bibiko
bibiko at eva.mpg.de
Mon Jun 30 10:49:06 UTC 2008
On 30 Jun 2008, at 12:33, Vincent Noel wrote:
> It's
> especially weird that it seems somebody noticed the problem, and
> decided to fix it using extended attributes when more standards tools
> (e.g. the 'file' command) are perfectly able to identify utf8 without
> non-standard trickery...
To use 'file' is a good idea, BUT it looks only for the first (I don't
know how many) characters in a file. I.e. if you have a rather large
UTF-8 file containing 'normal' ASCII and the last character is e.g. a
ü, 'file' will output: "test.txt: ASCII text, with very long lines" or
similar.
The only definite way to get the encoding is to parse the ENTIRE file
or parse to the first byte sequence which determine the used encoding
one-to-one. Or one uses the obsolete UTF-8 BOM (byte order marker at
the beginning of a file).
--Hans
More information about the textmate
mailing list