[TxMt] Re: unicode issue with QuickLook on Leopard

Hans-Joerg Bibiko bibiko at eva.mpg.de
Mon Jun 30 10:49:06 UTC 2008


On 30 Jun 2008, at 12:33, Vincent Noel wrote:

> It's
> especially weird that it seems somebody noticed the problem, and
> decided to fix it using extended attributes when more standards tools
> (e.g. the 'file' command) are perfectly able to identify utf8 without
> non-standard trickery...


To use 'file' is a good idea, BUT it looks only for the first (I don't  
know how many) characters in a file. I.e. if you have a rather large  
UTF-8 file containing 'normal' ASCII and the last character is e.g. a  
ü, 'file' will output: "test.txt: ASCII text, with very long lines" or  
similar.

The only definite way to get the encoding is to parse the ENTIRE file  
or parse to the first byte sequence which determine the used encoding  
one-to-one. Or one uses the obsolete UTF-8 BOM (byte order marker at  
the beginning of a file).

--Hans


More information about the textmate mailing list