[TxMt] Re: unicode issue with QuickLook on Leopard

30 Jun 2008


      On 30 Jun 2008, at 12:33, Vincent Noel wrote:
...
It's
especially weird that it seems somebody noticed the problem, and
decided to fix it using extended attributes when more standards tools
(e.g. the 'file' command) are perfectly able to identify utf8 without
non-standard trickery...
To use 'file' is a good idea, BUT it looks only for the first (I don't  
know how many) characters in a file. I.e. if you have a rather large  
UTF-8 file containing 'normal' ASCII and the last character is e.g. a  
ü, 'file' will output: "test.txt: ASCII text, with very long lines" or  
similar.
The only definite way to get the encoding is to parse the ENTIRE file  
or parse to the first byte sequence which determine the used encoding  
one-to-one. Or one uses the obsolete UTF-8 BOM (byte order marker at  
the beginning of a file).
--Hans

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

[TxMt] Re: unicode issue with QuickLook on Leopard