On 3. Jul 2007, at 14:48, Maxime Boissonneault wrote:
I understand the problem now. However, how come TextMate can detect what encoding was used for a file, and the scanner can not ?
TextMate can *not* detect the encoding of your files. If you use UTF- n encoding, there is a 99.9999…% chance that it will get it right, but *any* other encoding, and TM’s guess is based on a frequency table and how well your file corresponds to this distribution when interpreted in the various encodings.
This however gives rather mixed results.
With Tiger and onward, we can store the encoding as meta data, but even ignoring the complexity of handling encodings for every single command that may read or write text, there are too many cases where the text is not tied to a file per se, or may even come from multiple files.
For example take the diff actions (including those tied to a version control system), the output here is the differences between two files, so maybe we can assume this output is in the encoding of the two input sources (hoping those share encoding)? Wrong… the diff output also contains file names, these also have an encoding (fortunately it’s UTF-8).
So anything else than mandating UTF-8 will make things break, and there is no technical solution to this problem. Sure, you can have things work “good enough” for some w/o going 100% UTF-8, and you can maybe fix some of the stuff that breaks when you are not using UTF-8, but you can never fix it all, so IMO it’s really not worth trying to support more than UTF-8, UTF-8 is the solution to the encoding problems of the past.
UTF8 can cause problems with Bibtex (http://www.unix-ag.uni-kl.de/ ~fischer/kbibtex/encoding.html)
Ironically that page goes on to say you should convert your BibTeX files to UTF-8 ;)
As I understand it, the problem is when generating alphabetic references, i.e. [ODGAARD03] instead of the plain [1] style, and the error quoted is from the UTF-8 package about a malformed UTF-8 sequence, so presumably the error would only occur when the generated reference text has non-ASCII in it, but I am a little puzzled about the error quoted, it really sounds like this is the error you would get if your LaTeX file is UTF-8, you use a non-UTF-8 BibTeX file, you have non-ASCII characters in the reference, and these characters end up in the reference, and then the UTF-8 packages chokes on this malformed UTF-8 sequence.
If anyone has a sample project that shows the problem, please send it my way.