[TxMt] LaTeX bundle problem

Allan Odgaard throw-away-1 at macromates.com
Tue Jul 3 14:37:12 UTC 2007


On 3. Jul 2007, at 14:48, Maxime Boissonneault wrote:

> I understand the problem now. However, how come TextMate can detect  
> what encoding was used for a file, and the scanner can not ?

TextMate can *not* detect the encoding of your files. If you use UTF- 
n encoding, there is a 99.9999…% chance that it will get it right,  
but *any* other encoding, and TM’s guess is based on a frequency  
table and how well your file corresponds to this distribution when  
interpreted in the various encodings.

This however gives rather mixed results.

With Tiger and onward, we can store the encoding as meta data, but  
even ignoring the complexity of handling encodings for every single  
command that may read or write text, there are too many cases where  
the text is not tied to a file per se, or may even come from multiple  
files.

For example take the diff actions (including those tied to a version  
control system), the output here is the differences between two  
files, so maybe we can assume this output is in the encoding of the  
two input sources (hoping those share encoding)? Wrong… the diff  
output also contains file names, these also have an encoding  
(fortunately it’s UTF-8).

So anything else than mandating UTF-8 will make things break, and  
there is no technical solution to this problem. Sure, you can have  
things work “good enough” for some w/o going 100% UTF-8, and you can  
maybe fix some of the stuff that breaks when you are not using UTF-8,  
but you can never fix it all, so IMO it’s really not worth trying to  
support more than UTF-8, UTF-8 is the solution to the encoding  
problems of the past.

> UTF8 can cause problems with Bibtex (http://www.unix-ag.uni-kl.de/ 
> ~fischer/kbibtex/encoding.html)

Ironically that page goes on to say you should convert your BibTeX  
files to UTF-8 ;)

As I understand it, the problem is when generating alphabetic  
references, i.e. [ODGAARD03] instead of the plain [1] style, and the  
error quoted is from the UTF-8 package about a malformed UTF-8  
sequence, so presumably the error would only occur when the generated  
reference text has non-ASCII in it, but I am a little puzzled about  
the error quoted, it really sounds like this is the error you would  
get if your LaTeX file is UTF-8, you use a non-UTF-8 BibTeX file, you  
have non-ASCII characters in the reference, and these characters end  
up in the reference, and then the UTF-8 packages chokes on this  
malformed UTF-8 sequence.

If anyone has a sample project that shows the problem, please send it  
my way.




More information about the textmate mailing list