[TxMt] LaTeX bundle problem
Allan Odgaard
throw-away-1 at macromates.com
Tue Jul 3 14:37:12 UTC 2007
On 3. Jul 2007, at 14:48, Maxime Boissonneault wrote:
> I understand the problem now. However, how come TextMate can detect
> what encoding was used for a file, and the scanner can not ?
TextMate can *not* detect the encoding of your files. If you use UTF-
n encoding, there is a 99.9999…% chance that it will get it right,
but *any* other encoding, and TM’s guess is based on a frequency
table and how well your file corresponds to this distribution when
interpreted in the various encodings.
This however gives rather mixed results.
With Tiger and onward, we can store the encoding as meta data, but
even ignoring the complexity of handling encodings for every single
command that may read or write text, there are too many cases where
the text is not tied to a file per se, or may even come from multiple
files.
For example take the diff actions (including those tied to a version
control system), the output here is the differences between two
files, so maybe we can assume this output is in the encoding of the
two input sources (hoping those share encoding)? Wrong… the diff
output also contains file names, these also have an encoding
(fortunately it’s UTF-8).
So anything else than mandating UTF-8 will make things break, and
there is no technical solution to this problem. Sure, you can have
things work “good enough” for some w/o going 100% UTF-8, and you can
maybe fix some of the stuff that breaks when you are not using UTF-8,
but you can never fix it all, so IMO it’s really not worth trying to
support more than UTF-8, UTF-8 is the solution to the encoding
problems of the past.
> UTF8 can cause problems with Bibtex (http://www.unix-ag.uni-kl.de/
> ~fischer/kbibtex/encoding.html)
Ironically that page goes on to say you should convert your BibTeX
files to UTF-8 ;)
As I understand it, the problem is when generating alphabetic
references, i.e. [ODGAARD03] instead of the plain [1] style, and the
error quoted is from the UTF-8 package about a malformed UTF-8
sequence, so presumably the error would only occur when the generated
reference text has non-ASCII in it, but I am a little puzzled about
the error quoted, it really sounds like this is the error you would
get if your LaTeX file is UTF-8, you use a non-UTF-8 BibTeX file, you
have non-ASCII characters in the reference, and these characters end
up in the reference, and then the UTF-8 packages chokes on this
malformed UTF-8 sequence.
If anyone has a sample project that shows the problem, please send it
my way.
More information about the textmate
mailing list