I understand the problem now. However, how come TextMate can detect what encoding was used for a file, and the scanner can not ?
TextMate can *not* detect the encoding of your files. If you use UTF- n encoding, there is a 99.9999…% chance that it will get it right, but *any* other encoding, and TM’s guess is based on a frequency table and how well your file corresponds to this distribution when interpreted in the various encodings.
The problem with TextMate is that it essentially forces people to use UTF8. I have yet to find a way how to teach TextMate that my default encoding is Latin1 (even though this is the default encoding which I have set in the prefs): as long as a TeX file doesn't contain any special characters, it will automatically assume they are UTF8 files (ignoring my preference and -- if existant -- the metadata connected to that file).
So anything else than mandating UTF-8 will make things break, and there is no technical solution to this problem. Sure, you can have things work “good enough” for some w/o going 100% UTF-8, and you can maybe fix some of the stuff that breaks when you are not using UTF-8, but you can never fix it all, so IMO it’s really not worth trying to support more than UTF-8, UTF-8 is the solution to the encoding problems of the past.
However, going UTF8 is sometimes just not an option. I frequently exchange files with people who work on Windows, Linux or Solaris and the standard encoding they use is usually Latin1. Yes, there are ways how to use UTF8 on other OS, but have you ever tried to convince someone to switch to UTF8 who still writes his papers in Plain TeX?
Instead of blindly arguing for people to convert to UTF8 (which is what I would use if I got to choose), you should accept that people (= customers) want to and sometimes need to work with other encodings as well.
I'm still longing for an `encoding per project' option which TextMate would stick to no matter what. And also an error message that tells me that I cannot save my .tex file in Latin1 because there are some (invisible) characters that prevent it from doing so (right now, it'll just revert to UTF8 without telling me).
Max
UC Berkeley Department of Physics
The problem with TextMate is that it essentially forces people to use UTF8. I have yet to find a way how to teach TextMate that my default encoding is Latin1 (even though this is the default encoding which I have set in the prefs): as long as a TeX file doesn't contain any special characters, it will automatically assume they are UTF8 files (ignoring my preference and -- if existant -- the metadata connected to that file).
Good point.
So anything else than mandating UTF-8 will make things break, and there is no technical solution to this problem. Sure, you can have things work “good enough” for some w/o going 100% UTF-8, and you can maybe fix some of the stuff that breaks when you are not using UTF-8, but you can never fix it all, so IMO it’s really not worth trying to support more than UTF-8, UTF-8 is the solution to the encoding problems of the past.
However, going UTF8 is sometimes just not an option. I frequently exchange files with people who work on Windows, Linux or Solaris and the standard encoding they use is usually Latin1. Yes, there are ways how to use UTF8 on other OS, but have you ever tried to convince someone to switch to UTF8 who still writes his papers in Plain TeX?
Instead of blindly arguing for people to convert to UTF8 (which is what I would use if I got to choose), you should accept that people (= customers) want to and sometimes need to work with other encodings as well.
I'm still longing for an `encoding per project' option which TextMate would stick to no matter what. And also an error message that tells me that I cannot save my .tex file in Latin1 because there are some (invisible) characters that prevent it from doing so (right now, it'll just revert to UTF8 without telling me).
I second that. There should be an "encoding per project", and any file in that project is considered as encoded in this encoding. This way, the choice of the encoding is up to the user, and you don't have to try and figure out what encoding the file is in.
On 4. Jul 2007, at 23:12, Max Lein wrote:
[...]
The problem with TextMate is that it essentially forces people to use UTF8.
Yes, and as I tried to explain, there is good reason it does that.
I have yet to find a way how to teach TextMate that my default encoding is Latin1 (even though this is the default encoding which I have set in the prefs): as long as a TeX file doesn't contain any special characters, it will automatically assume they are UTF8 files (ignoring my preference and -- if existant -- the metadata connected to that file).
When checking “use for existing files” it will respect your preference. However, I fixed the problem for next build, so when files with ASCII encoding can’t remain as ASCII, it will first try your preferred encoding (even when the “use for existing files” is not checked).
[...]
However, going UTF8 is sometimes just not an option [...]
Maybe not, but the bundle commands (which was the topic of this thread) can’t be expected to work with your files, when those are not UTF-8. I.e. go with Latin-1 and use special characters, and you should be prepared to see no or garbled output from script runners (which show script output), diff commands (showing changes in your files), build commands (which quote parts of your source), log commands (showing SCM log entries), various validation/completion/ pretty-printing commands, a.s.o.
I frequently exchange files with people who work on Windows, Linux or Solaris and the standard encoding they use is usually Latin1. Yes, there are ways how to use UTF8 on other OS, but have you ever tried to convince someone to switch to UTF8 who still writes his papers in Plain TeX?
Would plain TeX not be ASCII? :)
Instead of blindly arguing for people to convert to UTF8 (which is what I would use if I got to choose), you should accept that people (= customers) want to and sometimes need to work with other encodings as well.
In what you replied to I gave a technical explanation of why it is highly infeasible to support other than UTF-8 for the various commands, which was the topic raised. Me accepting that some can’t use UTF-8 doesn’t really change that.
I'm still longing for an `encoding per project' option which TextMate would stick to no matter what. And also an error message that tells me that I cannot save my .tex file in Latin1 because there are some (invisible) characters that prevent it from doing so (right now, it'll just revert to UTF8 without telling me).
I have commented on this a few times in the past; the lack of a warning is indeed very unfortunate, and it comes from the code that does this “bumping” of encoding not having access to the UI. A mistake 2.0 will be without -- and as for encoding per project, I can’t recall exactly how much I have said here, but 2.0 does move a lot of things to be more folder-oriented and has another approach to dealing with encodings, basically offloading this to customizable import/export hooks, so non-UTF-8 users should be able to get whatever they want.