Hi,
I unsuccessfully searched the web site and the list archive, but I may have missed something.
I need to work for a client with humongous text files encoded in ISO-8859-1, with diacritical characters.
This is mostly OK, but I could not find a way to set the output window that TextMate opens to ISO-8859-1 encoding. This is a PITA.
I tried both Perl and Ruby and the issue is the same. For the sake of an example, save the following 6 lines (French composers) to a text file encoded in ISO-8859-1, named "test.txt":
Léo Delibes César Franck Gabriel Fauré Edgard Varèse Jean Françaix Henri Büsser
Now run the following Ruby program from TextMate:
file = File.open("test.txt", "r") file.each { |line| print line } file.close
In the output window, all diacritical chars are replaced by the dummy "white question mark within a black diamond".
This is despite having set the preferences to Latin 1.
Note that the same small Ruby program displays diacritical chars correctly when run from a terminal window (provided the said terminal window is set to ISO Latin 1 of course).
So my question is: is there a way to set the TextMate output window to ISO-8859-1? A workaround might be to redirect standard err and standard out to another window, already open, with ISO-8859-1 encoding. Is that possible?
I hope my request makes sense.
(Unfortunately, switching to UTF8 is not an option: input is almost 100GB ISO-8859-1, and so output needs be. A roundtrip to UTF8 would triple execution time, which is already too long as it is).
Many thanks,
Jean-Denis
On 13.02.2009, at 15:21, jdmuys@free.fr wrote:
Hi,
I unsuccessfully searched the web site and the list archive, but I may have missed something.
I need to work for a client with humongous text files encoded in ISO-8859-1, with diacritical characters.
This is mostly OK, but I could not find a way to set the output window that TextMate opens to ISO-8859-1 encoding. This is a PITA. Léo Delibes César Franck Gabriel Fauré Edgard Varèse Jean Françaix Henri Büsser
Maybe I missed an issue here but if you open an ISO-8859-1 doc in TM (or maybe via File > Reopen with Encoding >) TM doesn't destroy that encoding. For saving you can choose "Save As" to assure that TM saves it as ISO-8859-1.
An other option to convert ISO-8859-1 to UTF-8 is to make usage of the UNIX tool 'iconv -f LATIN1 -t UTF-8 THE_FILE' which one can run as batch to convert all files in a dir.
--Hans
----- Hans-Jörg Bibiko bibiko@eva.mpg.de a écrit :
On 13.02.2009, at 15:21, jdmuys@free.fr wrote:
Hi,
I unsuccessfully searched the web site and the list archive, but I may have missed something.
I need to work for a client with humongous text files encoded in ISO-8859-1, with diacritical characters.
This is mostly OK, but I could not find a way to set the output window that TextMate opens to ISO-8859-1 encoding. This is a PITA. Léo Delibes César Franck Gabriel Fauré Edgard Varèse Jean Françaix Henri Büsser
Maybe I missed an issue here but if you open an ISO-8859-1 doc in TM (or maybe via File > Reopen with Encoding >) TM doesn't destroy that encoding. For saving you can choose "Save As" to assure that TM saves it as ISO-8859-1.
An other option to convert ISO-8859-1 to UTF-8 is to make usage of the UNIX tool 'iconv -f LATIN1 -t UTF-8 THE_FILE' which one can run as batch to convert all files in a dir.
--Hans
Indeed, it seems you missed the issue. This is possibly because English is not my native language.
Stated as simply as I can: however it does it, my program *needs* to output text in ISO-8859-1 encoding, including diacritical characters. During the development and testing stages, I would like to be able to set the TextMate output window to ISO-8859-1 so that it displays my test data correctly. Is there a way to do that?
The small Ruby example I gave is for illustration purposes only. The text files I need to process are many gigabytes in size. I know about iconv quite well, but as I said, for performance reasons, it is NOT an option to convert them to UTF8, as the processed file need to be ISO-8859-1 as well.
Sorry, I misunderstood you.
On 13.02.2009, at 15:54, jdmuys@free.fr wrote:
I would like to be able to set the TextMate output window to ISO-8859-1 so that it displays my test data correctly.
As far as I know this is not possible. The output window is set to UTF-8.
If you want to run a Ruby script via APPLE+R you can try this:
#!/usr/bin/env ruby -wKU
require "iconv"
file = File.open("/Users/bibiko/Desktop/test.txt", "r") file.each { |line| print Iconv.iconv("UTF-8","LATIN1",line) } file.close
which outputs the test.txt content correctly. You can then define a function printLATIN1() or similar.
--Hans
----- Hans-Jörg Bibiko bibiko@eva.mpg.de a écrit :
#!/usr/bin/env ruby -wKU
require "iconv"
file = File.open("/Users/bibiko/Desktop/test.txt", "r") file.each { |line| print Iconv.iconv("UTF-8","LATIN1",line) } file.close
which outputs the test.txt content correctly. You can then define a function printLATIN1() or similar.
It's not ideal, but it's a useable workaround. Thank you very much.
JD
On 13 Feb 2009, at 17:15, jdmuys@free.fr wrote:
[...] require "iconv" [...]
It's not ideal, but it's a useable workaround. Thank you very much.
You can try Ruby 1.9 (and declare the file you open as latin-1), I believe this will automatically do the proper recodings, if the various streams are tagged properly.