On 30. Mar 2007, at 19:09, Jay Soffian wrote:
On Mar 30, 2007, at 10:58 AM, Danny Krøger wrote:
It would be nice to have an option to paste text at the current encoding and truncate characters not availible. That is a better option than destroying a document (when you are forced to keep it in latin 1). It costs so much time to change all the garbaged text by hand afterwards.
Bundle Editor -> New Command Input: None Output: Replace Selected Text Key Equivalent: <your choice> Command(s):
__CFUSERTEXT_ENCODING=0×1F5:0×8000100:0×8000100 /usr/bin/pbpaste | /usr/bin/iconv -c -s -f UTF-8 -t ISO-8859-1
Then use that command for pasting instead of cmd-v.
That is indeed clever :) One addition though, you need to convert back to utf-8, since TM expects the command result to be in utf-8 (but we got the non-latin 1 superset pruned, so it will still work).
One can also add //TRANSLIT to the target encoding, that will make iconv try to “downgrade” the characters which could not be converted. For example curly quotes become straight quotes, ellipsis becomes three dots, etc.
So the command could read:
__CFUSERTEXT_ENCODING=0×1F5:0×8000100:0×8000100 /usr/bin/ pbpaste \ | /usr/bin/iconv -c -s -f UTF-8 -t ISO-8859-1//TRANSLIT \ | /usr/bin/iconv -f ISO-8859-1 -t UTF-8
Answering a few other things from this thread:
1) IE6 (and IE4 + IE5 for that matter) supports utf-8 just fine, as long as you send the proper charset-encoding header.
2) I am a diehard utf-8 fan and I do want you all to switch to utf-8 if you haven’t already!!! but 2.0 will also have better encoding support in general, like presenting errors/warnings at the proper times, making it more explicit when there are problems with encodings (like loading non-utf 8 files with 8 bit characters), etc.
3) If you do insist on using latin-1 for whatever project you are working on, be sure to switch to ISO-8859-1 in Preferences → Advanced → Saving. By default it is utf-8, and I think that is why it switches to utf-8 when you paste æøå from Word. If you set it to ISO-8859-1, then it should pick latin-1 instead.
Finally a question: If your web-site is all in latin-1, how do you deal with user input, if any? I.e. if I can post comments or in some other way submit arbitrary plain text to your site, you just pray I restrain myself to latin-1, and that the browser sends my text as latin-1? ;)