[TxMt] keep it in iso-8859-1

Allan Odgaard throw-away-1 at macromates.com
Sat Mar 31 15:10:15 UTC 2007


On 30. Mar 2007, at 19:09, Jay Soffian wrote:

> On Mar 30, 2007, at 10:58 AM, Danny Krøger wrote:
>> It would be nice to have an option to paste text at the current  
>> encoding and truncate characters not availible. That is a better  
>> option than destroying a document (when you are forced to keep it  
>> in latin 1). It costs so much time to change all the garbaged text  
>> by hand afterwards.
> Bundle Editor -> New Command
> Input: None
> Output: Replace Selected Text
> Key Equivalent: <your choice>
> Command(s):
>
> __CFUSERTEXT_ENCODING=0×1F5:0×8000100:0×8000100 /usr/bin/pbpaste  
> | /usr/bin/iconv -c -s -f UTF-8 -t ISO-8859-1
>
> Then use that command for pasting instead of cmd-v.

That is indeed clever :) One addition though, you need to convert  
back to utf-8, since TM expects the command result to be in utf-8  
(but we got the non-latin 1 superset pruned, so it will still work).

One can also add //TRANSLIT to the target encoding, that will make  
iconv try to “downgrade” the characters which could not be  
converted. For example curly quotes become straight quotes, ellipsis  
becomes three dots, etc.

So the command could read:

     __CFUSERTEXT_ENCODING=0×1F5:0×8000100:0×8000100 /usr/bin/ 
pbpaste \
     | /usr/bin/iconv -c -s -f UTF-8 -t ISO-8859-1//TRANSLIT \
     | /usr/bin/iconv -f ISO-8859-1 -t UTF-8

Answering a few other things from this thread:

  1) IE6 (and IE4 + IE5 for that matter) supports utf-8 just fine, as  
long as you send the proper charset-encoding header.

  2) I am a diehard utf-8 fan and I do want you all to switch to  
utf-8 if you haven’t already!!! but 2.0 will also have better  
encoding support in general, like presenting errors/warnings at the  
proper times, making it more explicit when there are problems with  
encodings (like loading non-utf 8 files with 8 bit characters), etc.

  3) If you do insist on using latin-1 for whatever project you are  
working on, be sure to switch to ISO-8859-1 in Preferences →  
Advanced → Saving. By default it is utf-8, and I think that is why  
it switches to utf-8 when you paste æøå from Word. If you set it to  
ISO-8859-1, then it should pick latin-1 instead.

Finally a question: If your web-site is all in latin-1, how do you  
deal with user input, if any? I.e. if I can post comments or in some  
other way submit arbitrary plain text to your site, you just pray I  
restrain myself to latin-1, and that the browser sends my text as  
latin-1? ;)









More information about the textmate mailing list