[TxMt] keep it in iso-8859-1
Allan Odgaard
throw-away-1 at macromates.com
Sat Mar 31 15:10:15 UTC 2007
On 30. Mar 2007, at 19:09, Jay Soffian wrote:
> On Mar 30, 2007, at 10:58 AM, Danny Krøger wrote:
>> It would be nice to have an option to paste text at the current
>> encoding and truncate characters not availible. That is a better
>> option than destroying a document (when you are forced to keep it
>> in latin 1). It costs so much time to change all the garbaged text
>> by hand afterwards.
> Bundle Editor -> New Command
> Input: None
> Output: Replace Selected Text
> Key Equivalent: <your choice>
> Command(s):
>
> __CFUSERTEXT_ENCODING=0×1F5:0×8000100:0×8000100 /usr/bin/pbpaste
> | /usr/bin/iconv -c -s -f UTF-8 -t ISO-8859-1
>
> Then use that command for pasting instead of cmd-v.
That is indeed clever :) One addition though, you need to convert
back to utf-8, since TM expects the command result to be in utf-8
(but we got the non-latin 1 superset pruned, so it will still work).
One can also add //TRANSLIT to the target encoding, that will make
iconv try to “downgrade” the characters which could not be
converted. For example curly quotes become straight quotes, ellipsis
becomes three dots, etc.
So the command could read:
__CFUSERTEXT_ENCODING=0×1F5:0×8000100:0×8000100 /usr/bin/
pbpaste \
| /usr/bin/iconv -c -s -f UTF-8 -t ISO-8859-1//TRANSLIT \
| /usr/bin/iconv -f ISO-8859-1 -t UTF-8
Answering a few other things from this thread:
1) IE6 (and IE4 + IE5 for that matter) supports utf-8 just fine, as
long as you send the proper charset-encoding header.
2) I am a diehard utf-8 fan and I do want you all to switch to
utf-8 if you haven’t already!!! but 2.0 will also have better
encoding support in general, like presenting errors/warnings at the
proper times, making it more explicit when there are problems with
encodings (like loading non-utf 8 files with 8 bit characters), etc.
3) If you do insist on using latin-1 for whatever project you are
working on, be sure to switch to ISO-8859-1 in Preferences →
Advanced → Saving. By default it is utf-8, and I think that is why
it switches to utf-8 when you paste æøå from Word. If you set it to
ISO-8859-1, then it should pick latin-1 instead.
Finally a question: If your web-site is all in latin-1, how do you
deal with user input, if any? I.e. if I can post comments or in some
other way submit arbitrary plain text to your site, you just pray I
restrain myself to latin-1, and that the browser sends my text as
latin-1? ;)
More information about the textmate
mailing list