On Jan 20, 2005, at 18:57, Eric Hsu wrote:
On the other hand, extended ASCII does depend on encoding, and I'm not sure how standard x80-xFF are.
Since TM uses UTF-8 to talk with external commands, you don't have to worry about encodings. The non-printable high-bit characters are 0x80-0x9F, but in UTF-8 that corresponds to this pattern: “\xC2[\x80-\x9F]” (obtained using: “printf \x80\x9F|iconv -f iso-8859-1 -t utf-8|xxd”).
So my candidate for an UTF-8 friendly zap gremlins becomes: perl -pe 's/[^\t\n\x20-\xFF]|\xC2[\x80-\x9F]//g'
Does anyone actually have a document with 'gremlins' to test this stuff? ;)