[TxMt] pbs with LaTeX labels

Hans-Joerg Bibiko bibiko at eva.mpg.de
Thu May 31 15:24:49 UTC 2007


On 31 May 2007, at 17:07, Xavier Cambar wrote:

>
> Is there a way to substitute an accented character by its non- 
> accented equivalent with a regular expression?
>
As far as I know it would be very tricky.

By myself I use perl for that:

Write a command:

Input: Selected Text or Document
Output: Replace Selected Text

Command:

perl -e'
use Unicode::Normalize;
use utf8;
no warnings;
binmode (STDIN, ":utf8");
binmode (STDOUT, ":utf8");
while(<>){
	$_=NFKD($_);
	s/[\x{0300}-\x{0362}]//g; # combining diacritics
	s/\x{3099}//g;s/\x{FF9E}//g;s/\x{309B}//g; # Japanese voiced mark
	s/\x{309A}//g;s/\x{309C}//g;s/\x{FF9F}//g; # Japanese semi-voiced mark
	print;
}
'

You can delete the Japanese stuff.
The function NFKD decompose any character with a diacritic into its  
base character plus the diacritics as combining form according to the  
Unicode specification.
The next is simply delete all combining diacritics.
Please note, this will delete ALL diacritics, i.e cedilla, diaereses,  
acute, grave, macron, hook, ogonek etc.!

I guess you have to install the Perl library Unicode::Normalize in  
beforehand via CPAN, but I don't know this exactly.

How to apply this to the LaTeX snippets for sectioning, I don't know,  
but maybe my hint helps.

Best,

Hans



More information about the textmate mailing list