On 31 May 2007, at 17:07, Xavier Cambar wrote:
Is there a way to substitute an accented character by its non- accented equivalent with a regular expression?
As far as I know it would be very tricky.
By myself I use perl for that:
Write a command:
Input: Selected Text or Document Output: Replace Selected Text
Command:
perl -e' use Unicode::Normalize; use utf8; no warnings; binmode (STDIN, ":utf8"); binmode (STDOUT, ":utf8"); while(<>){ $_=NFKD($_); s/[\x{0300}-\x{0362}]//g; # combining diacritics s/\x{3099}//g;s/\x{FF9E}//g;s/\x{309B}//g; # Japanese voiced mark s/\x{309A}//g;s/\x{309C}//g;s/\x{FF9F}//g; # Japanese semi-voiced mark print; } '
You can delete the Japanese stuff. The function NFKD decompose any character with a diacritic into its base character plus the diacritics as combining form according to the Unicode specification. The next is simply delete all combining diacritics. Please note, this will delete ALL diacritics, i.e cedilla, diaereses, acute, grave, macron, hook, ogonek etc.!
I guess you have to install the Perl library Unicode::Normalize in beforehand via CPAN, but I don't know this exactly.
How to apply this to the LaTeX snippets for sectioning, I don't know, but maybe my hint helps.
Best,
Hans