Re: [TxMt] pbs with LaTeX labels

31 May 2007


      On 31 May 2007, at 17:07, Xavier Cambar wrote:
...
Is there a way to substitute an accented character by its non- 
accented equivalent with a regular expression?
As far as I know it would be very tricky.
By myself I use perl for that:
Write a command:
Input: Selected Text or Document
Output: Replace Selected Text
Command:
perl -e'
use Unicode::Normalize;
use utf8;
no warnings;
binmode (STDIN, ":utf8");
binmode (STDOUT, ":utf8");
while(<>){
    $_=NFKD($_);
    s/[\x{0300}-\x{0362}]//g; # combining diacritics
    s/\x{3099}//g;s/\x{FF9E}//g;s/\x{309B}//g; # Japanese voiced mark
    s/\x{309A}//g;s/\x{309C}//g;s/\x{FF9F}//g; # Japanese semi-voiced mark
    print;
}
'
You can delete the Japanese stuff.
The function NFKD decompose any character with a diacritic into its  
base character plus the diacritics as combining form according to the  
Unicode specification.
The next is simply delete all combining diacritics.
Please note, this will delete ALL diacritics, i.e cedilla, diaereses,  
acute, grave, macron, hook, ogonek etc.!
I guess you have to install the Perl library Unicode::Normalize in  
beforehand via CPAN, but I don't know this exactly.
How to apply this to the LaTeX snippets for sectioning, I don't know,  
but maybe my hint helps.
Best,
Hans

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [TxMt] pbs with LaTeX labels