[TxMt] Re: textmate Digest, Vol 15, Issue 19

Juan Falgueras juanfc at uma.es
Fri Aug 21 15:52:52 UTC 2009




El 21/08/2009, a las 3:14, textmate-request at lists.macromates.com  
escribió:

> On Aug 20, 2009, at 6:37 PM, Juan Falgueras wrote:
>
>> desperate looking for a way of a more relaxed way of searching a
>> string inside another, we need not to take into the account not only
>> the case
>>
>> s = "Abc"
>> if s =~ /abc/i then
>>   ?
>>
>> but also if you have forgotten an accent, etc:
>>
>> s = "?Bc"
>> if s =~ /abc/i then
>>   ?also should match!
>>
>> Changing the encoding to the simplest one: ASCII,  does not work  
>> since
>> iconv, nor ruby force_encoding() work and gives you errors in case  
>> you
>> try to convert "?" to "a"
>
> This might be one option for you:
>
> $ irb -KU -r iconv
>>> s = "?Bc"
> => "?Bc"
>>> Iconv.conv("ASCII//TRANSLIT//IGNORE", "UTF-8",
> s).downcase.delete("^a-z") =~ /abc/
> => 0
>
> I hope that helps.
>
> James Edward Gray II

Thanks James!


I have narrowed the chars you should delete after transliteration.  If  
you try with the ~full range of accented latin chars


require "iconv"

ss=  
"ÀÁÂÃÄÅĀĄĂÆÇĆČĊĎĐÈÉÊËĒĘĚĔĖĜĞĠĢĤĦÌÍÎÏĪĨĬĮİIJĴĶŁĽĹĻĿÑŃŇŅŊÒÓÔÕÖØŌŐŎŒŔŘŖŚŠŞŜȘŤŢŦȚÙÚÛÜŪŮŰŬŨŲŴÝŶŸŹŽŻàáâãäåāąăæçćĉċďđèéêëēęěĕėƒĝğġģĥħìíîïīĩĭįıijĵķĸłľĺļŀñńňņʼnòóôõöøōőŏœŕřŗśšşŝșťţŧțùúûüūůűŭũųŵÿŷžżźþßſÐð 
"

# you can see a small set of chars that are used to transliterate: "'~^`

   puts Iconv.conv("ASCII//TRANSLIT//IGNORE", "UTF-8",  
ss).delete(%q{"'~^`})



This is a less aggressive way of patching this annoying behaviour of  
transliterate.  I would like to see new //CLEANDROP  or so, to simply  
drop out those accents to work with the chars.  Or simply a regexp  
search with a modifier that could make it just to ignore accents…

- juan




More information about the textmate mailing list