[TxMt] Re: textmate Digest, Vol 15, Issue 19
Juan Falgueras
juanfc at uma.es
Fri Aug 21 15:52:52 UTC 2009
El 21/08/2009, a las 3:14, textmate-request at lists.macromates.com
escribió:
> On Aug 20, 2009, at 6:37 PM, Juan Falgueras wrote:
>
>> desperate looking for a way of a more relaxed way of searching a
>> string inside another, we need not to take into the account not only
>> the case
>>
>> s = "Abc"
>> if s =~ /abc/i then
>> ?
>>
>> but also if you have forgotten an accent, etc:
>>
>> s = "?Bc"
>> if s =~ /abc/i then
>> ?also should match!
>>
>> Changing the encoding to the simplest one: ASCII, does not work
>> since
>> iconv, nor ruby force_encoding() work and gives you errors in case
>> you
>> try to convert "?" to "a"
>
> This might be one option for you:
>
> $ irb -KU -r iconv
>>> s = "?Bc"
> => "?Bc"
>>> Iconv.conv("ASCII//TRANSLIT//IGNORE", "UTF-8",
> s).downcase.delete("^a-z") =~ /abc/
> => 0
>
> I hope that helps.
>
> James Edward Gray II
Thanks James!
I have narrowed the chars you should delete after transliteration. If
you try with the ~full range of accented latin chars
require "iconv"
ss=
"ÀÁÂÃÄÅĀĄĂÆÇĆČĊĎĐÈÉÊËĒĘĚĔĖĜĞĠĢĤĦÌÍÎÏĪĨĬĮİIJĴĶŁĽĹĻĿÑŃŇŅŊÒÓÔÕÖØŌŐŎŒŔŘŖŚŠŞŜȘŤŢŦȚÙÚÛÜŪŮŰŬŨŲŴÝŶŸŹŽŻàáâãäåāąăæçćĉċďđèéêëēęěĕėƒĝğġģĥħìíîïīĩĭįıijĵķĸłľĺļŀñńňņʼnòóôõöøōőŏœŕřŗśšşŝșťţŧțùúûüūůűŭũųŵÿŷžżźþßſÐð
"
# you can see a small set of chars that are used to transliterate: "'~^`
puts Iconv.conv("ASCII//TRANSLIT//IGNORE", "UTF-8",
ss).delete(%q{"'~^`})
This is a less aggressive way of patching this annoying behaviour of
transliterate. I would like to see new //CLEANDROP or so, to simply
drop out those accents to work with the chars. Or simply a regexp
search with a modifier that could make it just to ignore accents…
- juan
More information about the textmate
mailing list