El 21/08/2009, a las 3:14, textmate-request@lists.macromates.com escribió:
On Aug 20, 2009, at 6:37 PM, Juan Falgueras wrote:
desperate looking for a way of a more relaxed way of searching a string inside another, we need not to take into the account not only the case
s = "Abc" if s =~ /abc/i then ?
but also if you have forgotten an accent, etc:
s = "?Bc" if s =~ /abc/i then ?also should match!
Changing the encoding to the simplest one: ASCII, does not work since iconv, nor ruby force_encoding() work and gives you errors in case you try to convert "?" to "a"
This might be one option for you:
$ irb -KU -r iconv
s = "?Bc"
=> "?Bc"
Iconv.conv("ASCII//TRANSLIT//IGNORE", "UTF-8",
s).downcase.delete("^a-z") =~ /abc/ => 0
I hope that helps.
James Edward Gray II
Thanks James!
I have narrowed the chars you should delete after transliteration. If you try with the ~full range of accented latin chars
require "iconv"
ss= "ÀÁÂÃÄÅĀĄĂÆÇĆČĊĎĐÈÉÊËĒĘĚĔĖĜĞĠĢĤĦÌÍÎÏĪĨĬĮİIJĴĶŁĽĹĻĿÑŃŇŅŊÒÓÔÕÖØŌŐŎŒŔŘŖŚŠŞŜȘŤŢŦȚÙÚÛÜŪŮŰŬŨŲŴÝŶŸŹŽŻàáâãäåāąăæçćĉċďđèéêëēęěĕėƒĝğġģĥħìíîïīĩĭįıijĵķĸłľĺļŀñńňņʼnòóôõöøōőŏœŕřŗśšşŝșťţŧțùúûüūůűŭũųŵÿŷžżźþßſÐð "
# you can see a small set of chars that are used to transliterate: "'~^`
puts Iconv.conv("ASCII//TRANSLIT//IGNORE", "UTF-8", ss).delete(%q{"'~^`})
This is a less aggressive way of patching this annoying behaviour of transliterate. I would like to see new //CLEANDROP or so, to simply drop out those accents to work with the chars. Or simply a regexp search with a modifier that could make it just to ignore accents…
- juan