[TxMt] Re: how to match unicode?

Hans-Jörg Bibiko bibiko at eva.mpg.de
Wed Sep 24 10:00:15 UTC 2008


On 24.09.2008, at 11:08, Piero D'Ancona wrote:

> Writing a ruby command for TextMate to reformat author names
> in a list of papers I run into the obvious but sad fact that
> /[A-z]/ =~ "ü"
> does not match anything.  Is there a simple workaround?
> I mean, simpler than a very long and unelegant list of Unicode
> ranges such as the one here
> http://forums.mozillazine.org/viewtopic.php?f=25&t=834075

This is a tricky point.

If you are using Ruby 1.9 then Oniguruma'a class /[[:alpha:]]/u should  
work.
I can remember that one could install Oniguruma's regexp engine also  
for Ruby 1.8.

By myself I tried to rewrite my regexp in a negated form, i.e. instead  
of looking for \w I wrote e.g. [^\d\s -_].
A the other way would be to look for a significant string after the  
author. Ruby 1.8 matches by /./u also the ü as single character.

If this doesn't work for you, well then you should use the Unicode  
ranges, but you can shorten it if you are only dealing with names  
written in Latin script.

--Hans


More information about the textmate mailing list