On 11/09/2005, at 12.10, Andreas Wahlin wrote:
But I still need to learn regexp for complete customization, any good tutorials you know? Is it as hard as it looks?
O'reilly have written a book about them, though I haven't read it, so I can't talk about its usefulness to newbies (but it's probably a good bet).
There's also a tutorial in one of the Perl manuals: man perlre, I haven't read this either, but it's probably more of a reference manual.
They probably look harder than they are, since today regexp languages support _a lot_ of features, but one doesn't have to learn all at once.
E.g. letters and digits are literal matches. So foo will match foo.
Many other characters are special and thus would need to be escaped, e.g. foo+bar would match foo+bar.
. matches any character, except newline. So f.o matches flo, foo, fro, etc.
[a-z] matches the a-z range, could have been another range, or multiple ranges, like [a-fA-F0-9], or we can negate the range [^a-z] to match everything but a-z.
There are then escape characters to match groups of characters, like \w matches word characters, \d matches digits, \s matches spaces (whitespace) etc. Here you can generally negate by uppercasing the letter.
That's most of the “literal” matching. You can specify that the previous match should be applied more than once, e.g.:
a* -- match a 0-n times a+ -- match a 1-n times a{5,7} -- match a 5-7 times
This is greedy, so it will match as many as possible, by adding ?, it will match as few as possible.
If you want to repeat more than a character, you can group stuff with (...), e.g. (foo)+ to match 1 or more occurrences of foo.
That's really the basics. The entire grammar of the Oniguruma regexp library is here: http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt -- it's also a reference manual, but once you know the basics, that's really all you need.
One final thing about regexp's, everything you group (i.e. put in (...)) results in a “capture”, which basically means you can refer to that part (after the match).
E.g. if we search for: <img src="(.*?)"> then the src argument is in capture #1, and we can refer to that using $1 in our replace string. The language grammars also allow you to refer to captures.