[TxMt] Re: Search & replace regex question

Hans-Jörg Bibiko bibiko at eva.mpg.de
Wed Sep 3 20:04:02 UTC 2008


On 03.09.2008, at 18:17, Allan Odgaard wrote:
> On 3 Sep 2008, at 12:53, Hans-Jörg Bibiko wrote:
>> Is there also a plan to support within the replacement format string
>> Oniguruma's named groups and the entire back reference functionality
>> (including back reference with nested levels)?
>
> The named captures will be available as $variables.
This will be awesome ;)

> Not sure what the other thing you refer to is. Can you give an  
> example?

OK. Maybe you remember Thomas Aylott and I fiddled around to  
implement a command which is able to select/find balanced HTML/XML tags.
Finally we found a solution but it makes usage of many many lines of  
source code (the command should be in the TM trunk experimental).

Some while ago I read Oniguruma's RE.txt carefully. This kind of  
match is supported natively by Oniguruma. It is called 'back  
reference with nest level'.

Example 1:

I have a string: "<foo>f<foo>b<bar>123<bar>456</bar></bar>bb</foo>f</ 
foo>"
and this regexp (please don't be frightened ;):

(?<element>\g<stag>\g<content>*\g<etag>){0}(?<stag><\g<name>\s*>){0}(? 
<name>[a-zA-Z_:]+){0}(?<content>[^<&]+(\g<element>|[^<&]+)*){0}(? 
<etag></\k<name+1>>){0}\g<element>

If I run this through Oniguruma I get these named groups:
[syntax: group-name (which group): (string-indices[start-stop]])  
content]

stag (2): (20-25) <bar>
content (4): (5-49) f<foo>b<bar>123<bar>456</bar></bar>bb</foo>f
element (1): (0-55) <foo>f<foo>b<bar>123<bar>456</bar></bar>bb</ 
foo>f</foo>
etag (5): (49-55) </foo>
name (3): (21-24) bar


Example 2:
string: "o>b<bar>123<bar>456</bar></bar>bb</foo>f</foo>"

stag (2): (11-16) <bar>
content (4): (8-25) 123<bar>456</bar>
element (1): (3-31) <bar>123<bar>456</bar></bar>
etag (5): (25-31) </bar>
name (3): (12-15) <bar>

In other words it should be possible to use Oniguruma's power to find/ 
select the next balanced HTML/XML tag by using only one more or less  
easy regular expression depending of the position of the caret.

As far as I know Ruby 1.9 is supporting this (?). By myself I'm using  
a C program linked to the onig lib to match these nested stuff.

Furthermore this issue leads to a question: Would it be possible to  
use TM's Oniguruma engine from outside, meaning an API?


Best,

--Hans





More information about the textmate mailing list